Sunday, May 22, 2016

A quick and dirty medical dictionary for your Mac

I work on a lot of health IT projects at work and, as such, I face a constant stream of red squiggles provided by the spellcheck utilities of almost every major editor.  Over the years, I've built up quite a few 'User Dictionaries' by adding terms to each spellchecker one-by-one.

Today, I decided to look into whether my Mac offers a more centralized user dictionary to see if I might pre-populate it with all these medical terms instead.  Answer: Yup. And its just a simple text file with one term per line.

Now this is where the little-known SPECIALIST Lexicon from the US National Library of Medicine comes in.  The lexicon is a freely available "syntactic lexicon of biomedical and general English" that includes terms in the following format:
{base=cotton roll gingivitis
spelling_variant=cotton-roll gingivitis
entry=E0000001
        cat=noun
        variants=glreg
        variants=uncount
}
{base=2060 virus
entry=E0000010
        cat=noun
        variants=reg
}
...
There may be a version with one term per line, but I figured it would be easier just to use some unix-fu to get it in the right format.  So, if you're following along, simply open a Terminal and do the following:

1. Navigate to the latest Lexicon release and download the Lexicon (text) by right-clicking the link and choosing the appropriate save option (don't left-click/navigate to the link unless you want your browser to hang for a bit...the file is fairly large).

2. Make a LocalDictionary backup (optional)
If you already have a LocalDictionary, you may want to back it up first in case something goes wrong:
cp ~/Library/Spelling/LocalDictionary ~/Library/Spelling/LocalDictionary.backup
3. Navigate to where you downloaded the LEXICON file and add the terms from it
Perhaps a little dirty, but I don't think this LEXICON format has changed in like 10 years, so here are a couple lines to add the terms to your Local Dictionary in the proper format:
grep "{base=" LEXICON | cut -c 7- >> ~/Library/Spelling/LocalDictionary
grep "spelling_variant=" LEXICON | cut -c 14- >> ~/Library/Spelling/LocalDictionary
4. Sort the terms
According to that tutsplus article I linked earlier, the Mac spellcheck file needs to be sorted alphabetically.
sort -o ~/Library/Spelling/LocalDictionary ~/Library/Spelling/LocalDictionary 
If you made a LocalDictionary backup, I won't tell you to go delete it yet because I've only just made the change and so I'm not sure whether having an 8 MB LocalDictionary will have any downsides, but so far it seems to be working.