A small Hunspell dictionary for professional, scientific writing.
- High Quality: based on SCOWL en-US dictionary and thoroughly tested.
- High Sensitivity:
- removed words such as
thee
,posses
,fatuous
, orjerk
. - less missed errors (but probably more false-positives)
- removed words such as
- Academic Language:
- added words such as
overapproximation
,whitepaper
, andbitmask
. - select the scientific domains you need during build
- added words such as
- Easy Install: available as extension for LibreOffice and Firefox/Thunderbird.
Download this repository and then decide for which applications you want to install the dictionary:
Install System-Wide for Hunspell
- copy
en-Academic.dic
anden-Academic.aff
to/usr/share/hunspell
Install LibreOffice Extension
- Automatic: Download and open
acamedic-libreoffice.oxt
file in theaddons
folder. - Manual:
- Start LibreOffice and select
Tools → Extension Manager... → Add
. - Open
acamedic-libreoffice.oxt
from theaddons
folder.
- Start LibreOffice and select
Install Thunderbird Extension
- Automatic: Download and open
acamedic-mozilla.xpi
. - Manual:
- Start Thunderbird and select
Tools → Add-ons → ⚙ → Install Add-on from file
. - Open
acamedic-mozilla.xpi
from theaddons
folder.
- Start Thunderbird and select
Install for Sublime-Text
Copy en-Academic.dic
and en-Academic.aff
to ~/.config/sublime-text-3/Packages/Language - English/
Install for Visual Studio Code
- Install the extension
denisgerguri.hunspell-spellchecker
(Ctrl+Shift+P
, typeExt install
, typehunspell
) - copy
en-Academic.dic
anden-Academic.aff
to~/.vscode/extensions/denisgerguri.hunspell-spellchecker-1.0.1/languages/
- Follow instructions at https://marketplace.visualstudio.com/items?itemName=denisgerguri.hunspell-spellchecker#adding-new-language
Install for TeXstudio
- Start TeXstudio and select 'Options → configure Texstudio ... → Language ...'
- Under the spell check sub-group, check the path of the spelling dictionary. In Windows OS, the path is: 'C:\Program Files (x86)\texstudio\dictionaries'
- Copy
en-Academic.dic
anden-Academic.aff
to the path from step 2. - From the same configuration window of step 2, you can choose
en-Academic
from the drop menu of the default language. - Restart TeXstudio
The project is in an early stage and you might find many words in your domain that are missing. Please collect them over time and create an issue with your list of words. Again, the idea is to include only words that you use often or rare words that are very distinct and cannot be confused with other terms.
The dictionary is constructed from several individual dictionary files in the src
folder.
/base/
contains common words./academic/
contains special mathematical, technical, chemical, etc. terms./names/
contains special names starting with capital letter./codes/
contains keywords of programming languages.
For spell checking of academic documents, it is not useful if dictionaries include words such as thee
or wee
. They will most likely mask a spelling error of the words the
or we
. Since they are archaic or words, probably no one is going to write them but rather read them in some historic text fragments. And even if you write them, a spelling error of these words will most likely be masked by other words because instead of wee
there is we
, see
, or weed
.
Furthermore, the standard dictionaries include a lot of problematic words, such as wit
, dome
, or wont
.
I have created this dictionary using the following process:
- Take a very small base dictionary: SCOWL-20
- Manually go through it an remove non-scientific terms
- Use it to check reference papers
- add newly found terms that are in SCOWL-60
- manually check and add further unrecognized terms
- archaic terms such as
brethren
,cobbler
,sod
,thee
,thou
,unto
,wive
- inappropriate words such as
cum
,slut
,gnome
,sexy
,slave
- narrative adjectives such as
cunning
,fatuous
,fierce
,ghastly
,hitherto
,pompous
,sheer
- uncommon words with common alternatives, such as
envisage
(useenvision
),futile
(useuseless
), orhorrific
(usehorrible
). - colloquial words such as
eh
,gig
,hey
,lad
,lousy
,oh
,bugger
- words that are very far away from technical contexts, such as
horde
,hail
,hog
,hut
,lark
,mummy
,bigot
- religious terms such as
heresy
,sermon
,sinful
- words often used to insult:
hypocrite
,illiterate
,inane
,jerk
,moron
,snobbery
- British / Scottish words that sneaked into the US dictionary, such as
lorry
,nay
,dole
,duff
,dustbin
- potential mistakes found in SCOWL-20:
cs
,alias's
(should bealias'
), alsospecies's
,die's
,elect's
,feel's
,want's
I do automatic testing on the resulting dictionary file:
- check that there no duplicates or that duplicates are justified, e.g.
clean
can be a verbclean/SDG
or adjectiveclean/YRT
- run on some reference papers and check that not many words are missing
- check that all words are spelled correctly:
- using SCOWL-60 and SCOWL-95 dictionary
- manually double-check all remaining words with
- the suffixes
-ic
and-ical
are mostly interchangeable - words with
-ical
or-ual
suffix should (normally) have the/Y
adverb extension. - words with
-tive
suffix often have/SYP
- words should either have
-icity
or-ness
suffices and these should not have plural forms. E.g. usesimplifications
instead ofsimplicities
.
The Hunspell compression could be used to indicate the POS type of a word. For example /SDG
means that either s
, ed
or ing
can be appended to the word, indicating a regular verb. However, the compression is only concerned with the size and therefore it needs to be checked for confusing suffixes.
Misleading expansion examples:
we/DGT
expands towe
,wed
,wing
, andwest
be/DT
expands tobe
,bed
, andbest
should/RZ
expands toshould
andshoulder/S
As a result, I started to encode regular verbs with /SDG
and nouns with /MS
, adjectives with /RT
or /Y
for the adjective/adverb combination.