forked from hunspell/hunspell
-
Notifications
You must be signed in to change notification settings - Fork 1
/
NEWS
839 lines (615 loc) · 27.6 KB
/
NEWS
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
2018-11-12: Hunspell 1.7.0 release:
New features and bug fixes by László Németh, supported by FSF.hu Foundation:
- No annoying suggestion times any more, especially in languages with
compound word handling and complex morphology. By adding balanced
multi-level time limits, now the guaranteed suggestion time is there
within half a second, not seconds (nor dozen of seconds or more
in extreme cases) for longer misspellings, too.
- add SPELLML support for run-time dictionary extension with optional
affixation of user words. See new "Grammar By" feature of
language-specific user dictionaries of LibreOffice 6.0:
News: https://wiki.documentfoundation.org/ReleaseNotes/6.0#.E2.80.9CGrammar_By.E2.80.9D_spell_checking
Screencast with English example: https://www.youtube.com/watch?v=EsS3gaBTfOo
Screencast with German example: https://www.youtube.com/watch?v=aYVFDqCUb6I
- Improved, highly customizable suggestions on level of dictionary words:
Pronunciations and typical misspellings defined by optional "ph:" fields of
the dictionary words are used not only in n-gram suggestions, but as
elements of the REP replacement list getting the highest priority in normal
suggestions, also giving the best suggestions for short words, too.
More information: see "ph:" in man 5 hunspell.
- Handling multiple word suggestions is much more easier. Like in a
traditional spelling dictionary, for example, to get the correct suggestion
"a lot" for the typical misspelling "alot" at the first place, now it's
enough to put the following line to the dic(tionary) file:
a lot
- Limit compound overgeneration by dictionary based word pairs:
Now it's possible to filter bad compound words by listing
the correct word pairs with space in the dictionary, as in a traditional
spelling dictionary.
- clean-up suggestion:
- no n-gram and compound word suggestions, if "good" suggestion
exists, ie. uppercase, REP, ph: or dictionary word pair suggestions
- word pairs are always suggested, if they exist in the dic file
- word pairs have top priority in suggestions, and
these are the only suggestions if there is no other good suggestion.
- also dictionary word pairs separated by dash instead of space
are handled specially in two-word suggestion (depending from the
language)
- limit bad suggestions by improved n-gram suggestion rules:
don't suggest capitalized dictionary words for lower
case misspellings in n-gram suggestions, except
- PHONE usage, or
- in the case of German, where not only proper
nouns are capitalized, or
- the capitalized word has special pronunciation
and don't suggest if the difference of lengths of misspellings and
suggestions is 5 or more characters.
- Extend dotless i and dotted I rules to Crimean Tatar language
Allow dotted I in dictionary, and disable bad capitalization of i.
- BREAK: extended recursive word breaking algorithm to handle words or
words with suffixes when they already contain word break characters,
for example, "e-mail" is a dictionary word with a word break character, and
it wasn't accepted before in compounds in some languages.
- FORBIDDENWORD precedes BREAK: Now it's possible to forbid compound
forms recognized by BREAK word breaking by adding the bad compounds to
the dictionary with FORBIDDENWORD flags.
- lower limit for "doubletwochars" suggestion algorithm:
one of the typical misspellings recognized by Hunspell suggestion
mechanism is the syllable duplication. Along the old pattern
ABABA -> ABA, for example nutrITITIon -> nutrITIon, now also the
simpler ABAB -> AB pattern is recognized in non-starting position,
for example, regretTETEd -> regretTEd.
- lower limit for longswapchar and movechar: recognized only max.
4-character distances to avoid slow and bad suggestions.
- fix compound handling for new Hungarian orthography reform
- Allow suggestion search for prefix + *two suffixes*:
Remove artificial performance limit to get correct
suggestions for relatively simple misspellings in
Hungarian, etc., when the word form contains prefix
and both derivative and inflectional suffixes, too:
lefikszálása -> lefixálása
Improvements for command-line Hunspell:
- Remove false alarms during checking OpenDocument (ODF)
documents by ignoring <text:span> elements. (LibreOffice
creates a lot of <text:span> elements also within words
during text reediting, resulted often huge amount of broken
words before this fix.)
- List filenames during filtering multiple files in command-line:
Examples:
$ hunspell -l *.odt
a.odt: mispelling
b.odt: egzample
$ hunspell -l -G *.odt
a.odt: good
b.odt: words
- Dictionary search by option -D doesn't wait for the standard input
(fixed by Siva Mahadevan)
Other improvements:
- makealias dictionary compression: add option --minimize-diff
to reuse free positions of alias lists to create minimal and
readable diffs for alias compressed dictionaries stored in
revision control systems, as dictionaries of LibreOffice.
- Brazilian-Portuguese translation by Rafael Fontenelle
- Catalan translation by robert dot buj at gmail
- Minor bug fixes by several contributors, see git log
2017-09-03: Hunspell 1.6.2 release:
- Library changes: no. Same as 1.6.1.
- Command line tool:
- Added German translation
- Fixed bug with wrong output encoding, not respecting system locale.
2017-03-25: Hunspell 1.6.1 release:
- Library changes:
- Performance improvements in suggest()
- Fixes regressions for Hungarian related to compounding.
- Fixes regressions for Korean related to ICONV.
- Command line tool:
- Added Tajik translation
- Fix regarding serching of OOo dicts installed in user folder
- Manpages:
- Fix microsoft-cp1251 to cp1251. Dicts should not use the first.
- Typos.
2016-12-22: Hunspell 1.6.0 release:
- Library changes:
- Performance improvement in ngsuggest(), suggestions should be faster.
- Revert MAXWORDLEN to 100 as in 1.3.3 for performance reasons.
- MAXWORDLEN can be set during build time with -D defines.
- Fix crash when word with 102 consecutive X is spelled.
- Command line tool:
- -D shows all loaded dictionares insted of only the first.
- -D properly lists all available dictionaries on Windows.
2016-11-30: Hunspell 1.5.4 release:
- Fixes the command COMPOUNDSYLLABLE used in Hungarian dictionary.
2016-11-28: Hunspell 1.5.3 release:
- Removed a #include from hunspell.hxx that was creating trouble
2016-11-27: Hunspell 1.5.2 release:
- Reverted full backward compatibility with 1.4 public API, again
2016-11-27: Hunspell 1.5.1 release:
- Reverted full backward compatibility with 1.4 public API
2016-11-18: Hunspell 1.5.0 release:
- Lot of stability fixes
- Fixed compilation errors on various systems (Windows, FreeBSD)
- Small performance improvement compared to 1.4.0
- The C++ API is updated to use modern C++ types (string, vector).
Backward compatibility is kept for most of the functions except for
the following:
- get_wordchars();
- get_version();
- input_conv(string, string);
- removed get_csconv();
2016-04-15: Hunspell 1.4.0 release:
- various abi changes due to moving away from char* to std::string
2014-06-02: Hunspell 1.3.3 release:
- OpenDocument (ODF and Flat ODF) support (ODF needs unzip program)
- various bug fixes
2011-02-02: Hunspell 1.3.2 release:
- fix library versioning
- improved manual
2011-02-02: Hunspell 1.3.1 release:
- bug fixes
2011-01-26: Hunspell 1.2.15/1.3 release:
- new features: MAXDIFF, ONLYMAXDIFF, MAXCPDSUGS, FORBIDWARN, see manual
- bug fixes
2011-01-21:
- new features: FORCEUCASE and WARN, see manual
- new options: -r to filter potential mistakes (rare words
signed by flag WARN in the dictionary)
- limited and optimized suggestions
2011-01-06: Hunspell 1.2.14 release:
- bug fix
2011-01-03: Hunspell 1.2.13 release:
- bug fixes
- improved compound handling and
other improvements supported by OpenTaal Foundation, Netherlands
2010-07-15: Hunspell 1.2.12 release
2010-05-06: Hunspell 1.2.11 release:
- Maintenance release bug fixes
2010-04-30: Hunspell 1.2.10 release:
- Maintenance release bug fixes
2010-03-03: Hunspell 1.2.9 release:
- Maintenance release bug fixes and warnings
- MAP support for composed characters or character sequences
2008-11-01: Hunspell 1.2.8 release:
- Default BREAK feature and better hyphenated word suggestion to accept
and fix (compound) words with hyphen characters by spell checker
instead of by work breaking code of OpenOffice.org. With this feature
it's possible to accept hyphenated compound words, such as "scot-free",
where "scot" is not a correct English word.
- ICONV & OCONV: input and output conversion tables for optional character
handling or using special inner format. Example:
# Accepting de facto replacements of the Romanian comma acuted letters
SET UTF-8
ICONV 4
ICONV Å È
ICONV Å£ È
ICONV Å È
ICONV Å¢ È
Typical usage of ICONV/OCONV is to manage an inner format for a segmental
writing system, like the Ethiopic script of the Amharic language.
- Extended CHECKCOMPOUNDPATTERN to handle conpound word alternations, like
sandhi feature of Telugu and other writing systems.
- SIMPLIFIEDTRIPLE compound word feature: allow simplified Swedish and
Norwegian compound word forms, like tillåta (till|låta) and
bussjåfør (buss|sjåfør)
- wordforms: word generator script for dictionary developers (Hunspell
version of unmunch).
- bug fixes
2008-08-15: Hunspell 1.2.7 release:
- FULLSTRIP: new option for affix handling. With FULLSTRIP, affix rules can
strip full words, not only one less characters.
- COMPOUNDRULE works with all flag types. (COMPOUNDRULE is for pattern
matching. For example, en_US dictionary of OpenOffice.org uses COMPOUNDRULE
for ordinal number recognition: 1st, 2nd, 11th, 12th, 22nd, 112th, 1000122nd
etc.).
- optimized suggestions:
- modified 1-character distance suggestion algorithms: search a TRY character
in all position instead of all TRY characters in a character position
(it can give more readable suggestion order, also better suggestions
in the first positions, when TRY characters are sorted by frequency.)
For example, suggestions for "moze":
ooze, doze, Roze, maze, more etc. (Hunspell 1.2.6),
maze, more, mote, ooze, mole etc. (Hunspell 1.2.7).
- extended compound word checking for better COMPOUNDRULE related
suggestions, for example English ordinal numbers: 121323th -> 121323rd
(it needs also a th->rd REP definition).
- bug fixes
2008-07-15: Hunspell 1.2.6 release:
- bug fix release (fix affix rule condition checking of sk_SK dictionary,
iconv support in stemming and morphological analysis of the Hunspell
utility, see also Changelog)
2008-07-09: Hunspell 1.2.5 release:
- bug fix release (fix affix rule condition checking of en_GB dictionary,
also morphological analysis by dictionaries with two-level suffixes)
2008-06-18: Hunspell 1.2.4-2 release:
- fix GCC compiler warnings
2008-06-17: Hunspell 1.2.4 release:
- add free_list() for C, C++ interfaces to deallocate suggestion lists
- bug fixes
2008-06-17: Hunspell 1.2.3 release:
- extended XML interface to use morphological functions by standard
spell checking interface, spell() and suggest(). See hunspell.3 manual page.
- default dash suggestions for compound words: newword-> new word and new-word
- new manual pages: hunspell.3, hzip.1, hunzip.1.
- bug fixes
2008-04-12: Hunspell 1.2.2 release:
- extended dictionary (dic file) support to use multiple base and
special dictionaries.
- new and improved options of command line hunspell:
-m: morphological analysis or flag debug mode (without affix
rule data it signs the flag of the affix rules)
-s: stemming mode
-D: list available dictionaries and search path
-d: support extra dictionaries by comma separated list. Example:
hunspell -d en_US,en_med,de_DE,de_med,de_geo UNESCO.txt
- forbidding in personal dictionary (with asterisk, / signs affixation)
- optional compressed dictionary format "hzip" for aff and dic files
usage:
hzip example.aff example.dic
mv example.aff example.dic /tmp
hunspell -d example
hunzip example.aff.hz >example.aff
hunzip example.dic.hz >example.dic
- new affix compression tool "affixcompress": compression tool for
large (millions of words) dictionaries.
- support encrypted dictionaries for closed OpenOffice.org extensions or
other commercial programs
- improved manual
- bug fixes
2007-11-01: Hunspell 1.2.1 release:
- new memory efficient condition checking algorithm for affix rules
- new morphological functions:
- stem() for stemming
- analyze() for morphological analysis
- generate() for morphological generation
- new demos:
- analyze: stemming, morphological analysis and generation
- chmorph: morphological conversion of texts
2007-09-05: Hunspell 1.1.12 release:
- dictionary based phonetic suggestion for words with
special or foreign pronounciation or alternative (bad) transliteration
(see Changelog, tests/phone.* and manual).
- improved data structure and memory optimization for dictionaries
with variable count fields
- bug fixes for Unicode encoding dictionaries and ngram suggestions
- improved REP suggestions with space: it works without dictionary
modification
- updated and new project files for Windows API
2007-08-27: Hunspell 1.1.11 release:
- portability fixes
2007-08-23: Hunspell 1.1.10 release:
- pronounciation based suggestion using Björn Jacke's original Aspell
phonetic transcription algorithm (http://aspell.net), relicensed under
GPL/LGPL/MPL tri-license with the permission of the author
- keyboard base suggestion by KEY (see manual)
- better time limits for suggestion search
- test environment for suggestion based on Wikipedia data
- bug fixes for non standard Mozilla platforms etc.
2007-07-25: Hunspell 1.1.9 release:
- better tokenization:
- for URLs, mail addresses and directory paths (default: skip these tokens)
- for colons in words (for Finnish and Swedish)
- new examples:
- affixation of personal dictionary words
- digits in words
- bug fixes (see ChangeLog)
2007-07-16: Hunspell 1.1.8 release:
- better Mac OS X/Cygwin and Windows compatibility
- fix Hunspell's Valgrind environment and memory handling errors
detected by Valgrind
- other bug fixes (see ChangeLog)
2007-07-06: Hunspell 1.1.7 release:
- fix warning messages of OpenOffice.org build
2007-06-29: Hunspell 1.1.6 release:
- check capitalization of the following word forms
- words with mixed capitalisation: OpenOffice.org - OPENOFFICE.ORG
- allcap words and suffixes: UNICEF's - UNICEF'S
- prefixes with apostrophe and proper names: Sant'Elia - SANT'ELIA
- suggestion for missing sentence spacing: something.The -> something. The
- Hunspell executable: improved locale support
- -i option: custom input encoding
- use locale data for default dictionary names.
- tools/hunspell.cxx: fix 8-bit tokenization (letters without
casing, like à or Hebrew characters now are handled well)
- dictionary search path (automatic detection of OpenOffice.org directories)
- DICPATH environmental variable
- -D option: show directory path of loaded dictionary
- patches and bug fixes for Mozilla, OpenOffice.org.
2007-03-19: Hunspell 1.1.5 release:
- optimizations: 10-100% speed up, smaller code size and memory footprint
(conditional experimental code and warning messages)
- extended Unicode support:
- non BMP Unicode characters in dictionary words and affixes (except
affix rules and conditions)
- support BOM sequence in aff and dic files
- IGNORE feature for Arabic diacritics and other optional characters
- New edit distance suggestion methods:
- capitalisation: nasa -> NASA
- long swap: permenant -> permanent
- long move: Ghandi -> Gandhi, greatful -> grateful
- double two characters: vacacation -> vacation
- spaces in REP sug.: REP alot a_lot (NOTE: "a lot" must be a dictionary word)
- patches and bug fixes for Mozilla, OpenOffice.org, Emacs, MinGW, Aqua,
German and Arabic language, etc.
2006-02-01: Hunspell 1.1.4 release:
- Improved suggestion for typical OCR bugs (missing spaces between
capitalized words). For example: "aNew" -> "a New".
http://qa.openoffice.org/issues/show_bug.cgi?id=58202
- tokenization fixes (fix incomplete tokenization of input texts on big-endian
platforms, and locale-dependent tokenization of dictionary entries)
2006-01-06: Hunspell 1.1.3.2 release:
- fix Visual C++ compiling errors
2006-01-05: Hunspell 1.1.3 release:
- GPL/LGPL/MPL tri-license for Mozilla integration
- Alias compression of flag sets and morphological descriptions.
(For example, 16 MB Arabic dic file can be compressed to 1 MB.)
- Improved suggestion.
- Improved, language independent German sharp s casing with CHECKSHARPS
declaration.
- Unicode tokenization in Hunspell program.
- Bug fixes (at new and old compound word handling methods), etc.
2005-11-11: Hunspell 1.1.2 release:
- Bug fixes (MAP Unicode, COMPOUND pattern matching, ONLYINCOMPOUND
suggestions)
- Checked with 51 regression tests in Valgrind debugging environment,
and tested with 52 OOo dictionaries on i686-pc-linux platform.
2005-11-09: Hunspell 1.1.1 release:
- Compound word patterns for complex compound word handling and
simple word-level lexical scanning. Ideal for checking
Arabic and Roman numbers, ordinal numbers in English, affixed
numbers in agglutinative languages, etc.
http://qa.openoffice.org/issues/show_bug.cgi?id=53643
- Support ISO-8859-15 encoding for French (French oe ligatures are
missing from the latin-1 encoding).
http://qa.openoffice.org/issues/show_bug.cgi?id=54980
- Implemented a flag to forbid obscene word suggestion:
http://qa.openoffice.org/issues/show_bug.cgi?id=55498
- Checked with 50 regression tests in Valgrind debugging environment,
and tested with 52 OOo dictionaries.
- other improvements and bug fixes (see ChangeLog)
2005-09-19: Hunspell 1.1.0 release
* complete comparison with MySpell 3.2 (from OpenOffice.org 2 beta)
* improved ngram suggestion with swap character detection and
case insensitivity
------ examples for ngram improvement (input word and suggestions) -----
1. pernament (instead of permanent)
MySpell 3.2: tournaments, tournament, ornaments, ornament's, ornamenting, ornamented,
ornament, ornamentals, ornamental, ornamentally
Hunspell 1.0.9: ornamental, ornament, tournament
Hunspell 1.1.0: permanent
Note: swap character detection
2. PERNAMENT (instead of PERMANENT)
MySpell 3.2: -
Hunspell 1.0.9: -
Hunspell 1.1.0: PERMANENT
3. Unesco (instead of UNESCO)
MySpell 3.2: Genesco, Ionesco, Genesco's, Ionesco's, Frescoing, Fresco's,
Frescoed, Fresco, Escorts, Escorting
Hunspell 1.0.9: Genesco, Ionesco, Fresco
Hunspell 1.1.0: UNESCO
4. siggraph's (instead of SIGGRAPH's)
MySpell 3.2: serigraph's, photograph's, serigraphs, physiography's,
physiography, digraphs, serigraph, stratigraphy's, stratigraphy
epigraphs
Hunspell 1.0.9: serigraph's, epigraph's, digraph's
Hunspell 1.1.0: SIGGRAPH's
--------------- end of examples --------------------
* improved testing environment with suggestion checking and memory debugging
memory debugging of all tests with a simple command:
VALGRIND=memcheck make check
* lots of other improvements and bug fixes (see ChangeLog)
2005-08-26: Hunspell 1.0.9 release
* improved related character map suggestion
* improved ngram suggestion
------ examples for ngram improvement (O=old, N = new ngram suggestions) --
1. Permenant (instead of Permanent)
O: Endangerment, Ferment, Fermented, Deferment's, Empowerment,
Ferment's, Ferments, Fermenting, Countermen, Weathermen
N: Permanent, Supermen, Preferment
Note: Ngram suggestions was case sensitive.
2. permenant (instead of permanent)
O: supermen, newspapermen, empowerment, endangerment, preferments,
preferment, permanent, preferment's, permanently, impermanent
N: permanent, supermen, preferment
Note: new suggestions are also weighted with longest common subsequence,
first letter and common character positions
3. pernemant (instead of permanent)
O: pimpernel's, pimpernel, pimpernels, permanently, permanents, permanent,
supernatant, impermanent, semipermanent, impermanently
N: permanent, supernatant, pimpernel
Note: new method also prefers root word instead of not
relevant affixes ('s, s and ly)
4. pernament (instead of permanent)
O: tournaments, tournament, ornaments, ornament's, ornamenting, ornamented,
ornament, ornamentals, ornamental, ornamentally
N: ornamental, ornament, tournament
Note: Both ngram methods misses here.
5. obvus (instad of obvious):
O: obvious, Corvus, obverse, obviously, Jacobus, obtuser, obtuse,
obviates, obviate, Travus
N: obvious, obtuse, obverse
Note: new method also prefers common first letters.
6. unambigus (instead of unambiguous)
O: unambiguous, unambiguity, unambiguously, ambiguously, ambiguous,
unambitious, ambiguities, ambiguousness
N: unambiguous, unambiguity, unambitious
7. consecvence (instead of consequence)
O: consecutive, consecutively, consecutiveness, nonconsecutive, consequence,
consecutiveness's, convenience's, consistences, consistence
N: consequence, consecutive, consecrates
An example in a language with rich morphology:
8. Misisipiben (instead of Mississippiben [`in Mississippi' in Hungarian]):
O: Misikédéiben, Pisisedéiben, Misikéiéiben, Pisisekéiben, Misikéiben,
Misikéidéiben, Misikékéiben, Misikéikéiben, Misikéiméiben, Mississippiiben
N: Mississippiben, Mississippiiben, Misiiben
Note: Suggesting not relevant affixes was the biggest fault in ngram
suggestion for languages with a lot of affixes.
--------------- end of examples --------------------
* support twofold prefix cutting
* lots of other improvements and bug fixes (see ChangeLog)
* test Hunspell with 54 OpenOffice.org dictionaries:
source: ftp://ftp.services.openoffice.org/pub/OpenOffice.org/contrib/dictionaries
testing shell script:
-------------------------------------------------------
for i in `ls *zip | grep '^[a-z]*_[A-Z]*[.]'`
do
dic=`basename $i .zip`
mkdir $dic
echo unzip $dic
unzip -d $dic $i 2>/dev/null
cd $dic
echo unmunch and test $dic
unmunch $dic.dic $dic.aff 2>/dev/null | awk '{print$0"\t"}' |
hunspell -d $dic -l -1 >$dic.result 2>$dic.err || rm -f $dic.result
cd ..
done
--------------------------------------------------------
test result (0 size is o.k.):
$ for i in *_*/*.result; do wc -c $i; done
0 af_ZA/af_ZA.result
0 bg_BG/bg_BG.result
0 ca_ES/ca_ES.result
0 cy_GB/cy_GB.result
0 cs_CZ/cs_CZ.result
0 da_DK/da_DK.result
0 de_AT/de_AT.result
0 de_CH/de_CH.result
0 de_DE/de_DE.result
0 el_GR/el_GR.result
6 en_AU/en_AU.result
0 en_CA/en_CA.result
0 en_GB/en_GB.result
0 en_NZ/en_NZ.result
0 en_US/en_US.result
0 eo_EO/eo_EO.result
0 es_ES/es_ES.result
0 es_MX/es_MX.result
0 es_NEW/es_NEW.result
0 fo_FO/fo_FO.result
0 fr_FR/fr_FR.result
0 ga_IE/ga_IE.result
0 gd_GB/gd_GB.result
0 gl_ES/gl_ES.result
0 he_IL/he_IL.result
0 hr_HR/hr_HR.result
200694989 hu_HU/hu_HU.result
0 id_ID/id_ID.result
0 it_IT/it_IT.result
0 ku_TR/ku_TR.result
0 lt_LT/lt_LT.result
0 lv_LV/lv_LV.result
0 mg_MG/mg_MG.result
0 mi_NZ/mi_NZ.result
0 ms_MY/ms_MY.result
0 nb_NO/nb_NO.result
0 nl_NL/nl_NL.result
0 nn_NO/nn_NO.result
0 ny_MW/ny_MW.result
0 pl_PL/pl_PL.result
0 pt_BR/pt_BR.result
0 pt_PT/pt_PT.result
0 ro_RO/ro_RO.result
0 ru_RU/ru_RU.result
0 rw_RW/rw_RW.result
0 sk_SK/sk_SK.result
0 sl_SI/sl_SI.result
0 sv_SE/sv_SE.result
0 sw_KE/sw_KE.result
0 tet_ID/tet_ID.result
0 tl_PH/tl_PH.result
0 tn_ZA/tn_ZA.result
0 uk_UA/uk_UA.result
0 zu_ZA/zu_ZA.result
In en_AU dictionary, there is an abbrevation with two dots (`eqn..'), but
`eqn.' is missing. Presumably it is a dictionary bug. Myspell also
haven't accepted it.
Hungarian dictionary contains pseudoroots and forbidden words.
Unmunch haven't supported these features yet, and generates bad words, too.
* check affix rules and OOo dictionaries. Detected bugs in cs_CZ,
es_ES, es_NEW, es_MX, lt_LT, nn_NO, pt_PT, ro_RO, sk_SK and sv_SE dictionaries).
Details:
--------------------------------------------------------
cs_CZ
warning - incompatible stripping characters and condition:
SFX D us ech [^ighk]os
SFX D us y [^i]os
SFX Q os ech [^ghk]es
SFX M o ech [^ghkei]a
SFX J ém ej ám
SFX J ém ejme ám
SFX J ém ejte ám
SFX A ou¾it up oupit
SFX A ou¾it upme oupit
SFX A ou¾it upte oupit
SFX A nout l [aeiouyáéíóúýùìr][^aeiouyáéíóúýùìrl][^aeiouy
SFX A nout l [aeiouyáéíóúýùìr][^aeiouyáéíóúýùìrl][^aeiouy
es_ES
warning - incompatible stripping characters and condition:
SFX W umar úse [ae]husar
SFX W emir iñáis eñir
es_NEW
warning - incompatible stripping characters and condition:
SFX I unan únen unar
es_MX
warning - incompatible stripping characters and condition:
SFX A a ote e
SFX W umar úse [ae]husar
SFX W emir iñáis eñir
lt_LT
warning - incompatible stripping characters and condition:
SFX U ti siuosi tis
SFX U ti siuosi tis
SFX U ti siesi tis
SFX U ti siesi tis
SFX U ti sis tis
SFX U ti sis tis
SFX U ti simës tis
SFX U ti simës tis
SFX U ti sitës tis
SFX U ti sitës tis
nn_NO
warning - incompatible stripping characters and condition:
SFX D ar rar [^fmk]er
SFX U Øre orde ere
SFX U Øre ort ere
pt_PT
warning - incompatible stripping characters and condition:
SFX g ãos oas ão
SFX g ãos oas ão
ro_RO
warning - bad field number:
SFX L 0 le [^cg] i
SFX L 0 i [cg] i
SFX U 0 i [^i] ii
warning - incompatible stripping characters and condition:
SFX P l i l [<- there is an unnecessary tabulator here)
SFX I a ii [gc] a
warning - bad field number:
SFX I a ii [gc] a
SFX I a ei [^cg] a
sk_SK
warning - incompatible stripping characters and condition:
SFX T µa» olú kla»
SFX T µa» olúc kla»
SFX T sµa» ¹lú sla»
SFX T sµa» ¹lúc sla»
SFX R µc» lèiem åc»
SFX R iás» ätie mias»
SFX R iez» iem [^i]ez»
SFX R iez» ie¹ [^i]ez»
SFX R iez» ie [^i]ez»
SFX R iez» eme [^i]ez»
SFX R iez» ete [^i]ez»
SFX R iez» ú [^i]ez»
SFX R iez» úc [^i]ez»
SFX R iez» z [^i]ez»
SFX R iez» me [^i]ez»
SFX R iez» te [^i]ez»
sv_SE
warning - bad field number:
SFX C 0 net nets [^e]n
--------------------------------------------------------
2005-08-01: Hunspell 1.0.8 release
- improved compound word support
- fix German S handling
- port MySpell files and MAP feature
2005-07-22: Hunspell 1.0.7 release
2005-07-21: new home page: http://hunspell.sourceforge.net