forked from hunspell/hyphen
-
Notifications
You must be signed in to change notification settings - Fork 1
/
README.compound
87 lines (61 loc) · 2.9 KB
/
README.compound
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
New option of Libhyphen 2.7: NOHYPHEN
Hyphen, apostrophe and other characters may be word boundary characters,
but they don't need (extra) hyphenation. With NOHYPHEN option
it's possible to hyphenate the words parts correctly.
Example:
ISO8859-1
NOHYPHEN -,'
1-1
1'1
NEXTLEVEL
Description:
1-1 and 1'1 declare hyphen and apostrophe as word boundary characters
and NOHYPHEN with the comma separated character (or character sequence)
list forbid the (extra) hyphens at the hyphen and apostrophe characters.
Implicite NOHYPHEN declaration
Without explicite NEXTLEVEL declaration, Hyphen 2.8 uses the
previous settings, plus in UTF-8 encoding, endash (U+2013) and
typographical apostrophe (U+2019) are NOHYPHEN characters, too.
It's possible to enlarge the hyphenation distance from these
NOHYPHEN characters by using COMPOUNDLEFTHYPHENMIN and
COMPOUNDRIGHTHYPHENMIN attributes.
Compound word hyphenation
Hyphen library supports better compound word hyphenation and special
rules of compound word hyphenation of German languages and other
languages with arbitrary number of compound words. The new options,
COMPOUNDLEFTHYPHENMIN and COMPOUNDRIGHTHYPHENMIN help to set the right
style for the hyphenation of compound words.
Algorithm
The algorithm is an extension of the original pattern based hyphenation
algorithm. It uses two hyphenation pattern sets, defined in the same
pattern file and separated by the NEXTLEVEL keyword. First pattern
set is for hyphenation only at compound word boundaries, the second one
is for hyphenation within words or word parts.
Recursive compound level hyphenation
The algorithm is recursive: every word parts of a successful
first (compound) level hyphenation will be rehyphenated
by the same (first) pattern set.
Finally, when first level hyphenation is not possible, Hyphen uses
the second level hyphenation for the word or the word parts.
Word endings and word parts
Patterns for word endings (patterns with ellipses) match the
word parts, too.
Options
COMPOUNDLEFTHYPHENMIN: min. hyph. dist. from the left compound word boundary
COMPOUNDRIGHTHYPHENMIN: min. hyph. dist. from the right comp. word boundary
NEXTLEVEL: sign second level hyphenation patterns
Default hyphenmin values
Default values of COMPOUNDLEFTHYPHENMIN and COMPOUNDRIGHTHYPHENMIN are 0,
and 0 under the hyphenation, too. ("0" values of
LEFTHYPHENMIN and RIGHTHYPHENMIN mean the default "2" under the hyphenation.)
Examples
See tests/compound* test files.
Preparation of hyphenation patterns
It hasn't been special pattern generator tool for compound hyphenation
patterns, yet. It is possible to use PATGEN to generate both of
pattern sets, concatenate it manually and set the requested HYPHENMIN values.
(But don't forget the preprocessing steps by substrings.pl before
concatenation.) One of the disadvantage of this method, that PATGEN
doesn't know recursive compound hyphenation of Hyphen.
László Németh
<nemeth (at) openoffice.org>