What does "hyphenation vector" mean?

171 views Asked by At

The Hyphen library seems to be a very popular and free way to have hyphenation in your app.

What does hyphenation vector mean?

I am running the example attached to the library source code. Example output:

hibernate   // input word
030412000   // output hyphenation vector
hi=ber=nate  // hyphen points
 - hi=bernate
 - hiber=nate

Odd numbers in the vector indicate hyphenation points. But what do all of those values mean?

1

There are 1 answers

2
Jongware On BEST ANSWER

László Németh describes the algorithm in OpenOffice's documentation in full detail.

The library uses the algorithm developed by Frank M. Liang ("Word Hy-phen-a-tion by Com-pu-ter"): all letters in digrams, trigrams, and longer patterns are assigned numerical values to indicate it's a 'usual' place (an odd number) or an 'unusual' place (an even number) for a hyphen to occur. The higher the number is, the greater importance -- a pattern will almost never be broken on a larger even number, and almost always on a larger odd number. The number sequences are statistically determined on a corpus of pre-hyphenated words.

Note that the numbers are for positions between two characters. A better notation would have been

h i b e r n a t e
 0 3 0 4 1 2 0 0 (0)

(where the last 0 is obsolete).