The Hyphen library seems to be a very popular and free way to have hyphenation in your app.
What does hyphenation vector mean?
I am running the example attached to the library source code. Example output:
hibernate // input word
030412000 // output hyphenation vector
hi=ber=nate // hyphen points
- hi=bernate
- hiber=nate
Odd numbers in the vector indicate hyphenation points. But what do all of those values mean?
László Németh describes the algorithm in OpenOffice's documentation in full detail.
The library uses the algorithm developed by Frank M. Liang ("Word Hy-phen-a-tion by Com-pu-ter"): all letters in digrams, trigrams, and longer patterns are assigned numerical values to indicate it's a 'usual' place (an odd number) or an 'unusual' place (an even number) for a hyphen to occur. The higher the number is, the greater importance -- a pattern will almost never be broken on a larger even number, and almost always on a larger odd number. The number sequences are statistically determined on a corpus of pre-hyphenated words.
Note that the numbers are for positions between two characters. A better notation would have been
(where the last
0
is obsolete).