I'm using JLanguageTool with the German language (de-DE
) for spellchecking and noticed that digits seems to be used as a word separator (just like spaces?). For example We8lt
is not reported as a single incorrect word but as two spelling erros (one for We
and one for lt
). Or for example bis8
is not reported as an error at all.
Example call (I'm using it as a Java library but the behaviour is the same):
$ echo "Hallo We8lt bis8 Test" | java -jar languagetool-commandline.jar -l de-DE -
Expected text language: German (Germany)
Working on STDIN...
1.) Line 1, column 7, Rule ID: GERMAN_SPELLER_RULE prio=-3
Message: Möglicher Tippfehler gefunden.
Suggestion: WE; Der; Den; Des; Dem
Hallo We8lt bis8 Test
^^
2.) Line 1, column 10, Rule ID: GERMAN_SPELLER_RULE prio=-3
Message: Möglicher Tippfehler gefunden.
Suggestion: LT; als; lag; alt; elf
Hallo We8lt bis8 Test
^^
Time: 1618ms for 1 sentences (0.6 sentences/sec)
This is a big problem for as as for example missing spaces between words and numbers are not found. How can I get the library/tool to do not treat numbers as word separators? Thanks a lot.
Yes, you are right: LanguageTool treats numbers as word separators in German.
To modify this behaviour, you have to change the source code and change this line in GermanSpellerRule.java from
to
Alternatively, you could add another rule to grammar.xml which complains about missing spaces before/after numbers: