Java library for Fuzzy Full-Text Search

947 views Asked by At

I am aware of possible usage of Lucene and Solr, but is there any simple Java library that is just doing the fuzzy full text search part like e.g.:

SomeScore score = fuzzyFullTextSearch(String text, String searchTerm, int maxDistance)

where ''score'' determines the measure, how frequent the (fuzzy) searchTerm was found and how similar it was to the original searchTerm.

The reason why I'm not using Lucene or similar, is the fact that it is to bulky for my use case and I need the search only once. The maxDistance for Edits using Lucene's FuzzyQuery is only 2, too, which is not good enough for my special use case.

Is there a lightweight library that can achieve sth. like shown above?

1

There are 1 answers

1
Mysterion On

As usual Apache Commons comes for the rescue.

org.apache.commons.lang3.StringUtils has plenty of methods for getting fuzzyDistance, levenshteinDistance, and some more complex metrics

So, naive pseudocode will be something like this:

split the text into tokens by spaces, commas, etc.
for each token
    calcDistanceBetweenTokenAndSearchTerm
getSumScore // or avg or whatever

Another approach could be to use commons-text org.apache.commons.text.similarity.FuzzyScore which is capable of calculating this distance between two strings, but of course a lot depends on exact requirements.

I'm not saying this is full coverage of the possible answers, but you could give it a try.