Using Elasticsearch for fast similarity scoring

89 views Asked by At

I have to find a solution to generate a fast similarity score (a weighted average between jaccard and sorensen-dice similarities) between a person's name and approx 1.5M names divided in 7 CSV lists.

Searching online I found that maybe Elasticsearch could be the tool that I'm looking for, but I would appreciate any feedbacks from anyone who worked on similar problems, and if they used ELK Stack or any other tool.

Any operating hint would be appreciated too. The solution that I have to develop has to return the similarity score of the most similar name (in terms of average of jaccard and dice similarity) with an input name for every list (there are 7) , if a perfect match isn't found, and has to do it in about 0.1s.

The actual solution features a java API that parallelizes the scoring operations after filtering the lists for the first two letters, but it slows down as the workload increases, and eventually it crashes. it has to process up to a peak of 50 searches/second

0

There are 0 answers