Using Elasticsearch for fast similarity scoring

114 views Asked by MLonzo At 13 June 2023 at 11:10

I have to find a solution to generate a fast similarity score (a weighted average between jaccard and sorensen-dice similarities) between a person's name and approx 1.5M names divided in 7 CSV lists.

Searching online I found that maybe Elasticsearch could be the tool that I'm looking for, but I would appreciate any feedbacks from anyone who worked on similar problems, and if they used ELK Stack or any other tool.

Any operating hint would be appreciated too. The solution that I have to develop has to return the similarity score of the most similar name (in terms of average of jaccard and dice similarity) with an input name for every list (there are 7) , if a perfect match isn't found, and has to do it in about 0.1s.

The actual solution features a java API that parallelizes the scoring operations after filtering the lists for the first two letters, but it slows down as the workload increases, and eventually it crashes. it has to process up to a peak of 50 searches/second

Original Q&A

TechQA.

Using Elasticsearch for fast similarity scoring

There are 0 answers

Related Questions in ELASTICSEARCH

Related Questions in TEXT

Related Questions in TEXT-MINING

Related Questions in SIMILARITY

Related Questions in SIMILARITY-SEARCH

Popular Questions

Trending Questions