Whoosh fuzzy matching of the queried word list

Question

Whoosh fuzzy matching of the queried word list

896 views Asked by Deepali Semwal At 13 November 2014 at 05:38

By the fuzzy match here , I mean to find the documents which have like 60-70% of word matches from the word list in query.

Eg :

>> #(Query string as passed by user)
>> query =  i am searching for a document that is matched fuzzily with what i am giving here.
>> QueryParser("content", ix.schema).parse(query)

This query will look for documents with all the words but i want to find all those documents which contain at least 60% or more of the above words.

Since the count of words that I would be dealing with is large and I do not want programatically partitioning of this word set into different sets (for ORing).

Original Q&A

There are 1 answers

**Assem** · Answer 1 · 2015-05-28T07:58:27+00:00

This seems Not implemented yet in Whoosh (Checked 28/05/2015).

However, in the documentation of [whoosh.query.Or][1], there is a reference to a minmatch argument:

class whoosh.query.Or (subqueries, boost=1.0, minmatch=0, scale=None)

Parameters:

subqueries – a list of Query objects to search for.

boost – a boost factor to apply to the scores of all matching documents.

minmatch – not yet implemented.

scale – a scaling factor for a “coordination bonus”. If this value is not None, it should be a floating point number greater than 0 and less than 1. The scores of the matching documents are boosted/penalized based on the number of query terms that matched in the document. This number scales the effect of the bonuses.

If we supposed minmatch is the minimal matched keywords so the solution whould be like

from math import ceil
from whoosh.query import Or, Term
raw_query = "i am searching for a document that is matched fuzzily with what i am giving here."
min_ratio = ceil(len(raw_query) * 3.0 / 5.0)
query = Or([Term("content", word) for word in raw_query.split()], minmatch = min_ratio)

In this case, you should ignore stop filtering or you should filter the stopwords from the query before calculating the length of query.

TechQA.

Whoosh fuzzy matching of the queried word list

There are 1 answers

Related Questions in PYTHON

Related Questions in SEARCH

Related Questions in INDEXING

Related Questions in FUZZY-SEARCH

Related Questions in WHOOSH

Popular Questions

Popular Tags

Trending Questions