I want to find the closest matching string in a database column for a given string. After searching I came up with the following table and query
CREATE TABLE docs (id NUMBER PRIMARY KEY, text VARCHAR2(200));
INSERT INTO docs VALUES(1, 'California is a state in the US.');
INSERT INTO docs VALUES(2, 'Paris is a city in France.');
INSERT INTO docs VALUES(3, 'France is in Europe.');
INSERT INTO docs VALUES(4, 'Paris');
CREATE INDEX idx_docs ON docs(text)
INDEXTYPE IS CTXSYS.CONTEXT PARAMETERS
('DATASTORE CTXSYS.DEFAULT_DATASTORE');
SELECT SCORE(1), id, text
FROM docs
WHERE CONTAINS(text, 'fuzzy(Parsi,1,1)', 1) > 0;
I have set the similarity score to the minimum, i.e. 1. It works for search strings like "Parsi" or "Parse". It gives me the wanted results. But if the search string is too small like "par" or "pa" it does not show me any results.
What should I do to get the closest match, even when using very short strings to search?
You're basically hitting a limit in the fuzzy operator
and oracle doesn't index shorter strings unless you change the default:
In this case you might actually have to combine fuzzy and wildcard queries using query rewrite/relaxation. In my experience, wildcard expansion tends to significantly slow down everything, although maybe it's just a matter of the right index configuration.