SQL - searching database with the LIKE operator

5.7k views Asked by At

Given your data stored somewhere in a database:

Hello my name is Tom I like dinosaurs to talk about SQL.  
SQL is amazing. I really like SQL.

We want to implement a site search, allowing visitors to enter terms and return relating records. A user might search for:

Dinosaurs

And the SQL:

WHERE articleBody LIKE '%Dinosaurs%'

Copes fine with returning the correct set of records.

How would we cope however, if a user mispells dinosaurs? IE:

Dinosores

(Poor sore dino). How can we search allowing for error in spelling? We can associate common misspellings we see in search with the correct spelling, and then search on the original terms + corrected term, but this is time consuming to maintain.

Any way programatically?

Edit

Appears SOUNDEX could help, but can anyone give me an example using soundex where entering the search term:

Dinosores wrocks

returns records instead of doing:

WHERE articleBody LIKE '%Dinosaurs%' OR articleBody LIKE '%Wrocks%'

which would return squadoosh?

6

There are 6 answers

5
James Wiseman On BEST ANSWER

If you're using SQL Server, have a look at SOUNDEX.

For your example:

select SOUNDEX('Dinosaurs'), SOUNDEX('Dinosores')

Returns identical values (D526) .

You can also use DIFFERENCE function (on same link as soundex) that will compare levels of similarity (4 being the most similar, 0 being the least).

SELECT DIFFERENCE('Dinosaurs', 'Dinosores'); --returns 4

Edit:

After hunting around a bit for a multi-text option, it seems that this isn't all that easy. I would refer you to the link on the Fuzzt Logic answer provided by @Neil Knight (+1 to that, for me!).

This stackoverflow article also details possible sources for implentations for Fuzzy Logic in TSQL. Once respondant also outlined Full text Indexing as a potential that you might want to investigate.

1
p.campbell On

Perhaps your RDBMS has a SOUNDEX function? You didn't mention which one was involved here.

1
KeithS On

Short answer, there is nothing built in to most SQL engines that can do dictionary-based correction of "fat fingers". SoundEx does work as a tool to find words that would sound alike and thus correct for phonetic misspellings, but if the user typed in "Dinosars" missing the final U, or truly "fat-fingered" it and entered "Dinosayrs", SoundEx would not return an exact match.

Sounds like you want something on the level of Google Search's "Did you mean __?" feature. I can tell you that is not as simple as it looks. At a 10,000-foot level, the search engine would look at each of those keywords and see if it's in a "dictionary" of known "good" search terms. If it isn't, it uses an algorithm much like a spell-checker suggestion to find the dictionary word that is the closest match (requires the fewest letter substitutions, additions, deletions and transpositions to turn the given word into the dictionary word). This will require some heavy procedural code, either in a stored proc or CLR Db function in your database, or in your business logic layer.

0
Neil Knight On

Just to throw an alternative out there. If SSIS is an option, then you can use Fuzzy Lookup.

SSIS Fuzzy Lookup

0
Ed Schembor On

I'm not sure if introducing a separate "search engine" is possible, but if you look at products like the Google search appliance or Autonomy, these products can index a SQL database and provide more searching options - for example, handling misspellings as well as synonyms, search results weighting, alternative search recommendations, etc.

Also, SQL Server's full-text search feature can be configured to use a thesaurus, which might help: http://msdn.microsoft.com/en-us/library/ms142491.aspx

Here is another SO question from someone setting up a thesaurus to handle common misspellings: FORMSOF Thesaurus in SQL Server

0
z atef On

You can also try the SubString(), to eliminate the first 3 or so characters . Below is an example of how that can be achieved

SELECT Fname, Lname  
FROM Table1 ,Table2
WHERE substr(Table1.Fname, 1,3) || substr(Table1.Lname,1 ,3) = substr(Table2.Fname, 1,3) || substr(Table2.Lname, 1 , 3))
ORDER BY Table1.Fname;