Simple PHP search page with three fields querying a MySQL database with millions of rows. The data consists of profiles, mostly USA.
Fulltext indexing increases speed by a factor of 9.6 when the search term is rare (eg "Canada").
Fulltext indexing slows speed by a factor of 3.6 when the search term is common (eg "United States").
Is there some way to enjoy the benefits of both technique? Maybe searching only a small portion of the table and constructing a "MATCH AGAINST" query if the term is rare (found in less than half of the records) and a standard "LIKE" query if the term is common? That sounds like a convoluted solution. I want a better way.
Here is my SQL and search times:
SELECT COUNT(1) FROM `mytable` WHERE `country` LIKE '%canada%'
1.14 seconds (slow)
SELECT COUNT(1) FROM `mytable` WHERE MATCH(tagline, location, country)
AGAINST(' +canada' IN BOOLEAN MODE) AND country LIKE '%canada%
0.12 seconds (fast)
SELECT COUNT(1) FROM `mytable` WHERE `country` LIKE '%united states%'
1.09 seconds (comparatively fast)
SELECT COUNT(1) FROM `mytable` WHERE MATCH(tagline, location, country)
AGAINST(' +united +states' IN BOOLEAN MODE) AND country LIKE '%united states%
3.9 seconds (slow)
My solution is to generate my own list of stopwords. I downloaded a sample of data and found the most common terms using Python. Common terms are excluded from MATCH queries through PHP. In my data, "United" and "States" are common terms:
Becomes:
When all of the search terms are "common" the MATCH statement is removed entirely:
Becomes:
The best of both worlds.