SQL Match Against Fulltext Index is slow when searching common terms

879 views Asked by At

Simple PHP search page with three fields querying a MySQL database with millions of rows. The data consists of profiles, mostly USA.

Fulltext indexing increases speed by a factor of 9.6 when the search term is rare (eg "Canada").

Fulltext indexing slows speed by a factor of 3.6 when the search term is common (eg "United States").

Is there some way to enjoy the benefits of both technique? Maybe searching only a small portion of the table and constructing a "MATCH AGAINST" query if the term is rare (found in less than half of the records) and a standard "LIKE" query if the term is common? That sounds like a convoluted solution. I want a better way.

Here is my SQL and search times:

SELECT COUNT(1) FROM `mytable` WHERE `country` LIKE '%canada%' 

1.14 seconds (slow)

    SELECT COUNT(1) FROM `mytable` WHERE MATCH(tagline, location, country)
AGAINST(' +canada' IN BOOLEAN MODE) AND country LIKE '%canada%

0.12 seconds (fast)

SELECT COUNT(1) FROM `mytable` WHERE `country` LIKE '%united states%'

1.09 seconds (comparatively fast)

 SELECT COUNT(1) FROM `mytable` WHERE MATCH(tagline, location, country)
AGAINST(' +united +states' IN BOOLEAN MODE) AND country LIKE '%united states%

3.9 seconds (slow)

2

There are 2 answers

0
Ned Hulton On BEST ANSWER

My solution is to generate my own list of stopwords. I downloaded a sample of data and found the most common terms using Python. Common terms are excluded from MATCH queries through PHP. In my data, "United" and "States" are common terms:

$SQL =  "SELECT COUNT(1) FROM `mytable` WHERE MATCH(tagline, location, country)
AGAINST('+sales +united +states' IN BOOLEAN MODE) AND country LIKE '%united states% AND tagline LIKE '%sales%'"

Becomes:

$SQL =  "SELECT COUNT(1) FROM `mytable` WHERE MATCH(tagline, location, country)
AGAINST('+sales' IN BOOLEAN MODE) AND country LIKE '%united states% AND tagline LIKE '%sales%'"

When all of the search terms are "common" the MATCH statement is removed entirely:

$SQL =  "SELECT COUNT(1) FROM `mytable` WHERE MATCH(tagline, location, country)
AGAINST('+united +states' IN BOOLEAN MODE) AND country LIKE '%united states%"

Becomes:

$SQL =  "SELECT COUNT(1) FROM `mytable` WHERE country LIKE '%united states%"

The best of both worlds.

1
Rick James On

If you could know that the search is on a country name and that you need to look only in the country column, then this would be much faster than any of the above:

... WHERE country = 'canada'

What were the COUNTs from each query? That would help explain the timings. How many rows in the table?

Meanwhile, your FT queries are not the same as your LIKE queries. So, I don't want to discuss why the speeds are different.