I'm currently working on a search engine. I do it with php / mongodb.
The new feature Full Text Search does quite well a big part of the job, but I have a situation.
Here is an example:
I search "Pizza -restaurant"
(I'm french, but the words are transparents)
With the negative terms, there is a lot of documents with "restaurant" that are removed.
But there is still 3 or 4 documents with "restaurant" inside.
In these documents, "restaurant" is like any other words. It's separated with space, with no special character. It written in upper-case. (But the upper-case seems not to be the reason)
If it helps, the debug string is "[queryDebugString] => pizza||restaur||||"
And here is an example of a document that is not removed :
BAR - RESTAURANT LE ST MICHAL CAMPAGNARD, BAR - RESTAURANT LE ST MICHAL
or
HOTEL - RESTAURANT rd 1120 19460 auberge de la route Spécialités gastronomiques du terroir
Edit : Here is the command to do the search:
$result = $this->_dbLocal->command(
array(
'text' => 'boutique', //this is the name of the collection where we are searching
'search' => $q, //the string to search
// 'language' => 'french',
'limit' => 500,
)
);
EDIT : With some test, the negative terms works well with language: none
in the search and the index. But with language: none
, my search does not use stop words anymore, which were really useful ...
Is there any way to use stop words for everything but negative terms ? :/
Thank you for your time !
Gilles.
Cross-referencing this with the mongodb-user thread on the same issue. This was confirmed as a bug in SERVER-11994 and has since been fixed in 2.5.5 and the forthcoming 2.6 release.