Solr Negative Boost Query result containing Some Specific Words

1.3k views Asked by At

I have a field item_name, indexed in Solr 5.0.0. How do I give a negative boost to the query result that contains some specific words?

For example, let's suppose I have item_name like:

Feggi Brown Laptop Bags
Dell Laptop (Black) without Laptop Bag by Dell
HP Laptop with Laptop Bag
Sony laptop without bag
Goldendays Laptop Bag

If I search for laptop bags then it returns the results like below:

Dell Laptop (Black) without Laptop Bag by Dell
HP Laptop with Laptop Bag
Feggi Brown Laptop Bags
Sony laptop without bag
Goldendays Laptop Bag

How can I give a negative or low boost to the item_name that contains the word like:

with, without, ...

With the goal that the item_name containing these words will not be on the top of the result?

NB: is there any relation with stopwords in this context?.

2

There are 2 answers

3
alexf On

The documentation of Solr can help you:

True negative boosts are not supported, but you can use a very "low" numeric boost value on query clauses. In general the problem that confuses people is that a "low" boost is still a boost, it can only improve the score of documents that match. For example, if you want to find all docs matching "foo" or "bar" but penalize the scores of documents matching "xxx" you might be tempted to try...

q = foo^100 bar^100 xxx^0.00001    # NOT WHAT YOU WANT 

...but this will still help a document matching all three clauses score higher then a document matching only the first two. One way to fake a "negative boost" is to give a large boost to everything that does not match. For example...

q =  foo^100 bar^100 (*:* -xxx)^999

So in your case, you have to do something like:

q = item_name:laptop^100 item_name:bags^100 (*:* -item_name:with)^99 (*:* -item_name:without)^99

If you are using (e)dismax, Solr's documentation tells that:

When using (e)dismax, people sometimes expect that specifying a pure negative query with a large boost in the "bq" param will work (since Solr automatically makes top level purely negative positive queries by adding an implicit ":" --) but this doesn't work with "bq", because of how queries specified via "bq" are added directly to the main query. You need to be explicit...

? defType = dismax 
& q = foo bar 
& bq = (*:* -xxx)^999

In your case, it doesn't seem that there is a connection with stopwords.

0
teemutalja On

In DisMax, you can reduce the relevance score of documents that have word 'with' or 'without' in field 'item_name' by using the following code:

   - ['bf', "if(or(tf(item_name,'with'),tf(item_name,'without')),-5,0)"]

This syntax works in Dismax, and also with Solr and EDisMax parsers as long as I know. The code above is in yaml format, which is used for relevance settings in VuFind.

Yes, there is some relation with stop words. For example, with word 'with' in stop word list, when you search salad with tomato, result set would be same as for salad tomato. The presence of word 'with' in documents would not affect the order of the result set.