ElasticSearch: snowball not working?

612 views Asked by At

I build the following:

curl -XDELETE "http://localhost:9200/testindex"
curl -XPOST "http://localhost:9200/testindex" -d'
{
  "mappings" : {
    "article" : {
      "dynamic" : false,
      "properties" : {
            "text" : {
              "type" : "string",
          "analyzer" : "snowball"
        }
      }
    }
  }
}'

... I populate the following:

curl -XPUT "http://localhost:9200/testindex/article/1" -d'{"text": "grey"}'
curl -XPUT "http://localhost:9200/testindex/article/2" -d'{"text": "gray"}'
curl -XPUT "http://localhost:9200/testindex/article/3" -d'{"text": "greyed"}'
curl -XPUT "http://localhost:9200/testindex/article/4" -d'{"text": "greying"}'

... I see the following when I search:

curl -XPOST "http://localhost:9200/testindex/_search" -d'
{
     "query": {
         "query_string": {
             "query": "grey",
             "analyzer" : "snowball"
         }
     }
}'

result is

{
  "took": 2,
  "timed_out": false,
  "_shards": {
    "total": 5,
    "successful": 5,
    "failed": 0
  },
  "hits": {
    "total": 1,
    "max_score": 0.30685282,
    "hits": [
      {
        "_index": "testindex",
        "_type": "article",
        "_id": "1",
        "_score": 0.30685282,
        "_source": {
          "text": "grey"
        }
      }
    ]
  }
}

... I'm expecting 3 hits: grey, greyed, and greying. Why doesn't this work? Note that I'm not interested in adding fuzziness to the search, since that will by default match on gray (but not greying).

what I'm doing wrong here?

1

There are 1 answers

0
James R On BEST ANSWER

Your problem is you are using query_string and not defining a default_field, so it's searching against the _all field which is using your default analyzer (standard most likely).

To fix this, do this:

curl -XPOST "http://localhost:9200/testindex/_search" -d'
{
     "query": {
         "query_string": {
             "default_field": "text",
             "query": "grey"}
         }
     }
}'

{"took":7,"timed_out":false,"_shards":{"total":5,"successful":5,"failed":0},"hits":{"total":3,"max_score":0.30685282,"hits":[{"_index":"testindex","_type":"article","_id":"4","_score":0.30685282, "_source" : {"text": "greying"}},{"_index":"testindex","_type":"article","_id":"1","_score":0.30685282, "_source" : {"text": "grey"}},{"_index":"testindex","_type":"article","_id":"3","_score":0.30685282, "_source" : {"text": "greyed"}}]}}

I try to stay away from query_string searching though, unless I really can't avoid it. Sometimes, people coming from solr like this method of searching over the search dsl. In this case, try using match:

curl -XPOST "http://localhost:9200/testindex/_search" -d'
> {
>      "query": {
>          "match": {
>              "text": "grey"
>          }
>      }
> }'
{"took":5,"timed_out":false,"_shards":{"total":5,"successful":5,"failed":0},"hits":{"total":3,"max_score":0.30685282,"hits":[{"_index":"testindex","_type":"article","_id":"4","_score":0.30685282, "_source" : {"text": "greying"}},{"_index":"testindex","_type":"article","_id":"1","_score":0.30685282, "_source" : {"text": "grey"}},{"_index":"testindex","_type":"article","_id":"3","_score":0.30685282, "_source" : {"text": "greyed"}}]}}

But either way yields the correct results.

See documentation here for the query_string:

http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/query-dsl-query-string-query.html