Performing searches on JSON data in Elasticsearch

941 views Asked by At

I have mapped JSON data into Elasticsearch via Logstash which has worked, it has imported the data in and I can see it in Elasticsearch-Head.

My problem is querying the data. I can run a search for a field but it returns the entire type within the index as a single search result. I have tried a few variations but have not had any luck.

Here is the logstash shipper file:

input {
   exec {
     type => "recom_db"
     command => "curl -s -X GET http://www.test.com/api/edselastic/recom_db.json"
     interval => 86400
     codec => "json"
   }
   exec {
     type => "recom_ki"
     command => "curl -s -X GET http://www.test.com/api/edselastic/recom_ki.json"
     interval => 86400
     codec => "json"
   }
   exec {
     type => "recom_un"
     command => "curl -s -X GET http://www.test.com/api/edselastic/recom_un.json"
     interval => 86400
     codec => "json"
   }
}
output {
        elasticsearch {
                host => localhost
                index => "lib-recommender-%{+yyyy.MM.dd}"
                template_name => "recommender-template"
        }
}

and the Elasticsearch index is in the following form:

{
    "_index": "lib-recommender-2015.06.11",
    "_type": "recom_un",
    "_id": "qoZE4aF-SkS--tq_8MhH4A",
    "_version": 1,
    "_score": 1,
    "_source": {
        "item": [{
            "name": "AAM219 -- reading lists",
            "link": "http://www.test.com/modules/aam219.html",
            "description": "AAM219 -- reading lists",
            "terms": {
                "term": ["AAM219"]
            }
        },
        {
            "name": "AAR410 -- reading lists",
            "link": "http://www.test.com/modules/aar410.html",
            "description": "AAR410 -- reading lists",
            "terms": {
                "term": ["AAR410"]
            }
        }
        ...

Anyway so I have tried querying the data the various ways I have seen in the Elasticsearch documentation but am unable to a desired result. Here is one of the many queries I have tried:

curl -XPOST "http://localhost:9200/lib-recommender/recom_un/_search" -d'
{
    "fields": ["item.name", "item.link"],
    "query":{
        "term": {
                "item.terms.term": "AAM219"
                        }
                }
        }
}'

but it returns the entire type within the index (with the correct fields selected but disjointed and all of them):

{
    "took": 13,
    "timed_out": false,
    "_shards": {
        "total": 5,
        "successful": 5,
        "failed": 0
    },
    "hits": {
        "total": 1,
        "max_score": 0.006780553,
        "hits": [{
            "_index": "lib-recommender-2015.06.11",
            "_type": "recom_un",
            "_id": "qoZE4aF-SkS--tq_8MhH4A",
            "_score": 0.006780553,
            "fields": {
                "item.link": ["http://www.test.com/modules/aam219.html",
                "http://www.test.com/modules/aar410.html",
                "http://www.test.com/modules/ac1201.html",
                "http://www.test.com/modules/aca401.html",

I am after the following result:

{
    "took": 5,
    "timed_out": false,
    "_shards": {
        "total": 5,
        "successful": 5,
        "failed": 0
    },
    "hits": {
        "total": 1,
        "max_score": 0.006780553,
        "hits": [{
            "_index": "lib-recommender-2015.06.11",
            "_type": "recom_un",
            "_id": "qoZE4aF-SkS--tq_8MhH4A",
            "_score": 0.006780553,
            "_source": {
                "item": [{
                    "name": "AAM219 -- reading lists",
                    "link": "http://www.test.com/modules/aam219.html",
                    "description": "AAM219 -- reading lists",
                    "terms": {
                        "term": ["AAM219"]
                    }
                }
            }
        }
    }
}

What am I missing? Is the mapping of the index wrong for this kind of search (so should I manually make a mapping file for elasticsearch before importing data). Is there a parameter I am missing in the query? I have been looking for an answer but feel like I am running around in circles now, I am guessing it is something simple that I am overlooking but not sure.

2

There are 2 answers

0
adamst85 On BEST ANSWER

Props to Val's answer above. It was mostly what but with another level of nesting. Here is the mapping:

{
  "recom_un": {
    "properties": {
      "item": {
        "type": "nested",
        "properties": {
          "name": {
            "type": "string"
          },
          "link": {
            "type": "string"
          },
          "description": {
            "type": "string"
          },
          "terms": {
            "type": "nested",
            "properties": {
              "term": {
                "type": "string"
              }
            }
          }
        }
      }
    }
  }
}

The search query I used to get what I wanted:

curl -XPOST "http://localhost:9200/lib-recommender/recom_un/_search" -d'
{
  "_source": false,
  "query": {
    "filtered": {
      "filter": {
        "nested": {
          "path": "item",
          "query": {
            "nested": {
              "path": "item.terms",
              "query": {
                "match": {
                  "term": "AAM219"
                }
              }
            }
          },
          "inner_hits": { }
        }
      }
    }
  }
}'
4
Val On

Yes, for this kind of use case to work you need to create a custom mapping and make sure your item structure is of type nested otherwise all the fields in your item will be collapsed together as you saw in the results you showed.

So the mapping needs to be like this:

{
  "recom_un": {
    "properties": {
      "item": {
        "type": "nested",
        "properties": {
          "name": {
            "type": "string"
          },
          "link": {
            "type": "string"
          },
          "description": {
            "type": "string"
          },
          "terms": {
            "properties": {
              "term": {
                "type": "string"
              }
            }
          }
        }
      }
    }
  }
}

Then you can modify your query a bit to use a nested query instead like this. Also note I'm including the inner_hits so your results only include the matching nested document:

curl -XPOST "http://localhost:9200/lib-recommender/recom_un/_search" -d'
{
  "fields": [
    "item.name",
    "item.link"
  ],
  "query": {
    "nested": {
      "path": "item",
      "query": {
        "term": {
          "item.terms.term": "AAM219"
        }
      },
      "inner_hits": {}
    }
  }
}'