Elasticsearch array of strings being tokenized even with no_analyzed in mapping

143 views Asked by At

This has been driving me nuts. I've got a few arrays in my data, here is a slimmed down version:

{
"fullName": "Jane Doe",
"comments": [],
"tags": [
    "blah blah tag 1",
    "blah blah tag 1"
],
"contactInformation": {
    "attachments": [
        "some file 1",
        "some file 2",
        "some file 3"
    ]
}
}

Ok so my mappings in elasticsearch are as follows:

curl -XPOST localhost:9200/myindex -d '{
"settings" : {
    "number_of_shards" : 1
},
"mappings" : {
    "docs" : {
        "properties" : {
            “tags” : { "type" : "string", "index" : "not_analyzed" }
            “attachments” : { "type" : "string", "index" : "not_analyzed" }
        }
    }
}
}'

Now if I display these as facets the tags appear fine, like so:

[ ] - blah blah tag 1

[ ] - blah blah tag 2

However the attachments are tokenized and I get a facet for every single word i.e.

[ ] - some

[ ] - file

[ ] - 1

I was thinking since the attachments property lives inside contactInformation, my mapping might need to look like this: “contactInformation.attachments” : { "type" : "string", "index" : "not_analyzed" }

But that threw an error, not expecting the dot.

Any ideas?

1

There are 1 answers

0
femtoRgon On BEST ANSWER

See the "Complex Core Field Types" documentation (in particular, the section titled "Mapping for Inner Objects").

It should look something like this:

"mappings" : {
  "docs" : {
    "properties" : {
      “tags” : { "type" : "string", "index" : "not_analyzed" },
      "contactInformation": {
        "type": "object",
        "properties": {
          “attachments” : { "type" : "string", "index" : "not_analyzed" }
        }
      }
    }
  }
}