Elasticsearch data model

547 views Asked by At

I'm currently parsing text from internal résumés in my company. The goal is to index everything in elasticsearch to perform search on them.

for the moment I have the following JSON document with no mapping defined :

Each coworker has a list of project with the client name

{
name: "Jean Wisser"
position: "Junior Developer"
"projects": [
        {
            "client": "SutrixMedia",
            "missions": [
                "Responsible for the quality on time and within budget",
                "Writing specs, testing,..."
            ],
            "technologies": "JIRA/Mantis/Adobe CQ5 (AEM)"
        },
        {
            "client": "Société Générale",
            "missions": [
                " Writing test cases and scenarios",
                " UAT"
             ],
            "technologies": "HP QTP/QC"
        }
    ]
}

The 2 main questions we would like to answer are :

  1. Which coworker has already worked in this company ?
  2. Which client use this technology ?

The first question is really easy to answer, for example: Projects.client="SutrixMedia" returns me the right resume.

But how can I answer to the second one ?

I would like to make a query like this : Projects.technologies="HP QTP/QC" and the answer would be only the client name ("Société Générale" in this case) and NOT the entire document.

Is it possible to get this answer by defining a mapping with nested type ? Or should I go for a parent/child mapping ?

1

There are 1 answers

2
Val On BEST ANSWER

Yes, indeed, that's possible with ES 1.5.* if you map projects as nested type and then retrieve nested inner_hits.

So here goes the mapping for your sample document above:

curl -XPUT localhost:9200/resumes -d '
{
  "mappings": {
    "resume": {
      "properties": {
        "name": {
          "type": "string"
        },
        "position": {
          "type": "string"
        },
        "projects": {
          "type": "nested",        <--- declare "projects" as nested type
          "properties": {
            "client": {
              "type": "string",
              "fields": {
                "raw": {
                  "type": "string",
                  "index": "not_analyzed"
                }
              }
            },
            "missions": {
              "type": "string"
            },
            "technologies": {
              "type": "string",
              "fields": {
                "raw": {
                  "type": "string",
                  "index": "not_analyzed"
                }
              }
            }
          }
        }
      }
    }
  }
}'

Then, you can index your sample document from above:

curl -XPUT localhost:9200/resumes/resume/1 -d '{...}'

Finally, with the following query which only retrieves the nested inner_hits you can retrieve only the nested object that matches Projects.technologies="HP QTP/QC"

curl -XPOST localhost:9200/resumes/resume/_search -d '
{
  "_source": false,
  "query": {
    "nested": {
      "path": "projects",
      "query": {
        "term": {
          "projects.technologies.raw": "HP QTP/QC"
        }
      },
      "inner_hits": {           <----- only retrieve the matching nested document
        "_source": "client"     <----- and only the "client" field 
      }
    }
  }
}'

which yields only the client name instead of the whole matching document:

{
  "took" : 2,
  "timed_out" : false,
  "_shards" : {
    "total" : 5,
    "successful" : 5,
    "failed" : 0
  },
  "hits" : {
    "total" : 1,
    "max_score" : 1.4054651,
    "hits" : [ {
      "_index" : "resumes",
      "_type" : "resume",
      "_id" : "1",
      "_score" : 1.4054651,
      "inner_hits" : {
        "projects" : {
          "hits" : {
            "total" : 1,
            "max_score" : 1.4054651,
            "hits" : [ {
              "_index" : "resumes",
              "_type" : "resume",
              "_id" : "1",
              "_nested" : {
                "field" : "projects",
                "offset" : 1
              },
              "_score" : 1.4054651,
              "_source":{"client":"Société Générale"}  <--- here is the client name
            } ]
          }
        }
      }
    } ]
  }
}