Spring Data Elasticsearch does not give expected result

1.1k views Asked by At

I am using Spring data elasticsearch to query in my elastic documents. My Elasticsearch entity class:

//all the annotation things i.e lombok, de/serializer etc
@Document(indexName = "project", type = "project")
@EqualsAndHashCode
public class ProjectEntity extends CommonProperty implements Serializable {
    @Id
    private String id;
    private String projectName;
    private String description;
    private String parentProjectId;
    private Long projectOwner;
    private String projectOwnerName;
    private Long projectManager;
    private String projectManagerName;
    private String departmentId;
    private String status;
    private String organizationId;

    @Field(type = FieldType.Nested)
    private List<ActionStatusEntity> actionStatusList= new ArrayList<>();

    @Field(type = FieldType.Nested)
    private List<TeamMember> teamMemberList;

    @Field(type = FieldType.Nested)
    private List<UserDefineProperty> riskList;

}

I have done the other things like settings repositories, avoiding for brevity. Data Search:

    projectRepository.findByOrganizationIdAndProjectName(userEntity.getOrganizationId().toString() ,projectRequest.getProjectName().trim());
//userEntity.getOrganizationId().toString()="28", projectName="Team Test"

Spring generated query for above call:

{
  "from": 0,
  "size": 10000,
  "query": {
    "bool": {
      "must": [
        {
          "query_string": {
            "query": "28",
            "fields": [
              "organizationId^1.0"
            ],
            "type": "best_fields",
            "default_operator": "and",
            "max_determinized_states": 10000,
            "enable_position_increments": true,
            "fuzziness": "AUTO",
            "fuzzy_prefix_length": 0,
            "fuzzy_max_expansions": 50,
            "phrase_slop": 0,
            "escape": false,
            "auto_generate_synonyms_phrase_query": true,
            "fuzzy_transpositions": true,
            "boost": 1
          }
        },
        {
          "query_string": {
            "query": "Team Test",
            "fields": [
              "projectName^1.0"
            ],
            "type": "best_fields",
            "default_operator": "and",
            "max_determinized_states": 10000,
            "enable_position_increments": true,
            "fuzziness": "AUTO",
            "fuzzy_prefix_length": 0,
            "fuzzy_max_expansions": 50,
            "phrase_slop": 0,
            "escape": false,
            "auto_generate_synonyms_phrase_query": true,
            "fuzzy_transpositions": true,
            "boost": 1
          }
        }
      ],
      "adjust_pure_negative": true,
      "boost": 1
    }
  },
  "version": true
}

Query Result:

{
  "took" : 8,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 3,
      "relation" : "eq"
    },
    "max_score" : 4.1767306,
    "hits" : [
      {
        "_index" : "project",
        "_type" : "project",
        "_id" : "215",
        "_version" : 2,
        "_score" : 4.1767306,
        "_source" : {
          "projectName" : "team member only test",
          "description" : "team member only test",
          "projectOwner" : 50,
          "projectOwnerName" : "***",
          "departmentId" : "team member only test",
          "organizationId" : "28"
        }
      },
      {
        "_index" : "project",
        "_type" : "project",
        "_id" : "408",
        "_version" : 17,
        "_score" : 4.1767306,
        "_source" : {
         
          "projectName" : "Category & Team adding test",
          "description" : "Category & Team adding test",
          "projectOwner" : 50,
          "projectOwnerName" : "***",
          "projectManager" : 50,
          "projectManagerName" : "***",
          "departmentId" : "cat",

          "organizationId" : "28"
        }
      },
      {
        "_index" : "project",
        "_type" : "project",
        "_id" : "452",
        "_version" : 4,
        "_score" : 3.4388955,
        "_source" : {
         
          "projectName" : "team member not in system test",
          "description" : "id-452",
          "projectOwner" : 53,
          "projectOwnerName" : "***",
          "projectManager" : 202,
          "projectManagerName" : "***",
          "departmentId" : "abc",
          "organizationId" : "28",
        }
      }
    ]
  }
}

Look at the resultset, the projectName field-value was checked like contains method! It didn't check for the full given params.
Why this is happening? how to solve them?
Add: organizationId and projectName fields were set as fieldData=true

2

There are 2 answers

2
P.J.Meisch On

The query that Spring Data Elasticsearch derives from the method name is a Elasticsearch string-query with the given arguments as you noticed. For these Elasticsearch analyzes and parses the terms and then does the search for the documents that have these terms in the same order.

Your query with "Team Test" has two terms, "team" and "test", and all the documents you show have these terms in the project name, so they are returned.

If you had a document with "Team Test" and no other terms between these two, this would be returned with a higher score.

This implementation is choosen because it is what normally is expected when searching in Elasticsearch. Image having an index with names and searching for "Harry Miller" would not find a document with "Harry B. Miller".

You can write a custom repository method that builds a query that's fulfilling your needs and use that instead. Or, if you always want to do exact searches on this field, you could define it as a keyword field to prevent parsing and analyzing.

You could use a match_phrase query with this repository method definition (only using one parameter here, you'd need to add the organization id, but then the resulting query would be too complex for this small code sample):

@Query("{\"match_phrase\": {\"projectName\": \"?0\"}}\n")
SearchHits<ProjectEntity> findByProjectName(String name);
3
ESCoder On

I am not aware of Spring Data Elasticsearch, but adding a working example with index data, search query, and search result in JSON format

Index Data:

Indexed all the above three documents(given in question), and inserted the fourth document as shown below.

{
    "projectName": "team test",
    "description": "id-452",
    "projectOwner": 53,
    "projectOwnerName": "***",
    "projectManager": 202,
    "projectManagerName": "***",
    "departmentId": "abc",
    "organizationId": "28"
}

Search Query:

{
  "query": {
    "bool": {
      "must": [
        {
          "match": {
            "organizationId": 28
          }
        },
        {
          "multi_match": {
            "query": "Team Test",
            "type": "phrase",
            "fields": [
              "projectName"
            ]
          }
        }
      ]
    }
  }
}

Search Result:

"hits": [
      {
        "_index": "stof_64151693",
        "_type": "_doc",
        "_id": "4",
        "_score": 0.5003766,
        "_source": {
          "projectName": "team test",
          "description": "id-452",
          "projectOwner": 53,
          "projectOwnerName": "***",
          "projectManager": 202,
          "projectManagerName": "***",
          "departmentId": "abc",
          "organizationId": "28"
        }
      }
    ]