Elasticsearch asciifolding only for not-strict queries

31 views Asked by At

Is there a possibility to set the asciifolding filter to be used in only non-strict queries ? Or maybe I need to use somehow a different analyzer for strict queries ?

My index settings:

protected function createIndex(string $indexName, array $properties): array {
    return [
      'index' => $indexName,
      'body' => [
        'settings' => [
          'number_of_shards' => 1,
          'number_of_replicas' => 0,
          'analysis' => [
            'analyzer' => [
              'special_chars' => [
                'type' => 'custom',
                'tokenizer' => 'standard',
                'filter' => [
                  'preserve_asciifolding',
                ],
              ],
            ],
            'filter' => [
              'preserve_asciifolding' => [
                'type' => 'asciifolding',
                'preserve_original' => TRUE,
              ],
            ],
          ],
        ],
        'mappings' => [
          '_source' => [
            'enabled' => TRUE,
          ],
          'properties' => $properties,
        ],
      ],
    ];
  }

I have multiple indexes with different fields indexed, and my current query looks like this:

{
    "_source": [
        "first_name",
        "last_name",
        "title",
        "lead",
         ...
    ],
    "from": 0,
    "size": 20,
    "query": {
        "bool": {
            "must": [
                {
                    "query_string": {
                        "query": "\"Stephane\"",
                        "default_operator": "AND"
                    }
                },
                {
                    "nested": {
                        "path": "target",
                        "query": {
                            "bool": {
                                "should": [
                                    {
                                        "match": {
                                            "target.id": 1
                                        }
                                    }
                                ]
                            }
                        }
                    }
                },
                {
                    "nested": {
                        "path": "location",
                        "query": {
                            "bool": {
                                "should": [
                                    {
                                        "match": {
                                            "location.id": 1
                                        }
                                    }
                                ]
                            }
                        }
                    }
                }
            ],
            "filter": [],
            "should": []
        }
    },
    "aggs": {
      .....  
        }
    }
}

Thanks for help !

What I want accomplish is when I search:

query: Stephane expected result: results with both Stéphane and Stephane (this one works fine)

query: "Stephane" (strict query) expected result: results with only Stephane (this doesn't work because I get results with both Stéphane and Stephane)

1

There are 1 answers

3
Musab Dogan On

Thanks for explaining your request clearly. You can index your data with sub-fields, one can have analyzer one can have raw data. If the query include only strict values you can send query to raw field. In that way, you will only see the strict results.

The challenge arises when you send a query to Elasticsearch. Because Elasticsearch can't know if the query have special characters. If you can check in advance and send the query to specific field you can achive what you want. You can separate the queries and have multiple queries one for exact match one for analyzed match.

PUT /stephane
{
  "settings": {
    "number_of_shards": 1,
    "number_of_replicas": 0,
    "analysis": {
      "analyzer": {
        "special_chars": {
          "type": "custom",
          "tokenizer": "standard",
          "filter": ["preserve_asciifolding"]
        }
      },
      "filter": {
        "preserve_asciifolding": {
          "type": "asciifolding",
          "preserve_original": false
        }
      }
    }
  },
  "mappings": {
    "properties": {
      "field": {
        "type": "text",
        "analyzer": "special_chars",
        "fields": {
          "raw": {
            "type": "text"
          }
        }
      }
    }
  }
}

POST /stephane/_bulk
{"index":{"_id":"1"}}
{"field":"Stéphane"}
{"index":{"_id":"2"}}
{"field":"Stephane"}

#analyzed - 2 hits
GET stephane/_search
{
  "query": {
    "match": {
      "field": {
        "query": "Stephane"
      }
    }
  }
}

#exact match - 1 hit
GET stephane/_search
{
  "query": {
    "match": {
      "field.raw": {
        "query": "Stephane"
      }
    }
  }
}