I am having a hard time in finding the elastic search query unexpected results. Indexed the following documents into elastic search.
{
"group": "J00-I99", codes: [
{ "id": "J15", "description": "hello world" },
{ "id": "J15.0", "description": "test one world" },
{ "id": "J15.1", "description": "test two world J15.0" },
{ "id": "J15.2", "description": "test two three world J15" },
{ "id": "J15.3", "description": "hello world J18 " },
............................ // Similar records here
{ "id": "J15.9", "description": "hello world new" },
{ "id": "J16.0", "description": "new description" }
]
}
Here my aim is to implement autocomplete functionality and for that I used n-gram approach. I don't want to use complete suggester approach.
Currently I am stuck with two issues:
- Search query (both id and description fields ) : J15
Expected result: All the above results which includes J15 Actual result: Getting only few results (J15.0, J15.1, J15.8)
- Search query (both id and description fields ) : test two
Expected result:
{ "id": "J15.1", "description": "test two world J15.0" },
{ "id": "J15.2", "description": "test two three world J15" },
Actual Result:
{ "id": "J15.0", "description": "test one world" },
{ "id": "J15.1", "description": "test two world J15.0" },
{ "id": "J15.2", "description": "test two three world J15" },
Then mapping is done like this.
{
settings: {
number_of_shards: 1,
analysis: {
filter: {
ngram_filter: {
type: 'edge_ngram',
min_gram: 2,
max_gram: 20
}
},
analyzer: {
ngram_analyzer: {
type: 'custom',
tokenizer: 'standard',
filter: [
'lowercase', 'ngram_filter'
]
}
}
}
},
mappings: {
properties: {
group: {
type: 'text'
},
codes: {
type: 'nested',
properties: {
id: {
type: 'text',
analyzer: 'ngram_analyzer',
search_analyzer: 'standard'
},
description: {
type: 'text',
analyzer: 'ngram_analyzer',
search_analyzer: 'standard'
}
}
}
}
}
}
Search Query:
GET myindex/_search
{
"_source": {
"excludes": [
"codes"
]
},
"query": {
"nested": {
"path": "codes",
"query": {
"bool": {
"should": [
{
"match": {
"codes.description": "J15"
}
},
{
"match": {
"codes.id": "J15"
}
}
]
}
},
"inner_hits": {}
}
}
}
Note: Document index will be large in size. Here only sample data mentioned.
For the second issue, can i use multi_match with AND operator like the below?
GET myindex/_search
{
"_source": {
"excludes": [
"codes"
]
},
"query": {
"nested": {
"path": "codes",
"query": {
"bool": {
"should": [
{
"multi_match": {
"query": "J15",
"fields": ["codes.id", "codes.description"],
"operator": and
}
}
]
}
},
"inner_hits": {}
}
}
}
Any help would be really appreciated as I am having hard time in fixing this.
Adding another answer, as its a different issue and first answer was focused on first issue.
Issue is that your second query
test two
returnstest one world
as well as while indexing you are using thengram_analyzer
which is using the standard analyzer which split the text on white-spaces and again your search analyzer isstandard
so if you use the Analyze API on your indexed doc and search term, you will see it matches the tokens:And generated tokens
And for your search term
test two
As you can see
test
token was present in your document hence you get that search result. and it can be solved by using the AND operator in the query as shown belowSearch query
And search results