Jest provides a brilliant async API for elasticsearch, we find it very usefull. However, sometimes it turns out that resulting requests are slightly different than what we would expect.
Usually we didn't care, since everything was working fine, but in this case it was not.
I want to create an index with a custom ngram analyzer. When I do this following the elasticsearch rest API docs, I call below:
curl -XPUT 'localhost:9200/test' --data '
{
"settings": {
"number_of_shards": 3,
"analysis": {
"filter": {
"keyword_search": {
"type": "edge_ngram",
"min_gram": 3,
"max_gram": 15
}
},
"analyzer": {
"keyword": {
"type": "custom",
"tokenizer": "whitespace",
"filter": [
"lowercase",
"keyword_search"
]
}
}
}
}
}'
and then I confirm the analyzer is configured properly using:
curl -XGET 'localhost:9200/test/_analyze?analyzer=keyword&text=Expecting many tokens
in response I receive multiple tokens like exp, expe, expec and so on.
Now using Jest client I put the config json to a file on my classpath, the content is exactly the same as the body of the PUT request above. I execute the Jest action constructed like this:
new CreateIndex.Builder(name)
.settings(
ImmutableSettings.builder()
.loadFromClasspath(
"settings.json"
).build().getAsMap()
).build();
In result
Primo - checked with tcpdump that what's actually posted to elasticsearch is (pretty printed):
{ "settings.analysis.filter.keyword_search.max_gram": "15", "settings.analysis.filter.keyword_search.min_gram": "3", "settings.analysis.analyzer.keyword.tokenizer": "whitespace", "settings.analysis.filter.keyword_search.type": "edge_ngram", "settings.number_of_shards": "3", "settings.analysis.analyzer.keyword.filter.0": "lowercase", "settings.analysis.analyzer.keyword.filter.1": "keyword_search", "settings.analysis.analyzer.keyword.type": "custom" }
Secundo - the resulting index settings is:
{ "test": { "settings": { "index": { "settings": { "analysis": { "filter": { "keyword_search": { "type": "edge_ngram", "min_gram": "3", "max_gram": "15" } }, "analyzer": { "keyword": { "filter": [ "lowercase", "keyword_search" ], "type": "custom", "tokenizer": "whitespace" } } }, "number_of_shards": "3" <-- the only difference from the one created with rest call }, "number_of_shards": "3", "number_of_replicas": "0", "version": {"created": "1030499"}, "uuid": "Glqf6FMuTWG5EH2jarVRWA" } } } }
Tertio - checking the analyzer with
curl -XGET 'localhost:9200/test/_analyze?analyzer=keyword&text=Expecting many tokens
I get just one token!
Question 1. What is the reason that Jest does not post my original settings json, but some processed one instead?
Question 2. Why the settings generated by Jest are not working?
Glad you found Jest useful, please see my answer below.
It's not Jest but the Elasticsearch's
ImmutableSettings
doing that, see:outputs:
Because your usage of settings JSON/map is not the intended case. I have created this test to reproduce your case (it's a bit long but bear with me):
When you run it you'll see that the case where
settingsAsMap
is used the actual settings is totally wrong (settings
includes anothersettings
which is your JSON but they should have been merged) and so the analyze fails.Why is this not the intended usage?
Simply because that's how Elasticsearch behaves in this situation. If the settings data is flattened (as it is done by default by the
ImmutableSettings
class) then it should not have the top level elementsettings
but it can have the same top level element if data is not flattened (and that's why the test case withsettingsAsString
works).tl;dr:
Your settings JSON should not include the top level "settings" element (if you run it through
ImmutableSettings
).