AWS ElasticSearch service - any stemmers I can use on it?

135 views Asked by At

Trying to ensure plurality of the search queries using a type of english stemmer similar to snowball.

Is there a standard aws one? Or do I need to install a plugin? I've tried below 2, getting the below:

{"error":{"root_cause":[{"type":"remote_transport_exception","reason":"[dd63ee99c9186dd4b38e282ea58cbe6b][x.x.x.x:9300][indices:admin/create]"}],"type":"illegal_argument_exception","reason":"unknown setting [index.filter.my_stemmer.language] please check that any required plugins are installed, or check the breaking changes documentation for removed settings","suppressed":[{"type":"illegal_argument_exception","reason":"unknown setting [index.filter.my_stemmer.type] please check that any required plugins are installed, or check the breaking changes documentation for removed settings"}]},"status":400}

attempt 1:

PUT /my_index
{
  "settings": {
    "analysis": {
      "analyzer": {
        "my_analyzer": {
          "tokenizer": "standard",
          "filter": [
            "lowercase",
            "my_stemmer"
          ]
        }
      },
      "filter": {
        "my_stemmer": {
          "type": "stemmer",
          "language": "light_german"
        }
      }
    }
  }
}

and

attempt 2:

PUT /my_index
{
  "settings": {
    "analysis": {
      "analyzer": {
        "my_analyzer": {
          "tokenizer": "standard",
          "filter": [
            "lowercase",
            "my_stemmer"
          ]
        }
      },
      "filter": {
        "my_stemmer": {
          "type": "snowball",
          "language": "English"
        }
      }
    }
  }
}
1

There are 1 answers

0
emraldinho On

got it working with the following:

public void createSettingsWithEnglishStemAnalyzer() throws ExecutionException, InterruptedException, IOException {
    CreateIndexRequest request = new CreateIndexRequest(indexName);
    request.settings(Settings.builder()
            .put("index.max_inner_result_window", 250)
            .put("index.write.wait_for_active_shards", 1)
            .put("index.query.default_field", "paragraph")
            .put("index.number_of_shards", 3)
            .put("index.number_of_replicas", 2)
            .loadFromSource(Strings.toString(jsonBuilder()
                    .startObject()
                       .startObject("analysis")
                            .startObject("filter")
                                .startObject("english_stemmer")
                                .field("type","stemmer")
                                .field("name", "english")
                                .endObject()
                            .endObject()
                            .startObject("analyzer")
                                .startObject("EnglishStopWordAnalyzer")
                                    .field("tokenizer", "standard")
                                    .field("filter", new String[]{"lowercase","english_stemmer"})
                                .endObject()
                            .endObject()
                        .endObject()
                    .endObject()), XContentType.JSON)
    );
    CreateIndexResponse createIndexResponse = client.admin().indices().create(request).get();
    System.out.println("Index : "+createIndexResponse.index()+" Created");
    getSettingsWithAnalyzer();
}