Elasticsearch: Custom Token Filter

1.1k views Asked by At

Since there is no documentation about the subject, it is very complicated to understand how to implement a custom token filter plugin from scratch in Java.

I'd like to get an analyzer filter that returns only tokens that are numbers for example.

Any idea?

1

There are 1 answers

2
Val On BEST ANSWER

There are existing filters that do this. For instance the keep_types token filter can do exactly that.

If you leverage the <NUM> type, your custom token filter is going to only let numeric tokens through and filter out all others.

GET _analyze
{
  "tokenizer": "standard",
  "filter": [
    {
      "type": "keep_types",
      "types": [ "<NUM>" ]
    }
  ],
  "text": "1 quick fox 2 lazy dogs"
}

Result:

[1, 2]

You can achieve a similar result with the pattern_capture token filter as well.

But if you really want to go the Java way, then you're best best is to clone an existing analysis plugin and roll your own.