Solr tokenizer filter substring

Question

Solr tokenizer filter substring

68 views Asked by Erik At 31 August 2017 at 23:23

Is there a method to index a field so that each substring containing a word would be treated as separate tokens?

For example, input: "hello world, how are you?"

output: "hello world how are you", "hello world how are", "hello world how", "hello world", "hello"

This would be used in combination of SuggestComponent to provide autosuggestion for users.

Original Q&A

There are 1 answers

**Mysterion** · Answer 1 · 2017-09-01T07:13:41+00:00

In principle, something like solr.ShingleFilterFactory could do the trick for you. It has 2 params: minShingleSize and maxShingleSize, so it will generate a lot of tokens for you and some of them could be not useful for you (also it will mean for you a lot of wasted space on disk)

Potentially, you need either to filter out not needed tokens or potentially to write your own filter.

TechQA.

Solr tokenizer filter substring

There are 1 answers

Related Questions in SOLR

Related Questions in AUTOCOMPLETE

Related Questions in AUTOSUGGEST

Popular Questions

Popular Tags

Trending Questions