Is there a method to index a field so that each substring containing a word would be treated as separate tokens?
For example, input: "hello world, how are you?"
output: "hello world how are you", "hello world how are", "hello world how", "hello world", "hello"
This would be used in combination of SuggestComponent to provide autosuggestion for users.
In principle, something like
solr.ShingleFilterFactory
could do the trick for you. It has 2 params:minShingleSize
andmaxShingleSize
, so it will generate a lot of tokens for you and some of them could be not useful for you (also it will mean for you a lot of wasted space on disk)Potentially, you need either to filter out not needed tokens or potentially to write your own filter.