If i have to remove certain keywords and then remove all spaces in the string during index analysis, using :
'analysis' => array(
'filter' => array(
'whitespace_remove' => array(
'type' => 'pattern_replace',
'pattern' => ' ',
'replacement' => ''
),
'my_stop' => array(
'type' => 'stop',
'stopwords' => array('bad', 'horrible', 'useless')
),
'edge' => array(
'type' => 'edge_ngram',
'min_gram' => '1',
'max_gram' => '5'
)
),
and the analyzer with
'keyword_space_ngram' => array(
'type' => 'custom',
'tokenizer' => 'keyword',
'filter' => array(
'lowercase',
'my_stop',
'whitespace_remove',
'edge'
)
)
How do i ensure that i apply the filters in this order, that is convert to lowercase, remove keywords , remove spaces and then perform ngram analysis?
You can remove stopwords and white_spaces with custom
char_filter
at index time:This will transform
bad angry man
toangryman
, for exampleFor adding your
edge_ngram
filter just addedge
at the end of yourfilter
arrayNote: your stop words will only be substituted if they are lowercase