Solr: Perform stemming on a field and get the sorted list of stemmed words which were most frequent

310 views Asked by At

Is there a way that I can use stemming on a field at index time and then retrieve a sorted list of stemmed words by frequency of their original occurrence at query time.

For example assume my 'text' field has contents of a document and contains only these words:

walk walking walked moved run running.

I want to use stemming on this field to get the base forms sorted by the occurrence of their original words i.e.

walk run move

My understanding is that solr use stemming to reduce walk, walking and walked to one base form walk and then store it in index. I am not interested in retrieving count but just the list of words. Does solr keep track of such word count at index time? Here is my configuration:

My schema.xml has the text field:

<field name="text" type="text_general" indexed="true" stored="true" multiValued="true" />

and

The field type 'text_general' is defined as:

<fieldType class="solr.TextField" name="text_general" positionIncrementGap="100">
  <analyzer type="index">
    <tokenizer class="solr.WhitespaceTokenizerFactory"/>
    <filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt"/>
    <filter class="solr.PorterStemFilterFactory"/>
    <filter class="solr.LowerCaseFilterFactory"/>
  </analyzer>
  <analyzer type="query">
    <tokenizer class="solr.StandardTokenizerFactory"/>
    <filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt"/>
    <filter class="solr.SynonymFilterFactory" expand="true" ignoreCase="true" synonyms="synonyms.txt"/>
    <filter class="solr.LowerCaseFilterFactory"/>
  </analyzer>
</fieldType>

Thanks for help.

0

There are 0 answers