Solr spellcheck's top suggestion is unexpected

213 views Asked by At

I'm using solr 4.6.1 spellcheck component for spelling suggestions. I configured it to use DirectSolrSpellChecker with default distance function and comparator, which, as I understand, means the suggestions are ranked by edit distance (primary key), followed by document frequency (secodary key).

however, for the term papaer, the top suggestion is papier, which has far less document frequency than that of paper. both the alternatives are 1 edit distance away from papaer.

Is this a bug or a quirk of the edit distance algorithm I don't understand?

my spellcheck config:

<!-- a spellchecker built from a field of the main index -->
<lst name="spellchecker">
  <str name="name">default</str>
  <str name="field">spellfield</str>
  <str name="classname">solr.DirectSolrSpellChecker</str>
  <!-- the spellcheck distance measure used, the default is the internal levenshtein -->
  <str name="distanceMeasure">internal</str>
  <!-- minimum accuracy needed to be considered a valid spellcheck suggestion -->
  <float name="accuracy">0.5</float>
  <!-- Sort Results by frequency -->
  <str name="comparatorClass">score</str>
  <!-- the maximum #edits we consider when enumerating terms: can be 1 or 2 -->
  <int name="maxEdits">2</int>
  <!-- the minimum shared prefix when enumerating terms -->
  <int name="minPrefix">0</int>
  <!-- maximum number of inspections per result. -->
  <int name="maxInspections">5</int>
  <!-- minimum length of a query term to be considered for correction -->
  <int name="minQueryLength">3</int>
  <!-- maximum threshold of documents a query term can appear to be considered for correction -->
  <float name="maxQueryFrequency">0.01</float>
  <!-- uncomment this to require suggestions to occur in 1% of the documents-->
  <float name="thresholdTokenFrequency">2</float>
</lst>
0

There are 0 answers