Support for both EdegeNGram analysis and phrase search in Solr 3.4.0

Question

Support for both EdegeNGram analysis and phrase search in Solr 3.4.0

1.5k views Asked by mihaela At 19 January 2012 at 12:53

I want to enable "startsWith" search for each term in a SOLR query but also being able to perform phrase searches (given in quotes). For the prefix search firstly I added the suffix "*". This solution allows both prefix search and phrase search but I don't like this solution because it's a wildcard search and the wildcard searches doesn't analyze the terms.

So I enabled the EdgeNgramFilterFactory only on indexing. The prefix search works fine but the exact phrase search doesn't work anymore.

Does anyone know how to enable phrase search even when the EdgeNgram is enabled?

Thanks!

Here is the schema.xml

<fieldType name="text" class="solr.TextField" positionIncrementGap="100">
      <analyzer type="index">
        <tokenizer class="solr.WhitespaceTokenizerFactory"/>
        <filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt"/>
    <filter class="solr.EdgeNGramFilterFactory" minGramSize="1" maxGramSize="50" side="back" />
    <filter class="solr.EdgeNGramFilterFactory" minGramSize="1" maxGramSize="50" side="front" />
        <filter class="solr.WordDelimiterFilterFactory" generateWordParts="1" generateNumberParts="1" catenateWords="1" catenateNumbers="1" catenateAll="0" splitOnCaseChange="1"/>
        <filter class="solr.LowerCaseFilterFactory"/>

        <filter class="solr.KeywordMarkerFilterFactory" protected="protwords.txt"/>
        <filter class="solr.PorterStemFilterFactory"/>
        <filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
      </analyzer>
      <analyzer type="query">
        <tokenizer class="solr.WhitespaceTokenizerFactory"/>
        <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt" ignoreCase="true" expand="true"/>
        <filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt"/>
        <filter class="solr.WordDelimiterFilterFactory" generateWordParts="1" catenateWords="0" catenateNumbers="0" catenateAll="0" splitOnCaseChange="1"/>
        <filter class="solr.LowerCaseFilterFactory"/>
        <filter class="solr.KeywordMarkerFilterFactory" protected="protwords.txt"/>
        <filter class="solr.PorterStemFilterFactory"/>
        <filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
      </analyzer>
    </fieldType>

Also I have noticed that when using the WordDelimiterFilterFactory the highlighting doesn't perform well anymore.

Original Q&A

There are 3 answers

**Persimmonium** · Answer 1 · 2012-06-02T07:47:02+00:00

Persimmonium On 02 June 2012 at 07:47

Yet another option is upgrade to 3.6.0 as now wildcards don't prevent the query being analyzed

**Grimmo** · Answer 2 · 2012-02-07T01:28:00+00:00

Phrase search does not work because EdgeNGram produces additional terms and increases the term position(surprisingly) of each chunk of the word. Phrases are expected to be exact, meaning distance(slops) between two sequential terms is 1. But with chunks indexed text looks different. Imagine you have indexed the text "Hello World" using <filter class="solr.EdgeNGramFilterFactory" minGramSize="2" side="front"/>. Then indexed text would look like "he hel hell hello wo wor worl world". You would find the phrase "hel hell" rather than "hello world".

enter image description here

As an option you could allow some distance between words by increasing qs parameter of the query parser (dismax).

But 'not exact phrase' search may be unacceptable as you would find additional unexpected phrases like 'hel hell'.

A better option is to use a separate field for ngrams. In this case text will be indexed in two fields and ngrams will not break the original text.

**Max Schmidt** · Answer 3 · 2012-02-09T16:17:00+00:00

You can use two field - one for prefix and suffix search and another one for exact match.

  <field indexed="true" name="myfield_edgy"        type="edgy"/>
  <field indexed="true" name="myfield_exactmatch"  type="exactmatch"/>
  <copyField source="myfield_exactmatch" dest="myfield_edgy"/>

Now you can search in both field and even use different boosts, i.e. to rank matches in myfield_exactmatch higher.

TechQA.

Support for both EdegeNGram analysis and phrase search in Solr 3.4.0

There are 3 answers

Related Questions in SOLR

Related Questions in PREFIX

Related Questions in PHRASE

Popular Questions

Popular Tags

Trending Questions