Solr MultiPhraseQuery Not Returning Correct Results

338 views Asked by At

I am having trouble creating a Solr search for substrings. For example, when a user searches for "Alfa Romeo Land Car", I want to only match complete brands (only "Alfa Romeo", not "Land Rover"). The way I am trying to do this is by creating shingles from my query and then trying to do an exact match against my "car brands" Solr core.

So if a user searches for "A B C", I would like to get the shingles [A, AB, ABC, B, BC, C].

But when I use the Solr configuration below, when I search for "A B C" (using EDisMax or the standard query parser) Solr returns nothing, but if search for "ABC" I get the matching result "ABC".

Here is my schema.xml file:

<field name="id"             type="tint" indexed="true" stored="true" required="true"/>
<field name="name"           type="text_exact" indexed="true" stored="true" required="true"/>
<field name="seoAlias"       type="string" indexed="true" stored="true" required="true"/>


<fieldType name="text_exact" class="solr.TextField" positionIncrementGap="100">
      <analyzer type="index">
        <tokenizer class="solr.KeywordTokenizerFactory"/>
        <filter class="solr.WordDelimiterFilterFactory" splitOnCaseChange="0" splitOnNumerics="0" preserveOriginal="0" generateWordParts="0" catenateAll="1" />
      </analyzer>
      <analyzer type="query">
        <tokenizer class="solr.KeywordTokenizerFactory"/>
        <filter class="solr.WordDelimiterFilterFactory" splitOnCaseChange="0" splitOnNumerics="0" preserveOriginal="0" generateWordParts="1" catenateAll="0" />
        <filter class="solr.ShingleFilterFactory" outputUnigrams="true" outputUnigramsIfNoShingles="true" tokenSeparator="" maxShingleSize="5"/>
      </analyzer>
    </fieldType>

Here are the documents in my Solr core:

"response": {
    "numFound": 7,
    "start": 0,
    "docs": [
      {
        "id": 1,
        "name": "A B C D",
        "seoAlias": "abce",
        "_version_": 1524585748644233200
      },
      {
        "id": 2,
        "name": "A B C",
        "seoAlias": "abce",
        "_version_": 1524586301229105200
      },
      {
        "id": 3,
        "name": "B C D",
        "seoAlias": "abce",
        "_version_": 1524586311147585500
      },
      {
        "id": 4,
        "name": "A B",
        "seoAlias": "abce",
        "_version_": 1524586322261442600
      },
      {
        "id": 5,
        "name": "B C",
        "seoAlias": "abce",
        "_version_": 1524586329997836300
      },
      {
        "id": 6,
        "name": "C D",
        "seoAlias": "abce",
        "_version_": 1524586338173583400
      },
      {
        "id": 7,
        "name": "B",
        "seoAlias": "abce",
        "_version_": 1524652609127841800
      }
    ]
  },

In the Solr admin webpage, if I go to "Schema Browser", then select the field in question, and press "Load Term Info" I can see the following indexed terms:

6
 /6 Top-Terms:  
1
ABC
ABCD
BC
BCD
CD
AB

When I search for "A B C" I want the following shingles [ABC AB BC A B C] but from debug query I get:

"response": {
    "numFound": 0,
    "start": 0,
    "docs": []
  },
  "debug": {
    "rawquerystring": "*:*",
    "querystring": "*:*",
    "parsedquery": "MatchAllDocsQuery(*:*)",
    "parsedquery_toString": "*:*",
    "explain": {},
    "QParser": "LuceneQParser",
    "filter_queries": [
      "name:\"A B C\""
    ],
    "parsed_filter_queries": [
      "**MultiPhraseQuery**(name:\"(A AB ABC) (B BC) C\")"
    ], 

I think that the problem may be related to MultiPhraseQuery. It creates what appear to be the correct shingles, but it seems that Solr does not search with these string. Does anybody know what I'm missing?

Thanks a lot in advance

0

There are 0 answers