I have indexed a small collection (about 150k documents). I give user the ability to make filtered queries using dropdown boxes. The “field query” fields are: apo_taxonomy, apo_dik, apo_number, and apo_date. Below is a portion of schema.xml:
<fieldType name="text_efe_dioi_s" class="solr.TextField" positionIncrementGap="100" >
<analyzer type="index">
<tokenizer class="solr.WhitespaceTokenizerFactory"/>
<filter class="solr.LimitTokenCountFilterFactory" maxTokenCount="20"/>
<filter class="solr.GreekLowerCaseFilterFactory"/>
</analyzer>
<analyzer type="query">
<tokenizer class="solr.WhitespaceTokenizerFactory"/>
<filter class="solr.LimitTokenCountFilterFactory" maxTokenCount="20"/>
<filter class="solr.GreekLowerCaseFilterFactory"/>
</analyzer>
</fieldType>
<fieldType name="text_efe_dioi" class="solr.TextField" positionIncrementGap="100">
<analyzer type="index">
<tokenizer class="solr.StandardTokenizerFactory"/>
<filter class="solr.GreekLowerCaseFilterFactory"/>
<filter class="solr.GreekStemFilterFactory"/>
</analyzer>
<analyzer type="query">
<tokenizer class="solr.StandardTokenizerFactory"/>
<filter class="solr.GreekLowerCaseFilterFactory"/>
<filter class="solr.GreekStemFilterFactory"/>
</analyzer>
</fieldType>
<fields>
<field name="ida" type="string" indexed="true" stored="true" multiValued="false"/>
<field name="solr_id" type="string" indexed="true" stored="true" multiValued="false"/>
<field name="apo_number" type=" text_efe_dioi " indexed="true" stored="true" multiValued="true"/>
<field name="apofasi_date" type=" text_efe_dioi " indexed="true" stored="true"/>
<field name="apo_dik" type=" text_efe_dioi " indexed="true" stored="true"/>
<field name="apo_taxonomy" type=" text_efe_dioi " indexed="true" stored="true"/>
<field name="content" type=" text_efe_dioi " indexed="true" stored="true" multiValued="true"/>
<field name="type" type="string" indexed="true" stored="true"/>
<field name="model" type="string" indexed="true" stored="true" multiValued="false"/>
<field name="url" type="string" indexed="true" stored="true"/>
<field name="search_tag" type=" text_efe_dioi " indexed="true" stored="true"/>
<field name="contentbin" type="text" indexed="true" stored="true" multiValued="true"/>
<field name="last_modified" type="string" indexed="true" stored="true"/>
<field name="title" type=" text_efe_dioi " indexed="true" stored="true" multiValued="true"/>
<field name="grid_title" type=" text_efe_dioi " indexed="true" stored="true"/>
<field name="contentS" type=" text_efe_dioi _s" indexed="true" stored="true"/>
</fields>
<copyField source="apo_number" dest="content" />
<copyField source="apo_date" dest="content" />
<copyField source="apo_dik" dest="content" />
<copyField source="apo_taxonomy" dest="content" />
<copyField source="title" dest="content" />
<copyField source="search_tag" dest="content" />
<copyField source="contentbin" dest="content"/>
<copyField source="content" dest="contentS" />
I provide also a portion of solrconfig.xml concerning the “SearchHandler”. I have done this in order to boost on “exactish” (anchored) phrase matching:
<requestHandler name="/select" class="solr.SearchHandler">
<lst name="defaults">
<!--<str name="defType">edismax</str>
<str name="qf">content contentS^10</str>
<str name="pf">content^10 contentS^100</str>
<str name="ps">100</str>-->
<str name="echoParams">explicit</str>
<int name="rows">150</int>
<str name="sort">score desc</str>
<str name="defType">edismax</str>
<str name="qf">content contentS^10</str>
<str name="pf">content^10 contentS^100</str>
<str name="ps">100</str>
<str name="wt">json</str>
<str name="hl">true</str>
<str name="fl">solr_id,ida,type,model,keywordlist,title,apo_taxonomy,apo_dik,apo_date,grid_title</str>
<str name="hl.fl">content,title</str>
<str name="f.content.hl.alternateField">content</str>
<str name="hl.maxAlternateFieldLength">800</str>
<str name="hl.fragsize">800</str>
</lst>
</requestHandler>
Some valuable comments:
- The “apo_taxonomy” field can hold values like: “Πόρτα”, “Πόρτα-1”, and “Πόρτα-ασφ1”
- The “apo_dik” field can hold values like: “Μια”, “Μιάμιση”, and “ΟΧΤΟ”
- The “apo_date” and “apo_number” fields can hold numeric values.
- All the above fields have been using “”. The reason that I use "solr.TextField" class is to copy the above fields into one field (“content”) and make them searchable via solr’s basic query (“q” parameter).
- The whole collection is in Greek language.
My questions:
When user selects (using dropdown boxes) apo_taxonomy value of “Πόρτα” Solr returns documents containing “Πόρτα-1”, and “Πόρτα-ασφ1” (http://example.com/solr/efe_dioi/select/?q=:&fq=apo_taxonomy:( Πόρτα)+apo_date:(2009)&start=0&rows=100). This is not what user needs. When user filters the collection for documents of “Πόρτα” (apo_taxonomy) he/she don’t what to see documents of “Πόρτα-1” and/or “Πόρτα-ασφ1”. Is that feasible using “solr.TextField”? As you noticed I need the “filter fields” to be searchable using the “q” parameter plus boost on “exactish” match.
I think of adding one more filter: “apo_ses”. The field would hold values like: “ΜΕΡΑ”, “ΜΕΣΗΜΕΡΙ”, “ΑΠΟΓΕΥΜΑ”, and “ΒΡΑΔΥ”. Is it possible to give solr instructions when filtering using value let’s say “ΜΕΡΑ” to return documents filtered by “ΜΕΡΑ” AND “ΜΕΣΗΜΕΡΙ” or “ΜΕΡΑ” OR “ΜΕΣΗΜΕΡΙ”?
Any help would be greatly appreciated.
I hope not to bore you with my writing.
For your question 1, i suggest using type as string . If your field is (example: apo_taxonomy) also going to be used for search , then consider using apo_taxonomy_exact with string type for fq, where apo_taxonomy_exact is copy of apo_taxonomy in it's non tokenized form for fq purpose.
<copyField source="apo_number" dest="apo_taxonomy_exact" />
Type for apo_taxonomy_exact would be :For your second question, yes do something like fq=apo_ses:((“ΜΕΡΑ” AND “ΜΕΣΗΜΕΡΙ”) OR “ΜΕΡΑ” OR “ΜΕΣΗΜΕΡΙ”)