SolrNet query not working for Scandinavian characters

330 views Asked by At

When making a query through SolrNet that contains Scandinavian characters like ø, æ, å the query returns no results while queries containing regular words work fine.

The query has been added to the FilterQueries collection using the SolrQueryByField class with values "ss_content" which is the field name and the values \"søren\" with quoted set to false. even if i test without the "" in søren it doesn't give any results.

When running the same query through Solr Admin page in the browser it works fine.

Am i missing some configuration in SolrNet which could be causing the issue?

Solr version is 3.6 on Tomcat 8 and is being called from a .Net 4.5 application

Any help would be very much appreciated.

2

There are 2 answers

0
TMBT On

If it's working fine from the admin panel, I would suspect some kind of encoding issue. Use UTF-8 and there should be a Tomcat 8 connector for it you can set called URIEncoding. You can also try to use the analyzer to see how your non-Latin character search term is being interpreted by Solr.

Again, it's working from the admin panel, so I'm not sure this will help you, but try adding an ASCIIFoldingFilterFactory to whatever fields you're querying against. Your special characters are outside of the 127 "Basic Latin" ASCII block, and Solr appears to handle them in a different way. Here's the docs for the filter: ASCIIFoldingFilterFactory. Usage looks something like this:

<fieldType . . . >
        <filter class="solr.ASCIIFoldingFilterFactory" />
</fieldType>

As a last ditch, "nuclear" option, if it's possible, have you considered using a MappingCharFilterFactory? It will allow you to normalize your special characters.

0
Andrei Mărcuţ On

Solr Admin page query is a plain html <form method=get action="#">[...]</form>, meaning that the browser will automatically URL-Encode all input values - that is why it works from the Admin page.

You need to url-encode parameter values when forming the requests. In .NET 4.5 you can use WebUtility.UrlEncode(String).

Please try to replace the "søren" string with WebUtility.UrlEncode("søren") and see if it works.