"Sunspot" Gem makes distinction between UTF-8 chars

Question

"Sunspot" Gem makes distinction between UTF-8 chars

176 views Asked by Manu Artero At 21 August 2012 at 15:55

In a Rails app I started using sunspot => https://github.com/sunspot/sunspot/blob/master/README.md

Everything went OK until I noticed this (taken from the rails-console):

1.9.3p194 :002 > MyModel.search{fulltext "leon"}.results
=> [#<MyModel id: 16, name: "Leon">]
1.9.3p194 :003 > MyModel.search{fulltext "león"}.results
=> [#<MyModel id: 18, name: "León">]

How can I tell the system not to make distinction between "leon" and "león" (I want smth like search{fulltext "leon"} => [#MyModel id: 16 ... , #MyModel id: 18...])

I've been looking for this problem and I've found every time the same response:

With this line in Gemfile works meanwhile the next release of rsolr: gem 'rsolr', :git => "https://github.com/mwmitchell/rsolr.git"

thx

Original Q&A

There are 3 answers

Blacksad On 21 August 2012 at 15:59

You need to make changes inside the Solr (the application, not the gem) configuration files. Solr is "embedded" in the gem, but you can access its configuration as if it were installed separately. Have a look at Solr documentation.

vvlad On 21 August 2012 at 16:13

in the schema.xml you need to add a character filter as described in AnalyzersTokenizersTokenFilters for example:

<charFilter class="solr.MappingCharFilterFactory" mapping="mapping-ISOLatin1Accent.txt"/>

and in the you should have mapping-ISOLatin1Accent.txt you should have entries that will map the unicode byte sequence to a asci character sequence. You can see an example here mapping-ISOLatin1Accent.txt

**Manu Artero** · Accepted Answer · 2012-08-22T09:57:42+00:00

Thx for the responses. At least I've solved it right last night with anohter idea I've taked from http://codeshooter.wordpress.com/2011/01/13/full-text-search-in-in-rails-with-sunspot-and-solr/

the idea is in Restaurant.rb

text :name do 
  self.name.my_normalize
end

and the function

to_s.mb_chars.normalize(:kd).gsub(/[^\x00-\x7F]/,'').downcase

that line works with strings like "äáàÁÄÀ" --- "aaaaaa"

TechQA.

"Sunspot" Gem makes distinction between UTF-8 chars

There are 3 answers

Related Questions in RUBY-ON-RAILS

Related Questions in UTF-8

Related Questions in FULL-TEXT-SEARCH

Related Questions in SUNSPOT-RAILS

Related Questions in RSOLR

Popular Questions

Popular Tags

Trending Questions