I am trying to integrate NHibernate.Search into a multi-lingual website. Now, this website contains a class Article
which is multilingual. This is done by having a seperate class - Article_CultureInfo
which stores the language-specific content. Fields of Article
are
Article
-------
ID
Name
And Article_CultureInfo
are:
Article_CultureInfo
-------
ID
ArticleId
CultureCode
PageTitle
Content
I am using Nhibernate.Search.Mapping
to map out the field/document information. I would like to incorporate search features like stemming and synonym analysis where possible based on the language. Is there any way the Lucene Analyser can be specified at run-time, not compile time / initialisation?
Say we are analysing the content of PageTitle
which is to be stored in the respective Lucene index - This content can be English, French, Italian, etc based on the value of CultureCode
. Thus, the analyser should change based on this value. I have tried implementing a custom MultilingualAnalyser
, however the only data available to me are the string to be analysed, i.e the value of PageTitle
. From that only, I cannot deduce the language. (I could look into language detection techniques but that is out of the scope since I already know specifically what it is, and would be overkill and not 100% reliable.)
If I were to have apart from the tokens, an instance of the object, I could be able to get the CultureCode
value out of it, and analyse accordingly. Any ideas would be greatly appreciated - I really wish to avoid using Lucene.Net directly since NHibernate.Search looks to integrate very nicely.
Thanks!
I've basically done a work-around for this method - Quite an overkill but works.
I've created a new implementation of
IGetter
, which is used for multilingual properties, which I calledMultilingualGetter
. This is basically the same as theBasicGetter
- I couldn't extend from it as for some reason it issealed
, so I copied the code.What this
IGetter
does is: When theGet()
method is called on it, it is given thetarget
object. This is the instance of the class that contains the property. I check that it implements an interface for multilingual objects which I've created,IMultilingualContentInfo
. It then retrieves the current culture from theIMultilingualContentInfo
, and appends it on the front of the actual text, e.g [en]Hello World!.This text is then passed on to a custom analyzer I created which parses the culture as well, and can deduce what it is. It is then using a
SnowballFilter
to stem the text based on the language.Below is the code for
Get()
method of the customIGetter
implementation -IMultilingualContentInfo