Multi language full text search including stemming in Django / Python

911 views Asked by At

Currently we use Djapian + Xapian in our Django-based multi-language projects for full text search. In order to use stemming for each language, we create a different search index for each language. Inside Django, we decide based on the user's language, which stemming and which search index to use. That works fine, however, Djapian doesn't seem to be maintained any more and the code breaks up more and more. So we switched to haystack, but haystack doesn't seem to offer that kind of dynamic stemming that we need.

Is there any way of integrating this possibility, either in haystack version 1.x, version 2.x or in any other Python/Django-based backend?

1

There are 1 answers

1
Sym On

So, as I understand it, you can index content correctly, but not search it in with the correct stemmer? Or are you wanting to change the stemmer when indexing as well as searching?

xapian-haystack will set the stemming language based on settings.HAYSTACK_XAPIAN_LANGUAGE by default, however for searching when you have instantiated a SearchBackend instance you should be able to set SearchBackend.language before constructing a search to change the stemming language.

Note that I haven't tried this at all, I've just looked at the code on github.

Also, I should note that, although haystack is great, sometimes it's better to just use xapian directly. It's well enough documented, and for complex, xapian only features it might be quicker and easier. Obviously this is not the case if you already have your application, but it's worth doing if you're just starting out. :)