We are confronting different search engines for our research archives and having browsed the Xapian-Omega documentation, we decided to try it out since the Omega option appears to be an appropriate solution with several interesting search options.
We installed Xapian-Omega on a Linux Server (Deb 7) and tested the setup with success. However we are unsure as to how one can employ or perhaps even enable the use of Wild Cards or Regular Expressions with Xapian-Omega.
We read that for Xapian one has to enable the Wild Card option "QueryParser flags" Could someone clarify this ? ie. explain with or indicate a page with an example or two.
But we did not see much information regarding examples with Omega CGI and although this latter runs well, wild card options (such as * for the general wild card and ? as a single character), do not seem to work as expected by default and they would be useful, even though stemming and substrings etc may be functional.
Eg: It would be interesting to be able to employ standard simple wild char searches with a certain precision such as : medic* for medicine medical medicament or with ? for single characters
Can Regexp be recognised with Omega ? eg : sep[ae]r[ae]te(\w+)? or searching for structured formats such as Email or Credit Card Numbers or certain formula types in research papers etc.
In a note from Olly Betts long ago (Dev Mailing List) regarding this one suggestion was to grep the index file but this would defeat the RAD advantage of Omega.
Any examples of searches using Omega with Wild Cards or Regular Expressions would be most appreciated ... even an indication of a page where information regarding this theme is well presented with examples illustrating how to develop advanced searches using Xapian alone would be most welcome (PHP or Python perhaps).
(We are not concerned for the moment about the eventual substantial increase in the size of the index size or in the time to index the archive)
You can enable right-wildcards (such as "medic*") in Omega using
$set{flag_wildcard,1}(covered in the Omegascript documentation), which enablesFLAG_WILDCARD. There's a section in the user manual on using wildcards.Xapian doesn't provide support for regular expression searching, although in theory I believe it would be possible to support, if potentially costly (depending on the regex). It would have to run the regular expression against unstemmed terms in the database, and then feed them into the search. Where it becomes difficult is if the regex expands to a lot of terms (eg just 'a' as a regex). There's also some subtlety in making it efficient; it's easy to jump through the term list to something with a constant prefix, and you'd want to take advantage of that if possible.
For your example of
sep[ae]r[ae]te(\w+)?, it sounds like you actually want a combination of spelling correction (for the a-e substitutions, which you can enable using$set{flag_spelling_correction,1}) and stemming (for the trailing letters after 'te'; Omega defaults to English stemming, but that can be changed), or either wildcard or partial match support.If you do need regular expressions for your use case, then I'd suggest bringing it up on the xapian-discuss mailing list. Xapian has moved on since the last discussion, and I believe it would be easier to build such support now than it was then.