How to do synonym matching on regular expressions in solr 5.3.1?

163 views Asked by At

I'm writing a sunspot application for a large gene database. Ligands and receptors for genes are named with the normal gene name, followed by an 'l' or an 'r', respectively, so for example a ligand for the gene 'MIP2' would be called 'MIP2l'. However, I want to account for instances in which the scientists will search for them using the syntax "MIP2 ligand". How can I combine the two tokens "MIP2" and "ligand" into one, and then concat them?

I tried using the Synonym Graph Filter Factory, but my solr is in 5.3.1, so it won't load. A quick update is not feasible. I also tried the technique illustrated in this article (https://lucidworks.com/2014/07/12/solution-for-multi-term-synonyms-in-lucenesolr-using-the-auto-phrasing-tokenfilter/), but the database is too large for a simple synonyms.txt doc. I want to use regular expressions for this, but I can't without combining the two tokens into one first.

This is my current search function, the sql lookup and weird hashing is because it's replacing an old search function, and the sql lookup is how I get the properly formatted data for the view.

search =  GeneName.search do
            fulltext params[:search][:search_str]
            order_by(:use_name, :asc)
            order_by(:score, :desc)
          end
gene_ids = []
for gene_name in search.results
  gene_ids << gene_name.gene_id unless gene_name.nil? or gene_ids.include? gene_name.gene_id
end
gene_ids_to_s = gene_ids.to_s.gsub("[","(").gsub("]",")")
#raise gene_ids_to_s.inspect
@genes = Gene.find_by_sql("select distinct g.id gene_id from genes g, gene_names gn where g.id = gn.gene_id and g.id in #{gene_ids_to_s} order by use_name desc") unless gene_ids_to_s == "()"   
2

There are 2 answers

0
AMoon01 On

I believe I fixed it, but it's a lame workaround where I just added

    @str.downcase!
    @str.gsub!(" ligand", "l")
    @str.gsub!(" receptor","r")
    params[:search][:search_str] = @str

before the previously mentioned code section. @str is a parsed version of params[:search][:search_str]

2
jvillian On

I realize this isn't really your question. But, it seems like here:

gene_ids = []
for gene_name in search.results
  gene_ids << gene_name.gene_id unless gene_name.nil? or gene_ids.include? gene_name.gene_id
end

You could be using map, compact, and uniq, like:

gene_ids = search.results.map do |result|
  result.gene_id unless result.gene_name.nil?
end.compact.uniq

Also, I never use find_by_sql and I don't really understand what you're doing there. But, I wonder if you could use a standard ActiveRecord query there?