Lets say I have my list of ingredients: {'potato','rice','carrot','corn'}
and I want to return lists from a database that are most similar to mine:
{'beans','potato','oranges','lettuce'}, {'carrot','rice','corn','apple'} {'onion','garlic','radish','eggs'}
My query would return this first: {'carrot','rice','corn','apple'}
I've used Solr, and have looked at CloudSearch, ElasticSearch, Algolia, Searchify and Swiftype. These engines only seem to let me put in one query string and then filter by other facets.
In a real scenario my search list will be about 200 items long and will be matching against about a million lists in my database.
What technology should I use to accomplish what I want to do?
Should I look away from search indexers and more towards database-esque things like mongo, map reduce, hadoop... All I know are the names of other technologies and I just need someone to point me in the right direction on what technology path I should be exploring for this.
With so much data I can't really loop through it, I need to query everything at once.
I wonder what keeps you from trying it with Solr, as Solr provides much of what you need. You can declare the field as
type="string" multiValued="true
and save each list item as a value. Then, when querying, you specify each of the items in the list to look for as a search term for that field, and Solr will – by default – return the closest match. If you need exact control over what will be regarded as a match (e.g. at least 40% of the terms from the search list have to be in a matching list) you can use themm
EDisMax parameter, cf. Solr WikiHaving said that, I must add that I’ve never searched for 200 query terms (do I unerstand correctly that the list whose contents should be searched will contain about 200 items?) and do not know how well that performs. But I guess that setting up a test core and filling it with random lists using a script should not take more than a few hours, so it should be possible to evaluate the performance of this approach without investing too much time.