Don't understand how to do a search query with with or,and,not

196 views Asked by At

I am reposting this question as I fixed it now to make it easier to understand exactly what I need to do.

I have a function declared:

set<string> findQueryMatches(map<string, set<string>>& index, string sentence);

The map is an already filled map with keys and values, while the string will be a sentence that will look something like this "fish +red". The keys and values for the map come from a file that I read in previous functions with an example being:

www.shoppinglist.com
EGGS! milk, fish,      @  bread cheese
www.rainbow.org
red ~green~ orange yellow blue indigo violet
www.dr.seuss.net
One Fish Two Fish Red fish Blue fish !!!
www.bigbadwolf.com
I'm not trying to eat you

The website names are the values while the separate words (also have a clean token function that cleans punctuation, so egg! becomes egg and all the weird symbols get deleted) are keys for the map. So if you search for fish, you get the list of values for that keyword.

In the above function SearchQueryMatches, I input a string sentence which has to handle the terms as a compound query, where individual terms are synthesized into one combined result.

The entered string will possess whitespaces, +, and -. a minus means a result must match one term without matching the other, a plus means the results must match both items, while a white space without any preface means they are unioned, so they match one or the other.

For example,

"tasty -mushrooms simple +cheap" translates into "tasty WITHOUT mushrooms OR simple AND cheap"

I started by doing stringstream that separates the sentence and then did if statements like

if (word[0] == '+').....

After I separate these words and know what to do with them I will also have to call my helper clean function again to clean them up from the + and - before I begin the search.

But now, I am struggling with what I would need to do next. I heard of set_intersection functions from the C++ set library but I never used them and honestly have absolutely 0 idea on how to use it.

The return will be a set of the websites that satisfy the search query.

What would be a good way to program the inside of the if statements with what they would be doing each time there is a +, -, or no preface? I am completely lost on this.

1

There are 1 answers

0
Torilas On

Certainly you can solve the problem using set_intersection, set_difference and set_union. And here is a example about how to use those functions in your problem:

std::set<std::string> findQueryMatches(std::map<std::string, 
    std::set<std::string>>& index, std::string sentence) {
    std::set<std::string> url_set;
    std::stringstream ss;
    std::string str;
    ss.str(sentence);
    while(ss >> str) {
        if(str[0] == '-') { //difference
            std::set<std::string> difference_data;
            std::set_difference(url_set.begin(), url_set.end(), index[str.substr(1, str.size() - 1)].begin(), index[str.substr(1, str.size() - 1)].end(), 
                 std::inserter(difference_data, difference_data.begin()));
            url_set = difference_data;
            std::cout<<str<<": ";
            for(auto const& x: url_set) {
                std::cout<<x<<' ';
            }
            std::cout<<'\n';
        } else if(str[0] == '+') { //intersection
            std::set<std::string> intersection_data;
            std::set_intersection(index[str.substr(1, str.size() - 1)].begin(), index[str.substr(1, str.size() - 1)].end(), url_set.begin(), url_set.end(),
                 std::inserter(intersection_data, intersection_data.begin()));
            url_set = intersection_data;
            std::cout<<str<<": ";
            for(auto const& x: url_set) {
                std::cout<<x<<' ';
            }
            std::cout<<'\n';
        } else { //union
            std::set<std::string> union_data;
            std::set_union(index[str].begin(), index[str].end(), url_set.begin(), url_set.end(),
                 std::inserter(union_data, union_data.begin()));
            url_set = union_data;
            std::cout<<str<<": ";
            for(auto const& x: url_set) {
                std::cout<<x<<' ';
            }
            std::cout<<'\n';
        }
    }
    return url_set;
}

Keep in mind that you have to provide an output operator to set_intersection, set_difference and set_union (look at this: https://en.cppreference.com/w/cpp/algorithm/set_difference or this how to find the intersection of two std::set in C++?). Those output operators can be defined of this way:

template <class InputIterator1, class InputIterator2, class OutputIterator>
OutputIterator std::set_union ( InputIterator1 first1, InputIterator1 last1,
                                  InputIterator2 first2, InputIterator2 last2,
                                  OutputIterator result );
template <class InputIterator1, class InputIterator2, class OutputIterator>
OutputIterator std::set_intersection ( InputIterator1 first1, InputIterator1 last1,
                                  InputIterator2 first2, InputIterator2 last2,
                                  OutputIterator result );      
template <class InputIterator1, class InputIterator2, class OutputIterator>
OutputIterator std::set_difference ( InputIterator1 first1, InputIterator1 last1,
                                  InputIterator2 first2, InputIterator2 last2,
                                  OutputIterator result );  

For example given this data:

www.shoppinglist.com
EGGS! milk, fish      @  bread cheese
www.rainbow.org
red ~green~ orange yellow blue indigo violet
www.dr.seuss.net
One Fish Two Fish Red fish Blue fish !!!
www.bigbadwolf.com
I'm not trying to eat you milk,

And this sentence:

milk, +milk, Blue +fish -Fish

The result is:

milk,: www.bigbadwolf.com www.shoppinglist.com
+milk,: www.bigbadwolf.com www.shoppinglist.com
Blue: www.bigbadwolf.com www.dr.seuss.net www.shoppinglist.com
+fish: www.dr.seuss.net www.shoppinglist.com
-Fish: www.shoppinglist.com

Cheers!