Find rows that satisfy two conditions and are unique

71 views Asked by At

Hello I have the following tables:

Keyword
--------------------
| id   | Value     |
--------------------
| 1    | test1     |
| 2    | test2     |
|------------------|

URL
---------------------------
| id   | URL               |
---------------------------|
| 1    | www.something.com |
|--------------------------|

Keywords_URL
---------------------------
| Keywordid   | URLid     |
---------------------------
| 1           | 1         |
| 2           | 1         |
|-------------------------|

So the idea is that I have many URLs linked with a keyword, a URL can have more than 1 keywords, and many URLs can have the same Keywords.

So the URL table describes the information about a specific URL, the Keywords table describes the information about the keywords and the other Keywords_URL table links the two tables.

I am looking for a query, such that will return all the URLs that has as conditions two keywords.

For instance, I would like to know, the URLs that have both "test1" and "test2" keywords. So the results should be all the URLs that have both keywords associated with it.

Furthermore, I want my results to be "Distinct" so no duplicate rows.

I am surprised that MySQL doesn't support the INTERSECT clause and now I am looking for a way to solve that.

Here is what I have done so far:

(SELECT kewyord.Value, URL.id id, FROM Keywords_URL JOIN Keyword ON Keywords_URL.Keywordid = Keyword.id JOIN URL ON Keywords_URL.URLid = URL.id WHERE Keywords.Value = "test1")
UNION DISTINCT
(SELECT kewyord.Value, URL.id id, FROM Keywords_URL JOIN Keyword ON Keywords_URL.Keywordid = Keyword.id JOIN URL ON Keywords_URL.URLid = URL.id WHERE Keywords.Value = "test2")
ORDER BY id;

The problem is that I do get duplicates rows, but also I am getting URLs that have one of the two keywords (i.e not all the rows satisfy that they have to have both keywords associated)

Any ideas?

1

There are 1 answers

3
Giorgos Betsos On BEST ANSWER

You can use the following query:

SELECT id, URL
FROM URL
WHERE id IN (SELECT ku.URLid 
             FROM Keywords_URL AS ku 
             INNER JOIN Keyword AS k ON ku.Keywordid = k.id
             WHERE k.value IN ('test1', 'test2')
             GROUP BY ku.URLid
             HAVING COUNT(DISTINCT k.id) = 2)

The sub-query, used in the WHERE clause, selects all URLid having both 'test1', 'test2' as their associated keywords. These values are then used by the IN operator to select all rows of URL with these URLids.