Select rows based on information stored in separate table

75 views Asked by At

First of all I'm sorry for the overly vague title, however I'm unfamiliar with the proper terminology for a problem like this.

I'm attempting to retrieve a list of page titles from Wiktionary (Wikimedia wiki-based dictionary) where the page must be categorized under English_lemmas, but must not be categorized under English_phrases, English_slang, and English_%_forms (where % is the wildcard).

The two necessary tables are page which contains page information (page_id, page_title), and categorylinks which contains the categories that each page is categorized under.

The relevant structure of page is like so:

+---------+------------------+
| page_id | page_title       |
+---------+------------------+
| 0       | race             |
| 1       | racing           |
| 2       | that's all folks |
| 3       | fire             |
| 4       | psychédélique    |
+---------+------------------+

and the relevant structure of categorylinks is like so: (I've added line breaks for easier reading)

+---------+-------------------------+
| cl_from | cl_to                   |
+---------+-------------------------+
| 0       | English_lemmas          |
| 0       | English_verbs           |

| 1       | English_lemmas          |
| 1       | English_verbs           |
| 1       | English_non-lemma_forms |

| 2       | English_lemmas          |
| 2       | English_phrases         |

| 3       | English_lemmas          |
| 3       | English_nouns           |

| 4       | French_lemmas           |
| 4       | French_adjectives       |
+---------+-------------------------+

where categorylinks.cl_from is a direct reference to page.page_id and categorylinks.cl_to is the title of the category.

I need to select race and fire as they are both categorized under English_lemmas, but not racing or that's all folks as in addition to both being categorized under English_lemmas they are also categorized under English_%_forms and English_phrases respectively, and not psychédélique as it is not categorized under English_lemmas.

Therefore the ideal result would be

+---------+------------------+
| page_id | page_title       |
+---------+------------------+
| 0       | race             |
| 3       | fire             |
+---------+------------------+

Is this something that is feasible to achieve efficiently? And if so how can I achieve this?

I have a fairly basic understanding of SQL (basic SELECT, UPDATE, etc statements), so something like this is far beyond my comprehension.

1

There are 1 answers

5
tonypdmtr On BEST ANSWER

If I understood your requirements correctly:

select *
  from page
  where page_id not in (select cl_from from categorylinks
                         where cl_to like 'English_%_forms'
                            or cl_to like 'English_phrases')
    and page_id in (select cl_from from categorylinks
                      where cl_to like 'English_lemmas')