Please consider the following table:
_____________________
| sentence_word |
|---------|---------|
| sent_id | word_id |
|---------|---------|
| 1 | 1 |
| 1 | 2 |
| ... | ... |
| 2 | 4 |
| 2 | 1 |
| ... | ... |
With this table structure I want to store the words of sentences. Now I want to find out which words are together with a specific word in a sentence. The result should look like this:
_____________________
| word_id | counted |
|---------|---------|
| 5 | 1000 |
| 7 | 800 |
| 3 | 600 |
| 1 | 400 |
| 2 | 100 |
| ... | ... |
The query Looks like the following:
SELECT
word_id,
COUNT(*) AS counted
FROM sentence_word
WHERE sentence_word.sent_id IN (SELECT
sent_id
FROM sentence_word
WHERE word_id = [desired word]
)
AND word_id != [desired word]
GROUP BY word_id
ORDER BY counted DESC;
The query is working as it should but it always scans the full table. I created an index for sent_id and word_id. Do you have any ideas to optimize it that it doesn't Need to scan the full table all the time?
You could try a self join like this:
or perhaps even better