Let's assume I have a table with many records called comments
, and each record includes only a text body:
CREATE TABLE comments(id INT NOT NULL, body TEXT NOT NULL, PRIMARY KEY(id));
INSERT INTO comments VALUES (generate_series(1,100), md5(random()::text));
Now, I have an input array with N substrings, with arbitrary length. For example:
abc
xyzw
123456
not_found
For each input value, I want to return all rows that match a certain condition.
For example, given that the table includes the following records:
| id | body |
| -- | ----------- |
| 11 | abcd1234567 |
| 22 | unkown12 |
| 33 | abxyzw |
| 44 | 12345abc |
| 55 | found |
I need a query that returns the following result:
| substring | comments.id | comments.body |
| --------- | ----------- | ------------- |
| abc | 11 | abcd1234567 |
| abc | 44 | 12345abc |
| xyzw | 33 | abxyzw |
| 123456 | 11 | abcd1234567 |
So far, I have this SQL query:
SELECT substrings, comments.id, comments.body
FROM unnest(ARRAY[
'abc',
'xyzw',
'123456',
'not_found'
]) AS substrings
JOIN comments ON comments.id IN (
SELECT id
FROM comments as inner_comments
WHERE inner_comments.body LIKE ('%' || substrings || '%')
);
But the database client gets stuck for more than 10 minutes. And I missing something about joins?
Please note that this is a simplified example of my problem. My current check on the comment is not a LIKE
statement, but a complex switch-case statement of different functions (fuzzy matching).
The detour with the
IN
is unnecessary and unless the optimizer can rewrite this and it likely cannot, adds overhead. Try if it gets better without.But still indexes cannot be used here because of the wildcard at the beginning. You might want to look at Full Text Search and see what options you have with it to improve the situation.