Generate rows from input array

Asked by At

Let's assume I have a table with many records called comments, and each record includes only a text body:

CREATE TABLE comments(id INT NOT NULL, body TEXT NOT NULL, PRIMARY KEY(id));
INSERT INTO comments VALUES (generate_series(1,100), md5(random()::text));

Now, I have an input array with N substrings, with arbitrary length. For example:

abc
xyzw
123456
not_found

For each input value, I want to return all rows that match a certain condition.

For example, given that the table includes the following records:

| id | body        |
| -- | ----------- |
| 11 | abcd1234567 |
| 22 | unkown12    |
| 33 | abxyzw      |
| 44 | 12345abc    |
| 55 | found       |

I need a query that returns the following result:

| substring | comments.id | comments.body |
| --------- | ----------- | ------------- |
| abc       | 11          | abcd1234567   |
| abc       | 44          | 12345abc      |
| xyzw      | 33          | abxyzw        |
| 123456    | 11          | abcd1234567   |

So far, I have this SQL query:

SELECT substrings, comments.id, comments.body
FROM unnest(ARRAY[
  'abc',
  'xyzw',
  '123456',
  'not_found'
]) AS substrings
JOIN comments ON comments.id IN (
  SELECT id
  FROM comments as inner_comments
  WHERE inner_comments.body LIKE ('%' || substrings || '%')
);

But the database client gets stuck for more than 10 minutes. And I missing something about joins?

Please note that this is a simplified example of my problem. My current check on the comment is not a LIKE statement, but a complex switch-case statement of different functions (fuzzy matching).

2 Answers

1
sticky bit On Best Solutions

The detour with the IN is unnecessary and unless the optimizer can rewrite this and it likely cannot, adds overhead. Try if it gets better without.

SELECT un.substring,
       comments.id,
       comments.body
       FROM unnest(ARRAY['abc',
                         'xyzw',
                         '123456',
                         'not_found']) un (substring)
       INNER JOIN comments
                  ON comments.body LIKE ('%' || un.substring || '%');

But still indexes cannot be used here because of the wildcard at the beginning. You might want to look at Full Text Search and see what options you have with it to improve the situation.

0
Carlos Alves Jorge On

Basically you are performing FULLTEXT search in a column that most likely doesn't have a FULLTEXT index.

A first step you could try would be to have your column "body" FULLTEXT indexed. See details here and then perform the search using CONTAINS but, quite honestly, since you want to perform fuzzy matching you cannot rely on SQL server to perform the search - it would just not work properly. You will need an indexing service such as ElasticSearch, CloudSearch, Azure Search, etc