Is there a scalable way to implement database search by multiple tags?

161 views Asked by At

I have a Postgres database and a table of user-tags, with columnss UserId and TagId. Each user can have multiple tags, and vice versa.

Is there any way to implement a search by multiple tags in a scalable way? Example queries:

  • Get all users who have both tag1 and tag2
  • Get all users who have (tag1 or tag2) and tag3
  • Get all users who have tag1 and tag2 and don't have tag3

Since this is not easy to index and scale, I was thinking about using some kind of in-memory caches, for faster lookups. Do you know any readily-available solutions to this problem?

Thanks

1

There are 1 answers

0
tcthanh On

First of all, without knowing much details, I assume there is a number of tags but not so many, which makes the cardinality of column TagIds is low. My answer is based on this assumption.

In general, index on a low cardinality colum does not help to scale up queries on this column. For further details, please refer to Why low cardinality indexes negatively impact performance.

Secondly, the set of example queries given by you clearly gives an impression that other queries ( of this set) can be in disjunctive form (in other words, WHERE condition contains OR boolean predicates), which hints that no index will rescue the performance if the number of disjunctions is big. DBMS will consider between (a) scanning the entire table and test each row with the WHERE condition and (b) scanning the index on column TagIds.

Last but not least, going to in-memory will help you based on the fact that data now resides in memory. However, in principle, the in-memory DBMS also considers (a) and (b) and probably chooses (a) over (b).

I propose to use function index documented here in PostgreSQL. Take it into account if you are not dealing with ad-hoc queries: