I have a table users in a Postgres database that has a one-to-many relationship to another table users_attributes, a simple key-value type table with a foreign key column to the users table.
create table users(
id: uuid primary key,
name: varchar
);
create table users_attributes(
attribute_id: integer primary key,
user_id: uuid references users(id),
attribute_name: varchar,
attribute_value: varchar
);
I need to filter users based on attribute_name and attribute_value in the users_attributes table. I have tried this query, which works but it takes a lot longer to execute:
select * from users u
left join users_attributes ua1 on u.id = ua1.user_id and ua1.attribute_name = 'dog_name'
left join users_attributes ua2 on u.id = ua2.user_id and ua2.attribute_name = 'cat_name'
where ua1.attribute_value = 'Spot' and ua2.attribute_value = 'Mittens';
The number of joins go up for each attribute that I need to filter the users by. This is causing the query to slow down (between 4-10 seconds depending on number of joins) since there are approximately a hundred thousand users. An explain plan on the query supports this.
How do I query the users in a way that returns faster?
The mix of
LEFT JOINandWHEREconditions makes no sense, logically. See:Basic rewrite:
Basically, it's a case of relational-division.
There are many ways to do this. The best query style depends on your cardinalities, your typical filters, and what you are optimizing for. Here is a whole arsenal:
The query I gave is among the fastest options. You need matching indexes, of course. And everything would be more efficient with proper normalization, where the attribute name moves into a separate table
attribute, andattribute_idis an integer FK pointing there. An index onuser_attribute(attribute_id, user_id)(twointegercolumns) would be ideal. See:The query would resolve attribute names to integer IDs (explicitly, or implicitly in the query), and proceed with those.