I have two tables, locations and location groups
CREATE TABLE locations (
location_id INT UNSIGNED PRIMARY KEY AUTO_INCREMENT,
name VARCHAR(63) UNIQUE NOT NULL
);
INSERT INTO locations (name)
VALUES
('london'),
('bristol'),
('exeter');
CREATE TABLE location_groups (
location_group_id INT UNSIGNED PRIMARY KEY AUTO_INCREMENT,
location_ids VARCHAR(255) NOT NULL,
user_ids VARCHAR(255) NOT NULL,
name VARCHAR(63) NOT NULL,
);
INSERT INTO location_groups (location_ids, user_ids, name)
VALUES
('1', '1,2,4', 'south east'),
('2,3', '2', 'south west');
What I am trying to do is return all location_ids for all of the location_groups where the given user_id exists. I'm using CSV to store the location_ids and user_ids in the location_groups table. I know this isn't normalised, but this is how the database is and it's out of my control.
My current query is:
SELECT location_id
FROM locations
WHERE FIND_IN_SET(location_id,
(SELECT location_ids
FROM location_groups
WHERE FIND_IN_SET(2,location_groups.user_ids)) )
Now this works fine if the user_id = 1 for example (as only 1 location_group row is returned), but if i search for user_id = 2, i get an error saying the sub query returns more than 1 row, which is expected as user 2 is in 2 location_groups. I understand why the error is being thrown, i'm trying to work out how to solve it.
To clarify when searching for user_id 1 in location_groups.user_ids the location_id 1 should be returned. When searching for user_id 2 the location_ids 1,2,3 should be returned.
I know this is a complicated query so if anything isn't clear just let me know. Any help would be appreciated! Thank you.
You could use
GROUP_CONCAT
to combine thelocation_id
s in the subquery.Alternatively, use the problems with writing the query as an example of why normalization is good. Heck, even if you do use this query, it will run more slowly than a query on properly normalized tables; you could use that to show why the tables should be restructured.
For reference (and for other readers), here's what a normalized schema would look like (some additional alterations to the base tables are included).
The compound fields in the
location_groups
table could simply be separated into additional rows to achieve 1NF, but this wouldn't be in 2NF, as thename
column would be dependent on only thelocation
part of the(location, user)
candidate key. (Another way of thinking of this is thename
is an attribute of the regions, not the relations between regions/groups, locations and users.)Instead, these columns will be split off into two additional tables for 1NF: one to connect locations and regions, and one to connect users and regions. It may be that the latter should be a relation between users and locations (rather than regions), but that's not the case with the current schema (which could be another problem of the current, non-normalized schema). The region-location relation is one-to-many (since each location is in one region). From the sample data, we see the region-user relation is many-many. The
location_groups
table then becomes theregion
table.Sample data:
Now, the desired query for the normalized schema:
Result: