Why does having GROUP BY in an IN clause cause the query to become unexecutable?

56 views Asked by At
SELECT id, user_id, item_id
FROM questionnaire_answer_old_0
WHERE item_id IN (
    SELECT item_id
    FROM questionnaire_answer_old_0
    GROUP BY user_id, item_id
    HAVING COUNT(user_id) > 1 AND COUNT(item_id) > 1
);

The code provided above executes the subquery normally. The issue arises when attempting to run the entire query, which results in no response and no error messages; it just shows as 'query in progress.' Removing the GROUP BY from the subquery allows the query to execute normally. What could be the reason for this?

1

There are 1 answers

1
Charlieface On BEST ANSWER

You could use window functions, which might be more efficient.

HAVING COUNT(user_id) > 1 AND COUNT(item_id) > 1 probably doesn't do what you think it does, because COUNT(someValue) just counts the number of non-nulls, not the number of distinct values.

I've tried to guess your intention, it looks like you want to get all rows for an item_id, where any of those are part of a group of item_id, user_id where there are multiple rows. So you need two levels of partitioning.

SELECT id, user_id, item_id
FROM (
    SELECT
      *,
      COUNT(CASE WHEN count > 1 THEN 1 END) OVER (PARTITION BY user_id) AS hasCountGtThan1
    FROM (
        SELECT *,
          COUNT(*) OVER (PARTITION BY user_id, item_id) AS count
        FROM questionnaire_answer_old_0 qa
    ) qa
) qa
WHERE hasCountGtThan1 > 0;