How to group SQL entries by truncated/modified fields?

52 views Asked by At

I want to group all entries in my table by their user_type property. However the user_type entries exist in a {type of user} {date account was created} format, e.g.

id | user_type
--------------
01 | 'admin 2024-03-01'
02 | 'admin 2024-03-01'
03 | 'user 2024-03-02'
04 | 'user 2024-03-02'
05 | 'premium_user 2024-03-03'

I want to get the counts of each type of user, ignoring the date at the end of the value. So for the above example, my query would return:

user_type      | count
----------------------
'admin'        | 2
'user'         | 2
'premium_user' | 1

The user types are subject to change, so I can't hardcode in filters such as select count(id) where user_type like 'admin %'.

2

There are 2 answers

5
Tim Biegeleisen On

We can use SUBSTRING() with a regular expression here:

SELECT SUBSTRING(user_type from '^\w+') AS user_type, COUNT(*) AS count
FROM yourTable
GROUP BY user_type
ORDER BY COUNT(*) DESC;

Here is a working SQL fiddle.

2
Erwin Brandstetter On

Obviously, you need to split that column into two columns: user_type text (or some enumeration type) and date_created date.

While stuck with your unfortunate design - and if the trailing date has the fixed format your sample data suggests - left() with a negative length to truncate the last 11 characters is simplest & cheapest (much cheaper than a regular expression):

SELECT left(user_type, -11) AS user_type, count(*) AS count
FROM   tbl
GROUP  BY 1
ORDER  BY 2 DESC;

The solution is for Postgres. You also tagged Redshift, which is not Postgres at all. There, left() does not support negative length. You might work around with:

substring(user_type, 1, length(user_type) - 11)

About the numeric reference in GROUP BY 1 and ORDER BY 2:

That works in Postgres and Redshift.

fiddle