Select one row per day for each value

Question

Select one row per day for each value

4.2k views Asked by ibrewster At 23 June 2015 at 21:17

I have a SQL query in PostgreSQL 9.4 that, while more complex due to the tables I am pulling data from, boils down to the following:

SELECT entry_date, user_id, <other_stuff>
FROM <tables, joins, etc>
GROUP BY entry_date, user_id
WHERE <whatever limits I want, such as limiting the date range or users>

With the result that I have one row per user, per day for which I have data. In general, this query would be run for an entry_date period of one month, with the desired result of having one row per day of the month for each user.

The problem is that there may not be data for every user every day of the month, and this query only returns rows for days that have data.

Is there some way to modify this query so it returns one row per day for each user, even if there is no data (other than the date and the user) in some of the rows?

I tried doing a join with a generate_series(), but that didn't work - it can make there be no missing days, but not per user. What I really need would be something like "for each user in list, generate series of (user,date) records"

EDIT: To clarify, the final result that I am looking for would be that for each user in the database - defined as a record in a user table - I want one row per date. So if I specify a date range of 5/1/15-5/31/15 in my where clause, I want 31 rows per user, even if that user had no data in that range, or only had data for a couple of days.

Original Q&A

There are 1 answers

**Erwin Brandstetter** · Accepted Answer · 2015-06-23T21:27:15+00:00

generate_series() was the right idea. You probably did not get the details right. Could work like this:

WITH cte AS (
   SELECT entry_date, user_id, <other_stuff>
   FROM   <tables, joins, etc>
   GROUP  BY entry_date, user_id
   WHERE  <whatever limits I want>
   ) 
SELECT *
FROM  (SELECT DISTINCT user_id FROM cte) u
CROSS  JOIN (
   SELECT entry_date::date 
   FROM   generate_series(current_date - interval '1 month'
                        , current_date - interval '1 day'
                        , interval '1 day') entry_date
   ) d
LEFT   JOIN cte USING (user_id, entry_date);

I picked a running time window of one month ending "yesterday". You did not define your "month" exactly.

Assuming entry_date to be data type date.

Simpler for your updated requirements

To get results for every user in a users table (and not for a current selection) and for your given time range, it gets simpler. You don't need the CTE:

SELECT *
FROM   (SELECT user_id FROM users) u
CROSS  JOIN (
   SELECT entry_date::date 
   FROM   generate_series(timestamp '2015-05-01'
                        , timestamp '2015-05-31'
                        , interval '1 day') entry_date
   ) d
LEFT   JOIN (
   SELECT entry_date, user_id, <other_stuff>
   FROM   <tables, joins, etc>
   GROUP  BY entry_date, user_id
   WHERE  <whatever>
   ) t USING (user_id, entry_date);

Why this particular way to call generate_series()?

Generating time series between two dates in PostgreSQL

And best use ISO 8601 date format (YYYY-MM-DD) which works regardless of locale settings.

TechQA.

Select one row per day for each value

There are 1 answers

Simpler for your updated requirements

Related Questions in SQL

Related Questions in POSTGRESQL

Related Questions in TIME-SERIES

Related Questions in CROSS-JOIN

Related Questions in GENERATE-SERIES

Popular Questions

Trending Questions