postgresql query equivalent of date histogram with extended bounds in elastic search

488 views Asked by At

I have a table with the following columns:

column1 column2 timestamp event_id
c1v1 c2v1 2021-03-11 00:00:00 1
c1v2 c2v2 2021-03-11 01:03:00 1
c1v3 c2v3 2021-03-12 10:00:00 2
c1v4 c2v4 2021-03-13 20:00:00 1
c1v5 c2v5 2021-03-13 11:00:00 2
c1v6 c2v6 2021-03-13 00:00:00 3
c1v7 c2v7 2021-03-14 00:00:00 2

I have start_time = 2021-03-10 05:14:00 and end_time = 2021-03-15 15:12:19

I am doing an elastic search query on this data which

  • partitions by 1 day from start_time to end_time
  • counts the number of documents in each of the partitions (partitions with 0 documents also because of extended bounds)
  • for each of the partitions, finds the no of unique values in the columns event_id
{
    "query": {        
        "bool": {            
            "filter": 
            [
                {                    
                    "term": {"column1": "some_value"}
                },                
                {                   
                    "term": {"column2": "some_value"}                
                },                
                {                    
                    "range": {
                        "timestamp": {
                            "gte": "<start_time>", 
                            "lt": "<end_time>"
                        }
                    }
                }          
            ]        
        }    
    },    
    "aggs": {
        "timestamp": {
            "date_histogram": {
                "field": "timestamp", 
                "fixed_interval": "1d",                
                "extended_bounds": {
                    "min": "<start_time>",                    
                    "max": "<end_time>"
                }            
            },            
            "aggs": {                
                "unique_values": {                    
                    "cardinality": {                        
                        "field": "event_id"
                    }                
                }            
            }        
        }    
    }
}

I need help to create an sql query which does the same.

Output:

timestamp doc_count unique_values
2021-03-10 0 0
2021-03-11 2 1
2021-03-12 1 1
2021-03-13 3 3
2021-03-14 1 1
2021-03-15 0 0

Update: I have come up with this query but the values I am getting are close to the ones in es but not exact. Also this does not return dates with 0 documents.

SELECT 
    date_floor,
    count(date_floor) AS cnt_date_floor,
    count(DISTINCT(event_id)) AS cnt_dst_event_id
    FROM (
        SELECT  
            event_id,
            DATE(timestamp) AS date_floor
        FROM
            <table_name>
        WHERE
            date BETWEEN date'<start_date>' AND date'<end_date>' AND 
            timestamp >= timestamp'<start_time>' AND
            timestamp < timestamp'<end_time>' AND
            column1 IN ('some val') AND 
            column2 = some_val)
GROUP BY date_floor

where start_date and end_date are floor_dates of start_time and end_time

1

There are 1 answers

0
Jeremy On BEST ANSWER

You can use generate_series to generate the days that you need. Left join to this series and group by the days.

SELECT day,
       count(event_id) as doc_count,
       count(distinct event_id) as unique_values
FROM generate_series('2021-03-10T00:00:00', '2021-03-15T00:00:00', interval '1 day') as g(day)
LEFT JOIN test ON date_trunc('day', tstamp) = day
GROUP BY day
ORDER BY day;

Your query has some extra filtering on column1 and column2 that were not part of the question, but it should be pretty trivial to add those back in.

If you don't have the luxury of setting start time to the beginning of the day outside of the query, just add date_trunc('day', <start_time>) instead of the start time.

Here's a fiddle.