I have a table with the following columns:
column1 | column2 | timestamp | event_id |
---|---|---|---|
c1v1 | c2v1 | 2021-03-11 00:00:00 | 1 |
c1v2 | c2v2 | 2021-03-11 01:03:00 | 1 |
c1v3 | c2v3 | 2021-03-12 10:00:00 | 2 |
c1v4 | c2v4 | 2021-03-13 20:00:00 | 1 |
c1v5 | c2v5 | 2021-03-13 11:00:00 | 2 |
c1v6 | c2v6 | 2021-03-13 00:00:00 | 3 |
c1v7 | c2v7 | 2021-03-14 00:00:00 | 2 |
I have start_time = 2021-03-10 05:14:00
and end_time = 2021-03-15 15:12:19
I am doing an elastic search query on this data which
- partitions by 1 day from start_time to end_time
- counts the number of documents in each of the partitions (partitions with 0 documents also because of extended bounds)
- for each of the partitions, finds the no of unique values in the columns event_id
{
"query": {
"bool": {
"filter":
[
{
"term": {"column1": "some_value"}
},
{
"term": {"column2": "some_value"}
},
{
"range": {
"timestamp": {
"gte": "<start_time>",
"lt": "<end_time>"
}
}
}
]
}
},
"aggs": {
"timestamp": {
"date_histogram": {
"field": "timestamp",
"fixed_interval": "1d",
"extended_bounds": {
"min": "<start_time>",
"max": "<end_time>"
}
},
"aggs": {
"unique_values": {
"cardinality": {
"field": "event_id"
}
}
}
}
}
}
I need help to create an sql query which does the same.
Output:
timestamp | doc_count | unique_values |
---|---|---|
2021-03-10 | 0 | 0 |
2021-03-11 | 2 | 1 |
2021-03-12 | 1 | 1 |
2021-03-13 | 3 | 3 |
2021-03-14 | 1 | 1 |
2021-03-15 | 0 | 0 |
Update: I have come up with this query but the values I am getting are close to the ones in es but not exact. Also this does not return dates with 0 documents.
SELECT
date_floor,
count(date_floor) AS cnt_date_floor,
count(DISTINCT(event_id)) AS cnt_dst_event_id
FROM (
SELECT
event_id,
DATE(timestamp) AS date_floor
FROM
<table_name>
WHERE
date BETWEEN date'<start_date>' AND date'<end_date>' AND
timestamp >= timestamp'<start_time>' AND
timestamp < timestamp'<end_time>' AND
column1 IN ('some val') AND
column2 = some_val)
GROUP BY date_floor
where start_date and end_date are floor_dates of start_time and end_time
You can use generate_series to generate the days that you need. Left join to this series and group by the days.
Your query has some extra filtering on column1 and column2 that were not part of the question, but it should be pretty trivial to add those back in.
If you don't have the luxury of setting start time to the beginning of the day outside of the query, just add date_trunc('day', <start_time>) instead of the start time.
Here's a fiddle.