I have a subquery that aggregates some UNION ALL selects. Over that I prepare the SELECT
to create cross-tab and limit it to let's say 20. I would like to be able to retrieve the total COUNT of sub query results before I am limiting them in main query. This is for the purpose of trying to build a pagination that receives the total number of records and then the specific page record grid.
Sample query:
SELECT
name,
sumIf(metric_value, metric_name = 'data') AS data,
sumif(....
FROM
(SELECT
name, metric_name, SUM(metric_value) as metric_value
FROM
(SELECT
name, 'data' AS metric_name, SUM(data) AS metric_value
FROM
table
WHERE
date > '2017-01-01 00:00:00'
GROUP BY
name
UNION ALL
SELECT
name, 'data' AS metric_name, SUM(data) AS metric_value
FROM
table2
WHERE
date > '2017-01-01 00:00:00'
GROUP BY
name
UNION ALL
SELECT
name, 'data' AS metric_name, SUM(data) AS metric_value
FROM
table3
WHERE
date > '2017-01-01 00:00:00'
GROUP BY
name
UNION ALL
.
.
.)
GROUP BY
name, metric_name)
GROUP BY
name
ORDER BY
name ASC
LIMIT 0,20;
The first subselect returns tons of data, so I thought I can count it and return as one column value, or row and it would propagate to main select that limits 20 results. Because I need to know the entire set of results but don;t want to call the same query twice without limit and with limit just to get COUNT. There are at least 12 UNION ALL third level sub selects, so why waste resources. I am looking to try generic SQL solutions not necessarily related to ClickHouse
I was thinking of using count(*) OVER ()
, however that is not supported, so if thats only option I know I need to run query twice.
The first thing that one should mention is that nobody is usually interested in the exact number of pages on a query. It can be easily estimated and almost no one will care how exact is the estimation. However, if you have a link to the last page in your GUI, people will often click to link just to see whether it works.
Nevertheless, there are cases when an analyst should visit all the pages, and then the GUI should display the exact amount of work. A good news is that in that latter case, a better strategy is to cache a snapshot of the whole results table and counting the rows in the table becomes not a problem anymore.
I mean, it makes sense to discuss with the customers whether they really need it, because unneeded full scans many times per day may have effect on the database load and billing sums.
Anyway, if you still need to estimate the number of rows, you can simplify the query just to count the number of rows. As I understand this is something like:
or if
data
is a constant metric name