I am currently using apache-drill-1.21.1 I have a scenario in which I need to fetch the count for various tables frequently. Currently I am using the drill count query for doing the same. But if i have millions of records then fetching it frequently gives performance issue.Moreoever the number of records in the table also keeps changing so I need a fresh count each time. Can anyone please suggest what can be done to avoid executing count query and use some other approach instead? Do we have some api in drill which can give the row count? If not do we have some settings in drill related to query execution through which the count can be received efficiently? Also Can anybody also tell me what is the driver that drill uses to connect to database?
Find number of records from a table in apache drill without using count query
67 views Asked by Achyut Muley At
1
One way to make row counts fast is to put the data into Parquet files. If you can't, or prefer not to, do that then you can collect table statistics including a row count in Drill's metastore.
This operation is expensive but you could limit the columns considered if you only want a row count.
Once the metastore is populated you can query the table row count directly from it.
Partitioning the data appropriately would also enable the reuse of row counts calculated for completely loaded partitions.