Find number of records from a table in apache drill without using count query

67 views Asked by At

I am currently using apache-drill-1.21.1 I have a scenario in which I need to fetch the count for various tables frequently. Currently I am using the drill count query for doing the same. But if i have millions of records then fetching it frequently gives performance issue.Moreoever the number of records in the table also keeps changing so I need a fresh count each time. Can anyone please suggest what can be done to avoid executing count query and use some other approach instead? Do we have some api in drill which can give the row count? If not do we have some settings in drill related to query execution through which the count can be received efficiently? Also Can anybody also tell me what is the driver that drill uses to connect to database?

1

There are 1 answers

0
Dzamo Norton On

One way to make row counts fast is to put the data into Parquet files. If you can't, or prefer not to, do that then you can collect table statistics including a row count in Drill's metastore.

apache drill (dfs.tmp)> analyze table foo refresh metadata;
ok       true
summary  Collected / refreshed metadata for table [dfs.tmp.foo]

This operation is expensive but you could limit the columns considered if you only want a row count.

Once the metastore is populated you can query the table row count directly from it.

apache drill (dfs.tmp)> select num_rows from information_schema.`tables` where table_name = 'foo';
num_rows  6000

Partitioning the data appropriately would also enable the reuse of row counts calculated for completely loaded partitions.