Snowflake - Warehouse Node Utilization data?

124 views Asked by At

I vaguely recalling someone stating that there is at least an "estimated" node usage of a Warehouse in Snowflake.

Not a cluster. The nodes of a cluster. For instance a Large Warehouse in snowflake has 8 nodes.

This allows for better resource planning - at least more granular estimates.

I mean for now, you can calculate the 'concurrency' of various queries and how query execution time blends together (average queries running together, query time vs. idle/ buffer/ autosuspend time).

What these calculations largely ignore is "query complexity" - it reduces every query to an execution time which is related to complexity, but not exactly.

Really the 'efficiency' of a warehouse also comes down to how many of its nodes are running at once.

A Large warehouse running a 'very simple' query might only use 1 node of 8.

Well anyway you get the idea. Is there any NODE USAGE data in Snowflake? I can't seem to find anything in the meta data Account Usage schema.

1

There are 1 answers

0
Greg Pavlik On BEST ANSWER

You can get information on what percentage of worker nodes in a warehouse cluster were active during a query. It's an either-or proposition. Either a worker node was involved in the running of a query or it wasn't. If it was involved at all, no matter or much, it counts toward the total percentage.

You can get that in the SNOWFLAKE.ACCOUNT_USAGE.QUERY_HISTORY view in the QUERY_LOAD_PERCENT column. You can test it with this mini script:

alter warehouse TEST set warehouse_size = 'Large';
select * from SNOWFLAKE_SAMPLE_DATA.TPCH_SF1.NATION;
alter warehouse TEST set warehouse_size = 'X-Small';
alter session set use_cached_result = false;
select * from SNOWFLAKE_SAMPLE_DATA.TPCH_SF1.NATION;

-- Wait a few minutes, up to 15 minutes:

select  * from SNOWFLAKE.ACCOUNT_USAGE.QUERY_HISTORY 
where   START_TIME > dateadd(minute, -30, current_timestamp)
  and   QUERY_TYPE = 'SELECT'
  and   QUERY_TEXT ilike '%TPCH_SF1%'
;

enter image description here

Because the NATION table in the TPCH sample data is small, it has only one micropartition. Because it's only one micropartition, a select query run against it only requires one worker node. When we run it on a Large warehouse, which has 8 nodes, it uses 1 out of 8 nodes or 12.5% (rounded up to 13%). By scaling down the warehouse to X-Small, which has only one node, we use 1 out of 1 for 100%.