Query max execution and planning times not honoured on trinodb

111 views Asked by At

I am running trino 428 and with high number of concurrent queries there glue throttling exceptions which result in very high elapsed times . I have specified a max execution time of 5 minutes (with below config properties) and on version 333 the queries used to time out in 5 minutes.

coordinator=true
node-scheduler.include-coordinator=true
protocol.v1.alternate-header-name=Presto

query.max-execution-time=5m
query.max-run-time=5m
query.max-planning-time=3m

I enabled metastore cache to reduce the load on glue

hive.metastore-cache-ttl=20m
hive.metastore-refresh-interval=5m

Below is a log showing elapsed time of more than 15 minutes 2023-10-17T23:59:56.977Z INFO dispatcher-query-8105 io.trino.event.QueryMonitor TIMELINE: Query 20231017_234419_35787_drmh5 :: FAILED (OPTIMIZER_TIMEOUT) :: elapsed 937914ms :: planning 937914ms :: waiting 1ms :: scheduling 0ms :: running 0ms :: finishing 0ms :: begin 2023-10-17T23:44:19.062Z :: end 2023-10-17T23:59:56.976Z

On version 333 , the queries timed out after 5 minutes with limit on execution time and there were no OPTIMIZER_TIMEOUT errors.

Also all the queries that I need have the same template

select columns from table where prtn_col between x and y and col1=x and col2>c  

The data on s3 is partitioned by prtn_col and the parquet files are sorted on col1 and col2

The schema is already known and s3 paths to read will be s3://table_location/x and s3://table_location/y ( prtn column is based on time hour so all paths can be figured out without making a call to glue)

1. Is there a way to enforce a stricter limit on query execution times ?

2. Any steps on implementing a custom metastore? I can implement a dummy metastore which can return the partitions without making a glue catalog call something like

 private List<Partition> batchGetPartition(Table table, List<String> partitionNames){
                        return partitionNames.stream().map(p -> new Partition(table.getDatabaseName(), table.getTableName(), partitionNames, table.getStorage(), table.getDataColumns(), new HashMap<>()))
                    .collect(Collectors.toList());
     }

PS : The cordinator and worker run in same process and there are multiple such clusters to serve high number of concurrent queries,which causes high load on glue catalog .However this issue was not there is version 333 .Although 428 has significant benefits in terms of s3 reads because of un-bounded elapsed times of few queries the overall query response times are higher than 333

0

There are 0 answers