I have trino to query hdfs with hive connector.

not always but sometimes it gets this error :

io.trino.spi.trinoexception error reading from hdfs at position caused by java.io.ioexception 4 missing blocks , the stripe is : AlignedStripe ..

I have EC policy on my hdfs.

my table format is ORC

and i select huge table with time range 3 months.

for instance it says reading error ... day=20231221/000_5 (this is orc file)

then i try to read just 20231221 this day its ok it works and there is no dead node on my hdfs or corrupted block.

how can i handle this error

1

There are 1 answers

1
Pohakoo On

Trino is struggling to read your ORC table in HDFS bc of missing blocks caused by erasure coding. Try querying smaller time ranges or increasing the block replication factor in your HDFS erasure coding policy. also, look into HDFS health/Trino configuration options for retries.