Impala Concurrent READ & Overwrite

701 views Asked by At

I noticed in one application that concurrent READ (with invalidating metadata ) and OVERWRITING table , causes underlying files to corrupt.

Is it a known scenario? I expected that while table is been overwritten, concurrent read would be just failed, It can't corrupt underlying files of the table.

Help will be appreciated!

1

There are 1 answers

1
Matt On

If the files become corrupt, it shouldn't be caused by concurrent reads and writes. HDFS is a read/append-only filesystem and Impala will always write new files. When you insert, files are written to a staging directory which Impala will not read from until files are complete, at which point they are moved into the table/partition directory.

A few things to consider: If you run the insert independently of the select, the files are OK? What do you mean by corrupt? Does it work in Hive? What version of Impala are you running?