I noticed in one application that concurrent READ (with invalidating metadata ) and OVERWRITING table , causes underlying files to corrupt.
Is it a known scenario? I expected that while table is been overwritten, concurrent read would be just failed, It can't corrupt underlying files of the table.
Help will be appreciated!
If the files become corrupt, it shouldn't be caused by concurrent reads and writes. HDFS is a read/append-only filesystem and Impala will always write new files. When you insert, files are written to a staging directory which Impala will not read from until files are complete, at which point they are moved into the table/partition directory.
A few things to consider: If you run the insert independently of the select, the files are OK? What do you mean by corrupt? Does it work in Hive? What version of Impala are you running?