I am successfully query HIVE and HBase tables by using Drill.In my usecase i am getting data from storm into HDFS directory,for that directory i am creating Hive structure and querying that data by using Hive and Drill.Whenever storm is writing data into that directory(means directory is opened and writing data into hdfs) then drill is not able to query that hive table,it is giving error as,
Failed with exception java.io.IOException:java.io.IOException: Cannot obtain block length for LocatedBlock{BP-517438351-192.168.1.136-1475035616867:blk_1073793923_53182; getBlockSize()=0; corrupt=false; offset=0; locs=[127.0.0.1:50010]; storageIDs=[DS-be58a5f4-58d9-4c3c-8138-ce18ffa10ef8]; storageTypes=[DISK]}
if we are stop writing then drill is able to query that hive tables.In both cases hive is working properly.I am not able to find the cause.
Anybody can you please tell me,Drill can query opened HDFS files or directories or not?I tried alot but not getting anything about that.
Technically any file system (ext2, ext3, or hdfs) should be consistent to read/ write. When you are writing data to directory, file system is set to writing mode by one process and can not give read access to another process. Even though you force to read data, the process in reading gets inconsistent data. This is the reason, any file/ directory, when it is in writing mode may not get reading access. In my opinion, in hdfs you may not execute read query when another process is writing to same file.