Drill can query on open HDFS directories?

153 views Asked by At

I am successfully query HIVE and HBase tables by using Drill.In my usecase i am getting data from storm into HDFS directory,for that directory i am creating Hive structure and querying that data by using Hive and Drill.Whenever storm is writing data into that directory(means directory is opened and writing data into hdfs) then drill is not able to query that hive table,it is giving error as,

Failed with exception java.io.IOException:java.io.IOException: Cannot obtain block length for LocatedBlock{BP-517438351-192.168.1.136-1475035616867:blk_1073793923_53182; getBlockSize()=0; corrupt=false; offset=0; locs=[127.0.0.1:50010]; storageIDs=[DS-be58a5f4-58d9-4c3c-8138-ce18ffa10ef8]; storageTypes=[DISK]}

if we are stop writing then drill is able to query that hive tables.In both cases hive is working properly.I am not able to find the cause.

Anybody can you please tell me,Drill can query opened HDFS files or directories or not?I tried alot but not getting anything about that.

1

There are 1 answers

4
testtech On

Technically any file system (ext2, ext3, or hdfs) should be consistent to read/ write. When you are writing data to directory, file system is set to writing mode by one process and can not give read access to another process. Even though you force to read data, the process in reading gets inconsistent data. This is the reason, any file/ directory, when it is in writing mode may not get reading access. In my opinion, in hdfs you may not execute read query when another process is writing to same file.