How to read from Azure Blob Storage in Hadoop?

4.6k views Asked by At

I have a map-reduce job and the reducer gets an absolute address of a file residing on the Azure Blob storage and the reducer should opens it and read its content. I add the storage account containing the files when provisioning my Hadoop cluster (HDInsight). So the reducer must have access to this Blob storage but as the Blob Storage is not the default HDFS storage for my job. I have the following code in my reducer, but it gives me a FileNotFound error message.

FileSystem fs = FileSystem.get(new Configuration());
Path pt = new Path("wasb://mycontainer@accountname..."); 
FSDataInputStream stream = fs.open(pt);
1

There are 1 answers

3
Jonathan Gao On

It is covered in https://azure.microsoft.com/en-us/documentation/articles/hdinsight-hadoop-use-blob-storage/#addressing

The syntax is wasb://[email protected]/example/jars/hadoop-mapreduce-examples.jar

If "mycontainer" is a private container, you must add "myaccount" azure storage account as an additional storage account during provision process.