Accessing csv file placed in hdfs using spark

Question

Accessing csv file placed in hdfs using spark

2.1k views Asked by optimist At 12 June 2015 at 09:37

I have placed a csv file into the hdfs filesystem using hadoop -put command. I now need to access the csv file using pyspark csv. Its format is something like

`plaintext_rdd = sc.textFile('hdfs://x.x.x.x/blah.csv')`

I am a newbie to hdfs. How do I find the address to be placed in hdfs://x.x.x.x?

Here's the output when I entered

hduser@remus:~$ hdfs dfs -ls /input

Found 1 items
-rw-r--r--   1 hduser supergroup        158 2015-06-12 14:13 /input/test.csv

Any help is appreciated.

Original Q&A

There are 3 answers

vvladymyrov On 12 June 2015 at 12:43

Try to specify absolute path without hdfs://

plaintext_rdd = sc.textFile('/input/test.csv')

Spark while running on the same cluster with HDFS use hdfs:// as default FS.

Sairam Asapu On 05 October 2018 at 12:39

Start the spark shell or the spark-submit by pointing to the package which can read csv files, like below:

spark-shell  --packages com.databricks:spark-csv_2.11:1.2.0

And in the spark code, you can read the csv file as below:

val data_df = sqlContext.read.format("com.databricks.spark.csv")
              .option("header", "true")
              .schema(<pass schema if required>)
              .load(<location in HDFS/S3>)

**Abhishek Choudhary** · Accepted Answer · 2015-06-12T09:52:23+00:00

you need to provide the full path of your files in HDFS and the url will be mentioned in your hadoop configuration core-site or hdfs-site where you mentioned.

Check your core-site.xml & hdfs-site.xml for get the details about url.

Easy way to find any url is access your hdfs from your browser and get the path.

If you are using absolute path in your file system use file:///<your path>

TechQA.

Accessing csv file placed in hdfs using spark

There are 3 answers

Related Questions in CSV

Related Questions in HADOOP

Related Questions in APACHE-SPARK

Related Questions in PYSPARK

Popular Questions

Popular Tags

Trending Questions