I have placed a csv file into the hdfs filesystem using hadoop -put
command. I now need to access the csv file using pyspark csv. Its format is something like
`plaintext_rdd = sc.textFile('hdfs://x.x.x.x/blah.csv')`
I am a newbie to hdfs. How do I find the address to be placed in hdfs://x.x.x.x
?
Here's the output when I entered
hduser@remus:~$ hdfs dfs -ls /input
Found 1 items
-rw-r--r-- 1 hduser supergroup 158 2015-06-12 14:13 /input/test.csv
Any help is appreciated.
you need to provide the full path of your files in HDFS and the url will be mentioned in your hadoop configuration core-site or hdfs-site where you mentioned.
Easy way to find any url is access your hdfs from your browser and get the path.