HDFS commands in pyspark script

Question

HDFS commands in pyspark script

381 views Asked by learning At 10 May 2022 at 09:47

I am writing a simple pyspark script to copy hdfs files and folders from one location to another. I have gone through many docs and answers available online but i could not find a way to copy folders and files using pyspark or to execute hdfs commands using pyspark(particularly copy folders and files)

Below is my code

hadoop = sc._jvm.org.apache.hadoop
Path = hadoop.fs.Path
FileSystem = hadoop.fs.FileSystem
conf = hadoop.conf.Configuration()
fs = FileSystem.get(conf)
source = hadoop.fs.Path('/user/xxx/data')
destination = hadoop.fs.Path('/user/xxx/data1')

if (fs.exists(Path('/user/xxx/data'))):
    for f in fs.listStatus(Path('/user/xxx/data')):
        print('File path', str(f.getPath()))
        **** how to use copy command here ?

Thanks in advance

Original Q&A

There are 1 answers

**OneCricketeer** · Answer 1 · 2022-05-10T15:21:13+00:00

Create a new Java object for the FileUtil class and use its copy methods, not hdfs script commands

How to move or copy file in HDFS by using JAVA API

It might be better to just use distcp rather than Spark, though, otherwise, you'll run into race conditions if you try to run that code with multiple executors

TechQA.

HDFS commands in pyspark script

There are 1 answers

Related Questions in APACHE-SPARK

Related Questions in HADOOP

Related Questions in PYSPARK

Related Questions in HDFS

Popular Questions

Popular Tags

Trending Questions