Fastest way to copy large data from HDFS location to GCP bucket using command

143 views Asked by At

I have a 5TB of data which need to transfer to GCP bucket using some command.

I tried using hadoop discp -m num -strategy dynamic source_path destination_path. It's still getting executed since long.

Is there any alternative to copy large data from HDFS location to GCP bucket using command.

I tried to execute distCp command on 50GB of data with different number of mappers, I use:

hadoop discp -m num -strategy dynamic source_path destination_path

I have tried with below options:

  • with -m 18 -> it took 16 mins
  • with -m 22 -> it took 12 mins
  • with -m 44 -> it took 18 mins
  • with -m 60 -> it took 5 mins 20 sec
  • with -m 72 -> it took 5 mins 9 sec
  • with -m 80 -> it took 5 mins 7 sec
  • with -m 84 -> it took 16 mins 10 sec
  • with -m 88 -> it took 11+ mins

Can someone please suggest some alternative to distcp.

0

There are 0 answers