Distcp Mismatch in length of source

2.2k views Asked by At

I am facing issue while executing distcp command between two different hadoop clusters,

Caused by: java.io.IOException: Mismatch in length of source:hdfs://ip1/xxxxxxxxxx/xxxxx and target:hdfs://nameservice1/xxxxxx/.distcp.tmp.attempt_1483200922993_0056_m_000011_2

I tried using -pb and -skipcrccheck:

hadoop distcp -pb -skipcrccheck -update hdfs://ip1/xxxxxxxxxx/xxxxx hdfs:///xxxxxxxxxxxx/ 

hadoop distcp -pb  hdfs://ip1/xxxxxxxxxx/xxxxx hdfs:///xxxxxxxxxxxx/ 

hadoop distcp -skipcrccheck -update hdfs://ip1/xxxxxxxxxx/xxxxx hdfs:///xxxxxxxxxxxx/ 

but nothing seems to be working.

Any solutions please.

3

There are 3 answers

0
Edi Bice On

I was facing the same issue with distcp between two Hadoop clusters of exactly the same version. For me it turned out to be due to some files in one of the source directories being still open. Once I ran distcp for each source directory individually I was able to find that was the case - it worked fine for all but the one directory with the open files and only for those files. Of course it's hard to tell at first blush.

0
Aditya On

The issue was resolved by performing copyToLocal from cluster1 one to local linux fs and copyFromLocal to cluster2.

0
tom lee On
  1. Check source file stats, use command:

    hdfs fsck hdfs://xxxxxxxxxxx
    
  2. If the source file is not close, use this command to close it:

    hdfs debug recoverLease -path hdfs://xxxxxxx
    
  3. hadoop distcp -bandwidth 15 -m 50 -pb hdfs://xxxxxx hdfs://xxxxxx