Google Dataflow job failing continuously: "Pipe broken"

Question

Google Dataflow job failing continuously: "Pipe broken"

658 views Asked by Lahiru At 03 January 2017 at 17:06

I've been using the same code for a long time it used to work but when I re-run our batch loader it gave error not enough disk space so I increased the disk size and ran again then I get Pipeline broken error like below

    (84383c8e79f9b6a1): java.io.IOException: java.io.IOException: Pipe broken
    at com.google.cloud.hadoop.util.AbstractGoogleAsyncWriteChannel.waitForCompletionAndThrowIfUploadFailed(AbstractGoogleAsyncWriteChannel.java:431)
    at com.google.cloud.hadoop.util.AbstractGoogleAsyncWriteChannel.close(AbstractGoogleAsyncWriteChannel.java:289)
    at com.google.cloud.dataflow.sdk.runners.worker.TextSink$TextFileWriter.close(TextSink.java:243)
    at com.google.cloud.dataflow.sdk.util.common.worker.WriteOperation.finish(WriteOperation.java:100)
    at com.google.cloud.dataflow.sdk.util.common.worker.MapTaskExecutor.execute(MapTaskExecutor.java:77)
    at com.google.cloud.dataflow.sdk.runners.worker.DataflowWorker.executeWork(DataflowWorker.java:254)
    at com.google.cloud.dataflow.sdk.runners.worker.DataflowWorker.doWork(DataflowWorker.java:191)
    at com.google.cloud.dataflow.sdk.runners.worker.DataflowWorker.getAndPerformWork(DataflowWorker.java:144)
    at com.google.cloud.dataflow.sdk.runners.worker.DataflowWorkerHarness$WorkerThread.doWork(DataflowWorkerHarness.java:180)
    at com.google.cloud.dataflow.sdk.runners.worker.DataflowWorkerHarness$WorkerThread.call(DataflowWorkerHarness.java:161)
    at com.google.cloud.dataflow.sdk.runners.worker.DataflowWorkerHarness$WorkerThread.call(DataflowWorkerHarness.java:148)
    at java.util.concurrent.FutureTask.run(FutureTask.java:266)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
    at java.lang.Thread.run(Thread.java:745)
Caused by: java.io.IOException: Pipe broken
    at java.io.PipedInputStream.read(PipedInputStream.java:321)
    at java.io.PipedInputStream.read(PipedInputStream.java:377)
    at com.google.api.client.util.ByteStreams.read(ByteStreams.java:181)
    at com.google.api.client.googleapis.media.MediaHttpUploader.setContentAndHeadersOnCurrentRequest(MediaHttpUploader.java:629)
    at com.google.api.client.googleapis.media.MediaHttpUploader.resumableUpload(MediaHttpUploader.java:409)
    at com.google.api.client.googleapis.media.MediaHttpUploader.upload(MediaHttpUploader.java:336)
    at com.google.api.client.googleapis.services.AbstractGoogleClientRequest.executeUnparsed(AbstractGoogleClientRequest.java:427)
    at com.google.api.client.googleapis.services.AbstractGoogleClientRequest.executeUnparsed(AbstractGoogleClientRequest.java:352)
    at com.google.api.client.googleapis.services.AbstractGoogleClientRequest.execute(AbstractGoogleClientRequest.java:469)
    at com.google.cloud.hadoop.util.AbstractGoogleAsyncWriteChannel$UploadOperation.call(AbstractGoogleAsyncWriteChannel.java:357)
    ... 4 more

This error is sometimes normal but batch job finally finish but now it is not finishing and failing in the middle after couple of hours.

I am kinda blocked with this error and not sure how to proceed and get our batch loader start again.

Original Q&A

There are 1 answers

**Adam** · Answer 1 · 2017-01-16T19:08:34+00:00

Posting an answer to address the last question on the comment thread above.

The message "CoGbkResult has more than 10000 elements, reiteration (which may be slow) is required" is not an error. 10000 elements is chosen as the maximum amount to keep in memory at once, and it's just letting you know that it must re-iterate on the remaining results if you have more than 10000 of them.

I'd advise to continue debugging the issue on [email protected] as jkff suggested rather than in the comment thread, since it's grown outside the scope of a Stack Overflow question.

TechQA.

Google Dataflow job failing continuously: "Pipe broken"

There are 1 answers

Related Questions in GOOGLE-CLOUD-DATAFLOW

Related Questions in DATAFLOW

Popular Questions

Popular Tags

Trending Questions