Error during the pipeline execution: exceeds allowed maximum skew

207 views Asked by At

I executed a pipeline Friday and it has been executing during the weekend but the Sunday there were a lots of the below error:

14 jun. 2015 14:40:51
(6f550257718f53da): Exception: java.lang.IllegalArgumentException: Timestamp 2015-06-14T12:40:48.731Z exceeds allowed maximum skew.
com.google.api.client.repackaged.com.google.common.base.Preconditions.checkArgument(Preconditions.java:119)
com.google.api.client.util.Preconditions.checkArgument(Preconditions.java:69)
com.google.cloud.dataflow.sdk.util.DoFnRunner$DoFnProcessContext.checkTimestamp(DoFnRunner.java:502)
com.google.cloud.dataflow.sdk.util.DoFnRunner$DoFnProcessContext.outputWithTimestamp(DoFnRunner.java:465)
com.xtg.hub.dataflow.stats.common.Util$ExtractTimestampFn.processElement(Util.java:62)

In the pipeline there is a 5 minutes FixedWindow and before doing the .apply for this FixedWindow I'm assigning the TimeStamp to each element of the PCollection:

c.outputWithTimestamp(c.element(), Instant.now());

Am I doing something wrong?

Thanks in advance.

1

There are 1 answers

1
danielm On

It's not current supported in dataflow to call outputWithTimestamp with a timestamp less than the timestamp of the input element. It's possible that, due to clock skew between workers, setting the timestamp to Instant.now() is trying to move the timestamp backwards.

Edit: For example, you could do:

Instant now = Instant.now();
c.outputWithTimestamp(c.element(),
                      now.isAfter(c.timestamp()) ? now : c.timestamp());