oozie workflow throws Socket error but submits the workflow twice after 10 minutes

80 views Asked by At

I am facing very weird issue. I have workflow xml which contains like 20 fork-join nodes and each contain 4-8 actions . When I submits this workflow, It wait for like 5-6 minutes, throws

"Error: IO_ERROR : java.net.SocketException: Connection reset"

But actually what happens in the background is Its submits one workflow after 10 mins & another one after 12 mins. So it ends up triggering it twice.

I tried validate to my xml & it returned "OK". Since its not returning workflow, I am unable to do debugging. To be honest, I am not sure where to even start the debugging with.

I have similar workflow with lesser forks(6) and they all work fine. But not sure why this one causes all the trouble.

2

There are 2 answers

0
Spark_user On BEST ANSWER

Those logs did not provide any meaningful information. So I split my workflow files into 2 xmls. I called 2nd workflow from last action of first workflow .It works well without any issues.

1
1218985 On

The error that you stuck above looks more like from the client side. I think it would be a good idea to check the server logs instead.

oozie job -oozie http://localhost:11000 -info <wfid>
oozie job -oozie http://localhost:11000 -log <wfid>

It can also be possible that you might be using the invalid Oozie URL. For instance, if your cluster is kerberized, you have to use the Oozie URL that matches with the kerberos principal. If you're running from kerberized environment try, Kinit with principle and keytab (kinit user_principle -k -t key_tab) and then use FQN along with oozie server name in command like this

oozie job -oozie http://node_name@domain:11000/oozie -config xxxx -run