I want to run a simple workflow but it gets stuck in PREP state every time i submit the job. Here i am trying to read the values from text file which are comma separated and print them on screen. Fot this i am using following properties file, workflow file and script file.
Environment:
Hadoop: 2.6.0 (1 namenode and 2 datanode)
Oozie : 4.1.0
Pig : 0.14.0
This is my Properties File:
`
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<configuration>
<property>
<name>nameNode</name>
<value>hdfs://<IP/aliasname>:<port></value>
</property>
<property>
<name>jobTracker</name>
<value><IP/aliasname>:<port></value>
</property>
<property>
<name>oozie.libpath</name>
<value><path/to/pig/jars></value>
</property>
<property>
<name>oozie.wf.application.path</name>
<value><path/to/workflow app/in hdfs></value>
</property>
</configuration>
`
This is my Workflow:
<workflow-app name="samplewrokflow" xmlns="uri:oozie:workflow:0.2">
<start to="TestJob"/>
<action name="TestJobR">
<pig>
<job-tracker><IP/alias name>:<port></job-tracker>
<name-node>hdfs://<IP/alias name>:<port></name-node>
<script><Path/to/pig/script></script>
</pig>
<ok to="success"/>
<error to="fail"/>
</action>
<kill name="fail">
<message>The Identity Map-Reduce job failed!</message>
</kill>
<end name="success"/>
</workflow-app>
This is my Pig script:
DATA = LOAD 'path/to/sample.txt' USING PigStorage(',') as (name1:chararray,name2:chararray,number:int);DUMP DATA;
This is my content in sample.txt:
abc,xyz,1
Command used to run the job:
oozie job --oozie http://<IP address>:<port>/oozie -config <path/to/configuration file> -run
After running this command i get the job Id from screen.
This is my oozie job logs:
2015-06-08 10:58:56,814 INFO ActionStartXCommand:543 - SERVER[pal-hadoop1.cloudapp.net] USER[hadoop1] GROUP[-] TOKEN[] APP[WorkFlow_R] JOB[0000026-150603135220320-oozie-oozi-W] ACTION[0000026-150603135220320-oozie-oozi-W@:start:] Start action [0000026-150603135220320-oozie-oozi-W@:start:] with user-retry state : userRetryCount [0], userRetryMax [0], userRetryInterval [10]
2015-06-08 10:58:56,815 INFO ActionStartXCommand:543 - SERVER[pal-hadoop1.cloudapp.net] USER[hadoop1] GROUP[-] TOKEN[] APP[WorkFlow_R] JOB[0000026-150603135220320-oozie-oozi-W] ACTION[0000026-150603135220320-oozie-oozi-W@:start:] [***0000026-150603135220320-oozie-oozi-W@:start:***]Action status=DONE
2015-06-08 10:58:56,815 INFO ActionStartXCommand:543 - SERVER[pal-hadoop1.cloudapp.net] USER[hadoop1] GROUP[-] TOKEN[] APP[WorkFlow_R] JOB[0000026-150603135220320-oozie-oozi-W] ACTION[0000026-150603135220320-oozie-oozi-W@:start:] [***0000026-150603135220320-oozie-oozi-W@:start:***]Action updated in DB!
When i get the info using job Id it shows that the job is in PREP state always.
I executed the script independently using pig and it worked fine.
My workflow directory in hdfs structure:
oozie-wf/pigscript.pig
oozie-wf/workflow.xml
oozie-wf/sample.txt
oozie-wf/lib (Contains all pig jar files)
Can you tell me what would be the possible issue here because i could not rectify it from my side? and let me know if you require more details.
I think this is because of low number of containers on the cluster. How many containers do you have on yarn? Simply said, one container occupied for oozie, rest are needed to run the job. Probably pig might also be holding one container, though i am not sure about that. In case the containers are not sufficient for executing the job, it will remain stuck in PREP state.