Saving hive output through oozie using ">"

2k views Asked by At

Is something like this possible in oozie?

hive -f hiveScript.hql > output.txt

I have the following oozie hive action for the above code as follows:

    <hive xmlns="uri:oozie:hive-action:0.1">
                <job-tracker>${jobTracker}</job-tracker>
                <name-node>${nameNode}</name-node>
                <configuration>
                    <property>
                        <name>mapred.job.queue.name</name>
                        <value>${queueName}</value>
                    </property>
                </configuration>        
               <script>hiveScript.hql</script>  
            </hive>
            <ok to="end" />
            <error to="kill" /> 
    </hive>

How can I tell the script where the output should go?

1

There are 1 answers

0
brandon.bell On

That is not possible with Oozie in the way that you want. This is because Oozie starts (most) of it's workflow actions on nodes within the cluster.

With this you could run the Oozie Shell action to run hive -f hiveScript.hql > output.txt... however this has different implications of requiring Hive to be installed everywhere, your hiveScript.hql to be everywhere, etc. Another way this doesn't quite work is your output file would be on whichever node was assigned to run this shell action. https://oozie.apache.org/docs/3.3.0/DG_ShellActionExtension.html

I think you best bet would be to include INSERT OVERWRITE DIRECTORY '/tmp/hdfs_out' SELECT * FROM ... in your hiveScript.hql file and pulling the results down from HDFS afterwards.

Edit: Another option I just thought of would be to use the SSH Action. https://oozie.apache.org/docs/3.2.0-incubating/DG_SshActionExtension.html You could potentially have the SSH Action shell to your target machine and run hive -f hiveScript.hql > output.txt.