How to run Apache Crunch application without a Hadoop?

94 views Asked by At

I heard, that Apache Crunch is a facade and it can run applications without a Hadoop. Is this true?

If yes, then how to do that?

In Apache Crunch Getting Started the very first example includes hadoop command:

$ hadoop jar target/crunch-demo-1.0-SNAPSHOT-job.jar <in> <out>

Is it possible to omit hadoop?

1

There are 1 answers

0
OneCricketeer On

Maybe you misunderstood that you don't need a Hadoop cluster. Hive, Pig, Spark can all be ran locally, or filesystems other than HDFS.

From as much as I can know about the library, you do, however, need the Hadoop API (which is what hadoop jar will load for you).

In other words, you could set the input and output directories to a local file:// path to get around needing HDFS.

You can export CLASSPATH yourself to include Hadoop libraries, and run java jar to run the JAR