How to run Apache Crunch application without a Hadoop?

Question

89 views Asked by Dims At 23 May 2018 at 10:28

I heard, that Apache Crunch is a facade and it can run applications without a Hadoop. Is this true?

If yes, then how to do that?

In Apache Crunch Getting Started the very first example includes hadoop command:

$ hadoop jar target/crunch-demo-1.0-SNAPSHOT-job.jar <in> <out>

Is it possible to omit hadoop?

There are 1 answers

**OneCricketeer** · Answer 1 · 2018-05-24T00:52:04+00:00

Maybe you misunderstood that you don't need a Hadoop cluster. Hive, Pig, Spark can all be ran locally, or filesystems other than HDFS.

From as much as I can know about the library, you do, however, need the Hadoop API (which is what hadoop jar will load for you).

In other words, you could set the input and output directories to a local file:// path to get around needing HDFS.

You can export CLASSPATH yourself to include Hadoop libraries, and run java jar to run the JAR