I heard, that Apache Crunch is a facade and it can run applications without a Hadoop. Is this true?
If yes, then how to do that?
In Apache Crunch Getting Started the very first example includes hadoop command:
$ hadoop jar target/crunch-demo-1.0-SNAPSHOT-job.jar <in> <out>
Is it possible to omit hadoop
?
Maybe you misunderstood that you don't need a Hadoop cluster. Hive, Pig, Spark can all be ran locally, or filesystems other than HDFS.
From as much as I can know about the library, you do, however, need the Hadoop API (which is what
hadoop jar
will load for you).In other words, you could set the input and output directories to a local
file://
path to get around needing HDFS.You can
export CLASSPATH
yourself to include Hadoop libraries, and runjava jar
to run the JAR