clojure: parallel processing using multiple computers

Question

clojure: parallel processing using multiple computers

439 views Asked by Pradnyesh Sawant At 02 January 2015 at 12:00

i have 500 directories, and 1000 files (each about 3-4k lines) for each directory. i want to run the same clojure program (already written) on each of these files. i have 4 octa-core servers. what is a good way to distribute the processes across these cores? cascalog (hadoop + clojure)?

basically, the program reads a file, uses a 3rd party Java jar to do computations, and inserts the results into a DB

note that: 1. being able to use 3rd party libraries/jar is mandatory 2. there is no querying of any sorts

Original Q&A

There are 2 answers

myguidingstar On 04 January 2015 at 03:59

Onyx is a recent pure Clojure alternative to Hadoop/Storm. As long as you're familiar with Clojure, working with Onyx is pretty simple. You should give this data-driven approach a try:

https://github.com/MichaelDrogalis/onyx

**Arthur Ulfeldt** · Accepted Answer · 2015-01-02T21:23:34+00:00

Because there is no "reduce" stage to your overall process as I understand it, it makes sense to put 125 of the directories on each server and then spend the rest you time trying to make this program process them faster. Up to the point where you saturate the DB of course.

Most of the "big-data" tools available (Hadoop, Storm) focus on processes that need both very powerful map and reduce operations, with perhaps multiple stages of each. Your case all you really need is a decent way to keep track of which jobs passed and which didn't. I'm as bad as anyone (and worse than many) at predicting development times, though in this case I'd say it would an even chance that rewriting your process on one of the map-reduce-esque tools will take longer than adding a monitoring process to keep track of which jobs finished and which failed so you can rerun the failed ones later (preferably automatically).

TechQA.

clojure: parallel processing using multiple computers

There are 2 answers

Related Questions in HADOOP

Related Questions in CLOJURE

Related Questions in PARALLEL-PROCESSING

Related Questions in CASCALOG

Popular Questions

Trending Questions