Internal working of Spark - Communication/Synchronization

200 views Asked by At

I am quite new to Spark but already have programming experience in BSP model. In BSP model (e.g. Apache Hama), we have to handle all the communication and synchronization of nodes on our own. Which is good on one side because we have a finer control on what we want to achieve but on the other hand it adds more complexity.

Spark on the other hand, takes all the control and handles everything on its own (which is great) but I don't understand how it works internally especially in cases where we have alot of data and message passing between nodes. Let me put an example

zb = sc.broadcast(z)
r_i = x_i.map(x => Math.pow(norm(x - zb.value), 2))
r_i.checkpoint()
u_i = u_i.zip(x_i).map(ux => ux._1 + ux._2 - zb.value)
u_i.checkpoint()
x_i = f.prox(u_i.map(ui => {zb.value - ui}), rho)
x_i.checkpoint()
x = x_i.reduce(_+_) / f.numSplits.toDouble
u = u_i.reduce(_+_) / f.numSplits.toDouble
z = g.prox(x+u, f.numSplits*rho)
r = Math.sqrt(r_i.reduce(_+_))

This is a method taken from here, which runs in a loop (let's say 200 times). x_i contains our data (let's say 100,000 entries).

In a BSP style program if we have to process this map operation, we will partition this data and distribute on multiple nodes. Each node will process sub part of data (map operation) and will return the result to master (after barrier synchronization). Since master node wants to process each individual result returned (centralized master- see figure below), we send the result of each entry to master (reduce operator in spark). So, (only) master receives 100,000 messages after each iterations. It processes this data and sends the new values to slaves again which again start processing for next iteration.

Now, since Spark takes control from user and does internally everything, I am unable to understand how Spark collects all the data after map operations (asynchronous message passing? i heard it has p2p message passing ? what about synchronization between map tasks? If it does synchronization, then is it right to say that Spark is actually a BSP model ?). Then in order to apply the reduce function, does it collects all the data on a central machine (If yes, does it receives 100,000 messages on a single machine?) or it reduces in a distributed fashion (If yes, then how can this be performed ?)

Following figure shows my reduce function on master. x_i^k-1 represents the i-th value calculated (in previous iteration) against x_i data entry of my input. x_i^k represents the value of x_i calculated in current iteration. Clearly, this equation, needs the results to be collected.

enter image description here

I actually want to compare both styles of distributed programming to understand when to use Spark and when to move to BSP. Further, I looked alot on the internet, all I find is how map/reduce works but nothing useful was available on actual communication/synchronization. Any helful material will be useful aswell.

0

There are 0 answers