Distributed Processing Clarification

129 views Asked by At

I have something in mind but I don't know the typical solution that could help me achieve that.

I need to have a distributed environment where not only memory is shared but processing is also shared, that means ALL Shared Processors work as one Big Processor Computing The code I wrote.

Could this be achieved knowing that I have limited knowledge in Data Grids and Hadoop?

Data Grid Platform (I knew that memory only is shared in that model) or Hadoop (where the code is shared among nodes but each node processes the code separately from other nodes but works on a subset of the data on HDFS).

But I need a solution that not only (shares memory or code as hadoop) but also the processing power of all the machines as one Single Big processor and one single Big Memory?

2

There are 2 answers

0
Radim Vansa On

Do you expect that you just spawn the thread and it get executed somewhere and the middleware miraculously balances the load across nodes, moving threads from one node to another? I think you won't find this directly. The tagged frameworks don't have transparent shared memory either, for good reasons.

When using multiple nodes, you usually need them for processing power, and hiding everything and pretending you're on single machine will tend to unnecessary communication, slowing stuff down.

Instead, you can always design your app using the distribution API provided by those frameworks. For example in Infinispan, look for the Map-Reduce or Distributed Executors API.

0
Ravindra babu On

I need to have a distributed environment where not only memory is shared but processing is also shared, that means ALL Shared Processors work as one Big Processor Computing The code I wrote.

You are not benefiting with processing on single machine. Application will scale if the processing is spread across multiple machines. If you want to see benefits of one Big Processor Computing, you can virtualize big physical machine into multiple virtual nodes (using technologies like VMWare).

But distributed processing across multiple VM nodes across multiple physical machines in a big cluster is best for distributed applications. Hadoop/Spark is best fit for these type of applications depending on batch processing (Hadoop) or real time processing needs (Spark).