Is there anyone can explain the major differences between HDFS and Grid Computing ?
What is the difference between Grid computing and HDFS(Hadoop Distributed File System)?
6.8k views Asked by enes.acikoglu AtThere are 3 answers
I think you have to replace HDFS with Hadoop in your question.
Hadoop is a framework that allows for distributed processing of large data sets across clusters of commodity computers using a simple programming model - Map Reduce framework based on YARN (Yet Another Resource Negotiator).
HDFS is a file system designed for storing very large files with streaming data access patterns, running clusters on commodity hardware.
Grid Computing approach is based on distributing the work across a cluster of machines, which access a shared file system, hosted by a storage area network (SAN). This works well for predominantly compute-intensive jobs, but it becomes a problem when nodes need to access larger data volumes.
HDFS is just a file system. Since you are comparing processing of data, you have to compare Grid Computing with Hadoop Map Reduce (YARN) instead of HDFS.
Hadoop tries to co-locate the data with the compute nodes, so data access is fast because it is local. This feature, known as data locality, is at the heart of data processing in Hadoop and is the reason for its good performance.
You can refer to Hadoop, The Definitive guide (4th edition) to understand the concepts better.
The Hadoop Distributed File System (HDFS) is a distributed file system designed to run on commodity hardware. It has many similarities with existing distributed file systems. However, the differences from other distributed file systems are significant. HDFS is highly fault-tolerant and is designed to be deployed on low-cost hardware. HDFS provides high throughput access to application data and is suitable for applications that have large data sets. HDFS relaxes a few POSIX requirements to enable streaming access to file system data.
but....
Grid computing is the collection of computer resources from multiple locations to reach a common goal. The grid can be thought of as a distributed system with non-interactive workloads that involve a large number of files. Grid computing is distinguished from conventional high performance computing systems such as cluster computing in that grid computers have each node set to perform a different task/application. Grid computers also tend to be more heterogeneous and geographically dispersed (thus not physically coupled) than cluster computers. Although a single grid can be dedicated to a particular application, commonly a grid is used for a variety of purposes. Grids are often constructed with general-purpose grid middleware software libraries.
i think hdfs is not relevant to grid computing. or perhaps it is used in super virtual computers in a grid