HDFS-GPFS connector for using in Apache Spark

491 views Asked by At

Is there a possibility to read data from IBM GPFS (Global Parallel Filesystem) in Apache Spark ?

My intention is to use something like this

sc.textFile("gfps://...")

instead of

sc.textFile("hdfs://...")

The environment that is intended to be used is the Hortonworks Data Platform. I've read some articles, deploying IBM Spectrum Scale File System that says you can configure on HDP, a connector to GPFS that will give you the ability to read/write to GPFS (maybe something the MAPR-FS has for it's file system). Have anyone done this ?

Thanks

1

There are 1 answers

0
user3294904 On

@dumitru You can use Sparkling.data library.

More details - http://datascience.ibm.com/blog/making-data-useful-with-the-sparkling-data-library-2/