HDFS-GPFS connector for using in Apache Spark

Question

HDFS-GPFS connector for using in Apache Spark

491 views Asked by dumitru At 20 October 2017 at 12:59

Is there a possibility to read data from IBM GPFS (Global Parallel Filesystem) in Apache Spark ?

My intention is to use something like this

sc.textFile("gfps://...")

instead of

sc.textFile("hdfs://...")

The environment that is intended to be used is the Hortonworks Data Platform. I've read some articles, deploying IBM Spectrum Scale File System that says you can configure on HDP, a connector to GPFS that will give you the ability to read/write to GPFS (maybe something the MAPR-FS has for it's file system). Have anyone done this ?

Thanks

Original Q&A

There are 1 answers

**user3294904** · Answer 1 · 2017-11-06T18:04:01+00:00

user3294904 On 06 November 2017 at 18:04

@dumitru You can use Sparkling.data library.

More details - http://datascience.ibm.com/blog/making-data-useful-with-the-sparkling-data-library-2/

TechQA.

HDFS-GPFS connector for using in Apache Spark

There are 1 answers

Related Questions in APACHE-SPARK

Related Questions in HDFS

Related Questions in HORTONWORKS-DATA-PLATFORM

Related Questions in BIGINSIGHTS

Popular Questions

Trending Questions