Greenplum download dump to local cluster in parallel

Question

Greenplum download dump to local cluster in parallel

182 views Asked by VB_ At 21 December 2016 at 14:00

Is there any more effective way to fetch the whole Greenplum's dump than doing it through multiple JDBC connections to master node?

I need to download the whole dump of Greenplum through JDBC. To do the job quicker I am going to use Spark parallelism (fetching data in parallel through multiple JDBC connections). As I understand, I will have multiple JDBC connections to Greenplum's single master node. I am going to store the data at HDFS in parquet format.

Original Q&A

There are 2 answers

**Sung Yu-wei** · Answer 1 · 2016-12-21T15:40:49+00:00

Sung Yu-wei On 21 December 2016 at 15:40

For parallel exporting, you can try gphdfs writable external table. Gpdb segments can parallel write/read External sources.

http://gpdb.docs.pivotal.io/4340/admin_guide/load/topics/g-gphdfs.html

**Kong Yew Chan** · Answer 2 · 2017-10-13T20:27:54+00:00

Now, you can use Greenplum-Spark connector to parallelize data transfer between Greenplum segments and Spark executors.

This greenplum-spark connector speeds up data transfer as it leverage parallel processing in Greenplum segments and Spark workers. Definitely, it is faster than using JDBC connector that transfer data via Greenplum master node.

Reference: http://greenplum-spark.docs.pivotal.io/100/index.html

TechQA.

Greenplum download dump to local cluster in parallel

There are 2 answers

Related Questions in JDBC

Related Questions in PARALLEL-PROCESSING

Related Questions in DATA-WAREHOUSE

Related Questions in GREENPLUM

Popular Questions

Popular Tags

Trending Questions