spark jdbc read tuning where table without primary key

Question

spark jdbc read tuning where table without primary key

601 views Asked by Ramakrishna At 21 September 2018 at 14:20

I am reading 30M records from oracle table with no primary key columns. spark jdbc reading hangs and not fetching any data. where i can get the result from Oracle SQLDeveloper within few seconds for same query.

oracleDf = hiveContext.read().format("jdbc").option("url", url)
                        .option("dbtable", queryToExecute)
                        .option("numPartitions ","5")
                        .option("fetchSize","1000000")
                        .option("user", use).option("password", pwd).option("driver", driver).load().repartition(5);

i cannot use partition columns as i do not have primary key column. can anyone advice to improve performance.

Thanks

Original Q&A

There are 1 answers

**Sai** · Answer 1 · 2018-09-21T15:25:39+00:00

There are many a things that can be used to optimize your DF creation. You might want to drop repartition and also use predicates to parallelize data retrieval process for Spark actions.

If the filter is not based on primary key or an indexed column, exploring ROWID is a possibility.

TechQA.

spark jdbc read tuning where table without primary key

There are 1 answers

Related Questions in APACHE-SPARK-SQL

Related Questions in SPARK-JDBC

Popular Questions

Popular Tags

Trending Questions