spark jdbc read tuning where table without primary key

606 views Asked by At

I am reading 30M records from oracle table with no primary key columns. spark jdbc reading hangs and not fetching any data. where i can get the result from Oracle SQLDeveloper within few seconds for same query.

oracleDf = hiveContext.read().format("jdbc").option("url", url)
                        .option("dbtable", queryToExecute)
                        .option("numPartitions ","5")
                        .option("fetchSize","1000000")
                        .option("user", use).option("password", pwd).option("driver", driver).load().repartition(5);

i cannot use partition columns as i do not have primary key column. can anyone advice to improve performance.

Thanks

1

There are 1 answers

0
Sai On

There are many a things that can be used to optimize your DF creation. You might want to drop repartition and also use predicates to parallelize data retrieval process for Spark actions.

If the filter is not based on primary key or an indexed column, exploring ROWID is a possibility.