Is there a way to define “partitionColumn” in “option(”partitionColumn“,”colname“)” in Spark-JDBC if the column is of datatype: String?

Question

Is there a way to define “partitionColumn” in “option(”partitionColumn“,”colname“)” in Spark-JDBC if the column is of datatype: String?

1.9k views Asked by Torque At 27 September 2018 at 05:45

I am trying to load data from RDBMS to a hive table on HDFS. I am reading the RDBMS table in the below way:

val mydata = spark.read
  .format("jdbc")
  .option("url", connection)
  .option("dbtable", "select * from dev.userlocations")
  .option("user", usrname)
  .option("password", pwd)
  .option("numPartitions",20)
  .load()

I see in the executor logs that the option("numPartitions",20) is not given properly and the entire data in dumped into a single executor.

Now there are options to provide the partition column, lower bound & upper bound as below:

val mydata = spark.read
  .format("jdbc")
  .option("url", connection)
  .option("dbtable", "select * from dev.userlocations")
  .option("user", usrname)
  .option("password", pwd)
  .option("partitionColumn","columnName")
  .option("lowerbound","x")
  .option("upperbound","y")
  .option("numPartitions",20).load()

The above one only works if I have the partition column is of numeric datatype. In the table I am reading, it is partitioned based on a column location. It is of size 5gb overall & there are 20 different partitions in the table based on. I have 20 distinct locations in the table. Is there anyway I can read the table in partitions based on the partition column of the table: location ?

Could anyone let me know if it can be implemented at all ?

Original Q&A

There are 1 answers

**sgungormus** · Answer 1 · 2018-09-27T06:53:19+00:00

You can use the predicates option for this. It takes an array of string and each item in the array is a condition for partitioning the source table. Total number of partitions determined by those conditions.

val preds = Array[String]("location = 'LOC1'", "location = 'LOC2' || location = 'LOC3'")

val df = spark.read.jdbc(
  url = databaseUrl,
  table = tableName,
  predicates = preds,
  connectionProperties = properties
)

TechQA.

Is there a way to define “partitionColumn” in “option(”partitionColumn“,”colname“)” in Spark-JDBC if the column is of datatype: String?

There are 1 answers

Related Questions in APACHE-SPARK

Related Questions in APACHE-SPARK-SQL

Related Questions in RDBMS

Related Questions in SPARK-JDBC

Popular Questions

Popular Tags

Trending Questions