I'm working with a batch Spark pipeline written in Scala (v2.4). I would like to save a dataframe into a Postgresql database. However, instead of saving all rows into a single table in the database, I want to save them to multiple tables based on the value of a column.
Suppose the dataframe has a column named country, I want to write records into the respective country table, e.g.
df.show()
+-------+----+
|country|val1|
+-------+----+
| CN | 1.0|
| US | 2.5|
| CN | 3.0|
+-------+----+
Then I would like to save records ( (CN,1.0) and (CN,3.0)) into table app_CN and record (US,2.5) into table app_US. Assume that the tables already exist.
Can I use dataframe API to achieve this? Or should I repartition into RDD and provides a JDBC-like object into executors and manually saved them?