How to create PySpark RDD from any database table?

657 views Asked by At

As i am new to Spark community, can anyone explain how to create PySpark RDD from database table. I can create PySpark RDD from CSV file using textFile() method of SparkContext method. But I don't know creating PySpark RDD from database table.

1

There are 1 answers

2
Bala On BEST ANSWER

Using pyspark,

df = sqlContext.read.table("your_database.your_hive_table")

df (dataframe) will now have your rows that you can play with using Spark API. For e.g.

df.select("*").show()  //equivalent to select * from your_hive_table

>>> df = sqlContext.read.table("students")
>>> df.select("*").show()
+----+---------+---+
|   a|        b|  c|
+----+---------+---+
| Jon|  English| 80|
| Amy|Geography| 70|
|Matt|  English| 90|
| Jon|     Math|100|
| Jon|  History| 60|
| Amy|   French| 90|
+----+---------+---+