How to install jars related to spark-redis in databricks cluster?

767 views Asked by At

I am trying to connect to Azure cache for redis from databricks .

I have installed this package com.redislabs:spark-redis:2.3.0 from maven package in databricks. I have created a spark session with below code

SparkSession\
.builder\
.appName("myApp")\ 
.config("spark.redis.host", "my host")\ 
.config("spark.redis.port", "6379")\
.config("spark.redis.auth", "passwd")\ 
.getOrCreate()

But when I ran df.write.format("org.apache.spark.sql.redis").option("table", "people").option("key.column", "name").save()

I am getting below error.

*Py4JJavaError: An error occurred while calling o390.save.
: java.lang.ClassNotFoundException: 
Failed to find data source: org.apache.spark.sql.redis. Please find packages at
http://spark.apache.org/third-party-projects.html*

Could you please let me know the detailed steps to install all necessary libraries/jars to access redis in databricks.

I have seen below code in spark-redis python doc but I don't know how to run it in databricks.

$ ./bin/pyspark --jars <path-to>/spark-redis-<version>-jar-with-dependencies.jar

And also please let me know what is the latest spark-redis version.

1

There are 1 answers

3
CHEEKATLAPRADEEP On

Redis has a Spark Package that you can download and attach to your cluster

The following notebook shows how to use Redis with Apache Spark in Azure Databricks.

For more details, refer to Azure Databricks - Redis.