EMR PySpark write to Redshift: java.sql.SQLException: [Amazon](500310) Invalid operation: The session is read-only

Question

EMR PySpark write to Redshift: java.sql.SQLException: [Amazon](500310) Invalid operation: The session is read-only

805 views Asked by Jose Montoya At 25 May 2021 at 21:41

I got an error when trying to write data to Redshift using PySpark on an EMR cluster.

df.write.format("jdbc") \
   .option("url", "jdbc:redshift://clustername.yyyyy.us-east-1.redshift.amazonaws.com:5439/db") \
   .option("driver", "com.amazon.redshift.jdbc42.Driver") \
   .option("dbtable", "public.table") \
   .option("user", user_redshift) \
   .option("password", password_redshift) \
   .mode("overwrite") \
   .save()

The error I have got is:

py4j.protocol.Py4JJavaError: An error occurred while calling o143.save.
: org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 0.0 failed 4 times, most recent failure: Lost task 0.3 in stage 0.0 (TID 6, , executor 1): 
java.sql.SQLException: [Amazon](500310) Invalid operation: The session is read-only;
    at com.amazon.redshift.client.messages.inbound.ErrorResponse.toErrorException(Unknown Source)
    at com.amazon.redshift.client.PGMessagingContext.handleErrorResponse(Unknown Source)
    at com.amazon.redshift.client.PGMessagingContext.handleMessage(Unknown Source)
    at com.amazon.jdbc.communications.InboundMessagesPipeline.getNextMessageOfClass(Unknown Source)
    at com.amazon.redshift.client.PGMessagingContext.doMoveToNextClass(Unknown Source)
    at com.amazon.redshift.client.PGMessagingContext.getParameterDescription(Unknown Source)
    at com.amazon.redshift.client.PGClient.prepareStatement(Unknown Source)
    at com.amazon.redshift.dataengine.PGQueryExecutor.<init>(Unknown Source)
    at com.amazon.redshift.dataengine.PGDataEngine.prepare(Unknown Source)
    at com.amazon.jdbc.common.SPreparedStatement.<init>(Unknown Source)
    ...

I appreciate any help. Thanks!

Original Q&A

There are 1 answers

**Alokika Dash** · Answer 1 · 2021-08-27T22:15:24+00:00

We also faced the same issue for our EMR pySpark cluster. EMR with "ReleaseLabel": "emr-5.33.0" and Spark version 2.4.7

We resolved it with the following changes

Used the redshift Jar: redshift-jdbc42-2.0.0.7.jar from https://docs.aws.amazon.com/redshift/latest/mgmt/jdbc20-previous-driver-version-20.html

Changed the JDBC URL to the following:

jdbc:redshift://clustername.yyyyy.us-east-1.redshift.amazonaws.com:5439/db?user=username&password=password;ReadOnly=false

You can then try to run your spark-submit with the following: spark-submit --jars s3://jars/redshift-jdbc42-2.0.0.7.jar s3://scripts/scriptname.py where scriptname.py has

df.write\
    .format('jdbc')\
    .option("driver", "com.amazon.redshift.jdbc42.Driver")\
    .option("url", jdbcUrl)\
    .option("dbtable", "schema.table")\
    .option("aws_iam_role", "XXXX") \
    .option("tempdir", f"s3://XXXXXX") \
    .mode('append')\
    .save()

TechQA.

EMR PySpark write to Redshift: java.sql.SQLException: [Amazon](500310) Invalid operation: The session is read-only

There are 1 answers

Related Questions in APACHE-SPARK

Related Questions in PYSPARK

Related Questions in AMAZON-REDSHIFT

Related Questions in AMAZON-EMR

Related Questions in SPARK-REDSHIFT

Popular Questions

Popular Tags

Trending Questions