Linked Questions

Popular Questions

Spark-redshift library Authentication? Issue

Asked by At

I am trying to write data from my Pyspark application to a Redshift cluster, after encountering dozens of weird exceptions i think i came down to the correct JAR composition which i am using: Environment: - Spark 2.2.1 - Scala 2.11 - Python 2.7

JARs - mysql-connector-java-8.0.13.jar - RedshiftJDBC42-1.2.10.1009.jar - spark-redshift_2.11-3.0.0-preview1.jar - aws-java-sdk-1.7.4.jar - hadoop-aws-2.7.3.jar - spark-avro_2.11-4.0.0.jar

df.write.format("com.databricks.spark.redshift") 
    .option("url", url) 
    .option("dbtable", '{}'.format(table_name)) 
    .option("tempdir", tempdir)
    .mode('{}'.format(mode))
    .save()

where tempdir= "s3a://tempdir/" so i am using the s3a FS

This leads to

pyspark.sql.utils.IllegalArgumentException: u"requirement failed: You must specify a method for authenticating Redshift's connection to S3 (aws_iam_role, forward_spark_s3_credentials, or temporary_aws_*. For a discussion of the differences between these options, please see the README."

However, both first 2 methods lead to a mix of various exceptions around buckets such as:

  • 19/02/11 21:05:25 WARN Utils$: An error occurred while trying to determine the S3 bucket's region com.amazonaws.services.s3.model.AmazonS3Exception: Status Code: 403, AWS Service: Amazon S3
  • 19/02/11 21:05:25 WARN Utils$: An error occurred while trying to read the S3 bucket lifecycle configuration com.amazonaws.services.s3.model.AmazonS3Exception: Status Code: 403, AWS Service: Amazon S3

This issue drives me crazy, please some help!

Related Questions