Connecting Python/pandas to Redshift when SSL is required

4.2k views Asked by At

My company recently changed our Redshift cluster and now they require an SSL connection. In the past I've connected Python/pandas to Redshift via the method I've detailed here: http://measureallthethin.gs/blog/connect-python-and-pandas-to-redshift/

From the SQLAlchemy documentation, looks like all I need to do is add connect_args={'sslmode':'require'} to the create_engine() call, as this thread pointed out: How do I connect to Postgresql using SSL from SqlAchemy+pg8000?

However, I now get this error:

OperationalError: (psycopg2.OperationalError) sslmode value "require" invalid when SSL support is not compiled in

I use the Anaconda distribution for a number of packages, and found I needed to update my psycopg2 package per these instructions: https://groups.google.com/a/continuum.io/d/msg/conda/Fqv93VKQXAc/mHqfNK8xZWsJ

However, even after updating psycopg2 I'm still getting the same error and am at a loss at this point on how to further debug. I'd like to figure this out so I can get our Redshift data directly into pandas.

2

There are 2 answers

2
Gonçalo Peres On

AWS has developed an Amazon Redshift connector for Python (here is the GitHub repo) that helps in the process.

In order to install it on may install from the source

git clone https://github.com/aws/amazon-redshift-python-driver.git
cd redshift_connector
pip install .

Or from the binary using PyPi

pip install redshift_connector

Or Conda

conda install -c conda-forge redshift_connector

Here is an example

import redshift_connector

# Connects to Redshift cluster using AWS credentials
conn = redshift_connector.connect(
    host='examplecluster.abc123xyz789.us-west-1.redshift.amazonaws.com',
    database='dev',
    user='awsuser',
    password='my_password'
 )

cursor: redshift_connector.Cursor = conn.cursor()
cursor.execute("create Temp table book(bookname varchar,author varchar)")
cursor.executemany("insert into book (bookname, author) values (%s, %s)",
                    [
                        ('One Hundred Years of Solitude', 'Gabriel García Márquez'),
                        ('A Brief History of Time', 'Stephen Hawking')
                    ]
                  )
cursor.execute("select * from book")

result: tuple = cursor.fetchall()
print(result)
>> (['One Hundred Years of Solitude', 'Gabriel García Márquez'], ['A Brief History of Time', 'Stephen Hawking'])

Note that one of the Connection Parameters one can pass is SSL (If SSL is enabled). The default value is TRUE.

0
Tomas G. On

How about something like this ?

pip3 install pg8000


import pg8000
host =  "***.amazonaws.com"
port = 5439

db_name = "dev"
master_username = "dev"
master_user_password = 'TestPassword1'


connection = pg8000.connect(
    host=host,
    port=port,
    database=db_name,
    user=master_username,
    password=master_user_password,
)
connection.autocommit = True
import uuid
prefix = 'table_'
with connection.cursor() as cursor:
  table_name = f"{prefix}{str(uuid.uuid4())[:4]}"
  tbl_create_sql = f"CREATE TABLE {table_name} (a int, b int);"
  cursor.execute(tbl_create_sql)
  # Step 2: insert a number of rows
  ROWS_TO_INSERT = 5