Is PygreSQL available on AWS Glue Spark Jobs?

1.4k views Asked by At

I tried using PygreSQL modules

import pg
import pgdb

but it says the modules were not found when running on AWS Glue Spark.

Their Developer Guide, https://docs.aws.amazon.com/glue/latest/dg/glue-dg.pdf, says it's available for Python Shell though.

Can anyone else confirm this? Is there a page I can refer to for what libraries that come by default for the Python environment? Is there an alternative to a PostgreSQL library for running on Spark Glue jobs? I know it is possible to use an external library by importing into S3 and adding the path in the configurations but I would like to avoid as many manual steps as possible.

1

There are 1 answers

0
Prabhakar Reddy On BEST ANSWER

The document that you have shared is talking about libraries only intended for python shell jobs. If you want this library in a Glue spark job then you need to package it then upload to s3 and import it in your Glue job.

There are alternatives like pg8000 which can also be used as external python library.This and this talks more about on how you can package it which can also be used with pygresql library.

Also this has more information on how you can connect to on-prem postgresql databases.