i have a bunch of existing pyspark scripts that I want to execute using AWS Glue. The scripts use APIs like SparkSession.read and various transformation in pyspark DataFrames.
I wasn't able to find docs outlining how to convert such a script. Do you have a hint / examples where I could find more infos? Thanks :)
Pyspark script should run as is on AWS Glue since Glue is basically Spark with some custom AWS library added. For start, I would just paste it into Glue and try to run it.
If you need some functionality of Glue like dynamic frames or bookmarks, then you will need to modify the scripts to get GlueContext and work with that. The basic initialization is:
From here onwards, you can use
glueContextfor Glue features orspark_sessionfor plain Spark functionality.I would however avoid using Glue-specific stuff just for the sake of it, because: