I got a job that needs to perform the following task
- Relationalize the data
- Rename the field names that contains the '.'s so that it can be imported into PostgreSQL as normal looking field name.
Here is the code
import sys
from awsglue.transforms import *
from awsglue.utils import getResolvedOptions
from pyspark.context import SparkContext
from awsglue.context import GlueContext
from awsglue.job import Job
## @params: [JOB_NAME]
args = getResolvedOptions(sys.argv, ['JOB_NAME'])
sc = SparkContext()
glueContext = GlueContext(sc)
spark = glueContext.spark_session
job = Job(glueContext)
job.init(args['JOB_NAME'], args)
datasource0 = glueContext.create_dynamic_frame.from_catalog(database = "gluecatalog", table_name = "fcorders", transformation_ctx = "datasource0")
rootTableName = 'orders'
dfc = Relationalize.apply(frame = datasource0, staging_path = "s3://my-bucket/temp/", name = rootTableName, transformation_ctx = "dfc")
dfc.keys()
for df_name in dfc.keys():
m_df = dfc.select(df_name)
print "Writing to Postgre table: ", df_name
if (df_name <> rootTableName):
renamefields4 = m_df.rename_field("SalesDeliveryLines.val.shipped.unitDisplayCode", "shipped_unitDisplayCode")
else:
renamefields4 = RenameField.apply(frame = m_df, old_name = "vehicle.sourceReccordUID", new_name = "vehicle_sourceReccordUID", transformation_ctx = "renamefields4")
renamefields4.printSchema()
The printSchema() displays the schema as unchanged. If I write to the database, the field names still contain '.'s.
If I uses ApplyMapping.apply() to change the field name before relationalize, it makes the child table disappear. If I use ApplyMapping.apply() after relationalize it simply deletes all the fields whose name contains '.'.
The bottom line is I cannot relationalize and rename field in the same job no matter what I try.
Did I miss something or is this a AWS Glue bug?
It is confirmed the malfunction of
rename_field()
andRenameField.apply()
is a Glue bug.The work-around I have so far is to convert DynamicFrame to DataFrame -> rename the fields DataFrame -> Convert it back to DynamicFrame.
Here is the code