Unable to use BLANKSASNULL Data conversion parameter in write_dynamic_frame.from_catalog while moving data to Redshift table

664 views Asked by At

Previously, To move data to Redshift table we used "Copy" Command which has the functionality of Data Conversion parameters like BLANKSASNULL and EMPTYASNULL.

As our data contains both "Empty Strings" and "Null" values, We used to convert both to Null while moving to Redshift Table. as shown below. Example code :

COPY Database.Table 
FROM 's3:/folder/file.csv' 
IAM_ROLE 'arn:aws:iam::0000000:role/RedshiftCopyUnload'
DELIMITER ',' ESCAPE
REMOVEQUOTES
ACCEPTINVCHARS
EMPTYASNULL
BLANKSASNULL
NULL AS 'NULL'
DATEFORMAT 'auto';

Now, We had to use write_dynamic_frame.from_jdbc_conf method, we are trying to replicate the same(copy command data conversion parameters like **BLANKSASNULL and EMPTYASNULL), But we are unable to find the exact reference.

# Save data to Redshift
redshift_save_options = {
    "dbtable": "Database." + TableName,
    "database": "Schema"
}

from awsglue.dynamicframe import DynamicFrame
x = DynamicFrame.fromDF(input_data, glueContext, "dfx")

glueContext.write_dynamic_frame.from_jdbc_conf(
    frame = x, 
    catalog_connection = "Regshift-glue-connection", 
    connection_options = redshift_save_options, 
    redshift_tmp_dir = "s3:/project/RedshiftTempDirectory/")

Can someone help me in solving this.

Any suggestion is appreciated . Thankyou

1

There are 1 answers

0
Andrew Nguonly On

To replicate the functionality of BLANKSASNULL and EMPTYASNULL, replace blank and empty columns in the DataFrame (i.e. input_data) prior to converting it to a DynamicFrame.

Example:

from pyspark.sql.functions import col, when

# replace empty strings values
# calling strip() handles "blank" strings (i.e. handles new line characters, etc)
input_data = input_data.select(
    [
        when(col(c).strip() == "", None).otherwise(col(c)).alias(c) for c in input_data.columns
    ]
)

x = DynamicFrame.fromDF(input_data, glueContext, "dfx")

References:

  1. PySpark Replace Empty Value With None/null on DataFrame