AWS Glue MySQL target insert slow performance

75 views Asked by At

I'm using AWS Glue as ETL tool for building my data warehouse. In the example below I'm moving data from one MySQL database to another.

When I see on the target DB (SHOW PROCESSLIST), I notice that there is an insert statement for each record individually. How can I achieve some kind of bulk insert to speed up performance?

##Step 1: truncate the target staging table
source_jdbc_conf = glueContext.extract_jdbc_conf(glue_catalog_connection) 
from py4j.java_gateway import java_import
java_import(sc._gateway.jvm,"java.sql.Connection")
java_import(sc._gateway.jvm,"java.sql.DatabaseMetaData")
java_import(sc._gateway.jvm,"java.sql.DriverManager")
java_import(sc._gateway.jvm,"java.sql.SQLException")
conn = sc._gateway.jvm.DriverManager.getConnection(source_jdbc_conf.get('url'), source_jdbc_conf.get('user'), source_jdbc_conf.get('password'))
cstmt = conn.prepareCall(qry_truncate);
results = cstmt.execute();

conn.close()

##Step 2: data flow
SRC_view_LZ = (
    glueContext.create_dynamic_frame.from_catalog(
        database=glue_catalog_name,
        table_name=glue_catalog_lz_viewname,
        transformation_ctx="SRC_view_LZ",
    )
)

DC_InsertDate = SRC_view_LZ.gs_now(
    colName="etl_insertdate"
)
 
DST_STG  = (
    glueContext.write_dynamic_frame.from_jdbc_conf(
    frame = DC_InsertDate, 
    catalog_connection = glue_catalog_connection, 
    connection_options = {
            "dbtable": stg_tablename,
            "database": stg_database,
            "batchsize": "10000",
            "bulkSize": "1000"
        },      
    transformation_ctx = "DST_STG"
    ) 
)

Tried using the data catalog as target, this gives more or less same performance.

0

There are 0 answers