AWS Glue MySQL target insert slow performance

106 views Asked by Kevin Lelievre At 16 December 2023 at 17:25

I'm using AWS Glue as ETL tool for building my data warehouse. In the example below I'm moving data from one MySQL database to another.

When I see on the target DB (SHOW PROCESSLIST), I notice that there is an insert statement for each record individually. How can I achieve some kind of bulk insert to speed up performance?

##Step 1: truncate the target staging table
source_jdbc_conf = glueContext.extract_jdbc_conf(glue_catalog_connection) 
from py4j.java_gateway import java_import
java_import(sc._gateway.jvm,"java.sql.Connection")
java_import(sc._gateway.jvm,"java.sql.DatabaseMetaData")
java_import(sc._gateway.jvm,"java.sql.DriverManager")
java_import(sc._gateway.jvm,"java.sql.SQLException")
conn = sc._gateway.jvm.DriverManager.getConnection(source_jdbc_conf.get('url'), source_jdbc_conf.get('user'), source_jdbc_conf.get('password'))
cstmt = conn.prepareCall(qry_truncate);
results = cstmt.execute();

conn.close()

##Step 2: data flow
SRC_view_LZ = (
    glueContext.create_dynamic_frame.from_catalog(
        database=glue_catalog_name,
        table_name=glue_catalog_lz_viewname,
        transformation_ctx="SRC_view_LZ",
    )
)

DC_InsertDate = SRC_view_LZ.gs_now(
    colName="etl_insertdate"
)
 
DST_STG  = (
    glueContext.write_dynamic_frame.from_jdbc_conf(
    frame = DC_InsertDate, 
    catalog_connection = glue_catalog_connection, 
    connection_options = {
            "dbtable": stg_tablename,
            "database": stg_database,
            "batchsize": "10000",
            "bulkSize": "1000"
        },      
    transformation_ctx = "DST_STG"
    ) 
)

Tried using the data catalog as target, this gives more or less same performance.

Original Q&A

TechQA.

AWS Glue MySQL target insert slow performance

There are 0 answers

Related Questions in MYSQL

Related Questions in AWS-GLUE

Related Questions in AWS-GLUE-CONNECTION

Popular Questions

Popular Tags

Trending Questions