PYODBC upsert to SQL poor performance

18 views Asked by At

The following python code works as intended but is incredibly slow, inserting max 50 rows a second and quickly tapers off to less than 10 rows a second.

def test_single_date():
    date = [dt.date(2023, 1, 1)]
    yield date

def get_data(url):
    api_response = requests.get(url)
    api_data = api_response.json()
    return api_data

def process_dates():
    # Step 1: Get the list of dates and create list for values to send to SQL
    dates_to_process = test_single_date()
    upsert_list = []

    # Step 2: Generate URLs and retrieve data
    for date_list in dates_to_process:
        for single_date in date_list:

            # Step 2a: Generate URL
            base_hist_url = 'https://api_url/history/currency/{year}/{month}/{day}'
            hist_url = base_hist_url.format(year=single_date.year, month=single_date.month, day=single_date.day)

            # Step 2b: Retrieve data
            hist_api_data = get_data(hist_url)

            # Step 3: Upsert data
            for k, v in hist_api_data['conversion_rates'].items():
                upsert_list.append(['currency_code', k, v, single_date])

                # Step 3a: Execute SQL upsert statement
                upsert_sql(upsert_list)
    
    # Step 4: Get the latest conversion rates
    latest_url = 'https://api_url/latest/currency'
    lastest_api_data = get_data(latest_url)
    for k, v in lastest_api_data['conversion_rates'].items():
        upsert_list.append(['currency_code', k, v, dt.datetime.now(dt.UTC)])

        # Step 4a: Execute SQL statement
        upsert_sql(upsert_list)

def upsert_sql(upsert_list):
    sql = """
    MERGE INTO SQL_TABLE AS Target
    USING (
        VALUES {}
    ) AS Source (currency_from, currency_to, factor, timestamp)
    ON Target.currency_from = Source.currency_from 
    AND Target.currency_to = Source.currency_to
    AND CAST(Target.timestamp as date) = CAST(Source.timestamp as date)
    WHEN NOT MATCHED THEN
        INSERT (currency_from, currency_to, factor, timestamp) VALUES (Source.currency_from, Source.currency_to, Source.factor, Source.timestamp);
    """.format(','.join(['(?,?,?,?)' for _ in range(len(upsert_list))]))
    params = [item for sublist in upsert_list for item in sublist]

    try:
        crsr.execute(sql, params)
        crsr.commit()
    except Exception as e:
        crsr.rollback()
        print(e)
        print('Transaction rollback')

process_dates()
crsr.close()
cnxn.close()

By itself, the MERGE part is near instant with the small amount data I am working with. However, due to the extra steps add for additional functionality, the whole process is much slower. The goal is to generate a URL for every date that is not present in the SQL_TABLE, if any dates are missing (between 1/1/2023 and today), then send those requests to the API provider and MERGE with SQL_TABLE. After checking for missing data, the function should grab the most recent data and MERGE that with SQL_TABLE. I don't know much about PYODBC, or python in general for that matter, so any help with coding is very much appreciated.

Also, I have never been able to get pyodbc.executemany() (I tried to incorporate fastexecute) work with mssql MERGE, nor have I been able to get the MERGE to work with just (?,?,?,?) in the values section (i.e. fully parameterized), I always have to use format(). Any advice there would be great too.

0

There are 0 answers