Currently I am using psycopg2 to bulk update a column on a postgres table from a pandas dataframe. While testing local postgres instance on docker takes less than one minute to update 100k rows but the same script on prod takes more than 60 minutes to update. For the same prod DB bulk INSERT for the same 100k rows takes less than a minute. What parameters could I modify on my prod DB to make the bulk UPDATEs as fast as on local? Table is indexed on ID which is used on the WHERE condition.
def write_to_db(data: pd.DataFrame, ID):
conn = psycopg2.connect(host=HOST, port=PORT, database=DBNAME, user=USER, password=PASSWORD)
try:
with conn.cursor() as cur:
query="""
UPDATE table SET column_1=%s WHERE id=%s AND name=%s
"""
argslist = [(row['column_1'], Dd, index) for index, row in data.iterrows()]
execute_batch(cur, query, argslist)
conn.commit()
logger.info("successfully updated to the DB")
except Exception as e:
logger.error("error in writing to the DB: ", e)
conn.rollback()
finally:
conn.close()
So far I have increased the size of the DB on AWS RDS from t3.micro to t3.xlarge which increases the RAM from 1 GB to 8 GB but it did not make any difference.