Bulk load data issue in python

121 views Asked by At

I want to bulk load the data into snowflake warehouse using pandas. Please find whole requirement below:

  1. I have source data in snowflake table. I am reading the same in a dataframe.
  2. After loading the data in dataframe, I have made changes in data using some pandas functions.
  3. After these changes i need to load the data in snowflake again.

file size : 200k records

Things i have tried:

  1. first created the for loop which was creating the insert statement at go and also loading the same. This script ran for ~4 hours and loaded ~9k records(so this is not a viable option).
  2. Then i created the whole insert query earlier before executing it on database. This approach is also failing and taking a lot of time(same as the above one).
  3. I tried parallel processing and also created batch for data. Still no luck.
  4. Later i tried copy into approach and it is working.

But i do not want to use COPY into as it is only snowflake specific.

Please help me with bulk loading of data using python.

1

There are 1 answers

1
VirbickasGytautas On

Try using the snowflake-connector-python library

import snowflake.connector

# Create a connection
conn = snowflake.connector.connect(
    user='<your_username>',
    password='<your_password>',
    account='<your_account_url>',
    warehouse='<your_warehouse>',
    database='<your_database>',
    schema='<your_schema>'
)

Snowflake provides the copy_into method to efficiently bulk load data. You can use it as follows:

import snowflake.connector.pandas_tools as sfpd

# Assuming you have a DataFrame called 'df' with your data
sfpd.write_pandas(conn, df, '<snowflake_table_name>')

This approach uses the built-in capabilities of Snowflake Connector for bulk loading, which should be much faster than inserting rows one at a time.