I am looking for a way to use a server side cursor with the Trino Python SDK. I have a query that returns a large dataset and I want to keep memory usage in check.
If I use SQLAlchemy and Pandas, I am able to process it as a stream, without any issue like so:
import pandas as pd
from trino.sqlalchemy import URL
from sqlalchemy import create_engine
engine = create_engine(
URL(
user=...)
)
conn = engine.connect().execution_options(stream_results=True)
query = "SELECT * FROM "big"."dataset".table"
for chunk_dataframe in pd.read_sql(query, conn, chunksize=1000000):
<process the chunks>
But I cannot find a way to do the same with the pure SDK. If I do:
from trino.dbapi import connect
conn = connect(
host=...)
cursor = conn.cursor()
cursor.execute("SELECT * FROM "big"."dataset".table")
cursor.arraysize = 1000000
while True:
rows = cursor.fetchmany()
if not rows:
break
It seems the entire dataset is pulled before I reach the fetchmany.
How can I keep a low memory profile without using SQLAlchemy?