Trino Python SDK - Server Side Cursor

108 views Asked by At

I am looking for a way to use a server side cursor with the Trino Python SDK. I have a query that returns a large dataset and I want to keep memory usage in check.

If I use SQLAlchemy and Pandas, I am able to process it as a stream, without any issue like so:

import pandas as pd
from trino.sqlalchemy import URL
from sqlalchemy import create_engine

engine = create_engine(
    URL(
        user=...)
    )
conn = engine.connect().execution_options(stream_results=True)

query = "SELECT * FROM "big"."dataset".table"
for chunk_dataframe in pd.read_sql(query, conn, chunksize=1000000):
    <process the chunks>

But I cannot find a way to do the same with the pure SDK. If I do:

from trino.dbapi import connect

conn = connect(
host=...)
cursor = conn.cursor()
cursor.execute("SELECT * FROM "big"."dataset".table")
cursor.arraysize = 1000000
while True:
    rows = cursor.fetchmany()
    if not rows:
        break

It seems the entire dataset is pulled before I reach the fetchmany.

How can I keep a low memory profile without using SQLAlchemy?

0

There are 0 answers