I am using jaydebeapi (Mac OS X) to query a Netezza database and performing some quick/dirty timing:
t0 = time.time()
curs.execute('''select * from table1;''')
print time.time() - t0
I created the table myself and it contains 650,000 rows and 9 columns (integers and dates).
When I run the above command it takes about 1.3 minutes to complete (averaged over 10 runs).
Then, when I try to fetch the data:
t0 = time.time()
curs.execute('''select * from table1;''')
row = curs.fetchone()
while row is not None:
row = curs.fetchone()
print time.time() - t0
It takes about 10 minutes to complete (averaged over 10 runs).
Now, when I run the same SQL query using WinSQL (Windows 7, ODBC), it takes about 3 minutes to return the data. I can't seem to figure out why it is taking so much longer in Python and am not sure how or where to start looking.
You might want to use curs.fetchmany() instead of fetchone. That will optimize, somewhat, the back and forth to fetch the rows.
Something like this will even hide the fact that you are fetching many rows at a time:
However, I think that if a raw sql query tool takes 3 minutes to fetch the data, it is not entirely unreasonable to have your Python code take 3x as long.