I'm populating a PostgreSQL table with ~11.000.000 rows that have been selected before from another database. I'm using Python and psycopg2. The whole process takes an estimated 1.5 hours to complete. However, after ~30 minutes I get "connection closed unexpectedly" exception. The source code looks like this:
incursor = indb.cursor()
incursor.execute("SELECT ...")
indb.commit() # (1) close transaction
outcursor = outdb.cursor()
rows = 0
for (col1, col2, col3) in incursor: # incursor contains ~11.000.000 rows
outcursor.execute("INSERT ...", (col1, col2, col3)) # This fails after ~30 minutes
row += 1
if row % 100 == 0: # (2) Write data every 100 rows
outcursor.close()
outdb.commit()
outcursor = outdb.cursor()
incursor.close()
outcursor.close()
outdb.commit()
I inserted (1)
and (2)
after the first tries that failed, assuming that an open transaction has an upper time limit of ~30 minutes or that a cursor has an upper limit of pending inserts. It seems that none of this assumptions are true and the error lies somewhere else.
Both databases are stored on a VirtualBox machine that I connect via port forwarding from the host. I run the program on the host machine.
Both database are just for testing purposes and they have no other connections to manage. Maybe I have to rewrite the problem to get around this, but I need very time-consuming inserts elsewhere (running approx. for days) so I'm very concerned about some hidden time limits in psycopg2
or PostgreSQL.
I don't know of any such "hidden" timeout in postgresql itself. PostgreSQL does have
statement_timeout
, but if you hit that you should get aERROR: canceling statement due to statement timeout
in the server log (and it will log the cancelled statement too). I can't speak for psycopg2. Definitely check the server log for anything that looks relevant.Maybe it's a networking issue? A long-running statement will be a TCP connection that stays idle for a long time. Perhaps your port forwarding purges connections that are idle for more than 30 minutes? Maybe your TCP connections aren't using keepalive. Postgresql has some settings for tuning TCP keepalive (tcp_keepalives_interval etc.) and you may also need to do some kernel/networking configuration to make sure they are actually enabled.
e.g. I just tried connecting to my own machine here and
tcp_keepalives_interval
defaults to 7200, which is 2 hours. If your port forwarding cuts off after 30 minutes, this default won't do. You can override the setting used in the client connection string (assuming you can twiddle the conninfo string directly), or set the GUC variable in user/database properties or postgresql.conf.See: