Pyodbc connection closing before query finishes

338 views Asked by At

Setup: I'm running Dremio on a Centos 7.6 VM connected to an HDFS with a large csv (200Gb). Now I'm querying Dremio through pyodbc however if I use the method:

cnxn = pyodbc.connect("Driver={};ConnectionType=Direct;HOST={};PORT={};AuthenticationType=Plain;UID={};PWD={}".format(driver, host,port,uid,pwd),autocommit=True)
sql = "SELECT * FROM HDFS.data.\"factTable.txt\" WHERE C > 0.0 LIMIT 100"
cursor = cnxn.cursor()
cursor.execute(sql)

it fails giving the error 'Query cancelled by user 'mpowers'. Using

cnxn = pyodbc.connect("Driver={};ConnectionType=Direct;HOST={};PORT={};AuthenticationType=Plain;UID={};PWD={}".format(driver, host,port,uid,pwd),autocommit=True)
sql = "SELECT * FROM HDFS.data.\"factTable.txt\" WHERE C > 0.0 LIMIT 100"
dataframe = pandas.read_sql(sql,cnxn)

the query succeeds however (but I don't want to read my query I just want to time it). Looking through my server.out file comparing the two queries:

2020-12-07 19:34:26,749 [UserServer-1] INFO  com.dremio.ConnectionLog - [b61585f8-5b73-4806-97e8-28e77caa4a74] Connection opened.
    Endpoint: 127.0.0.1:42862
    Protocol Version: 5
    Record Type: n/a
    Record Formats: 
    Support Complex Types: false
    Name: Dremio C++ Client
    Version: 1.17.0 (1.17.0)
    Application: Unknown
    User Properties: 
        schema=
        support_complex_types=false
        userName=mpowers

2020-12-07 19:34:26,750 [UserServer-1] WARN  c.dremio.sabot.rpc.user.UserSession - Ignoring unknown property: SUPPORT_COMPLEX_TYPES
2020-12-07 19:34:26,750 [UserServer-1] WARN  c.dremio.sabot.rpc.user.UserSession - Ignoring unknown property: USERNAME
2020-12-07 19:34:26,754 [out-of-band-observer] INFO  query.logger - {"queryId":"20317c3c-d70c-b598-20cd-f038f7647900","queryText":"NA","start":1607369666752,"finish":1607369666753,"outcome":"COMPLETED","username":"mpowers"}
2020-12-07 19:34:26,761 [out-of-band-observer] INFO  query.logger - {"queryId":"20317c3d-4ef2-bd14-6949-24781167ea00","queryText":"NA","start":1607369666756,"finish":1607369666759,"outcome":"COMPLETED","username":"mpowers"}
2020-12-07 19:34:26,841 [FABRIC-rpc-event-queue] INFO  c.d.sabot.exec.FragmentExecutors - Received remote fragment start instruction for 20317c3d-29fa-dec0-730e-654b426bf300:1:0
2020-12-07 19:34:26,841 [FABRIC-rpc-event-queue] INFO  c.d.sabot.exec.FragmentExecutors - Received remote fragment start instruction for 20317c3d-29fa-dec0-730e-654b426bf300:1:1
2020-12-07 19:34:26,841 [FABRIC-rpc-event-queue] INFO  c.d.sabot.exec.FragmentExecutors - Received remote fragment start instruction for 20317c3d-29fa-dec0-730e-654b426bf300:1:2
2020-12-07 19:34:26,842 [FABRIC-rpc-event-queue] INFO  c.d.sabot.exec.FragmentExecutors - Received remote fragment start instruction for 20317c3d-29fa-dec0-730e-654b426bf300:1:3
2020-12-07 19:34:26,842 [FABRIC-rpc-event-queue] INFO  c.d.sabot.exec.FragmentExecutors - Received remote fragment start instruction for 20317c3d-29fa-dec0-730e-654b426bf300:1:4
2020-12-07 19:34:26,842 [FABRIC-rpc-event-queue] INFO  c.d.sabot.exec.FragmentExecutors - Received remote fragment start instruction for 20317c3d-29fa-dec0-730e-654b426bf300:1:5
2020-12-07 19:34:26,842 [FABRIC-rpc-event-queue] INFO  c.d.sabot.exec.FragmentExecutors - Received remote fragment start instruction for 20317c3d-29fa-dec0-730e-654b426bf300:0:0
2020-12-07 19:34:26,904 [e5 - 20317c3d-29fa-dec0-730e-654b426bf300:frag:1:4] INFO  c.d.s.e.f.FragmentStatusReporter - 20317c3d-29fa-dec0-730e-654b426bf300:1:4: State to report: FINISHED
2020-12-07 19:34:26,911 [e0 - 20317c3d-29fa-dec0-730e-654b426bf300:frag:0:0] INFO  c.d.s.e.f.FragmentStatusReporter - 20317c3d-29fa-dec0-730e-654b426bf300:0:0: State to report: FINISHED
2020-12-07 19:34:26,915 [e1 - 20317c3d-29fa-dec0-730e-654b426bf300:frag:1:2] INFO  c.d.s.e.f.FragmentStatusReporter - 20317c3d-29fa-dec0-730e-654b426bf300:1:2: State to report: FINISHED
2020-12-07 19:34:26,916 [e6 - 20317c3d-29fa-dec0-730e-654b426bf300:frag:1:1] INFO  c.d.s.e.f.FragmentStatusReporter - 20317c3d-29fa-dec0-730e-654b426bf300:1:1: State to report: FINISHED
2020-12-07 19:34:26,916 [e2 - 20317c3d-29fa-dec0-730e-654b426bf300:frag:1:5] INFO  c.d.s.e.f.FragmentStatusReporter - 20317c3d-29fa-dec0-730e-654b426bf300:1:5: State to report: FINISHED
2020-12-07 19:34:26,919 [e4 - 20317c3d-29fa-dec0-730e-654b426bf300:frag:1:3] INFO  c.d.s.e.f.FragmentStatusReporter - 20317c3d-29fa-dec0-730e-654b426bf300:1:3: State to report: CANCELLED
2020-12-07 19:34:26,923 [e3 - 20317c3d-29fa-dec0-730e-654b426bf300:frag:1:0] INFO  c.d.s.e.f.FragmentStatusReporter - 20317c3d-29fa-dec0-730e-654b426bf300:1:0: State to report: CANCELLED
2020-12-07 19:34:26,927 [out-of-band-observer] INFO  query.logger - {"queryId":"20317c3d-29fa-dec0-730e-654b426bf300","queryText":"SELECT * FROM  HDFS.data.\"factTable.txt\" WHERE C > 0.0 LIMIT 100","start":1607369666763,"finish":1607369666925,"outcome":"COMPLETED","username":"mpowers"}
2020-12-07 19:34:26,943 [UserServer-1] INFO  c.d.sabot.rpc.user.UserRPCServer - [USER]: Channel closed /127.0.0.1:31010 <--> /127.0.0.1:42862 (user client)
2020-12-07 19:34:26,943 [UserServer-1] INFO  com.dremio.ConnectionLog - [b61585f8-5b73-4806-97e8-28e77caa4a74] Connection Closed

vs. the method that failed

2020-12-07 19:35:37,698 [UserServer-1] INFO  com.dremio.ConnectionLog - [30c3f4d8-241f-4f3d-a9e6-de7aea93b752] Connection opened.
    Endpoint: 127.0.0.1:42904
    Protocol Version: 5
    Record Type: n/a
    Record Formats: 
    Support Complex Types: false
    Name: Dremio C++ Client
    Version: 1.17.0 (1.17.0)
    Application: Unknown
    User Properties: 
        schema=
        support_complex_types=false
        userName=mpowers

2020-12-07 19:35:37,698 [UserServer-1] WARN  c.dremio.sabot.rpc.user.UserSession - Ignoring unknown property: SUPPORT_COMPLEX_TYPES
2020-12-07 19:35:37,699 [UserServer-1] WARN  c.dremio.sabot.rpc.user.UserSession - Ignoring unknown property: USERNAME
2020-12-07 19:35:37,705 [out-of-band-observer] INFO  query.logger - {"queryId":"20317bf5-c9fe-5e85-6e81-9ce420af1c00","queryText":"NA","start":1607369737702,"finish":1607369737703,"outcome":"COMPLETED","username":"mpowers"}
2020-12-07 19:35:37,712 [out-of-band-observer] INFO  query.logger - {"queryId":"20317bf5-b19e-866c-5da6-2f580816e500","queryText":"NA","start":1607369737708,"finish":1607369737711,"outcome":"COMPLETED","username":"mpowers"}
2020-12-07 19:35:37,763 [FABRIC-rpc-event-queue] INFO  c.d.sabot.exec.FragmentExecutors - Received remote fragment start instruction for 20317bf6-1026-5a5e-5c04-03d012fa4300:1:0
2020-12-07 19:35:37,763 [FABRIC-rpc-event-queue] INFO  c.d.sabot.exec.FragmentExecutors - Received remote fragment start instruction for 20317bf6-1026-5a5e-5c04-03d012fa4300:1:1
2020-12-07 19:35:37,763 [FABRIC-rpc-event-queue] INFO  c.d.sabot.exec.FragmentExecutors - Received remote fragment start instruction for 20317bf6-1026-5a5e-5c04-03d012fa4300:1:2
2020-12-07 19:35:37,763 [FABRIC-rpc-event-queue] INFO  c.d.sabot.exec.FragmentExecutors - Received remote fragment start instruction for 20317bf6-1026-5a5e-5c04-03d012fa4300:1:3
2020-12-07 19:35:37,764 [FABRIC-rpc-event-queue] INFO  c.d.sabot.exec.FragmentExecutors - Received remote fragment start instruction for 20317bf6-1026-5a5e-5c04-03d012fa4300:1:4
2020-12-07 19:35:37,764 [FABRIC-rpc-event-queue] INFO  c.d.sabot.exec.FragmentExecutors - Received remote fragment start instruction for 20317bf6-1026-5a5e-5c04-03d012fa4300:1:5
2020-12-07 19:35:37,764 [FABRIC-rpc-event-queue] INFO  c.d.sabot.exec.FragmentExecutors - Received remote fragment start instruction for 20317bf6-1026-5a5e-5c04-03d012fa4300:0:0
2020-12-07 19:35:37,824 [e5 - 20317bf6-1026-5a5e-5c04-03d012fa4300:frag:1:0] INFO  c.d.s.e.f.FragmentStatusReporter - 20317bf6-1026-5a5e-5c04-03d012fa4300:1:0: State to report: FINISHED
2020-12-07 19:35:37,826 [e0 - 20317bf6-1026-5a5e-5c04-03d012fa4300:frag:1:3] INFO  c.d.s.e.f.FragmentStatusReporter - 20317bf6-1026-5a5e-5c04-03d012fa4300:1:3: State to report: FINISHED
2020-12-07 19:35:37,830 [e0 - 20317bf6-1026-5a5e-5c04-03d012fa4300:frag:0:0] INFO  c.d.s.e.f.FragmentStatusReporter - 20317bf6-1026-5a5e-5c04-03d012fa4300:0:0: State to report: FINISHED
2020-12-07 19:35:37,834 [e4 - 20317bf6-1026-5a5e-5c04-03d012fa4300:frag:1:4] INFO  c.d.s.e.f.FragmentStatusReporter - 20317bf6-1026-5a5e-5c04-03d012fa4300:1:4: State to report: CANCELLED
2020-12-07 19:35:37,835 [UserServer-1] INFO  c.d.sabot.rpc.user.UserRPCServer - [USER]: Channel closed /127.0.0.1:31010 <--> /127.0.0.1:42904 (user client)
2020-12-07 19:35:37,835 [UserServer-1] INFO  com.dremio.ConnectionLog - [30c3f4d8-241f-4f3d-a9e6-de7aea93b752] Connection Closed
2020-12-07 19:35:37,835 [e3 - 20317bf6-1026-5a5e-5c04-03d012fa4300:frag:1:5] INFO  c.d.s.e.f.FragmentStatusReporter - 20317bf6-1026-5a5e-5c04-03d012fa4300:1:5: State to report: CANCELLED
2020-12-07 19:35:37,838 [e2 - 20317bf6-1026-5a5e-5c04-03d012fa4300:frag:1:2] INFO  c.d.s.e.f.FragmentStatusReporter - 20317bf6-1026-5a5e-5c04-03d012fa4300:1:2: State to report: CANCELLED
2020-12-07 19:35:37,840 [e6 - 20317bf6-1026-5a5e-5c04-03d012fa4300:frag:1:1] INFO  c.d.s.e.f.FragmentStatusReporter - 20317bf6-1026-5a5e-5c04-03d012fa4300:1:1: State to report: CANCELLED
2020-12-07 19:35:37,844 [out-of-band-observer] INFO  query.logger - {"queryId":"20317bf6-1026-5a5e-5c04-03d012fa4300","queryText":"SELECT * FROM  HDFS.data.\"factTable.txt\" WHERE C > 0.0 LIMIT 100","start":1607369737716,"finish":1607369737841,"outcome":"CANCELED","outcomeReason":"Query cancelled by user 'mpowers'","username":"mpowers"}

Notice that the difference between the two is:

2020-12-07 19:34:26,943 [UserServer-1] INFO  c.d.sabot.rpc.user.UserRPCServer - [USER]: Channel closed /127.0.0.1:31010 <--> /127.0.0.1:42862 (user client)
2020-12-07 19:34:26,943 [UserServer-1] INFO  com.dremio.ConnectionLog - [b61585f8-5b73-4806-97e8-28e77caa4a74] Connection Closed

appears before the query has been logged on the failing method, whereas on the successful method it appears after the query has been logged.

EDIT: To make it clear I don't want to use read_sql as the reading time takes too long for the queries I want to run so I can't accurately measure db speeds.

1

There are 1 answers

0
Sarvesh On

Pyodbc/JDBC is inefficient to stream huge datasets. Instead use arrow flight connector which is most efficient way.

https://dremio-client.readthedocs.io/en/stable/dremio_client.flight.html#dremio_client.flight.query

https://www.dremio.com/is-time-to-replace-odbc-jdbc/