I have been using SQLAlchemy with OracleDB as driver to build my applications.
I observed one of the process being killed unexpectedly while being inside a try except Exception block.
And after trial and error, I found the reason to be the handler being used to read data from CLOB and BLOB to string.
def output_type_handler(cursor, metadata):
if metadata.type_code is oracledb.DB_TYPE_CLOB:
return cursor.var(oracledb.DB_TYPE_LONG, arraysize=cursor.arraysize)
if metadata.type_code is oracledb.DB_TYPE_BLOB:
return cursor.var(oracledb.DB_TYPE_LONG_RAW, arraysize=cursor.arraysize)
if metadata.type_code is oracledb.DB_TYPE_NCLOB:
return cursor.var(oracledb.DB_TYPE_LONG_NVARCHAR, arraysize=cursor.arraysize)
I tried using OracleDB alone without SqlAlchemy and successfully fetched the data and read the CLOB with read().
However, after I enabled the outputtypehandler, even fetchone() cause Jupyter to crash with ExitCode: 3221225477.
I have tried other settings like oracledb.defaults.fetch_lobs = False and it also caused the same problem.
Are there any configs I can use with SQLAlchemy to fix this?
Edit: I found the problem happen when I access a table with multiple CLOB columns across DBLink. It works fine without DBLink.
- DB Version: 19.6c
- Client Version: 21c
- Python: 3.9.5
- oracledb: 1.4.2
- SQLAlchemy: 2.0.23
- Connection Type: Thick Mode
Table Schema:
CREATE TABLE AUTOMAIL_JOB_LIST (
ID NUMBER GENERATED BY DEFAULT ON NULL AS IDENTITY START WITH 1 INCREMENT BY 1,
ID_DEFINITION NUMBER,
SERVICE_GROUP VARCHAR2 (100 Char),
SUBJECT VARCHAR2 (300 Char),
EMAIL_CONTENT CLOB,
ATTACHMENT CLOB,
SEND_TO CLOB,
SEND_CC CLOB,
SEND_BCC CLOB,
STATUS NUMBER(2) DEFAULT 0,
CREATE_DATE DATE DEFAULT SYSDATE,
SEND_DATE DATE DEFAULT NULL,
OWNER_EMAIL VARCHAR2(200 CHAR))
The table is storing emails to send. The columns attachment, send_to, send_cc, and send_bcc usually are just a few email/filepath. The problem came from EMAIL_CONTENT column that store the entire html of the email, which could contain tables.
Fetching them all but EMAIL_CONTENT and then EMAIL_CONTENT alone is fine, but it will cause problem when SELECT *
Sample Code:
from sqlalchemy import create_engine
import pandas as pd
engine = create_engine("", connect_args = {
"encoding": "UTF-16",
"nencoding": "UTF-16"
},
thick_mode = True,
pool_pre_ping = True,
echo_pool = True,
pool_size = 1)
query = """
SELECT *
FROM AUTOMAIL_JOB_LIST@DBLINK
"""
df = pd.read_sql(query, engine)
OracleDB-Python Version:
import oracledb
import pandas as pd
oracledb.init_oracle_client()
def output_type_handler(cursor, metadata):
if metadata.type_code is oracledb.DB_TYPE_CLOB:
return cursor.var(oracledb.DB_TYPE_LONG, arraysize=cursor.arraysize)
if metadata.type_code is oracledb.DB_TYPE_BLOB:
return cursor.var(oracledb.DB_TYPE_LONG_RAW, arraysize=cursor.arraysize)
if metadata.type_code is oracledb.DB_TYPE_NCLOB:
return cursor.var(oracledb.DB_TYPE_LONG_NVARCHAR, arraysize=cursor.arraysize)
query = """
SELECT *
FROM AUTOMAIL_JOB_LIST@DBLINK
"""
oracle_connection_string = oracledb.makedsn(oracle_host, oracle_port, service_name = oracle_service_name)
oracle_connection = oracledb.connect(user = oracle_username, password = oracle_password, dsn = oracle_connection_string, encoding = "UTF-16", nencoding = "UTF-16")
oracle_connection.outputtypehandler = output_type_handler
df = pd.read_sql(query, oracle_connection)
# cursor = oracle_connection.cursor()
# cursor.execute(query)
# row = cursor.fetchone()
Jupyter Notebook to reproduce the problem https://github.com/TuanDang293/BugReproduce/blob/main/Code.ipynb