I tried to connect teradata using sqlalchemy in order to use read_sql and to_sql methods form pandas.
However, the connection is so slow. Even simple stuff, such as pd.read_sql('select current_date'), will cost more than 30 seconds to complete.
I don't really understand why this is so slow. If anyone experienced similar issue before, please tell me why and how you solved this. Thanks!
Updates:
I tried cProfile and sqlTAP and realize the slowness is due to some queries that the dialect generated. The has_table meathod will run a query to dbc.tablesvx view and this query will take around 100 seconds to finish while the view is only about 55k rows. For pd.to_sql, this has_table might be called more than once and some other queries to system tables will be required.
-- query from has_table method
SELECT tablename
FROM dbc.tablesvx
WHERE DatabaseName=?
AND TableName=?
-- query form drop_table method
SELECT tablename
FROM dbc.TablesVX
WHERE DatabaseName = ?
AND (tablekind='T'
OR tablekind='O')
It seems like all I need to do is some tuning to the system tables to make the queries run faster. However, our sql help persons told me that those system tables are already in the best performance. Is this possible? Is there anyone who have done any tuning to teradata DBC view? Thanks.