Using SQL Alchemy ORM to query an AWS athena database. First off, here is the raw SQL query, which is working perfectly
WITH zi AS (SELECT
trip.start_dt,
address.zone_id
FROM view_midtable_esc_trip_details_cleaned AS trip
INNER JOIN view_midtable_esc_address_expanded AS address
ON trip.start_address_ref = address.address_ref),
all_trips AS (SELECT
date_parse(
concat(
date_format(zi.start_dt, '%Y-%m-%d %H:'),
-- The following line causing issues when converting to ORM
CAST(FLOOR(extract(minute from zi.start_dt)/15)*15 AS VARCHAR),
':00'),
'%Y-%m-%d %H:%i:%s'
) AS modified_timestamp,
zi.zone_id
FROM zi)
SELECT COUNT(*) as demand,
modified_timestamp,
zone_id FROM all_trips GROUP BY modified_timestamp, zone_id;
Now, here is the relevant part of the code where I am doing the minutes extraction from the timestamp, and replacing it with the following logic
floor(original_minutes/15)*15
minutes_interval:int=15
quantised_minute:BinaryExpression=func.FLOOR(func.extract('minute',
zi.columns[original_timestamp])/minutes_interval)*minutes_interval
modified_timestamp = func.date_format(func.concat(
func.date_format(zi.columns[original_timestamp], '%Y-%m-%d %H:'),
func.CAST(quantised_minute, VARCHAR)),
text("'%Y-%m-%d %H:%i'")).label(name=original_timestamp)
query:Query=session.query(modified_timestamp,
zi.columns[zone_col],
func.count().label(name=LABEL)).group_by(modified_timestamp, zi.columns[zone_col])
When I print the raw query that is sent to the engine, this is getting translated by ORM as (comments and formatting mine)
SELECT date_format(concat(date_format(anon_1.start_dt, '%Y-%m-%d %H:'),
-- What is this extra casting?
CAST(FLOOR(EXTRACT(minute FROM anon_1.start_dt) / CAST(15 AS NUMERIC)) * 15 AS VARCHAR)),
'%Y-%m-%d %H:%i') AS start_dt, anon_1.zone_id, count(*) AS total_unique_demand
FROM ...
and the python error message is
OperationalError: (pyathena.error.OperationalError) TYPE_MISMATCH: line 1:123: Unknown type: NUMERIC
[SQL: SELECT date_format(concat(date_format(anon_1.start_dt, '%%Y-%%m-%%d %%H:'), CAST(FLOOR(EXTRACT(minute FROM anon_1.start_dt) / CAST(15 AS NUMERIC)) * 15 AS VARCHAR)), '%%Y-%%m-%%d %%H:%%i') AS start_dt, anon_1.zone_id, count(*) AS total_unique_demand
Of course, this is not the whole code and the whole query, but this incorrect translation (when forming the query from the ORM representation) is the core issue here. Is this a documented concept that I should be aware of?
Also, I tried replacing the simple number with some variations of sqlalchemy.text(text=str(minutes_interval)
, but did not help.
I am using pyathena for engine, and AWS athena database.