Event_date reference in CTE

27 views Asked by At

I've been working on some code that will extract some aggregated data from my Google Analytics 4 export.

Unfortunately I've been running into an error regarding my variable event_timestamp in my CTE SessionsPerUser. I can't solve the error and have been trying to trace back what the problem is. The variable exists in my staging model and works fine for every other CTE. There has to be an issue with the Group by statement or the nested function? Any insights or ideas are greatly appreciated!

I will reference my entire code as well as the code snippet I believe causing the issue.

WITH RECURSIVE DateDimension AS 
(
    SELECT '2024-01-01'::DATE AS date -- Start date
    UNION ALL
    SELECT DATEADD(day, 1, date)
    FROM DateDimension
    WHERE date < CURRENT_DATE() -- Automatically updates to include up to the current date
),
UserInfo AS 
(
    SELECT
        DATE(event_timestamp) AS event_date,
        user_pseudo_id,
        MAX(IFF(event_name IN ('first_visit', 'first_open'), 1, 0)) AS is_new_user,
        MAX(IFF(event_name = 'app_remove', 1, 0)) AS is_deletion,
        MAX(IFF(event_name = 'delete_account', 1, 0)) AS  is_account_deletion,
        MAX(IFF(param_exclusion = TRUE, 1, 0)) AS is_exclusion,
        SUM(param_ENGAGEMENT_TIME_MSEC) AS total_engagement_time,
        MAX(IFF(event_name = 'first_open', 1, 0)) AS is_download
    FROM 
        {{ ref('stg_google_analytics_events') }}
    GROUP BY 
        DATE(event_timestamp), user_pseudo_id
),
SessionInfo AS 
(
    SELECT
        DATE(event_timestamp) AS event_date,
        user_pseudo_id,
        param_ga_session_id,
        TIMESTAMPDIFF(SECOND, MIN(event_timestamp), MAX(event_timestamp)) AS session_duration
    FROM 
        {{ ref('stg_google_analytics_events') }}
    GROUP BY  
        DATE(event_timestamp), user_pseudo_id, param_ga_session_id
),
TotalSessions AS 
(
    SELECT
        DATE(event_timestamp) AS event_date,
        COUNT(DISTINCT param_ga_session_id) AS total_sessions
    FROM 
        {{ ref('stg_google_analytics_events') }}
    GROUP BY 
        DATE(event_timestamp)
),
SessionsPerUser AS 
(
    SELECT
        DATE(event_timestamp) AS event_date,
        AVG(session_count) AS sessions_per_user
    FROM 
        (SELECT
             DATE(event_timestamp) AS event_date,
             user_pseudo_id,
             COUNT(DISTINCT param_ga_session_id) AS session_count
         FROM 
             {{ ref('stg_google_analytics_events') }}
         GROUP BY 
             DATE(event_timestamp), user_pseudo_id) AS user_sessions
    GROUP BY 
        DATE(event_timestamp)
),
AggregatedMetrics AS 
(
    SELECT
        ui.event_date,
        COUNT(DISTINCT ui.user_pseudo_id) AS active_users,
        SUM(ui.is_new_user) AS signups,
        SUM(ui.is_deletion) AS deletions,
        SUM(ui.is_account_deletion) AS account_deletions,
        SUM(ui.is_exclusion) AS exclusions,
        SUM(ui.is_download) AS downloads,
        AVG(si.session_duration) AS average_session_duration,
        SUM(ui.total_engagement_time) / COUNT(DISTINCT si.param_ga_session_id) AS average_engagement_time,
        ts.total_sessions,
        spu.sessions_per_user
    FROM 
        UserInfo ui
    LEFT JOIN 
        SessionInfo si ON ui.user_pseudo_id = si.user_pseudo_id 
                       AND ui.event_date = si.event_date
    LEFT JOIN 
        TotalSessions ts ON ui.event_date = ts.event_date
    LEFT JOIN 
        SessionsPerUser spu ON ui.event_date = spu.event_date
    GROUP BY 
        ui.event_date, ts.total_sessions, spu.sessions_per_user
)
SELECT
    dd.date,
    COALESCE(am.downloads, 0) AS downloads,
    COALESCE(am.active_users, 0) AS active_users,
    COALESCE(am.signups, 0) AS signups,
    COALESCE(am.deletions, 0) AS deletions,
    COALESCE(am.account_deletions, 0) AS account_deletions,
    COALESCE(am.exclusions, 0) AS exclusions,
    COALESCE(am.total_sessions, 0) AS total_sessions,
    COALESCE(am.sessions_per_user, 0) AS sessions_per_user,
    COALESCE(am.average_session_duration, 0) AS average_session_duration,
    COALESCE(am.average_engagement_time, 0) AS average_engagement_time
FROM
    DateDimension dd
LEFT JOIN 
    AggregatedMetrics am ON dd.date = am.event_date
ORDER BY 
    dd.date

Error message:

000904 (42000): SQL compilation error: error line 46 at position 11 invalid identifier 'EVENT_TIMESTAMP'

Code snippet:

SessionsPerUser AS 
(
    SELECT
      DATE(event_timestamp) AS event_date,
      AVG(session_count) AS sessions_per_user
    FROM (
        SELECT
          DATE(event_timestamp) AS event_date,
          user_pseudo_id,
          COUNT(DISTINCT param_ga_session_id) AS session_count
        FROM {{ ref('stg_google_analytics_events') }}
        GROUP BY DATE(event_timestamp), user_pseudo_id
    ) AS user_sessions
    GROUP BY DATE(event_timestamp)
),

Again I can verify that event_timestamp exists and works for my other CTEs so it is not an 'invalid identifier'

Thanks for the help!

1

There are 1 answers

1
Dave Welden On BEST ANSWER

In the SessionsPerUser CTE, you have DATE(event_timestamp) AS event_date. This is selecting from a subquery where you already created event_date from event_timestamp. So it should look like:

SessionsPerUser AS (
    SELECT
      event_date,
      AVG(session_count) AS sessions_per_user
    FROM (
        SELECT
          DATE(event_timestamp) AS event_date,
          user_pseudo_id,
          COUNT(DISTINCT param_ga_session_id) AS session_count
        FROM {{ ref('stg_google_analytics_events') }}
        GROUP BY DATE(event_timestamp), user_pseudo_id
    ) AS user_sessions
    GROUP BY DATE(event_date)
),