Unionize partitioned tables matching criteria

83 views Asked by At

I am trying to make a union between partitioned tables matching this criteria:

#standardSQL 
SELECT table_name 
FROM `project.dataset`.INFORMATION_SCHEMA.TABLES
WHERE SAFE.PARSE_DATE('%Y%m%d', RIGHT(table_name, 8)) 
BETWEEN DATE_SUB(CURRENT_DATE(), INTERVAL 7 DAY) AND CURRENT_DATE()
ORDER BY creation_time DESC;

example output:

daily_20220104
daily_20220103
daily_20220101
daily_20211231

These are names of tables I want to union. As you may see, it happens that there are gaps in time.

I was thinking about turning this column into an array and then figure out how to iterate over it, or eventually iterate over a table, which contains one column somehow, I lack experience in doing it in SQL.

My ultimate goal is to use it as subquery. It will also be placed in Apache Airflow. I could also write some Python code to retrieve it, I'm open to suggestions. I would most appriciate solution including a SQL FOR loop like

FOR row in rows DO
    INSERT INTO temporary_table (
        SELECT *
        FROM project.dataset.row
    );
END FOR;
1

There are 1 answers

0
dariuszewski On

I managed to solve this problem as following:

DECLARE varI INT64;
DECLARE varJ INT64;
DECLARE varTbl STRING;

SET varI = 1;

SET varJ = (
    SELECT count(table_name)
    FROM `project.dataset`.INFORMATION_SCHEMA.TABLES
    WHERE SAFE.PARSE_DATE('%Y%m%d', RIGHT(table_name, 8)) BETWEEN DATE_SUB(CURRENT_DATE(), INTERVAL 7 DAY) AND CURRENT_DATE()
);

CREATE TEMPORARY TABLE ResultOneWeek (company_name INT64, date DATE);

WHILE varI <= varJ DO
    SET varTbl = (
        WITH availableReports AS (
            SELECT table_name, ROW_NUMBER() OVER(ORDER BY creation_time DESC) AS rn
            FROM `project.dataset`.INFORMATION_SCHEMA.TABLES
            WHERE SAFE.PARSE_DATE('%Y%m%d', RIGHT(table_name, 8)) 
        BETWEEN DATE_SUB(CURRENT_DATE(), INTERVAL 7 DAY) AND CURRENT_DATE()
        )
        SELECT RIGHT(table_name, 8)
        FROM availableReports
        WHERE rn = varI
    );

    INSERT INTO ResultOneWeek (
        SELECT DISTINCT company_name, reportDate
        FROM `project.dataset.daily_*`
        WHERE _TABLE_SUFFIX = varTbl
    );
    
    SET varI = varI + 1;
END WHILE;

--final query
SELECT *some_tables 
FROM other_table o
INNER JOIN ResultOneWeek r
ON *some_keys;

but i don't think efficiency would be strong part of this approach.