How to write the SQL query so that it doesn't have to run for each day?

59 views Asked by At

I am using Vertica Enterprise for my warehouse.

I have three tables.

Table #1 - Students

Students Table

Table #2 - Address of Students

enter image description here

Table #3 - Homework progress of Students done for the day. (Note: some of students might be absent on that day). Homework progress column is in %.

enter image description here

Desired output

I want all the days that exists in table 3 and all the students from table 1 along with their addresses from table 2 in a single table.

enter image description here

Here is my attempt but I have to run the query for each day. Meaning I have to keep changing HOMEWORK.day = '2023-11-01' and TO_DATE(TO_CHAR(20231101), 'YYYY-MM-DD') AS DAY_ for each run.

INSERT INTO TABLE_4 
    SELECT 
        TO_DATE(TO_CHAR(20231101),'YYYY-MM-DD') AS DAY_,
        A.ID AS ID,
        A.NAME AS NAME,
        B.ADDRESS AS ADDRESS,
        NVL(C.HOMEWORK_PROGRESS,0.0) AS PROGRESS
    FROM 
        STUDENTS AS STUDENT
    INNER JOIN 
        ADDRESS AS ADDRESS ON STUDENT.ID = ADDRESS.ID
    LEFT JOIN 
        HOMEWORK AS HOMEWORK ON STUDENT.ID = HOMEWORK.ID AND HOMEWORK.DAY = '2023-11-01'

I want to rewrite it so that I can just do HOMEWORK.DAY BETWEEN '2023-11-01' AND '2023-11-30' and let it run by itself.

2

There are 2 answers

1
Guy Kastenbaum On

did you try the obvious one :

INSERT INTO TABLE_4 
SELECT C.DAY AS DAY_
,A.ID AS ID
,A.NAME AS NAME
,B.ADDRESS AS ADDRESS
,NVL(C.HOMEWORK_PROGRESS,0.0) AS PROGRESS
FROM STUDENTS AS A
INNER JOIN ADDRESS AS B ON A.ID = B.ID
LEFT JOIN HOMEWORK AS C ON A.ID = C.ID 
WHERE C.DAY BETWEEN '2023-11-01' AND '2023-11-30' 

?

0
marcothesane On

I found no better solution than cross joining all distinct id-s from the progress table with all distinct dates from the progress table in a sub select (a CTE), and then left-joining the progress table with that CTE:

WITH

-- your data, don't use in final query ...

stud(id,nam) AS (
            SELECT 1,'John'
  UNION ALL SELECT 2,'Kenny'
  UNION ALL SELECT 3,'Westly'
  UNION ALL SELECT 4,'Arthur'
  UNION ALL SELECT 5,'Tom'
)
,                                                                               
addr(id,addr) AS (
            SELECT 1,'North'
  UNION ALL SELECT 2,'West'
  UNION ALL SELECT 3,'East'
  UNION ALL SELECT 4,'South East'
  UNION ALL SELECT 5,'South West'
)
,
progress(dt,id,progress) AS (    
            SELECT DATE '2023-01-01',1,0.5
  UNION ALL SELECT DATE '2023-01-01',2,0.7
  UNION ALL SELECT DATE '2023-01-01',3,1.0
  UNION ALL SELECT DATE '2023-11-02',1,0.8
  UNION ALL SELECT DATE '2023-11-02',2,0.9
  UNION ALL SELECT DATE '2023-11-02',3,1.0
  UNION ALL SELECT DATE '2023-11-02',4,0.2
)

-- end of input; real query starts here

-- replace following comma with "WITH"

,  -- or WITH for stand-alone query
all_stud_dt AS (
  SELECT
    id
  , dt
  FROM (SELECT DISTINCT dt FROM progress) dt
  CROSS JOIN (SELECT DISTINCT id FROM progress) id
)
SELECT
  all_stud_dt.dt
, all_stud_dt.id
, addr
, NVL(progress,0) AS progress
FROM all_stud_dt
JOIN addr USING(id)
LEFT JOIN progress USING(id,dt)
ORDER BY 1,2
;
-- out      dt     | id |    addr    | progress 
-- out ------------+----+------------+----------
-- out  2023-01-01 |  1 | North      | 0.5
-- out  2023-01-01 |  2 | West       | 0.7
-- out  2023-01-01 |  3 | East       | 1.0
-- out  2023-01-01 |  4 | South East | 0.0
-- out  2023-11-02 |  1 | North      | 0.8
-- out  2023-11-02 |  2 | West       | 0.9
-- out  2023-11-02 |  3 | East       | 1.0
-- out  2023-11-02 |  4 | South East | 0.2