ATHENA/PRESTO complex query with multiple unnested tables

1.4k views Asked by At

i have i would like to create a join over several tables. table login : I would like to retrieve all the data from login table logging : calculating the Nb_of_sessions for each db & for each a specific event type by user table meeting : calculating the Nb_of_meetings for each db & for each user table live : calculating the Nb_of_live for each db & for each user

I have those queries with the right results :

SELECT db.id,_id as userid,firstname,lastname
FROM  "logins"."login",
UNNEST(dbs) AS a1 (db)

SELECT dbid,userid,count(distinct(sessionid)) as no_of_visits,
       array_join(array_agg(value.from_url),',') as from_url
FROM "loggings"."logging"
where event='url_event'
group by db.id,userid;

SELECT dbid,userid AS userid,count(*) as nb_interviews,
      array_join(array_agg(interviewer),',') as interviewer 
FROM  "meetings"."meeting" 
group by dbid,userid;

SELECT dbid,r1.user._id AS userid,count(_id) as nb_chat 
FROM "lives"."live",
UNNEST(users) AS r1 (user)
group by dbid,r1.user._id;

But when i begin to try put it all together, it seems i retrieve bad data (i have only on db retrieved) and it seems not efficient.

select a1.db.id,a._id as userid,a.firstname,a.lastname,count(rl._id) as nb_chat
FROM 
"logins"."login" a,
"loggings"."logging" b,
"meetings"."meeting" c,
"lives"."live" d,
UNNEST(dbs) AS a1 (db),
UNNEST(users) AS r1 (user)
where a._id = b.userid AND a._id = c.userid AND a._id = r1.user._id
group by 1,2,3,4

Do you have an idea ?

Regards.

1

There are 1 answers

0
Philipp Johannis On BEST ANSWER

The easiest way is to work with with to structure the subquery and then reference them.

with parameter reference:

You can use WITH to flatten nested queries, or to simplify subqueries.

The WITH clause precedes the SELECT list in a query and defines one or more subqueries for use within the SELECT query.

Each subquery defines a temporary table, similar to a view definition, which you can reference in the FROM clause. The tables are used only when the query runs.

Since you already have working sub queries, the following should work:

with logins as 
(
    SELECT db.id,_id as userid,firstname,lastname
    FROM  "logins"."login",
    UNNEST(dbs) AS a1 (db)
)
,visits as
(
    SELECT dbid,userid,count(distinct(sessionid)) as no_of_visits,
           array_join(array_agg(value.from_url),',') as from_url
    FROM "loggings"."logging"
    where event='url_event'
    group by db.id,userid
)
,meetings as
(
    SELECT dbid,userid AS userid,count(*) as nb_interviews,
          array_join(array_agg(interviewer),',') as interviewer 
    FROM  "meetings"."meeting" 
    group by dbid,userid
)
,chats as 
(
    SELECT dbid,r1.user._id AS userid,count(_id) as nb_chat 
    FROM "lives"."live",
    UNNEST(users) AS r1 (user)
    group by dbid,r1.user._id
)
select *
from logins l
left join visits v
    on l.dbid = v.dbid
    and l.userid = v.userid
left join meetings m
    on l.dbid = m.dbid
    and l.userid = m.userid
left join chats c
    on l.dbid = c.dbid
    and l.userid = c.userid;