I have a list of different channels that could potentially bring users to a website (organic, SEO, online marketing, etc.). I would like to find an efficient way to count daily active user that comes from the combination of these channels. Each channel has its own table and track its respective users.
The tables looks like the following,
channel A
date user_id
2020-08-01 A
2020-08-01 B
2020-08-01 C
channel B
date user_id
2020-08-01 C
2020-08-01 D
2020-08-01 G
channel C
date user_id
2020-08-01 A
2020-08-01 C
2020-08-01 F
I want to know the following combinations
- Only visit channel A
- Only visit channel A & B
- Only visit channel B & C
- Only visit channel B
- etc.
However, when there are a lot of channels (I have around 8 channels) the combination is a lot. What I've done roughly is as simple as this (this one includes channel A)
SELECT
a.date,
COUNT(DISTINCT IF(b.user_id IS NULL AND c.user_id IS NULL, a.user_id, NULL)) AS dau_a,
COUNT(DISTINCT IF(b.user_id IS NOT NULL AND c.user_id IS NULL, a.user_id, NULL)) AS dau_a_b,
...
FROM a LEFT JOIN b ON a.user_id = b.user_id AND a.date = b.date
LEFT JOIN c ON a.user_id = c.user_id AND a.date = c.date
GROUP BY 1
but extremely tedious when the total channels is 8 (28 variations for 2 combinations, 56 for 3, 70 for 4, and many more).
Any smart ideas to solve this? I was thinking to use FULL OUTER JOIN
but can't seem to get the grasp out of it. Answers really appreciated.
I would approach this with
union all
and two levels of aggregation: