for a project I am working on I was asked to solve this task:
Given the table
events event_id int (autoincrement) --10B distinct values event_ts datetime -- 10B event_type int (1 = impression, 2 = click, 3 = purchase...) --20 types product_id int --100K client_id int --10M client_type int --10
Q: find the amount of clients which have seen an impression of a product before purchasing it
I came up with two solutions:
1)
With cteProdsClients As (
select e1.product_id ,e1.client_id
from events as e1
where event_type = ‘3’ and
EXISTS (SELECT e2.product_id ,e2.client_id
FROM events as e2
WHERE event_type = ‘1’ and e1.product_id = e2.product_id
AND e1.client_id = e2.client_id
AND e1.event_ts <e2.event_ts )
count(distinct product_id)
From cteProdsClients
)
SELECT
count(distinct client_id)
FROM
cteProdsClients ;
2)
With cteProdsClients As (
select e1.product_id ,e1.client_id
from events as e1 left join
(
SELECT e2.product_id ,e2.client_id
FROM events as e2
WHERE event_type = ‘1’ AND e1.event_ts <e2.event_ts
)
ON e1.product_id = e2.product_id AND e1.client_id = e2.client_id
WHERE event_type = ‘3’
)
SELECT
count(distinct client_id)
FROM
cteProdsClients ;
I need to not only create a query that would get the output but also do it in the most efficient, optimized way. Out of these 2 which is better? I would appreciate it if you fixed errors (if exist) and propose better solutions Thanks
You seem to be overcomplicating things; I believe this will give you what you want. Performance would be dependent on appropriate indexing, etc.