I have a graph whitch showing me the unique visitors based on IP group by date (today every hour).
Now I want to seperate this data to new visitors and returning visitors bases on IP and session_id and group them by date (today every hour). How can I do this with a SQL query? Is it even possible?
The query should look in the tabel if an IP with session_id is already there. Then it is a returning visitor. Otherwise it is a new visitor. I don't know how to do that.
Query I have to count the unique IP's and group them by every hour today:
SELECT DISTINCT DATE_FORMAT(`date`, '%Y-%m-%d %H') as 'dates', COUNT(DISTINCT `ip`) as 'count' FROM `logging` WHERE DATE(date) = DATE(NOW()) GROUP BY `dates`
Now it is showing me:
Dates Count
2021-02-04 00 10
2021-02-04 01 8
2021-02-04 02 5
etc.
What I want is:
Dates Count new IP Count returning IP
2021-02-04 00 2 8
2021-02-04 01 4 4
2021-02-04 02 2 3
New IP: Check if IP is stored in the table with only one known session_id. Returning IP: Check if IP is stored in the table with more than one different session_id's.
Many thanks in advance!
UPDATE #1:
Now I have the following query to count the returning visitors today:
SELECT date, ip, count(distinct ip, session_id) as 'count' FROM logging GROUP BY ip HAVING count > 1 AND date(date) = date(now())
Result for example:
date ip count (returning visitors)
2021-02-05 08:24:56 62.163.91.178 2
2021-02-05 10:24:15 77.163.91.223 6
2021-02-05 08:49:51 77.173.17.157 13
How can I change this query to group them by date to get this?:
date count (returning visitors)
2021-02-05 08 15
2021-02-05 10 6
UPDATE #2:
Thanks to Tsungur I've gote the following query but it shows me different results each time I run it.
select DATE_FORMAT([date], '%Y-%m-%d %H') as [date] , count(*) from ( SELECT [date], ip, count(distinct ip, session_id) as 'count' FROM logging GROUP BY ip HAVING count(distinct ip, session_id) > 1 AND date(date) = date(now())
) as sub
group by DATE_FORMAT([date], '%Y-%m-%d %H')
Here some data to play with:
ID session_id ip date
10752 454747k5k45l23h3b5n6k432nn 44.56.123.123 2021-01-01 09:15:54
10950 kmcoq3glgm187uhsfmo3r71h9q 86.85.131.246 2021-02-11 13:19:22
10958 kmcoq3glgm187uhsfmo3r71h9q 86.85.131.246 2021-02-12 12:10:52
10960 dfh78dfh7fdh7fdh6sd55dsd88 86.85.131.246 2021-02-12 13:00:02
10967 87s97sfh57sh6sh6s6sdsd44d3 11.56.873.560 2021-02-13 13:00:00
10968 rkdrgjsd7gjsd5jskjd46kjdsk 66.35.127.435 2021-02-13 13:01:00
10977 rkdrgjsd7gjsd5jskjd46kjdsk 66.35.127.435 2021-02-13 13:03:11
10978 dfajesj9sdj0dfh78sgd57sd5d 44.56.123.123 2021-02-13 13:05:12
10979 fhdf7f7hdf6fd44fdf3ffdf321 86.85.131.246 2021-02-13 14:05:02
10980 fhdf7f7hdf6fd44fdf3ffdf321 86.85.131.246 2021-02-13 14:06:13
The above data should show me:
date count (new visitor)
2021-02-13 13 2
2021-02-13 14 0
date count (returning visitor)
2021-02-13 13 1
2021-02-13 14 1
You can find returning IP's by grouping on IP and filtering which have a count more than one.
Then you can use this query as a filter for your main query.
For the last part of your Updated Question;
Your current query is in consistent. You display the count of distinct ip and session but your having clause does not reflect this. Your Having clause counts by date and ip grouping. I am not sure if you made this in purpose. Maybe your your having clause should be
having count(distinct ip, session_id)>1
.For your final result set, without modifying your final query, use it as a subquery like
UPDATE #2 First some advice, do not use reserved words as column names if possible (ex. date). I tried to break the problem into pieces. The final query looks dirty, it can be improved and shortened. But for clarification of concerns I used multiple queries. Your first problem is counting the unique sid and ip. For simplicity (since both are string) I used sid+':'+ip as a single unique string. Another problem is grouping also date and hour. So here is it step by step:
Creating the base reference data:
Finding returning items by my unique value and date - hour:
select dt_H,uniq,count() as times from ( SELECT FORMAT(dt, 'yyyy-MM-dd HH') as dt_H, [sid]+':'+[ip] as uniq FROM [mytable] where cast(dt as date)=cast(getdate() as date) ) as helper group by dt_H,uniq having count()>1
If you are using php you can later separate uniq field by : character. But if you want it in sql we will join the master table with out distinguishing columns:
select distinct returning.dt_H, main.[sid], main.[ip],returning.times from [mytable] as main inner join ( select dt_H,uniq,count(*) as times from ( SELECT FORMAT(dt, 'yyyy-MM-dd HH') as dt_H, [sid]+':'+[ip] as uniq
) as returning on returning.uniq=main.[sid]+':'+main.[ip] and FORMAT(main.dt, 'yyyy-MM-dd HH')=returning.dt_H
You are filling the gaps with PHP you said. You can do this in sql also. I would suggest creating a user defined function (which comes in handy many other times) that will create numeric values within given range:
CREATE FUNCTION [dbo].[GetNumbers](@Start int, @Stop int) RETURNS TABLE AS RETURN WITH Numbers (N) AS ( SELECT @Start UNION ALL SELECT @Start + N-@Start+1 FROM Numbers WHERE N < @Stop ) SELECT N FROM Numbers GO
Usage:
This will return a table with all the hours in a day. You can use this table, combined with the date part and left join with the above query. So you can display all the hours and 0 for returning times if is null.
PS:sorry could not make the formatting work.