SQL: Count returning and new visitors based on IP group by date

978 views Asked by At

I have a graph whitch showing me the unique visitors based on IP group by date (today every hour).

Now I want to seperate this data to new visitors and returning visitors bases on IP and session_id and group them by date (today every hour). How can I do this with a SQL query? Is it even possible?

The query should look in the tabel if an IP with session_id is already there. Then it is a returning visitor. Otherwise it is a new visitor. I don't know how to do that.

Query I have to count the unique IP's and group them by every hour today:

SELECT DISTINCT DATE_FORMAT(`date`, '%Y-%m-%d %H') as 'dates', COUNT(DISTINCT `ip`) as 'count' FROM `logging` WHERE DATE(date) = DATE(NOW()) GROUP BY `dates`

Now it is showing me:

Dates            Count
2021-02-04 00    10
2021-02-04 01    8
2021-02-04 02    5

etc.

What I want is:

Dates            Count new IP      Count returning IP
2021-02-04 00    2                 8
2021-02-04 01    4                 4
2021-02-04 02    2                 3

New IP: Check if IP is stored in the table with only one known session_id. Returning IP: Check if IP is stored in the table with more than one different session_id's.

Many thanks in advance!

UPDATE #1:

Now I have the following query to count the returning visitors today:

SELECT date, ip, count(distinct ip, session_id) as 'count' FROM logging GROUP BY ip HAVING count > 1 AND date(date) = date(now())

Result for example:

date                    ip                   count (returning visitors)
2021-02-05 08:24:56     62.163.91.178        2
2021-02-05 10:24:15     77.163.91.223        6
2021-02-05 08:49:51     77.173.17.157        13

How can I change this query to group them by date to get this?:

date                    count (returning visitors)
2021-02-05 08           15
2021-02-05 10           6

UPDATE #2:

Thanks to Tsungur I've gote the following query but it shows me different results each time I run it.

select DATE_FORMAT([date], '%Y-%m-%d %H') as [date] , count(*) from ( SELECT [date], ip, count(distinct ip, session_id) as 'count' FROM logging GROUP BY ip HAVING count(distinct ip, session_id) > 1 AND date(date) = date(now())
) as sub
group by DATE_FORMAT([date], '%Y-%m-%d %H') 

Here some data to play with:

ID        session_id                   ip               date
10752     454747k5k45l23h3b5n6k432nn   44.56.123.123    2021-01-01 09:15:54   
10950     kmcoq3glgm187uhsfmo3r71h9q   86.85.131.246    2021-02-11 13:19:22
10958     kmcoq3glgm187uhsfmo3r71h9q   86.85.131.246    2021-02-12 12:10:52
10960     dfh78dfh7fdh7fdh6sd55dsd88   86.85.131.246    2021-02-12 13:00:02
10967     87s97sfh57sh6sh6s6sdsd44d3   11.56.873.560    2021-02-13 13:00:00
10968     rkdrgjsd7gjsd5jskjd46kjdsk   66.35.127.435    2021-02-13 13:01:00
10977     rkdrgjsd7gjsd5jskjd46kjdsk   66.35.127.435    2021-02-13 13:03:11
10978     dfajesj9sdj0dfh78sgd57sd5d   44.56.123.123    2021-02-13 13:05:12
10979     fhdf7f7hdf6fd44fdf3ffdf321   86.85.131.246    2021-02-13 14:05:02
10980     fhdf7f7hdf6fd44fdf3ffdf321   86.85.131.246    2021-02-13 14:06:13

The above data should show me:

date             count (new visitor)
2021-02-13 13    2
2021-02-13 14    0

date             count (returning visitor)
2021-02-13 13    1
2021-02-13 14    1
1

There are 1 answers

8
TSungur On

You can find returning IP's by grouping on IP and filtering which have a count more than one.

SELECT [ip]
FROM [logging]
group by [ip]
having count(*)>1

Then you can use this query as a filter for your main query.

SELECT [ip], [date],count(*) as [Count]
FROM [logging]
where [ip] in
(
SELECT [ip]
FROM [logging]
group by [ip]
having count(*)>1
)
group by [ip],[date]

For the last part of your Updated Question;

  1. Your current query is in consistent. You display the count of distinct ip and session but your having clause does not reflect this. Your Having clause counts by date and ip grouping. I am not sure if you made this in purpose. Maybe your your having clause should be having count(distinct ip, session_id)>1.

  2. For your final result set, without modifying your final query, use it as a subquery like

    select DATE_FORMAT([date], '%Y-%m-%d %H') as [date] , count(*) from (
    SELECT [date], ip, count(distinct ip, session_id) as 'count' FROM logging GROUP BY ip HAVING count > 1 AND date(date) = date(now())
        ) as sub
        group by DATE_FORMAT([date], '%Y-%m-%d %H') 
    

UPDATE #2 First some advice, do not use reserved words as column names if possible (ex. date). I tried to break the problem into pieces. The final query looks dirty, it can be improved and shortened. But for clarification of concerns I used multiple queries. Your first problem is counting the unique sid and ip. For simplicity (since both are string) I used sid+':'+ip as a single unique string. Another problem is grouping also date and hour. So here is it step by step:

  1. Creating the base reference data:

         SELECT FORMAT(dt, 'yyyy-MM-dd HH') as dt_H,
           [sid]+':'+[ip] as uniq
    
       FROM [mytable]
       where cast(dt as date)=cast(getdate() as date)
    
  2. Finding returning items by my unique value and date - hour:

    select dt_H,uniq,count() as times from ( SELECT FORMAT(dt, 'yyyy-MM-dd HH') as dt_H, [sid]+':'+[ip] as uniq FROM [mytable] where cast(dt as date)=cast(getdate() as date) ) as helper group by dt_H,uniq having count()>1

  3. If you are using php you can later separate uniq field by : character. But if you want it in sql we will join the master table with out distinguishing columns:

    select distinct returning.dt_H, main.[sid], main.[ip],returning.times from [mytable] as main inner join ( select dt_H,uniq,count(*) as times from ( SELECT FORMAT(dt, 'yyyy-MM-dd HH') as dt_H, [sid]+':'+[ip] as uniq

           FROM [mytable]
           where cast(dt as date)=cast(getdate() as date)
     ) as helper
       group by dt_H,uniq
       having count(*)>1
    

    ) as returning on returning.uniq=main.[sid]+':'+main.[ip] and FORMAT(main.dt, 'yyyy-MM-dd HH')=returning.dt_H

  4. You are filling the gaps with PHP you said. You can do this in sql also. I would suggest creating a user defined function (which comes in handy many other times) that will create numeric values within given range: CREATE FUNCTION [dbo].[GetNumbers](@Start int, @Stop int) RETURNS TABLE AS RETURN WITH Numbers (N) AS ( SELECT @Start UNION ALL SELECT @Start + N-@Start+1 FROM Numbers WHERE N < @Stop ) SELECT N FROM Numbers GO

Usage:

SELECT N FROM [dbo].[GetNumbers] (0,23)

This will return a table with all the hours in a day. You can use this table, combined with the date part and left join with the above query. So you can display all the hours and 0 for returning times if is null.

PS:sorry could not make the formatting work.