MySQL: For each id, using sequential dates to calculate space in between

135 views Asked by At

I am struggling to find a solution to a problem I need to solve, so I decided to ask instead of just looking at other questions that weren't quite what I was looking for.

I am using mysql and have data like the following:

| id |  date     |
|  1 | 2015-06-05|   
|  1 | 2015-06-07|
|  1 | 2015-06-08|
|  2 | 2015-06-04|
|  2 | 2015-06-06|
|  3 | 2015-06-03|
|  3 | 2015-06-08|
|  4 | 2015-06-02|

The id represents a specific user in the database, and date represents the date that an event took place for that user. My goal is calculate the average amount of time between one event taking place for a user and the next event for that same user, averaged over all users.

So for example, using the example data I provided, for the id 1 user, I want to start with the first event for this user and find the gap of time between this event and the next event: 2015-06-07 - 2015-06-05 = 2 days. Next is the gap of time between the second and third events: 2015-06-08 - 2015-06-07 = 1 day. As we proceed down the list, we add together the days and track how many spaces of time we've calculated, so that we can take an average at the end. After user 1, we have 3 days total and 2 spaces added.

For id 2 user, we get 2 days (2015-06-06 - 2015-06-04 = 2), so 5 days total and 3 spaces added. After id 3 user, we have 10 days total with 4 spaces (2015-06-08 - 2015-06-03 = 5). For id 4 user, there is only one event, so we don't have any spaces added, so we stay at 10 days total with 4 spaces (ID's with only one event are essentially ignored). So our answer is 10/4 = 2.5 as the average for this particular small set of data.


I need to be able to calculate this result with data in which before running the code, I do not know the number of distinct id's or the number of events for each of these id's or the range of event dates.

I have found that for the actual date subtraction part, that the function TIMESTAMPDIFF(DAY, date1, date2) works with DAY as the unit. It's the part of actually using this function over multiple id's with each id having a range of events in the data that I'm having trouble with.

How can I calculate the desired result in MySQL? The dates will always be in the correct order. The example displayed them in order from earliest to latest, but I can rearrange them from latest to earliest, if needed. Additionally, I can use code to have the id's that have only one event taken out if need be. Thank you for any insight you can give.

1

There are 1 answers

0
Mark Jack On BEST ANSWER

Of course, in less than an hour after asking the question, I come up with my own solution even though I've been working on this for several hours.

I realized that for a single user, the sum of the amount of space in between each of the events for that user, is simply the space between the first and last event (For id 1 user, we had 2 + 1 = 3 for the total, but we could have just found the difference between the first and last date which is also 3).

This means that only the first and last date are necessary to find the desired result which makes everything much easier. To simulate adding together all of the spaces between events for a single user, we divide by the number of spaces which is the number of rows for that user minus 1. The GROUP BY clause together with the MIN and MAX and COUNT functions allow us to get the first and last date as well as the count for the number of spaces.

The following code should be what is needed (where data is a table like the given example).

SELECT SUM(diff)/COUNT(diff)
FROM (
    SELECT id, TIMESTAMPDIFF(DAY, min, max)/count AS diff
    FROM (
        SELECT id, MIN(date) AS min, MAX(date) AS max, COUNT(id)-1 AS count
        FROM data
        GROUP BY id
        )
    )