I would like to GROUP BY some rows in SQL based on "time of inactivity", that means notice large gaps between timestamps in order to recognize user sessions.
Assume this actions table in a sqlite database:
_actions_
| id | userID | timestamp | actionType |
Now I can view all user actions and its types grouped by hour, month, year etc.
SELECT
userID, strftime('%H', timestamp), group_concat(actionType)
FROM
actions
GROUP BY userID, strftime('%H', timestamp);
However, that is not completely what I want. Consider this:
| 1 | 333 | 13.11.2014-20:59 | action6 |
| 2 | 333 | 13.11.2014-21:01 | action3 |
| 3 | 333 | 13.11.2014-21:47 | action5 |
My example merges into row 1 and row 2+3, however if we assume a session timeout of 30 minutes we should get row 1+2 and row 3. Any sql-ideas?
Sure one could load the database and in R, Pandas etc. and check for each user, if current_timestamp - last_timestamp > 30min, but is this really the way to go?
To compute where a row is the first row in a session, check whether any other row exists in the previous 30 minutes:
But his does not help directly with grouping; we need some value that is unique for all rows in a group.
Let's take the ID of the first row in the group. To find this first row of the group of the current row, we take the last row that is the first row of some group and that is not after the current row:
To make this query efficient, the
timestamp
column must be indexed.