So, I was trying to accomplish this in SQL but was advised there would be a simple way to do this in Pandas... I would appreciate your help/hints!
I currently have the table on the left with two columns (begin subsession and end subsession), and I would like to add the two left columns "session start" and "session end". I know how to simply add the columns, but I can't figure out the query that would allow me to identify the continuous values in the two original columns (ie the end sub-session value is the same as the next rows begin sub-session value) and then add the first begin session value, and last end session value (for continuous rows) to the respective rows in my new columns. Please refer to the image.. for example, for the first three rows the "end subsession" value is the same as the next rows "begin subsession" values, so the first three "session start" and "session end" would be the same, with the minimum of the "begin subsession" values and the maximum "end sub session" value.
I was trying something along these lines in SQL, obviously didn't work, and I realized the aggregate function doesn't work in this case...
SELECT
FROM viewershipContinuous =
CASE
WHEN endSubsession.ROWID = beginSubession.ROWID+1
THEN MIN(beginSubsession)
ELSE beginSubsession.ROWID+1
END;
The table on the left is what I have, the table on the right is what I want to achieve
You can first compare next value by
shift
ed columnesub
with columnbsub
if not equal (!=
) and then create groups bycumsum
:Then
groupby
bySeries
s
withtransform
min
andmax
: