I have a DataFrame which have a column with statuses like this:
datetime | session | try | status
2020-09-17 10:00:01 | '1a' | '1a_1' | 'success'
2020-09-17 10:00:02 | '2a' | '2a_1' | 'fail'
2020-09-17 10:00:03 | '2a' | '2a_2' | 'success'
2020-09-17 10:00:03 | '3a' | '3a_1' | 'interrupted'
2020-09-18 13:00:04 | '4a' | '4a_1' | 'fail'
I want to resample data by day with counting status types by condition in sessions (not tries).
I can resample it by tries easyly like this:
df['date'] = df['datetime'].dt.date
df['ones'] = np.ones(df.shape[0])
piv = df.pivot_table(index='date', columns='status', values='ones', aggfunc=len).fillna(0)
And have:
day | success | fail | interrupted
2020-09-17 | 2 | 2 | 1
2020-09-18 | 0 | 1 | 0
But I want to aggregate it by session with condition no matter how much tries in session.:
- if 'success' status in session tries, then success +1, fail +0, interrupted +0;
- if 'interrupted' status in session AND NO 'success' status in session, then success +0, fail +0, interrupted +1;
- if NO 'interrupted' AND 'success' statuses in session, then then success +0, fail +1, interrupted +0.
So I shoul get something like this:
day | success | fail | interrupted
2020-09-17 | 2 | 0 | 1
2020-09-18 | 0 | 1 | 0
I stuck with function and all I come up with ends with "ValueError: The truth value of a Series is ambiguous". I will be very greatfull for any ideas.
My idea is convert values of statust to ordered categories, sorting and get row only with most important value which is defined in list passed to parameter
categories
: