I'm looking for a way in order to take subset with a condition on the first row obtain by a groupby
, let's see an example to be more clear:
ID Date Diff Class
0 010101 11/12/2008 NaN Good
1 010102 02/07/2008 NaN Very Good
2 010102 08/07/2008 6.0 Bad
3 010103 12/02/2009 NaN Very Bad
4 010104 07/05/2007 NaN Very Good
So I'm looking for a way to obtain the following subset, due to the first class per ID, here, Very Good
for the first time it appears:
ID Date Diff Class
010102 02/07/2008 NaN Very Good
010102 08/07/2008 6.0 Bad
010104 07/05/2007 NaN Very Good
Thanks for your help,
import pandas as pd
df = pd.DataFrame(
[
{"ID": 010101, "Date": '11/12/2008', "Class" : "Good"},
{"ID": 010102, "Date": '02/07/2008', "Class" : "Very Good"},
{"ID": 010102, "Date": '08/07/2008', "Diff": 6.0 , "Class" : "Bad"},
{"ID": 010103, "Date": '12/02/2009', "Class" : "Very Bad"},
{"ID": 010104, "Date": '07/05/2007', "Class" : "Very Good"},
],
columns=["ID", "Date", "Diff", "Class"],
)
Based on comments, it seems as though you want to filter to any ID where the first dated entry for that ID is "Very Good". I'm going to assume your dataframe is called
df
.First, we should convert your
Date
column to a properpandas
datetime, to ensure it is properly ordered (I am assuming your dates are dd/mm/yyyy here):Then, we sort by
ID
andDate
, group onID
and return the first row we get (which, due to our sorting, will be the first date for thatID
).From this, we can work out which of the first rows has a
Class
ofVery Good
, and select those ID values:And finally, we can use those ID values to filter the original dataframe.
This gives us: