Extract data from a dataframe

Asked by At

I have a list based upon which I want to retrieve data from a dataset.

Here is the list:

packed = [1, 5, 8, 2, 3, 3, 7, 3, 7, 7, 4, 6, 3]

and here is the dataset

enter image description here

There are two items with multiple quantity i.e. 3 and 7

I want to extract those rows which are not in packed list. In this case its two times 7(rest 3 are in the list already) How can I do that? I tried this but this doesn't work

new_df= data[~data["Pid"].isin(packed)].reset_index(drop=True)

1 Answers

1
jezrael On Best Solutions

Use GroupBy.cumcount with helper DataFrame, merge with left join and indicator=True and last filter by boolean indexing:

packed = [1, 5, 8, 2, 3, 3, 7, 3, 7, 7, 4, 6, 3]
df1 = pd.DataFrame({'Pid':packed})
df1['g'] = df1.groupby('Pid').cumcount()
print (df1)
    Pid  g
0     1  0
1     5  0
2     8  0
3     2  0
4     3  0
5     3  1
6     7  0
7     3  2
8     7  1
9     7  2
10    4  0
11    6  0
12    3  3

data['g'] = data.groupby('Pid').cumcount()
new_df = data[data.merge(df1, indicator=True, how='left')['_merge'].eq('left_only')]