I have 2 pandas DataFrames: users and interactions.
I need to filter first so that values from users['user_id'] are in interactions['user_id']
users = users[users.user_id.isin(interactions['user_id'])]
A get such DataFrame:
        Unnamed: 0  user_id         age        income sex  kids_flg
0                0   973171   age_25_34  income_60_90   М         1
1                1   962099   age_18_24  income_20_40   М         0
3                3   721985   age_45_54  income_20_40   Ж         0
4                4   704055   age_35_44  income_60_90   Ж         0
5                5  1037719   age_45_54  income_60_90   М         0
...            ...      ...         ...           ...  ..       ...
818672      840184   529394   age_25_34  income_40_60   Ж         0
818674      840186    80113   age_25_34  income_40_60   Ж         0
818676      840188   312839  age_65_inf  income_60_90   Ж         0
818677      840189   191349   age_45_54  income_40_60   М         1
818678      840190   393868   age_25_34  income_20_40   М         0
[566772 rows x 6 columns]
Now let's count the number of values which are not in interactions['user_id']:
print(users['user_id'].size - interactions['user_id'].unique().size)
>> 98359
print(users['user_id'].size)
>> 818683
#number of values in users['user_id']
We can notice that 818683 - 98359 != 566772
What am I doing wrong?
I don't know where problem is, can you help me?