I have 2 pandas DataFrames: users and interactions.
I need to filter first so that values from users['user_id'] are in interactions['user_id']
users = users[users.user_id.isin(interactions['user_id'])]
A get such DataFrame:
Unnamed: 0 user_id age income sex kids_flg
0 0 973171 age_25_34 income_60_90 М 1
1 1 962099 age_18_24 income_20_40 М 0
3 3 721985 age_45_54 income_20_40 Ж 0
4 4 704055 age_35_44 income_60_90 Ж 0
5 5 1037719 age_45_54 income_60_90 М 0
... ... ... ... ... .. ...
818672 840184 529394 age_25_34 income_40_60 Ж 0
818674 840186 80113 age_25_34 income_40_60 Ж 0
818676 840188 312839 age_65_inf income_60_90 Ж 0
818677 840189 191349 age_45_54 income_40_60 М 1
818678 840190 393868 age_25_34 income_20_40 М 0
[566772 rows x 6 columns]
Now let's count the number of values which are not in interactions['user_id']:
print(users['user_id'].size - interactions['user_id'].unique().size)
>> 98359
print(users['user_id'].size)
>> 818683
#number of values in users['user_id']
We can notice that 818683 - 98359 != 566772
What am I doing wrong?
I don't know where problem is, can you help me?