Having a defined set of valid values, all the pandas data frame column values out of it should be set to a given value, f.e. NaN. The values contained in the set and data frame can be assumed to be of numerical type.
Having the following valid values set and data frame:
valid = {5, 22}
df = pd.DataFrame({'a': [5, 1, 7, 22],'b': [12, 3 , 10, 9]})
a b
0 5 12
1 1 3
2 7 10
3 22 9
Setting the valid values on column a would result in:
a b
0 5 12
1 NaN 3
2 NaN 10
3 22 9
You can use
pd.Series.where:A few points to note:
pd.Series.isinwill work more efficiently with alistversus aset. See also Pandas pd.Series.isin performance with set versus array.floatsinceNaNis consideredfloat.inplace=Trueis used.