Having a defined set of valid values, all the pandas data frame column values out of it should be set to a given value, f.e. NaN
. The values contained in the set and data frame can be assumed to be of numerical type.
Having the following valid values set and data frame:
valid = {5, 22}
df = pd.DataFrame({'a': [5, 1, 7, 22],'b': [12, 3 , 10, 9]})
a b
0 5 12
1 1 3
2 7 10
3 22 9
Setting the valid values on column a
would result in:
a b
0 5 12
1 NaN 3
2 NaN 10
3 22 9
You can use
pd.Series.where
:A few points to note:
pd.Series.isin
will work more efficiently with alist
versus aset
. See also Pandas pd.Series.isin performance with set versus array.float
sinceNaN
is consideredfloat
.inplace=True
is used.