I have
import pandas as pd
import numpy as np
df = pd.DataFrame({"x": ["red", "blue", np.nan, np.nan, np.nan, np.nan, np.nan, ],
"y": [np.nan, np.nan, np.nan, 'cold', 'warm', np.nan, np.nan, ],
"z": [np.nan, np.nan, np.nan, np.nan, np.nan, 'charm', 'strange'],
}).astype("category")
giving
x y z
0 red NaN NaN
1 blue NaN NaN
2 NaN NaN NaN
3 NaN cold NaN
4 NaN warm NaN
5 NaN NaN charm
6 NaN NaN strange
I would like to add a new categorical column with unordered values red,blue,hot,cold,warm, charm, strange, filled in appropriately. I have many such columns, not just three.
Some possiblities:
astype(str)
and concatenating and then re-creating a categorical- creating a new categorical type using
union_categoricals
and then cast each column to that type? and then seriallyfillna()
them?
I can't make those or anything else work.
Notes:
using .astype(pd.CategoricalDtype(ordered=True))
in place of .astype("category")
in defining df
also works with the answer below.
New Solution
For the purpose of using for a large datasets, the following solution may be more efficient:
Edited answer
As specified by the OP, in case there are rows where all values are
np.NaN
we could try the following solution: