Let's say we have the following dataframe. Which in real case is an comparison of columns after melting, that's the reason there are mixed types.
df = pd.DataFrame({'value':[0.0, 0.0, pd.Timedelta(hours=1), pd.Timedelta(0)]})
value
0 0
1 0
2 0 days 01:00:00
3 0 days 00:00:00
What I wanted to do, is to check if this is equal to 0
and based on that make a conditional column.
So first we have a to get a boolean to mark which rows are 0.
Simply using eq
or ==
, wont work:
df['value'].eq(0)
0 True
1 True
2 False
3 False
Name: value, dtype: bool
This is because we have a Timedelta
type probably, so I thought lets convert the timedelta to seconds. So I checked with:
df['value'].apply(type) == pd._libs.tslibs.timedeltas.Timedelta
0 False
1 False
2 True
3 True
Name: value, dtype: bool
Which works.
Then used, which did not work:
np.where(df['value'].apply(type) == pd._libs.tslibs.timedeltas.Timedelta,
df['value'].total_seconds(),
df['value'])
'Series' object has no attribute 'total_seconds'
Finally, this works.
df['value'].apply(lambda x: x.total_seconds() if type(x) == pd._libs.tslibs.timedeltas.Timedelta else x).eq(0)
0 True
1 True
2 False
3 True
Name: value, dtype: bool
But it's quite slow and does not look "panda like".
So my question is, is this there a faster more optimal solution?
I will 'upgrade' the
int
totimedelta