How to transform a Dataframe into a Series with Darts including the DatetimeIndex?

1.3k views Asked by At

My Dataframe, temperature measurings over time:

[My Data1]

df.info()

 <class 'pandas.core.frame.DataFrame'>
DatetimeIndex: 17545 entries, 2020-01-01 00:00:00+00:00 to 2022-01-01 00:00:00+00:00
Data columns (total 1 columns):
 #   Column    Non-Null Count  Dtype  
---  ------    --------------  -----  
 0   T (degC)  17545 non-null  float64
dtypes: float64(1)
memory usage: 274.1 KB

After transforming the dataframe into a Time Series with

df_series = TimeSeries.from_dataframe(df)
df_series

the result looks like:

enter image description here

For this reason, I cant plot the Series.

TypeError: Plotting requires coordinates to be numeric, boolean, or dates of type numpy.datetime64, datetime.datetime, cftime.datetime or pandas.Interval. Received data of type object instead.

I expected something like this from the darts doc (https://unit8co.github.io/darts/):

enter image description here

df
    The DataFrame
time_col
    The time column name. If set, the column will be cast to a pandas DatetimeIndex.
If not set, the DataFrame index will be used. In this case the DataFrame must contain an index that is
either a pandas DatetimeIndex or a pandas RangeIndex. If a DatetimeIndex is
used, it is better if it has no holes; alternatively setting fill_missing_dates can in some casees solve
these issues (filling holes with NaN, or with the provided fillna_value numeric value, if any).

In case about the above method description I don't know why it changed my DatetimeIndex to object.

Any suggestions on that?

Thanks.

1

There are 1 answers

0
kwladyka On

I had the same issue. Darts doesn't work with datetime64[ns, utc], but works with datetime64[ns]. Darts doesn't recognise datetime64[ns, utc] as datatime type of value.

This fix it by doing datetime64[ns, utc] -> datetime64[ns]:

def set_index(df):
    df['open_time'] = pd.to_datetime(df['open_time'], infer_datetime_format=True).dt.tz_localize(None)
    df.set_index(keys='open_time', inplace=True, drop=True)
    return df