Python : convert string to datetime with optional milliseconds

350 views Asked by At

I'm using the following code to convert strings in a dataframe column to datetime and add 60 days for each row.

pd.to_datetime(df['datetime_string'], format="%Y-%m-%dT%H:%M:%S.%fZ") + timedelta(days=60)

Due to the fact that I load data from an external API and write it in df['datetime_string'], I receive different data like: "2023-11-24T09:34:18Z" "2023-11-24T09:35:19.130122Z" so sometimes I don't have the milliseconds part in the string.

Which leads to a ValueError: time data "2023-11-24T09:34:18Z" doesn't match format "%Y-%m-%dT%H:%M:%S.%fZ"

I want to fill the missing milliseconds part with .000000 and always have the %Y-%m-%dT%H:%M:%S.%fZ format.

What is the best way to do this especially when dealing with huge data ?

3

There are 3 answers

4
Keren_H On

First add a millisecond to the datetime, then convert again with the required format:

pd.to_datetime(pd.to_datetime(df['datetime_string'])+ timedelta(milliseconds=0, days=60), format="%Y-%m-%dT%H:%M:%S.%fZ")
0
Tim Biegeleisen On

You could use np.where and explicitly check for the milliseconds component:

np.where(df['datetime_string'].str.contains(r'\.\d+Z$', regex=True),
         pd.to_datetime(df['datetime_string'], format="%Y-%m-%dT%H:%M:%S.%fZ"),
         pd.to_datetime(df['datetime_string'], format="%Y-%m-%dT%H:%M:%SZ"))
    + timedelta(days=60)
0
FObersteiner On

The clean way to parse mixed ISO8601 compatible formats is to use pandas v2's format="ISO8601" keyword argument:

import pandas as pd

df = pd.DataFrame({"datetime_string": ["2023-11-24T09:34:18Z", "2023-11-24T09:35:19.130122Z"]})

df["datetime"] = pd.to_datetime(df["datetime_string"], format="ISO8601")

# gives
df
               datetime_string                         datetime
0         2023-11-24T09:34:18Z        2023-11-24 09:34:18+00:00
1  2023-11-24T09:35:19.130122Z 2023-11-24 09:35:19.130122+00:00

df.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 2 entries, 0 to 1
Data columns (total 2 columns):
 #   Column           Non-Null Count  Dtype
---  ------           --------------  -----
 0   datetime_string  2 non-null      object
 1   datetime         2 non-null      datetime64[ns, UTC]
dtypes: datetime64[ns, UTC](1), object(1)
memory usage: 164.0+ bytes