Verifying timestamps in a time series

840 views Asked by At

I am working with time series data and I would like to know if there is a efficient & pythonic way to verify if the sequence of timestamps associated to the series is valid. In other words, I would like to know if the sequence of time stamps is in the correct ascending order without missing or duplicated values.

I suppose that verifying the correct order and the presence of duplicated values should be fairly straightforward but I am not so sure about the detection of missing timestamps.

1

There are 1 answers

2
Stephen Rauch On BEST ANSWER

numpy.diff can be used to find the difference between subsequent time stamps. These diffs can then be evaluated to determine if the timestamps look as expected:

import numpy as np
import datetime as dt

def errant_timestamps(ts, expected_time_step=None, tolerance=0.02):
    # get the time delta between subsequent time stamps
    ts_diffs = np.array([tsd.total_seconds() for tsd in np.diff(ts)])

    # get the expected delta
    if expected_time_step is None:
        expected_time_step = np.median(ts_diffs)

    # find the index of timestamps that don't match the spacing of the rest
    ts_slow_idx = np.where(ts_diffs < expected_time_step * (1-tolerance))[0] + 1
    ts_fast_idx = np.where(ts_diffs > expected_time_step * (1+tolerance))[0] + 1

    # find the errant timestamps
    ts_slow = ts[ts_slow_idx]
    ts_fast = ts[ts_fast_idx]

    # if the timestamps appear valid, return None
    if len(ts_slow) == 0 and len(ts_fast) == 0:
        return None

    # return any errant timestamps
    return ts_slow, ts_fast


sample_timestamps = np.array(
    [dt.datetime.strptime(sts, "%d%b%Y %H:%M:%S") for sts in (
        "05Jan2017 12:45:00",
        "05Jan2017 12:50:00",
        "05Jan2017 12:55:00",
        "05Jan2017 13:05:00",
        "05Jan2017 13:10:00",
        "05Jan2017 13:00:00",
    )]
)

print errant_timestamps(sample_timestamps)