Why Doesn't Pandas Produce an Ambiguous Time Error?

540 views Asked by At

In 2016 daylight savings time started at 2am on 2016-03-13 for US Eastern time, and ended at 2am on 2016-11-06. So, 2016-03-13 02:30:00 is not a valid timestamp, and 2016-11-06 01:30:00 occurred twice.

I would expect this code to throw errors about ambiguous and non-existent times, but it doesn't:

from pandas import Timestamp

no_such_time = "2016-03-13 02:30:00"
ambiguous_time = "2016-11-06 01:30:30"
est = 'US/Eastern'
utc = 'UTC'

ts1 = Timestamp(no_such_time, tz=est).tz_convert(utc)
ts2 = Timestamp(ambiguous_time, tz=est).tz_convert(utc)

Why does Pandas consider both of these to be valid times?

I'm using Pandas 0.14.1.

1

There are 1 answers

1
Stephen Rauch On BEST ANSWER

In this code:

ts1 = pd.Timestamp(no_such_time, tz=est)
ts2 = pd.Timestamp(ambiguous_time, tz=est)

pandas will convert both of these times into a timezone aware timestamp. It seems to do so without any awareness of potential problems (IE, it is very permissive). After the conversion the timestamps are already stored internally in UTC with the associated timezone. Therefore a subsequent call to tz_convert will work fine:

ts1 = pd.Timestamp(no_such_time, tz=est).tz_convert(utc)
ts2 = pd.Timestamp(ambiguous_time, tz=est).tz_convert(utc)

If you in fact want to determine if the timestamps are in error, that can be done with:

ts1 = pd.Timestamp(no_such_time).tz_localize(est)
ts2 = pd.Timestamp(ambiguous_time).tz_localize(est)

In these cases pandas will raise a NonExistentTimeError and AmbiguousTimeError respectively.