Python timezone aware to local string (ditching UTC offset)

597 views Asked by At

I have incoming strings that are timezone aware UTC formatted, such as:

'2014-11-25 01:01:00+00:00'

and wish to show this in native localized timezone - WITHOUT the UTC offset bit at the end.

for instance, the above example, for US/Eastern should display as:

'2014-11-24 20:01:00'

Now, I've made a little method that will take an input string and do this, spitting back the value I desire. However, it seems to be horribly inefficient. I'm using pandas for data manipulation and this method gets applied to a whole column of timeseries string data in the above string format. Calling the apply method via interactive shell finished execution in ~2sec, but strangely, letting the code run as compiled/interpreted on the same dataframe takes more like 15-20 seconds. Why is that? This is how I'm calling it for the dataframe/series:

df['created_at'] = df['created_at'].apply(timeremap)

I am self-taught & clearly not the best programmer. Please tell me what I can do to streamline this process. There appears to be 5000 ways of converting time in python judging from google searches. I am open to any module/package, but preferably would love this to be done in existing stock python or pandas. What is "The Right Way" to do this?

Here's my little doodle:

from pandas.tseries.tools import parse_time_string
from pytz import timezone
import calendar
import datetime

def timeremap(intimestr, tz=timezone('US/Eastern')):
    temp = parse_time_string(intimestr)[0]
    loc = temp.astimezone(tz)
    return str(dt(ut(loc)))

def dt(u):
    return datetime.datetime.utcfromtimestamp(u)
def ut(d):
    return calendar.timegm(d.timetuple())
1

There are 1 answers

0
jfs On

If you are given a csv file with data indexed by time:

time,value
2015-11-01 08:30:00+03:00,0
2015-11-01 08:45:00+03:00,1
2015-11-01 09:00:00+03:00,2
2015-11-01 09:15:00+03:00,3
2015-11-01 09:30:00+03:00,4
2015-11-01 09:45:00+03:00,5
2015-11-01 10:00:00+03:00,6
2015-11-01 10:15:00+03:00,7

You could use read_csv() to read it and parse the time string, and tz_convert() to convert input to the destination timezone:

#!/usr/bin/env python
import sys
import pandas
import pytz

filename = 'dataframe'
local_tz = pytz.timezone('America/New_York')

df = pandas.read_csv(filename, parse_dates=True, index_col=0)
df.index = df.index.tz_localize(pytz.utc).tz_convert(local_tz)
df.head().to_csv(sys.stdout)
df.head().to_csv(sys.stdout, date_format='%Y-%m-%d %H:%M:%S')

Here's each index value initially is stored as utc time with no associated timezone (before the conversion):

print(repr(df.index[0]))
# -> Timestamp('2015-11-01 05:30:00', tz=None)

Or you could convert the time during reading:

from dateutil.parser import parse

def parse_datetime(time_string, tz=local_tz):
    return tz.normalize(parse(time_string).astimezone(tz))

df = pandas.read_csv(filename, date_parser=parse_datetime, index_col=0)
df.head().to_csv(sys.stdout)
df.head().to_csv(sys.stdout, date_format='%Y-%m-%d %H:%M:%S')

Here's each index value has the associated timezone:

print(repr(df.index[0]))
# -> Timestamp('2015-11-01 01:30:00-0400', tz='America/New_York')

Output

time,value
2015-11-01 01:30:00-04:00,0
2015-11-01 01:45:00-04:00,1
2015-11-01 01:00:00-05:00,2
2015-11-01 01:15:00-05:00,3
2015-11-01 01:30:00-05:00,4
time,value
2015-11-01 01:30:00,0
2015-11-01 01:45:00,1
2015-11-01 01:00:00,2
2015-11-01 01:15:00,3
2015-11-01 01:30:00,4

Both methods produce the same output.

Notice: how date_format is used to "ditch" utc offset that disambiguates the time strings (there is end-of-DST transition on 1st Nov 2015 in America/New_York timezone).