How to clean columns & convert to datetime using python

2.3k views Asked by At

I'm parsing my data from JSON to following DataFrame, but I'm not able to remove the extra stuff from readingtime column & convert it to datetime format

                        readingtime                      deviceId  
0  {u'$date': u'2014-11-04T17:27:50.000+0000'}           1224EG12

I tried using replace, lstring-rstring but I'm not able to replace the extra characters from thr readingtime column

da2['readingtime2'] = da2['readingtime'].str.replace('date', '') 


data['readingtime'] = data['readingtime'].map(lambda x: str(x)[13:])

Tried loc as well but not getting errors

EDITED :

I want final readingtime to be '2014-11-04 17:27:50.000 +000' which I want to convert to datetime - yyyy-mm-dd hh:mm:ss.mils +UTC

3

There are 3 answers

1
Alexander On BEST ANSWER

You can apply a lambda function to the column of the data frame, extracting the date from the dictionary via x['$date'], and then just take the date/time portion (ignoring the time offset). As this is a 'datetime naive' object, Python wouldn't know what to do with any timezone adjustment. Use this stripped date/time string (e.g. '2014-11-04T17:27:50.000') as the input to strptime.

import datetime as dt

df = pd.DataFrame({'deviceId': {0: '1224EG12', 1: '1224EG13'},
 'readingtime': {0: {u'$date': u'2014-11-04T17:27:50.000+0000'},
  1: {u'$date': u'2014-11-04T17:27:50.000+0000'}}})

>>> df
   deviceId                                  readingtime
0  1224EG12  {u'$date': u'2014-11-04T17:27:50.000+0000'}
1  1224EG13  {u'$date': u'2014-11-04T17:27:50.000+0000'}


>>> df.readingtime.apply(lambda x: dt.datetime.strptime(x['$date'][:-7], 
                                                        '%Y-%m-%dT%H:%M:%S.%f')) 
0   2014-11-04 17:27:50
1   2014-11-04 17:27:50
Name: readingtime, dtype: datetime64[ns]
2
zalkap On

try to use ast module. With ast.literal_eval() convert readingtime column into dict and then call key "$date" from the dict you've just created.

import ast

readingtime = "{u'$date': u'2014-11-04T17:27:50.000+0000'}"
da2 = ast.literal_eval(readingtime)
dat = da2['$date']

print(dat)

dat now is containing pure date string ready to be converted with datetime.

MarcinZ

2
Akshay On

Assuming that da2['reading_time] returns a dict,

da2['reading_time]['$date'] 

will return you the value i.e 2014-11-04 17:27:50.000 +000

Another approach could be:

start_index = da2['reading_time'].__str__().index(':') + 3
end_index = da2['reading_time'].__str__().index('}') - 1
date = da2['reading_time'].__str__()[start_index:end_index]