I have a txt file with METAR (weather) data. This data is recorded at an uneven interval. I'm trying to use matplotlib to make some graphs from this data. I think to do that I need to use the txt file's 'valid' time in UTC. So how would I go about indexing my pandas dataframe by the valid time and have it recognize it as the date and time in UTC?

I've tried parsing by date, but I don't think this would be the correct approach.

KORD = pd.read_table('ORD.txt',parse_dates=['valid'], delimiter=',', index_col=1)

ORD.txt looks like:

station,valid,tmpf,dwpf,relh,drct,sknt,p01i,alti,mslp,vsby,gust,skyc1,skyc2,skyc3,skyc4,skyl1,skyl2,skyl3,skyl4,wxcodes,ice_accretion_1hr,ice_accretion_3hr,ice_accretion_6hr,peak_wind_gust,peak_wind_drct,peak_wind_time,feel,metar
ORD,2011-01-30 00:51,32.00,24.08,72.24,0.00,0.00,null,30.05,1018.20,10.00,null,BKN,OVC,null,null,3800.00,5000.00,null,null,null,null,null,null,null,null,null,32.00,KORD 300051Z 00000KT 10SM BKN038 OVC050 00/M04 A3005 RMK AO2 SLP182 T00001044
ORD,2011-01-30 

I have a data frame where the index is by date, but I'm not sure it's by date and time in UTC.

The output looks like:

    station tmpf    dwpf    relh    drct    sknt    p01i    alti    mslp    vsby    ... skyl4   wxcodes ice_accretion_1hr   ice_accretion_3hr   ice_accretion_6hr   peak_wind_gust  peak_wind_drct  peak_wind_time  feel    metar
valid                                                                                   
2011-01-30 00:51    ORD 32.00   24.08   72.24   0.0 0.0 null    30.05   1018.20 10.0    ... null    null    null    null    null    null    null    null    32.00   KORD 300051Z 00000KT 10SM BKN038 OVC050 00/M04...
2011-01-30 01:51    ORD 30.92   24.98   78.35   260.0   4.0 0.00    30.04   1018.10 10.0    ... null    null    null    null    null    null    null    null    26.16   KORD 300151Z 26004KT 10SM BKN070 OVC095 

2 Answers

0
DataPsycho On Best Solutions

Hopefully I have understand your question correctly. You can read the data and use default datatime conversion.

import pandas as pd

data = pd.read_csv("datalake/ORD.txt", sep=',')    
data["valid"] = pd.to_datetime(data.valid, errors='coerce')
data = data.dropna(subset=["valid"])
data = data.set_index("valid")

Output of valid column:

data.index
DatetimeIndex(['2011-01-30 00:51:00', '2011-01-30 00:00:00'], dtype='datetime64[ns]' ...

So by default it converts to datetime 64 format for each rows.

0
Valdi_Bo On

I think that valid column is already in UTC, so there is no need to convert it.

Look at the source row you provided. It contains 2011-01-30 00:51 as valid column.

Then look at the start of METAR data: KORD 300051Z.

KORD is the aerodrome code and 300051Z contains:

  • 30 - day of month,
  • 0051 - hour and minute,
  • Z - Zulu.

So the hour / minute part in both valid column and the METAR data segment described above are equal.

Note also parse_dates=['valid'] and index_col=1 in your code sample.

They mean that:

  • valid column should be converted to DateTime (so the type is right).
  • Column No 1 (numeration starts from 0, so the column in question is valid) should be the index column.

And the output you provided confirms what I wrote above:

  • The first row (station, tmpf, ...) contains names of "regular" columns.
  • The next row contains just valid - the name of index column.