How can I keep string data with aggregation when resampling data?

64 views Asked by At

I'd like to average my minute data to 10-min averages of those 10 timepoints, however, I have important data in columns.

This is the first thing I tried df1.resample('10min', on='Date_Time').mean(d).dropna()

I used dropna() to remove the added rows of 10-min data by the resample function. Using the dropna() globally however, results in this results in lots of missing data.

I have tried:

numerics = (df1.select_dtypes(include=['datetime','number']).resample('10min', on='Date_Time')).mean(d).dropna()

strings = df1.select_dtypes(include=['datetime','object']).resample("10min", on='Date_Time').first().dropna()

df_10m = numerics.join(strings)

this is the top 5 rows of the df

{'Date_Time': {0: Timestamp('2020-02-04 10:00:00'), 1: Timestamp('2020-02-04 10:01:00'), 2: Timestamp('2020-02-04 10:02:00'), 3: Timestamp('2020-02-04 10:03:00'), 4: Timestamp('2020-02-04 10:04:00')}, 'Tg (ºC)': {0: 35.58856153846154, 1: 35.72681666666667, 2: 35.94865000000001, 3: 36.24185000000001, 4: 36.51074166666667}, 'Tg (C) HSRI': {0: 46.03922307692308, 1: 46.179199999999994, 2: 46.330241666666666, 3: 46.40623333333333, 4: 46.40038333333332}, 'Ta (C)': {0: 31.03922307692308, 1: 31.179199999999998, 2: 31.330241666666666, 3: 31.406233333333333, 4: 31.400383333333327}, 'RH (%)': {0: 35.54615384615385, 1: 35.074999999999996, 2: 35.083333333333336, 3: 35.208333333333336, 4: 35.25833333333333}, 'WS': {0: 2.3515384615384614, 1: 2.3275, 2: 2.454166666666666, 3: 2.3666666666666667, 4: 2.5333333333333328}, 'Tr (C)': {0: 86.03059100103826, 1: 85.92113760389992, 2: 87.10761095323471, 3: 86.41824958826783, 4: 87.81529421945606}, 'm': {0: 80, 1: 80, 2: 80, 3: 80, 4: 80}, 'Ht': {0: 1.8, 1: 1.8, 2: 1.8, 3: 1.8, 4: 1.8}, 'PB': {0: 101.9, 1: 101.9, 2: 101.9, 3: 101.9, 4: 101.9}, 'vcomb': {0: 2.651538461538461, 1: 2.6275, 2: 2.754166666666666, 3: 2.6666666666666665, 4: 2.8333333333333326}, 'vself': {0: 0.3, 1: 0.3, 2: 0.3, 3: 0.3, 4: 0.3}, 'VO2': {0: 14, 1: 14, 2: 14, 3: 14, 4: 14}, 'VO2.1': {0: 1.12, 1: 1.12, 2: 1.12, 3: 1.12, 4: 1.12}, 'RER': {0: 0.8, 1: 0.8, 2: 0.8, 3: 0.8, 4: 0.8}, 'Tcl': {0: 36, 1: 36, 2: 36, 3: 36, 4: 36}, 'Tsk': {0: 36, 1: 36, 2: 36, 3: 36, 4: 36}, 'ε': {0: 0.95, 1: 0.95, 2: 0.95, 3: 0.95, 4: 0.95}, 'Ar/AD': {0: 0.35, 1: 0.35, 2: 0.35, 3: 0.35, 4: 0.35}, 'Icl': {0: 1.03, 1: 1.03, 2: 1.03, 3: 1.03, 4: 1.03}, 'Re,cl': {0: 0.023, 1: 0.023, 2: 0.023, 3: 0.023, 4: 0.023}, 'AD': {0: 1.9917607971689137, 1: 1.9917607971689137, 2: 1.9917607971689137, 3: 1.9917607971689137, 4: 1.9917607971689137}, 'AD:m': {0: 248.9700996461142, 1: 248.9700996461142, 2: 248.9700996461142, 3: 248.9700996461142, 4: 248.9700996461142}, 'Ta': {0: 304.18922307692304, 1: 304.32919999999996, 2: 304.48024166666664, 3: 304.5562333333333, 4: 304.5503833333333}, 'Pa': {0: 1.600406759479592, 1: 1.5918245414256496, 2: 1.60593338043295, 3: 1.6186268793904253, 4: 1.6203871372827314}, 'Pa.1': {0: 12.00305069609694, 1: 11.938684060692372, 2: 12.044500353247125, 3: 12.139701595428189, 4: 12.152903529620486}, 'Psa': {0: 4.5023345321866906, 1: 4.538345093159372, 2: 4.577482319523848, 3: 4.5972834444225095, 4: 4.595756475394181}, 'PB.1': {0: 764.25, 1: 764.25, 2: 764.25, 3: 764.25, 4: 764.25}, 'Abs Humid': {0: 0.01141684979152761, 1: 0.011350403625066736, 2: 0.011445325373048706, 3: 0.011532912296143744, 4: 0.011545676119064242}, 'v': {0: 2.651538461538461, 1: 2.6275, 2: 2.754166666666666, 3: 2.6666666666666665, 4: 2.8333333333333326}, 'M (W)': {0: 376.959744, 1: 376.959744, 2: 376.959744, 3: 376.959744, 4: 376.959744}, 'M(w/m2)': {0: 189.25954589316655, 1: 189.25954589316655, 2: 189.25954589316655, 3: 189.25954589316655, 4: 189.25954589316655}, 'H (W)': {0: 376.959744, 1: 376.959744, 2: 376.959744, 3: 376.959744, 4: 376.959744}, 'H (W/m2)': {0: 189.25954589316655, 1: 189.25954589316655, 2: 189.25954589316655, 3: 189.25954589316655, 4: 189.25954589316655}, 'fcl': {0: 1.3193000000000001, 1: 1.3193000000000001, 2: 1.3193000000000001, 3: 1.3193000000000001, 4: 1.3193000000000001}, 'hc': {0: 10.429773835114272, 1: 10.372937644718489, 2: 10.670143570681152, 3: 10.465437098406353, 4: 10.853123250589423}, 'hr': {0: 2.8152285937257235, 1: 2.8138458658081724, 2: 2.8288587842794275, 3: 2.820129539046607, 4: 2.8378387547428283}, 'h': {0: 13.245002428839996, 1: 13.186783510526661, 2: 13.499002354960579, 3: 13.285566637452959, 4: 13.690962005332251}, 'to': {0: 42.727652144400366, 1: 42.86024221094555, 2: 43.01897961346187, 3: 43.083643584372176, 4: 43.09396778675954}, 'σ': {0: 5.67e-08, 1: 5.67e-08, 2: 5.67e-08, 3: 5.67e-08, 4: 5.67e-08}, 'Rcl': {0: 0.15965000000000001, 1: 0.15965000000000001, 2: 0.15965000000000001, 3: 0.15965000000000001, 4: 0.15965000000000001}, 'C + R (W)': {0: -61.7854635996826, 1: -62.92983381965766, 2: -64.78260666774776, 3: -65.10727499145821, 4: -65.71445913620975}, 'W/m2)': {0: -31.020523994399518, 1: -31.595076029765245, 2: -32.52529458348094, 3: -32.688300263767424, 4: -32.99314818808373}, 'Ps,sk': {0: 5.94114568997784, 1: 5.94114568997784, 2: 5.94114568997784, 3: 5.94114568997784, 4: 5.94114568997784}, 'he': {0: 172.09126827938547, 1: 171.15347113785506, 2: 176.057368916239, 3: 172.67971212370483, 4: 179.07653363472548}, 'LR': {0: 16.5, 1: 16.5, 2: 16.5, 3: 16.5, 4: 16.5}, 'λ': {0: 2427, 1: 2427, 2: 2427, 3: 2427, 4: 2427}, 'Ereq': {0: 409.36381964806634, 1: 410.5260936917001, 2: 412.55058725207795, 3: 412.99813912411855, 4: 413.6137153206719}, 'Emax': {0: 283.93655875039144, 1: 284.24761905934344, 2: 284.6055046911687, 3: 282.899686582697, 4: 284.41267301639743}, 'Ereq.1': {0: 205.5286057592536, 1: 206.1121467373097, 2: 207.12858082078773, 3: 207.353282437908, 4: 207.66234374558525}, 'Emax.1': {0: 142.55555142664647, 1: 142.71172495380603, 2: 142.89140799221803, 3: 142.03497075793953, 4: 142.79459331695918}, 'ωreq': {0: 1.4417439636856977, 1: 1.4442551710731937, 2: 1.4495523819883425, 3: 1.4598748556880825, 4: 1.4542731550391408}, 'H.S.I.': {0: 144.17439636856977, 1: 144.42551710731937, 2: 144.95523819883425, 3: 145.98748556880824, 4: 145.4273155039141}, 'r': {0: 0.5, 1: 0.5, 2: 0.5, 3: 0.5, 4: 0.5}, 'Sreq': {0: 411.0572115185072, 1: 412.2242934746194, 2: 414.25716164157546, 3: 414.706564875816, 4: 415.3246874911705}, 'Sreq.1': {0: 1214.4291312179967, 1: 1217.8771629914465, 2: 1223.8830771384264, 3: 1225.210795918275, 4: 1227.0369799377163}, 'Slimit': {0: 1214.4291312179967, 1: 1217.8771629914465, 2: 1223.8830771384264, 3: 1225.210795918275, 4: 1227.0369799377163}, 'Cres + Eres': {0: 29.381387951616322, 1: 29.36348412795755, 2: 29.19176341566985, 3: 29.06887986733963, 4: 29.060487815537876}, 'Cres + Eres.1': {0: 14.75146412831249, 1: 14.742475185622073, 2: 14.656259655859772, 3: 14.594563719025947, 4: 14.590350335665013}, 'Heat': {0: 0.0, 1: 0.0, 2: 0.0, 3: 0.0, 4: 0.0}, 'Sweat Rate': {0: 2.1442913121799667, 1: 2.1787716299144644, 2: 2.2388307713842637, 3: 2.2521079591827498, 4: 2.270369799377163}, 'Black Globe Temp': {0: 6.539223076923079, 1: 6.6791999999999945, 2: 6.830241666666666, 3: 6.906233333333333, 4: 6.900383333333323}, 'HSS Score': {0: 8.683514389103046, 1: 8.85797162991446, 2: 9.069072438050929, 3: 9.158341292516083, 4: 9.170753132710487}, 'Sunny Factor': {0: 4, 1: 4, 2: 4, 3: 4, 4: 4}, 'Td (C)': {0: 18.14845384615385, 1: 18.194199999999995, 2: 18.346908333333335, 3: 18.4479, 4: 18.452049999999993}, 'WBGT (C)': {0: 26.6959487578496, 1: 26.780697622069766, 2: 26.9482556650566, 3: 27.051211098622655, 4: 27.052240739589937}, 'e (hPa)': {0: 13.121397641817339, 1: 13.135092168116461, 2: 13.343533435258518, 3: 13.495869716597092, 4: 13.506929744503669}, 'Category': {0: 'High Risk', 1: 'High Risk', 2: 'High Risk', 3: 'High Risk', 4: 'High Risk'}, 'Stadium': {0: 'WACA', 1: 'WACA', 2: 'WACA', 3: 'WACA', 4: 'WACA'}, 'City': {0: 'Perth', 1: 'Perth', 2: 'Perth', 3: 'Perth', 4: 'Perth'}, 'State': {0: 'WA', 1: 'WA', 2: 'WA', 3: 'WA', 4: 'WA'}, 'BOM_station_id': {0: 9225, 1: 9225, 2: 9225, 3: 9225, 4: 9225}, 'BOM_station': {0: 'Perth Metro', 1: 'Perth Metro', 2: 'Perth Metro', 3: 'Perth Metro', 4: 'Perth Metro'}, 'BOM2_station_id': {0: 9021, 1: 9021, 2: 9021, 3: 9021, 4: 9021}, 'BOM2_station': {0: 'Perth Airport', 1: 'Perth Airport', 2: 'Perth Airport', 3: 'Perth Airport', 4: 'Perth Airport'}, 'Measurement_device': {0: 'EMU', 1: 'EMU', 2: 'EMU', 3: 'EMU', 4: 'EMU'}, 'Longitude': {0: 115.880637, 1: 115.880637, 2: 115.880637, 3: 115.880637, 4: 115.880637}, 'Latitude': {0: -31.959454, 1: -31.959454, 2: -31.959454, 3: -31.959454, 4: -31.959454}, 'Day': {0: 'Day 1', 1: 'Day 1', 2: 'Day 1', 3: 'Day 1', 4: 'Day 1'}, 'measurement_type': {0: 'EMU', 1: 'EMU', 2: 'EMU', 3: 'EMU', 4: 'EMU'}, 'WS_conv': {0: 2.3515384615384614, 1: 2.3275, 2: 2.454166666666666, 3: 2.3666666666666667, 4: 2.5333333333333328}, 'WS_unconverted': {0: 2.3515384615384614, 1: 2.3275, 2: 2.454166666666666, 3: 2.3666666666666667, 4: 2.5333333333333328}, 'Numeric_Category': {0: 3, 1: 3, 2: 3, 3: 3, 4: 3}, 'venue_type': {0: 'Major', 1: 'Major', 2: 'Major', 3: 'Major', 4: 'Major'}, 'time': {0: datetime.time(10, 0), 1: datetime.time(10, 1), 2: datetime.time(10, 2), 3: datetime.time(10, 3), 4: datetime.time(10, 4)}}

However, this too seems to result in lost data.

1

There are 1 answers

1
jezrael On

You can use this alternative with custom aggregate function mean for numeric else first value if exist. Last remove only rows with all missing values by DataFrame.dropna with how='all' parameter:

f = lambda x: x.mean() if np.issubdtype(x.dtype, np.number) else next(iter(x), None)
out = df1.resample('10min', on='Date_Time').agg(f).dropna(how='all')