I am new to stack overflow and pandas, but I appreciate this platform and have an interesting question: I have a pandas data frame that takes the NOAA rainfall data (in csv format for the hours that have rainfall, various years yet sequential, some data missing), replaces the NaNs with zeros, and makes a clean hourly data file for our water/plumbing engineers for all of the years available from NOAA (varies completely). However, the engineer would like one 8760 (the number of hours in a non-leap year) hourly data file that is the average of each hour from each of the years available from NOAA.
For example, I have hourly NOAA data from 1:00AM July 1, 1987 to 12:00AM December 31st, 2001; I make a huge hourly df, but now I need to make an annual 8760 hourly df with the average from each hour of the year (the average from January 1st at 1:00am across all years, the average from January 1st at 2:00am across all years, ..., the average from December 31st at 12:00am across all years) bearing in mind the start of the data AND the leap years! Any insight how to do this successfully?
Pandas is great for these kind of things. What you need to do is:
groupby
method to create a mapping of grouped rowsHere is a snippet which creates a dummy dataset & calculates the mean of each group: