I am working with a time series of precipitation data and attempting to use the median imputation method to replace all 0
value data points with the median of all data points for the corresponding month that that 0
value was recorded.
I have two data frames, one with the original precipitation data:
> head(df.m)
prcp date
1 121.00485 1975-01-31
2 122.41667 1975-02-28
3 82.74026 1975-03-31
4 104.63514 1975-04-30
5 57.46667 1975-05-31
6 38.97297 1975-06-30
And one with the median monthly values:
> medians
Group.1 x
1 01 135.90680
2 02 123.52613
3 03 113.09841
4 04 98.10044
5 05 75.21976
6 06 57.47287
7 07 54.16667
8 08 45.57653
9 09 77.87740
10 10 103.25179
11 11 124.36795
12 12 131.30695
Below is the current solution that I have come up with utilizing the 1st answer here:
df.m[,"prcp"] <- sapply(df.m[,"prcp"], function(y) ifelse(y==0, medians$x,y))
This has not worked as it only applies the first value of the df medians$Group.1
, which is the month of January (01
). How can I get the values so that correct median will be applied from the corresponding month?
Another way I have attempted a solution is via the below:
df.m[,"prcp"] <- sapply(medians$Group.1, function(y)
ifelse(df.m[format.Date(df.m$date, "%m") == y &
df.m$prcp == 0, "prcp"], medians[medians$Group.1 == y,"x"],
df.m[,"prcp"]))
Description of the above function - this function tests and returns the amount of zeros for every month that there is a zero value in df.m[,"prcp"]
Same issue here as the 1st solution, but it does return all of the 0 values by month (if just executing the sapply()
portion).
How can I replace all 0
in df.m$prcp
with their corresponding medians from the medians
df based on the month of the data?
Apologies if this is a basic question, I'm somewhat of a newbie here. Any and all help would be greatly appreciated.
Consider merging the two dataframes by month/group and then calculating with
ifelse
: