There's probably a better way, but since I'm new to R and already had the IDW code set up, I've been trying to get the median of all points within 2000 meters by tweaking the IDW code, setting the weighting power (idp) near zero so closer points are weighted the same as far ones.
I'm guessing it says NA when I run the code below with maxdist=2000 because some points don't have any neighbors within 2000 meters. The smallest maxdist I can get it to work with is ~40,000, even if I set nmin to zero.
Is there a way to tell it to ignore points without neighbors within 2000 meters, or does someone know a better way to do this?
Here's my code:
library(gstat)
clean3145 = read.csv("clean3145.csv")
#Set up the k-fold validation
set.seed(88)
groups <- sample(1:5, nrow(clean3145), replace=TRUE)
#res=result=R=Pearson's correlation between predicted and actual arsenic concentration
MEDres<- rep(NA, 5)
r <- list()
for (k in 1:5) {
print(k)
flush.console()
train <- clean3145[groups!=k, ]
test <- clean3145[groups==k, ]
med <- gstat(formula = As1~1, locations = ~UTMNM+UTMEM, data=train, nmin=0, maxdist=40000, set=list(idp = .01))
medpred <- predict(med, test)$var1.pred
MEDres[k] <- cor(test$As1, medpred)
}
#Show the mean correlation for the 5 different training-test dataset pairs in K-fold validation
mean(MEDres)
Thanks for your help!
I can't see how your code helps answering your original question, but for a local median I would try
If a neighbourhood contains no data, you may define it by the number of nearest points,
nmax
, in which case, of course, distance is no longer controlled.