EDIT: Ok, it has something to do with the data.all.filtered
datatype.
The filtered datatype gets created from data.all.raw
which works fine with any lapply
below. The weird thing is that I can't find out how do the two differ...
data.selectedFeatures <- sapply(data.train.raw, FUN = sf.getGoodFeaturesVector, treshold = 5)
data.train.filtered <- lapply(seq(1, 8), FUN = function(i) sf.filterFeatures(data.train.raw[[i]], data.selectedFeatures[[i]]))
st.testFeature <- function(featureVector, treshold) {
if(!is.numeric(featureVector)) {return(T)}
numberOfNonZero <- sum(featureVector > 0)
numberOfZero <- length(featureVector) - numberOfNonZero
return(min(numberOfNonZero, numberOfZero) >= treshold)
}
sf.getGoodFeaturesVector <- function(data, treshold) {
selectedFeatures <- sapply(data, FUN = st.testFeature, treshold <- treshold)
whitelistedFeatures <- names(data) %in% c("id", "tp")
return(selectedFeatures | whitelistedFeatures)
}
sf.filterFeatures <- function(data, selectedFeatures) {
return(data[, selectedFeatures])
}
Any idea what am I doing wrong when manipulating the data that causes subsequent lapply
to not to work?
Original post:
I have a list of datasets called data.train.filtered
and want to get a list of models (for predicting a feature called tp
) trained by rplot on them. The easiest solution I could think of was using lapply
but it doesn't work for some reason.
lapply(data.train.filtered, function(dta) rpart(tp ~ ., data = dta))
Error in terms.formula(formula, data = data) :
'.' in formula and no 'data' argument
The problem is probably not in the data as using it just for one (any) dataset works fine:
rpart(tp ~ ., data = data.train.filtered[[1]])
Even though accessing just one dataset via index works fine (as shown above) using lapply trough indexes fails just the same way the first example did.
lapply(1:8, function(i) rpart(tp ~ ., data = data.train.filtered[[i]]))
Error in terms.formula(formula, data = data) :
'.' in formula and no 'data' argument
The traceback for the index version is following:
10 terms.formula(formula, data = data)
9 terms(formula, data = data)
8 model.frame.default(formula = tp ~ ., data = data.train.filtered[[i]],
na.action = function (x)
{
Terms <- attr(x, "terms") ...
7 stats::model.frame(formula = tp ~ ., data = data.train.filtered[[i]],
na.action = function (x)
{
Terms <- attr(x, "terms") ...
6 eval(expr, envir, enclos)
5 eval(expr, p)
4 eval.parent(temp)
3 rpart(tp ~ ., data = data.train.filtered[[i]])
2 FUN(X[[i]], ...)
1 lapply(1:8, function(i) rpart(tp ~ ., data = data.train.filtered[[i]]))
I'm quite sure I'm missing something extremely trivial here but being quite new to R I just can't find the problem.
PS: I know that I could iterate trough all the datasets via for loop but that feels really dirty and I'd prefer an R idiomatic solution.
Ok, I finally managed to find the answer. The problem was that
data.train.all
was actually not what I thought it was. I had an error in the filtering process which corrupted (silently, thanks R) everything.The fix was to use:
instead of
Thanks for all the other answers, though.