How to determine data point that gives error?

159 views Asked by At

I have a code in R that reads, one line at the time, through a data.frame and if a certain set of conditions is met, changes the value of one of the variables in the data.frame. In pseudo code:

for(i in 1:nrow(data)) {

 if (conditions on data[i,]) { change value } else {do nothing}

}

While the code is running, at a certain point it stops and throws the following error message: Error in if (condition : missing value where TRUE/FALSE needed

I understand that the error message means that, at a certain point, when the condition in the if statement is evaluated the result is Na rather than a TRUE or FALSE.

However, when I try the condition in R by using the value of i that is "stored" in R (and which I assume to be the row of the data set that throws the error) I get an answer of TRUE. Do I understand correctly that the value of i allows me to identify which line of the data frame is throwing the error? If not, should I look for some other way to identify which row of the data set is causing the error?

3

There are 3 answers

0
jeremycg On BEST ANSWER

As long as your for loop is not inside a function, i will be equal to the final value it hit before the error. Thus after your error:

 data[i, ]

Should give you the pathological row.

If you are running inside a function, due to scoping rules, i should die with the function. In that case, I would modify your code to print out every line (or i) until it dies:

 for(i in 1:nrow(data)) {
   print(i) #or print(data[i, ])
   if (conditions on data[i,]) { change value } else {do nothing}

}
1
drsh1 On

1) replacing values

wouldn't it be better to use replace ?

some examples here: replace function examples

in your case

 replace (df$column, your_condition, value)

2) filtering

if you're sure your data contains only TRUEs/FALSEs or NAs you can:

a) subset rows with NAs in specific column

df[(is.na(df$column)), ]

b) filter out things using filter from dplyr

library("dplyr")
filter(df, is.na(column)) # filter NAs in dplyr you don't have to use $ to specify column
filter(df, !is.na(column) & column!="FALSE") # filter everything other than NA and FALSE
filter(df, column!="TRUE" & column!="FALSE") # careful with that, won't return NAs

3) selecting row numbers

finally, when you need specific row number where NAs occur, use which

which(is.na(df$column)) # row numbers with NAs
which(df$column!="TRUE") # row numbers other than TRUEs
which(df$column!="TRUE" & df$column!="FALSE") # again, won't return NAs
1
Ben Bolker On

I think the answer is "yes"

 print(i) ## Error: doesn't exist yet
 for (i in 1:10) {
     if (i==4) stop("simulated error")
 }
 print(i)  ## 4

The try() function can also be useful. Here we make a function f that simulates the error, then use try() so that we can run all the way through the loop. We don't stopping when we hit the error, but instead fill in a value (10000 in this case) that stands for an error code. (We could also just make the error behaviour be a no-op, i.e. just go on to the next iteration of the loop; in this case that would leave an NA in the error position.)

 f <- function(x) {
     if (x==4) stop("simulated error")
     return(x)
 }
 results <- rep(NA,10)
 for (i in 1:10) {
     res <- try(f(i))
     if (is(res,"try-error")) {
        results[i] <- 10000
     } else {
        results[i] <- res
    }
 }