I have a data frame with several factor columns containing NaN
's that I would like to convert to NA
's (the NaN
seems to be a problem for using linear regression objects to predict on new data).
> tester1 <- c("2", "2", "3", "4", "2", "3", NaN)
> tester1
[1] "2" "2" "3" "4" "2" "3" "NaN"
> tester1[is.nan(tester1)] = NA
> tester1
[1] "2" "2" "3" "4" "2" "3" "NaN"
> tester1[is.nan(tester1)] = "NA"
> tester1
[1] "2" "2" "3" "4" "2" "3" "NaN"
Here's the problem: Your vector is character in mode, so of course it's "not a number". That last element got interpreted as the string "NaN". Using
is.nan
will only make sense if the vector is numeric. If you want to make a value missing in a character vector (so that it gets handle properly by regression functions), then use (without any quotes),NA_character_
.Neither "NA" nor "NaN" are really missing in character vectors. If for some reason there were values in a factor variable that were "NaN" then you would have been able just use logical indexing:
That last result might be surprising. There is a remaining "NaN" level but none of elements is "NaN". Instead the element that was "NaN" is now a real missing value signified in print as .