Using is.na with character vector in R

40 views Asked by At

Trying to run a function that will run a calculation IF the two vectors have values (aka if they are not empty).

Basically I want the "answer" to be 3 in the example below but if VAR1 or VAR2 was empty/had no value I would want the answer to be "limited".

I am using the function is.na but it currently wont work since one of my variables is a character...

VAR1 = 1

VAR2 = "Yes"


if (!is.na(VAR1 | VAR2)) {
  Answer = Var1 + 2
} else (Answer = "limited")

GET THE ERROR

Error in VAR1 & VAR2 : 
  operations are possible only for numeric, logical or complex types
3

There are 3 answers

1
AlexB On

You can use something like:

VAR1 = 1    
VAR2 = "Yes"    

if (any(!is.na(c(VAR1, VAR2)))) {
  Answer = VAR1 + 2
} else (Answer = "limited")
0
Onyambu On

Use

Answer <- if anyNA(c(VAR1, VAR2)) 'limited' else VAR1 + 2

Answer <- if all(!is.na(c(VAR1, VAR2))) VAR1 + 2 else 'limited'

0
r2evans On

The order of operations is important here.

Your expression is.na(VAR1 | VAR2) has as its first operation VAR1 | VAR2, which is a vectorized logical "OR". | works for numbers, where 0 equates to FALSE, and anything else (negative or positive) equates to TRUE. Interestingly, 1 | NA and NA | 1 both resolve to TRUE, which is clearly not what you're going after here.

| is not defined for strings, which is why you got the error.

If you had tried with two numbers, you would have not received the error. In fact, it may have given you a completely incorrect answer:

is.na(1 | NA)
# [1] FALSE
is.na(NA | 1)
# [1] FALSE

This is also applicable with &, the vectorized logical "AND" operation.

As has been mentioned several times, you need to call is.na(.) on each variable individually, then use the | "OR" operator.

Or you can use anyNA(c(VAR1, VAR2)), though I find in this case that using c(VAR1, VAR2) might be a bit sloppy and definitely memory-inefficient: since they are not the same class, the non-strings are converted to strings (and R is not the most efficient in programming when it comes to using/storing strings, even if you don't store this intermediate value). That operation converts to anyNA(c("1", "Yes")), which gets you the correct results but with some completely unnecessary casting.

So the best way if both are length-1 vectors is is.na(VAR1) | is.na(VAR2) (as has been stated).

A note about the use of | here: using | in an if statement is bad practice unless you wrap it in some form of aggregator. The reason is that if (cond) expects the cond to always be length-1, whereas the | operator is intended to return the same length as the input vectors. In this case you may be using vectors of length-1 (so it can work), but if you ever think to extend this to (say) columns of a frame, then this will at best warn (older versions of R), likely stop/error (recent versions of R). To remedy this, use either || (length-1, employs short-circuit logic) or any(is.na(.) | is.na(.)) (or all(..), some form of aggregator).

See ?Syntax for a discussion of the order (precedence) of operators, recognizing that things inside the parens are evaluated before the wrapping function attempts to determine a result.

Another side note: having an if statement return either a number or a string is likely to get you in trouble. When you do that, then all follow-on expressions in your code/script will have to pre-check if Answer is a string or a number. If you don't, at best you'll get a clear error, at worst you'll get silently incorrect results.