naniar::replace_with_na_all changes factor variables to integers?

990 views Asked by At

I have a dataset where some missing values are coded as -99, and tried to use the naniar function replace_with_na_all to replace those values with NA. The function does this, but it also seems to convert my factor variables to integers, thereby losing the name of the factors.

This happens whether the factor itself already has some true (NA) missing values or not, which you can see in the example below (in tibble1 the factor has a missing value from the start, in tibble2 it does not).

library(tidyverse)
library(naniar)

# Example factor with missing values
tibble1 <- tribble(
  ~x, ~y,
  "a", 1,
  -99, 2,
  "c", -99
)

tibble1$x <- as.factor(tibble1$x) 


levels(tibble1$x) <- list("A" = "a",
                          "C" = "c")

# Showing original tibble and then after replace_with_na_all is used
tibble1
tibble1 %>% naniar::replace_with_na_all(condition = ~.x == -99) 




# Example factor without missing values
tibble2 <- tribble(
  ~x, ~y,
  "a", 1,
  "b", 2,
  "c", -99
)

tibble2$x <- as.factor(tibble2$x) 


levels(tibble2$x) <- list("A" = "a",
                          "B" = "b",
                          "C" = "c")

# Showing original tibble and then after replace_with_na_all is used
tibble2
tibble2 %>% naniar::replace_with_na_all(condition = ~.x == -99)  

There is no error message, I just did not expect this behavior and can't find a reason for it (or way around it) in the documentation. Is this a bug? A feature?

Help.

1

There are 1 answers

1
mysteRious On

Is there a specific reason to use naniar, or can you use dplyr? The dplyr preserves the data types in your columns:

> dplyr::mutate_all(tibble1, funs(replace(., . == -99, NA)))
# A tibble: 3 x 2
  x         y
  <fct> <dbl>
1 a         1
2 NA        2
3 c        NA

> dplyr::mutate_all(tibble2, funs(replace(., . == -99, NA)))
# A tibble: 3 x 2
  x         y
  <fct> <dbl>
1 a         1
2 b         2
3 c        NA