R - changing values to labels permanently in labelled data

1.4k views Asked by At

I have worked with haven and sjlabelled to try and work with data labels included on sav files.

Here is some example data (the real data is much larger with many more variables, values, labels, etc., and all values occur numerous times):

library(sjlabelled)
col1 <- c("a", "b", "c")
col2 <- c(1, 2, 3)
df <- data.frame(col1, col2)
labels <- c("x", "y", "z")
df <- set_labels(df, col2, labels = labels)

I know I can use as_label to manipulate the data frame using labels, subsetting using these labels, etc. However, I want to replace the values with the labels because some functions/processes revert the data to values and drop the labels entirely. I haven't been able to pin down when this will occur.

Using the example data, I want the original data frame to end up as the following, but instead of defining a new data frame, to just overwrite the values with the labels:

col1 <- c("a", "b", "c")
col2 <- c("x", "y", "z") # these were the labels but are now the values
df <- data.frame(col1, col2)
2

There are 2 answers

3
JWilliman On BEST ANSWER

The get_labels(x)[x] approach can cause problems when not all labels are included as values in the dataset, or if all values are missing (which can happen in survey data).

sjlabelled::read_spss by default converts all atomic variables with value labels to factors. Given that these represent labelled categorical variables, it makes sense for the output variables to be returned as factors. All atomic variables without value labels are assumed to be continuous and return as is.

sjlabelled::copy_labels can be used to return value and variables labels when they have been dropped.

library(sjlabelled)

# Create test data
df <- data.frame(
  col1 = c("a", "b", "c"),
  col2 = c(1, 2, 3),
  col3 = c(NA, NA, NA)
)

df <- set_labels(df, col2, col3, labels = c("0" = "w", "1" = "x", "2" = "y", "3" = "z")) |>
  var_labels(
    col1 = "Var 1",
    col2 = "Var 2",
    col3 = "var 3"
  )


## Function to convert labelled variables to normal r factors
labels_to_values <- function(x, ...) {
  
  if(!is.null(attr(x, "labels"))) {
    x <- factor(x, levels = attr(x, "labels"), labels = names(attr(x, "labels")))
  }
  
  return(x)
  
}

# This approach produces incorrect results / errors
lapply(df[, 2:3], \(x) get_labels(x)[x])
#> $col2
#> [1] "w" "x" "y"
#> 
#> $col3
#> [1] NA NA NA NA

# This approach returns expected results
df <- lapply(df, labels_to_values) |>
  data.frame() |>
  copy_labels(df)  

df
#>   col1 col2 col3
#> 1    a    x <NA>
#> 2    b    y <NA>
#> 3    c    z <NA>

str(df)
#> 'data.frame':    3 obs. of  3 variables:
#>  $ col1: chr  "a" "b" "c"
#>   ..- attr(*, "label")= chr "Var 1"
#>  $ col2: Factor w/ 4 levels "w","x","y","z": 2 3 4
#>   ..- attr(*, "label")= chr "Var 2"
#>   ..- attr(*, "labels")= Named num [1:4] 0 1 2 3
#>   .. ..- attr(*, "names")= chr [1:4] "w" "x" "y" "z"
#>  $ col3: Factor w/ 4 levels "w","x","y","z": NA NA NA
#>   ..- attr(*, "label")= chr "var 3"
#>   ..- attr(*, "labels")= Named num [1:4] 0 1 2 3
#>   .. ..- attr(*, "names")= chr [1:4] "w" "x" "y" "z"

Created on 2023-10-31 with reprex v2.0.2

5
akrun On

We can use get_labels

df$col2 <- get_labels(df$col2)[df$col2]

-output

> df
  col1 col2
1    a    x
2    b    y
3    c    z