Restructure Multiple Response Survey Items

297 views Asked by At

Often, the data from multiple response survey items are structured without sufficient information to make tidying very easy. Specifically, I have a survey question in which respondents pick one or more of 8 categorical items. The resulting dataframe has up to 8 strings separated by commas. Some cells might have two, four or none of the 8 options separated by commas. The eighth item is "Other" and may be populated with custom text.

Incidentally, this is a typical format for GoogleForms multiple response data.

Below are example data. The third and last rows include a unique response for the eighth "other" option:

structure(list(actvTypes = c(NA, NA, "Data collection, Results / findings /     learnings, ate ants and milkweed", 
NA, "Discussion of our research question, Planning for data collection", 
"Data analysis, Collected data, apples are yummy")), row.names = c(NA, 
-6L), class = c("tbl_df", "tbl", "data.frame"))

I'd like to make a set of 8 new columns into which the responses are recorded as either 0 or 1. How can this be done efficiently?

I have a solution but it is cumbersome. I started by creating new columns for each of the response options:

atypes<-    c("atype1","atype2","atype3","atype4","atype5","atype6","atype7","atype8")
log[atypes]<-NA

Next, I wrote eight ifelse statements; the format for the first seven is shown below:

log$atype7<-ifelse(str_detect(log$actvTypes,"Met with non-DASA team member (not data collection)"),1,0)

For the "other" response option, I used a list of strings and a sapply solution:

alloptions<-c('Discussion of our research question' ,'Planning for data     collection' ,'Data analysis','Discussion of results | findings | learnings'     ,'Mid-course corrections to our project' ,'Collected data' ,'Met with non-DASA     team member (not data collection)' )
log$atype8<-sapply(log$actvTypes, function(x) 
    ifelse(
    any(sapply(alloptions, str_detect, string = x)==TRUE),1,0)   )

How might this coding scheme be more elegant? Perhaps sapply and using an index?

1

There are 1 answers

0
JasonAizkalns On

Depending on what you're ultimately trying to do, the following could be helpful:

library(tidyverse)

df %>%
  rownames_to_column(var = "responder") %>%
  separate_rows(actvTypes, sep = ",") %>%
  mutate(actvTypes = fct_explicit_na(actvTypes)) %>%
  count(actvTypes)

# # A tibble: 9 x 2
#   actvTypes                                 n
#   <fct>                                 <int>
# 1 " apples are yummy"                       1
# 2 " ate ants and milkweed"                  1
# 3 " Collected data"                         1
# 4 " Planning for data collection"           1
# 5 " Results / findings /     learnings"     1
# 6 Data analysis                             1
# 7 Data collection                           1
# 8 Discussion of our research question       1
# 9 (Missing)                                 3

Taking note of what this looks like right before the call to count() -- grouping up the "other" category should be trivial if you know the "non-other" categories beforehand. You may also want to look at what this looks like after the call to separate_rows().