Looking for a negative lookahead for whitespace when using the cSplit_e function in the splitstackshape package

72 views Asked by At

I'm looking to separate out a column containing multiple comma-delimited responses into multiple columns. I'm using the cSplit_e function in the splitstackshape package. Unfortunately, some items within the package contain commas within a single item, so I am trying to indicate that it should split only at commas that are not followed by spaces.

This is the syntax that I've got right now:

cSplit_e(data=df,split.col="question",sep=",",type="character")

Which takes this:

Behavior; green, pink, blue,Sleep; indigo, violet, puce

And creates separate columns for:

question_Behavior; green
question_pink
question_blue
question_Sleep; indigo
question_violet
question_puce

But I want it to split into this:

question_Behavior; green, pink, blue
question_Sleep; indigo, violet, puce

I'm not sure how to indicate within the syntax of cSplit_e that I only want it to split at the commas that are immediately followed by not-whitespace, and would appreciate assistance!

An example dataframe:

id_num <- c("1","2","3","4","5")
question <- c("Behavior; green, pink, blue,Sleep; indigo, violet, puce","Behavior; green, pink, blue","","Sleep; indigo, violet, puce","Behavior; green, pink, blue,Sleep; indigo, violet, puce")

df <- data.frame(id_num,question)
1

There are 1 answers

0
Lucca Nielsen On

If you don't mind using the tidyr package, here is a suggestion for a possible solution. Maybe it's not as elegant or simple as using this splitstackshape package, but I don't know it.

I had to remove the id_num with empty values in both answers (id = 3)

My code:

df %>%
  separate_rows(question, sep = "(?<=\\S),(?=\\S)", convert = FALSE) %>%
  separate(question, into = c("question", "response"), sep = ";", extra = "merge") %>%
  filter(!is.na(response)) %>%
  pivot_wider(names_from = question, values_from = response) %>%
  rename_all(~gsub("\\.", "_", .))

Output:

# A tibble: 4 × 3
  id_num Behavior             Sleep                  
  <chr>  <chr>                <chr>                  
1 1      " green, pink, blue" " indigo, violet, puce"
2 2      " green, pink, blue"  NA                    
3 4       NA                  " indigo, violet, puce"
4 5      " green, pink, blue" " indigo, violet, puce"