Separating strings in R when they are different lengths/contents

73 views Asked by At

Running into trouble when trying to create a new column (or columns) that separates outs contents of a column (characters, different length strings) in R. The column contains separated characters/text per observation (populated with contents of a drop down list in a questionnaire). The problem I am running into seems to be based around the fact that the values are all different lengths with different entries. Specifically, I only care about 2 possible entries, and I want to find out if those are TRUE/FALSE (or 1/0) per observation, or alternatively separate out the contents of the column into multiple columns. When i try an ifelse statement, it sometimes works but most of the time it does not. Examples below:

And return either values assigned to them "IF" they have one or both of the specific reasons listed "Disabled/unable to work|" "Lack of eligibility documentation/identification" (note, this could be in separated columns or within the same column if either one is included)

Something like this

I have tried this to create 2 binary columns if it meets the conditions, but it doesn't always return the correct value.

df %>% mutate(EligibilityDocs = 
ifelse(Reason == "Lack of eligibility documentation/identification",   1, 0),
Disabled = ifelse(Reason == "Disabled/Unable to Work", 1, 0))

I have also tried separating into multiple columns, but that returns only letters in the columns, so I know its definitely not correct: Any help is greatly appreciated!

1

There are 1 answers

1
mfg3z0 On BEST ANSWER

You need to find strings that contain your desired string, even if they contain other text as well. There are a few ways to do this, but I will use a tidyverse approach since that is what your example shows.

The stringr package in the tidyverse contains lots of helpful functions for parsing strings. str_detect checks if a string contains your given string or regular expression.

df %>%
  mutate(
    EligibilityDocs = ifelse(
      str_detect(Reason, "Lack of eligibility documentation/identification"),
      1,
      0
    ),
    Disabled = ifelse(
      str_detect(Reason, "Disabled/Unable to Work"),
      1,
      0
    )
  )

If you are interesting in accomplishing this in base R, you can use grepl("Disabled/Unable to Work", df$Reason).