Running into trouble when trying to create a new column (or columns) that separates outs contents of a column (characters, different length strings) in R. The column contains separated characters/text per observation (populated with contents of a drop down list in a questionnaire). The problem I am running into seems to be based around the fact that the values are all different lengths with different entries. Specifically, I only care about 2 possible entries, and I want to find out if those are TRUE/FALSE (or 1/0) per observation, or alternatively separate out the contents of the column into multiple columns. When i try an ifelse statement, it sometimes works but most of the time it does not. Examples below:
And return either values assigned to them "IF" they have one or both of the specific reasons listed "Disabled/unable to work|" "Lack of eligibility documentation/identification" (note, this could be in separated columns or within the same column if either one is included)
Something like this
I have tried this to create 2 binary columns if it meets the conditions, but it doesn't always return the correct value.
df %>% mutate(EligibilityDocs =
ifelse(Reason == "Lack of eligibility documentation/identification", 1, 0),
Disabled = ifelse(Reason == "Disabled/Unable to Work", 1, 0))
I have also tried separating into multiple columns, but that returns only letters in the columns, so I know its definitely not correct: Any help is greatly appreciated!
You need to find strings that contain your desired string, even if they contain other text as well. There are a few ways to do this, but I will use a
tidyverse
approach since that is what your example shows.The
stringr
package in thetidyverse
contains lots of helpful functions for parsing strings.str_detect
checks if a string contains your given string or regular expression.If you are interesting in accomplishing this in base R, you can use
grepl("Disabled/Unable to Work", df$Reason)
.