I have a categorical variable with over 1000 levels. I want to group levels together so that I can reduce the dimensionality and just have 5 general level. I want to take the group names and group similar values together.
For example, all levels that contain the word "immune" I want to group into a new group called "immune group". All levels that contain the word "eyes" I want to group into a new group called "eye group", etc.
I've tried str_detect and grepl with little success in R . Any other methods that could efficiently do this?
maybe using
case_when
from dplyr withstr_detect
. But it would help to have a reproductible example