So I have a long dataset of sequence. Every column (from t1 to t...n) has the same levels or categories. There are more than 200 categories or levels and 144 column (variables) in total.
id t1 t2 t3 t...n
"1" "eating" "tv" "conversation" "..."
"2" "sleep" "driving" "relaxing" "..."
"3" "drawing" "kissing" "knitting" "..."
"..." "..." "..." "..." "..."
Variables t1 has the same levels has t2 and so on. What I need is a loop-style recoding for each column (but avoiding to loop).
I would like to avoid the usual
seq$t1[seq$t1== "drawing"] <- 'leisure'
seq$t1[seq$t1== "eating"] <- 'meal'
seq$t1[seq$t1== "sleep"] <- 'personal care'
seq$t1[seq$t1== "..."] <- ...
The most convenient recoding style would be something like
c('leisure') = c('drawing', 'tv', ...)
That would help me to better cluster variables into bigger categories.
Is there some new and easier recoding methods in R that appeared lately ? What would you advise me to use ?
This is a sample of my real dataset, 5 repeated observations (in column) for 10 respondents (in rows).
dtaSeq = structure(c("Wash and dress", "Eating", "Various arrangements", "Cleaning dwelling", "Ironing", "Activities related to sports",
"Eating", "Eating", "Other specified construction and repairs",
"Other specified physical care & supervision of a child", "Wash and dress",
"Filling in the time use diary", "Food preparation", "Wash and dress",
"Ironing", "Travel related to physical exercise", "Eating", "Eating",
"Other specified construction and repairs", "Other specified physical care & supervision of a child",
"Wash and dress", "Filling in the time use diary", "Food preparation",
"Wash and dress", "Food preparation", "Wash and dress", "Eating",
"Eating", "Other specified construction and repairs", "Other specified physical care & supervision of a child",
"Wash and dress", "Filling in the time use diary", "Baking",
"Teaching the child", "Food preparation", "Wash and dress", "Eating",
"Eating", "Other specified construction and repairs", "Other specified physical care & supervision of a child",
"Dish washing", "Unspecified TV watching", "Reading periodicals",
"Teaching the child", "Food preparation", "Reading periodicals",
"Eating", "Eating", "Other specified construction and repairs",
"Feeding the child", "Laundry", "Unspecified TV watching", "Cleaning dwelling",
"Teaching the child", "Eating", "Eating", "Eating", "Eating",
"Other specified construction and repairs", "Feeding the child"),
.Dim = c(10L, 6L), .Dimnames = list(c("1", "2", "3", "4",
"5", "6", "7", "8", "9", "10"), c("act1.050", "act1.051", "act1.052",
"act1.053", "act1.054", "act1.055")))
You don't seem to have fully specified recoding rules for your real data, so I made some up:
Here's a general-purpose recoding function.
car::recode
does work, but I find it a little clumsy. There's alsoplyr::revalue
, but it's one-to-one, not many-to-one.