Recent development in recoding repeated variables in R?

178 views Asked by At

So I have a long dataset of sequence. Every column (from t1 to t...n) has the same levels or categories. There are more than 200 categories or levels and 144 column (variables) in total.

 id    t1        t2        t3             t...n
"1"   "eating"  "tv"      "conversation" "..."
"2"   "sleep"   "driving" "relaxing"     "..."
"3"   "drawing" "kissing" "knitting"     "..."
"..." "..."     "..."     "..."          "..."

Variables t1 has the same levels has t2 and so on. What I need is a loop-style recoding for each column (but avoiding to loop).

I would like to avoid the usual

seq$t1[seq$t1== "drawing"] <- 'leisure'
seq$t1[seq$t1== "eating"] <- 'meal'
seq$t1[seq$t1== "sleep"] <- 'personal care' 
seq$t1[seq$t1== "..."] <- ... 

The most convenient recoding style would be something like

c('leisure') = c('drawing', 'tv', ...) 

That would help me to better cluster variables into bigger categories.

Is there some new and easier recoding methods in R that appeared lately ? What would you advise me to use ?

This is a sample of my real dataset, 5 repeated observations (in column) for 10 respondents (in rows).

dtaSeq = structure(c("Wash and dress", "Eating", "Various arrangements",     "Cleaning dwelling", "Ironing", "Activities related to sports", 
 "Eating", "Eating", "Other specified construction and repairs", 
"Other specified physical care & supervision of a child", "Wash and dress", 
"Filling in the time use diary", "Food preparation", "Wash and dress", 
"Ironing", "Travel related to physical exercise", "Eating", "Eating", 
"Other specified construction and repairs", "Other specified physical care & supervision of a child", 
"Wash and dress", "Filling in the time use diary", "Food preparation", 
"Wash and dress", "Food preparation", "Wash and dress", "Eating", 
"Eating", "Other specified construction and repairs", "Other specified     physical care & supervision of a child", 
"Wash and dress", "Filling in the time use diary", "Baking", 
"Teaching the child", "Food preparation", "Wash and dress", "Eating", 
"Eating", "Other specified construction and repairs", "Other specified physical care & supervision of a child", 
"Dish washing", "Unspecified TV watching", "Reading periodicals", 
"Teaching the child", "Food preparation", "Reading periodicals", 
"Eating", "Eating", "Other specified construction and repairs", 
"Feeding the child", "Laundry", "Unspecified TV watching", "Cleaning dwelling", 
"Teaching the child", "Eating", "Eating", "Eating", "Eating", 
"Other specified construction and repairs", "Feeding the child"), 
.Dim = c(10L, 6L), .Dimnames = list(c("1", "2", "3", "4", 
"5", "6", "7", "8", "9", "10"), c("act1.050", "act1.051", "act1.052", 
"act1.053", "act1.054", "act1.055")))
2

There are 2 answers

1
Ben Bolker On BEST ANSWER

You don't seem to have fully specified recoding rules for your real data, so I made some up:

recodes <- list("meals"=c("Eating"),
                "leisure"=c("Reading Periodicals",
                             "Unspecified TV watching"),
                "child care"=c("Feeding the child","Teaching the child"),
                "house care"=c("Food preparation","Dish washing",
                                "Cleaning dwelling","Ironing"))

Here's a general-purpose recoding function. car::recode does work, but I find it a little clumsy. There's also plyr::revalue, but it's one-to-one, not many-to-one.

recodeFun <- function(x) {
    for (i in seq_along(recodes)) {
        x[x %in% recodes[[i]]] <- names(recodes)[i]
           }
           return(x)
}
d2 <- recodeFun(dtaSeq)
1
Daniel On

As far as I know, the car package can handle strings or characters in its recode-function, but I'm not sure. An alternative could be the sjmisc-package, making a detour by converting the strings to numeric values and set back value labels later:

library(sjmisc)
dtaSeq <- as.data.frame(dtaSeq)
# convert to values
dtaSeq.values <- to_value(dtaSeq)
# random recode example, use your own values for clustering here
dtaSeq.values <- rec(dtaSeq.values, "1:3=1; 4:6=2; else=3")
# set value labels, these will be added as attributes
dtaSeq.values <- set_val_labels(dtaSeq.values, c("meal", "leisure", "personal care"))
# replace numeric values with assicated label attributes
dtaSeq.values <- to_label(dtaSeq.values)

Result:

> head(dtaSeq.values)
       act1.050      act1.051 act1.052      act1.053      act1.054      act1.055
1 personal care personal care  leisure personal care          meal       leisure
2          meal          meal     meal          meal personal care personal care
3 personal care          meal     meal          meal       leisure          meal
4          meal personal care  leisure personal care personal care       leisure
5       leisure       leisure     meal       leisure       leisure          meal
6          meal personal care  leisure personal care       leisure          meal

An advantage of the sjmisc-recode function is, if you have a data frame with variables of similar "structure", you can recode the complete data frame just with one call to rec.

Does this help you?