Using gsub in R for multiple changes

965 views Asked by At

I have a data.frame where I want to 'clean' the names of the columns:

>names(Data)
[1] "tBodyAcc.mean...X" 
[2] "angle.X.gravityMean."
[3] "fBodyBodyGyroJerkMag.mean.."
[4] "fBodyAccMag.meanFreq.."
             .
             .

I am using the following code:

names(Data)<-gsub('[mM]ean',' Mean ',names(Data))
names(Data)<-gsub('[Ff]req',' Frequency ',names(Data))
names(Data)<-gsub('^t','Time  ',names(Data))
names(Data)<-gsub('\\.',' ',names(Data))

to get the following:

[1] "Time  BodyAcc  Mean    X"       
[2] "angle X gravity Mean  "         
[3] "fBodyBodyGyroJerkMag  Mean   "  
[4] "fBodyAccMag  Mean  Frequency   "

Is there a way to impliment that in one line or another more elegant way than this one?

3

There are 3 answers

0
lukeA On BEST ANSWER

You could also try stri_replace_all_regex from the stringi package:

library(stringi)
stri_replace_all_regex(names(Data), c("mean", "freq", "^t", "\\."), c(' Mean ', ' Frequency ', 'Time  ', ' '), F, list(case_insensitive = TRUE))
# [1] "Time  BodyAcc  Mean    X"        "angle X gravity Mean  "         
# [3] "fBodyBodyGyroJerkMag  Mean   "   "fBodyAccMag  Mean  Frequency   "
0
G. Grothendieck On

What you have is already pretty good but the first two regular expressions could be simplified a bit using ignore.case = TRUE. Also since, except for the last one, we only want one occurrence replaced it would be better to use sub instead of gsub:

nms <- c("tBodyAcc.mean...X", "angle.X.gravityMean.", 
            "fBodyBodyGyroJerkMag.mean..", "fBodyAccMag.meanFreq..")

nms <- sub('mean', ' Mean ', nms, ignore.case = TRUE)
nms <- sub('freq', ' Frequency ', nms, ignore.case = TRUE)
nms <- sub('^t', 'Time  ', nms)
nms <- gsub('\\.', ' ', nms)
0
Thomas On

Since you need to apply every regular expression to the full vector, there's no way to do this without a loop of some kind. In the below example n is your names(Data) vector:

n <- c("tBodyAcc.mean...X", "angle.X.gravityMean.", "fBodyBodyGyroJerkMag.mean..", "fBodyAccMag.meanFreq..")
for(i in seq_along(n)) {
  p <- c('[mM]ean', '[Ff]req', '^t', '\\.')
  r <- c(' Mean ', ' Frequency ', 'Time  ', ' ')
  n <- gsub(p[i], r[i], n)
}

Result:

> n
[1] "Time  BodyAcc  Mean    X"        "angle X gravity Mean  "         
[3] "fBodyBodyGyroJerkMag  Mean   "   "fBodyAccMag  Mean  Frequency   "