Reverse Scoring Items

34.1k views Asked by At

I have a survey of about 80 items, primarily the items are valanced positively (higher scores indicate better outcome), but about 20 of them are negatively valanced, I need to find a way to reverse score the ones negatively valanced in R. I am completely lost on how to do so. I am definitely an R beginner, and this is probably a dumb question, but could someone point me in an direction code-wise?

7

There are 7 answers

3
eipi10 On BEST ANSWER

Here's an example with some fake data that you can adapt to your data:

# Fake data: Three questions answered on a 1 to 5 scale
set.seed(1)
dat = data.frame(Q1=sample(1:5,10,replace=TRUE), 
                 Q2=sample(1:5,10,replace=TRUE),
                 Q3=sample(1:5,10,replace=TRUE))

dat
   Q1 Q2 Q3
1   2  2  5
2   2  1  2
3   3  4  4
4   5  2  1
5   2  4  2
6   5  3  2
7   5  4  1
8   4  5  2
9   4  2  5
10  1  4  2

# Say you want to reverse questions Q1 and Q3
cols = c("Q1", "Q3")

dat[ ,cols] = 6 - dat[ ,cols]

dat
   Q1 Q2 Q3
1   4  2  1
2   4  1  4
3   3  4  2
4   1  2  5
5   4  4  4
6   1  3  4
7   1  4  5
8   2  5  4
9   2  2  1
10  5  4  4

If you have a lot of columns, you can use tidyverse functions to select multiple columns to recode in a single operation.

library(tidyverse)

# Reverse code columns Q1 and Q3
dat %>% mutate(across(matches("^Q[13]"), ~ 6 - .))

# Reverse code all columns that start with Q followed by one or two digits
dat %>% mutate(across(matches("^Q[0-9]{1,2}"), ~ 6 - .))

# Reverse code columns Q11 through Q20
dat %>% mutate(across(Q11:Q20, ~ 6 - .))

If different columns could have different maximum values, you can (adapting @HellowWorld's suggestion) customize the reverse-coding to the maximum value of each column:

# Reverse code columns Q11 through Q20 
dat %>% mutate(across(Q11:Q20, ~ max(.) + 1 - .))
0
Oscar Kjell On

Another example is to use recode in library(car).

 #Example data
 data = data.frame(Q1=sample(1:5,10, replace=TRUE))

 # Say you want to reverse questions Q1
 library(car)
 data$Q1reversed <- recode(data$Q1, "1=5; 2=4; 3=3; 4=2; 5=1")
 data
0
Zoë Turner On

Just converting @eipi10's answer using tidyverse:

# Create same fake data: Three questions answered on a 1 to 5 scale
set.seed(1)
dat <- data.frame(Q1 = sample(1:5,10, replace=TRUE), 
                  Q2 = sample(1:5,10, replace=TRUE),
                  Q3 = sample(1:5,10, replace=TRUE))

# Reverse scores in the desired columns (Q2 and Q3)

dat <- dat %>% 
  mutate(Q2Reversed = 6 - Q2,
         Q3Reversed = 6 - Q3)
0
Jeff Parker On

Here is an alternative approach using the psych package. If you are working with survey data this package has lots of good functions. Building on @eipi10 data:

# Fake data: Three questions answered on a 1 to 5 scale
set.seed(1)
original_data = data.frame(Q1=sample(1:5,10,replace=TRUE), 
                 Q2=sample(1:5,10,replace=TRUE),
                 Q3=sample(1:5,10,replace=TRUE))
original_data

# Say you want to reverse questions Q1 and Q3. Set those keys to -1 and Q2 to 1.
# install.packages("psych") # Uncomment this if you haven't installed the psych package
library(psych)
keys <- c(-1,1,-1)

# Use the handy function from the pysch package
# mini is the minimum value and maxi is the maimum value
# mini and maxi can also be vectors if you have different scales
new_data <- reverse.code(keys,original_data,mini=1,maxi=5)
new_data

The pro to this approach is that you can recode your entire survey in one function. The con to this is you need a library. The stock R approach is more elegant as well.

FYI, this is my first post on stack overflow. Long time listener, first time caller. So please give me feedback on my response.

0
GSA On

Here is another attempt that will generalize to any number of columns. Let's use some made up data to illustrate the function.

# create a df
{
A = c(3, 3, 3, 3, 3, 3, 3, 3, 3, 3)
B = c(9, 2, 3, 2, 4, 0, 2, 7, 2, 8)
C = c(2, 4, 1, 0, 2, 1, 3, 0, 7, 8)

df1 = data.frame(A, B, C)
print(df1)
}
   A B C
1  3 9 2
2  3 2 4
3  3 3 1
4  3 2 0
5  3 4 2
6  3 0 1
7  3 2 3
8  3 7 0
9  3 2 7
10 3 8 8

The columns to reverse code

# variables to reverse code
vtcode = c("A", "B")

The function to reverse-code the selected columns

reverseCode <- function(data, rev){
  
  # get maximum value per desired col: lapply(data[rev], max)
  # subtract values in cols to reverse-code from max value plus 1
  data[, rev] = mapply("-", lapply(data[rev], max), data[, rev]) + 1
  
  return(data)
  
}


reverseCode(df1, vtcode)

   A  B C
1  1  1 2
2  1  8 4
3  1  7 1
4  1  8 0
5  1  6 2
6  1 10 1
7  1  8 3
8  1  3 0
9  1  8 7
10 1  2 8

This code was inspired by another response a response from @catastrophic-failure relating to subtract max of column from all entries in column R

0
J.Sabree On

The psych package has the intuitive reverse.code() function that can be helpful. Using the dataset started by @eipi10 and the same goal or reversing q1 and q2:

set.seed(1)
dat <- data.frame(q1 =sample(1:5,10,replace=TRUE), 
                 q2=sample(1:5,10,replace=TRUE),
                 q3 =sample(1:5,10,replace=TRUE))

You can use the reverse.code() function. The first argument is the keys. This is a vector of 1 and -1. -1 means that you want to reverse that item. These go in the same order as your data.

The second argument, called items, is simply the name of your dataset. That is, where are these items located?

Last, the mini and maxi arguments are the smallest and largest values that a participant could possibly score. You can also leave these arguments to NULL and the function will use the lowest and highest values in your data.

library(psych)
keys <- c(-1, 1, -1)
dat1 <- reverse.code(keys = keys, items = dat, mini = 1, maxi = 5)

dat1

Alternatively, your keys can also contain the specific names of the variables that you want to reverse score. This is helpful if you have many variables to reverse score and yields the same answer:

library(psych)
keys <- c("q1", "q3")
dat2 <- reverse.code(keys = keys, items = dat, mini = 1, maxi = 5)

dat2

Note that, after reverse scoring, reverse.code() slightly modifies the variable name to have a - behind it (i.e., q1 becomes q1- after being reverse scored).

0
Paul On

The solutions above assume wide data (one score per column). This reverse scores specific rows in long data (one score per row).

library(magrittr)
max <- 5
df <- data.frame(score=sample(1:max, 20, replace=TRUE))
df <- mutate(df, question = rownames(df))
df
df[c(4,13,17),] %<>% mutate(score = max + 1 - score)
df