Correlate by levels of a variable in R

1.1k views Asked by At

I would like to correlate two variables and have the output reported separately for levels of a third variable.

My data are similar to this example:

var1 <- c(7, 8, 9, 10, 11, 12)
var2 <- c(18, 17, 16, 15, 14, 13)
categories <- c(1, 2, 3, 1, 2, 3)

And I want to correlate var1 with var2 within the categories, such that the results would show the correlation of the values of var1 and var2 for category 1 separately from category 2 and category 3.

In SAS, I would do:

PROC CORR DATA=x; 
  BY CATEGORY
  VAR VAR1
  WITH VAR2; 
RUN;
2

There are 2 answers

1
MrFlick On BEST ANSWER

You can put your records into a data.frame and then split by the cateogies and then run the correlation for each of the categories.

sapply(
    split(data.frame(var1, var2), categories), 
    function(x) cor(x[[1]],x[[2]])
)

This can look prettier with the dplyr library

library(dplyr)
data.frame(var1=var1, var2=var2, categories=categories) %>%
    group_by(categories) %>%
    summarize(cor= cor(var1, var2))
0
akrun On

You could also use by

sapply(by(cbind(var1, var2), categories, FUN=cor),`[`,2)
#1  2  3 
#-1 -1 -1