I've started to use R lately, and I want to get a correlation matrix for a certain set of variables. My dataset consists of over 150 variables, but I'm only using a few of them. How can I choose which ones to produce? Thanks in advance!

# Choose variables for correlation matrix

Asked by Community At## 2 Answers

0

On

I like using the `dplyr`

package. For instance, if your dataset is called `dataset`

, do:

```
library(dplyr)
```

Then lets pretend your dataset is:

```
dataset <- data.frame(x = c(1, 2, 3),
y = c(4, 5, 6),
z = c(100, 50, 20))
```

Then:

```
dataset %>%
as.data.frame() %>%
select(x, z) %>% # select the variables
as.matrix() %>%
cor() # the correlation matrix
# x z
# x 1.0000000 -0.9897433
# z -0.9897433 1.0000000
```

This method is full proof. We don't know if your dataset is currently a *dataframe* or a *matrix*, which will effect which code you use. This code takes that into account.

This computes the correlation of the 2nd, 3rd and 4th variables of the builtin data frame anscombe:

So does this (assuming they have the indicated names):