I'm trying to correlate sulfate and nitrate values in my dataset (a) by ID values and specific conditions (specified below). The dataset contains three columns (ID, sulfate, nitrate). The code works when I run each ID value individually but now I'm trying to set up a loop to run through all the ID values and then print out all the correlations by ID value into a single vector. The loop is not printing out the correlation values as I'm sure I am not saving them correctly. How can I modify the code below to print out a vector of correlation values according to each ID value?
for (i in 1:5) {
if (a$ID==i && length(a$ID==i) > 10) {
cor(a$sulfate[a$ID==i], a$nitrate[a$ID==i])
}
}
Try instead:
Explanation
We attempt a logical test. Return the output of 'yes' if ID equals 1:
We get the result of 'yes', but we also get a warning. Because:
The test checks whether each element of
a$ID
is equal to1
. That's a problem for theif
statement. How does R know whichTRUE
orFALSE
value to use for the test? So it just uses the first.In your code, you are passing vectors like that in your if statement. You want your if statement to return one value of
TRUE
orFALSE
. Or avoid it all together.Vectorization
As you become more advanced, you can avoid this loop with a vectorized function call.
Some R users have written great packages to deal with these types of problems. You will need
dplyr
anddata.table
. Here are two quick alternatives.