I'm trying to figure how to calculate the pearson correlation coefficient using sql. Here is the formula I'm using: and here is the table I'm using:
This is what I have so far for a query but it's giving me this message: Invalid use of group function
select first_id, second_id, movie_id, first_score, second_score, count(*) as n,
sum((first_score-avg(first_score))*(second_score-avg(second_score)))/
(
sqrt(sum(first_score-avg(first_score)))*
sqrt(sum(second_score-avg(second_score))))
as pearson
from connections
group by second_id
Thanks for helping
Here is a query that does the calculation in the formula:
There are numerous issues with your attempt. This precalculates the average values for the two scores. It then applies the formula pretty much as written.