How to find correlation between two values

151 views Asked by At

I have a table with two columns emailid and keyword and I am pivoting(kind of matrix) the value is sql such as the columns are the distinct keywords and the rows are the distinct users the values at [emailid][keyword] is 1 if the value is present and null if it is not, and I am trying to find the correlation between keywords i.e. if two users have searched for the same keyword then there is a correlation between those two keywords, How can I achieve this ?

1

There are 1 answers

0
dhanush-ai1990 On

You should replace the null value with 0 to begin. You may want to explore various correlation techniques such as Pearson and Spearman correlation.

This is a page on Pearson Correlation: http://learntech.uwe.ac.uk/da/Default.aspx?pageid=1442

from scipy.stats.stats import pearsonr
a =[1.0001345,0.000656];b=[1.00001345,0.000656]
print pearsonr(a,b)[0]

This gives the output as 1.0 which means total correlation or positive correlation. The output of Pearson correlation varies from -1.0 (Most negative correlation) to 1.0 (high positive correlation). Here 0 means no correlation between the two data quantity.

The more information on this could be found under: https://docs.scipy.org/doc/scipy-0.14.0/reference/generated/scipy.stats.pearsonr.html