Edited: I have 3 raters each having rated 5 test samples on a Likert scale (0-10). I want to find out the inter-rater reliability. What is the best method to pursue it? Also, I am using python for writing my code.
I have tried Pearson's correlation but the results come out to be horrible (although they shouldn't) because the ratings are not immensely different.
rater1 = [1,8,9,8,8] rater2 = [8,8,6,7,8] rater3 = [10,5,9,8,9] Pearson's correlation matrix
This is my code:
import seaborn as sns
import pandas as pd
import matplotlib.pyplot as plt
rater1 = [1,8,9,8,8]
rater2 = [8,8,6,7,8]
rater3 = [10,5,9,8,9]
df = pd.DataFrame({'R1':rater1, 'R2':rater2, 'R3':rater3})
corr = df.corr()
heatmap = sns.heatmap(corr, annot=True)
plt.show()