How can I find the best resemblance between one particular row and the rest of the rows in a dataframe?
I try to explain what I mean. Take a look at this dataframe:
df <- structure(list(person = 1:5, var1 = c(1L, 5L, 2L, 2L, 5L), var2 = c(4L,
4L, 3L, 2L, 2L), var3 = c(5L, 4L, 4L, 3L, 1L)), .Names = c("person",
"var1", "var2", "var3"), class = "data.frame", row.names = c(NA,
-5L))
How can I find the best resemblance between person 1 (row 1) and the rest of the rows (persons) in the data frame. The output should be something like: person 1 still in row 1 and the rest of the rows in order of best resemblance. The simmilarity algorithm I want to use is cosine or pearson. I tried to solve my problem with functions from the arules package
, but it didn't match well with my needs.
Any ideas someone?
Another idea is to define the cosine function manually, and apply it on your data frame, i.e.
which gives,