How can I create a distance matrix containing the mean absolute scores between each row?

2.5k views Asked by At

Given the matrix,

df <- read.table(text="
 X1 X2 X3 X4 X5
  1  2  3  2  1
  2  3  4  4  3
  3  4  4  6  2
  4  5  5  5  4
  2  3  3  3  6
  5  6  2  8  4", header=T)

I want to create a distance matrix containing the absolute mean difference between each row of each column. For example, the distance between X1 and X3 should be = 1.67 given that:

abs(1 - 3) + abs(2-4) + abs(3-4) + abs(4-5) + abs(2-3) + abs(5-2) = 10 / 6 = 1.67

I have tried using the designdist() function in the vegan package this way:

designdist(t(df), method = "abs(A-B)/6", terms = "minimum")

The resulting distance for columns 1 and 3 is 0.666. The problem with this function is that it sums all the values in each column and then subtracts them. But I need to sum the absolute differences between each row (individually, absolute) and then divide it by N.

1

There are 1 answers

0
Josh O'Brien On BEST ANSWER

Here's a one-line solution. It takes advantage of dist()'s method argument to calculate the L1 norm aka city block distance aka Manhattan distance between each pair of columns in your data.frame.

as.matrix(dist(df, "manhattan", diag=TRUE, upper=TRUE)/nrow(df))

To make it reproducible:

df <- read.table(text="
 X1 X2 X3 X4 X5
  1  2  3  2  1
  2  3  4  4  3
  3  4  4  6  2
  4  5  5  5  4
  2  3  3  3  6
  5  6  2  8  4", header=T)

dmat <- as.matrix(dist(df, "manhattan", diag=TRUE, upper=TRUE)/nrow(df))
print(dmat, digits=3)
#      1     2     3    4     5    6
# 1 0.00 1.167 1.667 2.33 1.333 3.00
# 2 1.17 0.000 0.833 1.17 0.833 2.17
# 3 1.67 0.833 0.000 1.00 1.667 1.67
# 4 2.33 1.167 1.000 0.00 1.667 1.33
# 5 1.33 0.833 1.667 1.67 0.000 2.33
# 6 3.00 2.167 1.667 1.33 2.333 0.00