I have a tsv that looks like this (long-form):
one two value
a b 30
a c 40
a d 20
b c 10
b d 05
c d 30
I'm trying to get this into a dataframe for R (or pandas)
a b c d
a 00 30 40 20
b 30 00 10 05
c 40 10 00 30
d 20 05 30 00
The problem is, in my tsv I only have a, b defined and not b,a. So I get a lot of NAs in my dataframe.
The final goal is to get a distance matrix to use in clustering. Any help would be appreciated.
An
igraph
solution where you read in the dataframe, with the value assumed as edge weights. You can then convert this to an adjacency matrix