Create adjacency matrix for SNA from dataframe

655 views Asked by At

I want to create an adjacency matrix to use in social network analysis (likely with graph_from_adjacency_matrix in igraph) from a csv that is structured like this (but much larger):

name vote1 vote2 vote3
Joe  1     0     1
Jane 0     0     1
Jill 1     0     1

For the network analysis, the node will be the name variable, and the nodes will be connected by how frequently they vote together (1 or 0). Something like:

    Joe Jane Jill
Joe  0    2    3
Jane 2    0    2
Jill 3    2    0

As simple as this seems, I haven't been able to successfully convert this dataframe into an adjacency matrix that can be used to create an igraph graph object. as.matrix and data.matrix do convert it to a matrix, but not an adjacency matrix, and not one that preserves the characters in the "name" variable. My matrix algebra is not strong, so I know I'm likely missing something obvious, but I don't know enough to know what it is. I'm open to other solutions that get me to my end goal of network analysis.

1

There are 1 answers

2
lmo On

I think you want some version of the cross product.

# construct the matrix
myMat <- as.matrix(df[-1])

# same output as myMat %*% t(myMat)
resultMat <- tcrossprod(myMat)
# add names
dimnames(resultMat) <-  list(df$name, df$name)

resultMat
     Joe Jane Jill
Joe    2    1    2
Jane   1    1    1
Jill   2    1    2

The off diagonal shows the counts of instances where the individuals voted at the same time and the diagonal gives the counts of how many times the individuals voted with themselves (ie, their total vote count).

Since you don't want the total vote counts of each individual, you can replace the diagonal with 0s.

# remove diagonal
diag(resultMat) <- 0

resultMat
     Joe Jane Jill
Joe    0    1    2
Jane   1    0    1
Jill   2    1    0

Adding two additional votes and two additional voters in df1 below. There is a voter named Sal who only votes once in vote 2 and is the only voter.

df1
 name vote1 vote2 vote3 vote4 vote5
1  Joe     1     0     1     0     1
2 Jane     0     0     1     1     0
3 Jill     1     0     1     1     0
4  Bob     1     0     1     1     0
5  Sal     0     1     0     0     0

Running through the above process with this larger matrix, we get

resultMat
     Joe Jane Jill Bob Sal
Joe    0    1    2   2   0
Jane   1    0    2   2   0
Jill   2    2    0   3   0
Bob    2    2    3   0   0
Sal    0    0    0   0   0

Which shows 0s in all of Sal's slots and 3s in Bob-Jill Jill-Bob slots as they both voted in the same 3 votes.

data

df <-
structure(list(name = structure(c(3L, 1L, 2L), .Label = c("Jane", 
"Jill", "Joe"), class = "factor"), vote1 = c(1L, 0L, 1L), vote2 = c(0L, 
0L, 0L), vote3 = c(1L, 1L, 1L)), .Names = c("name", "vote1", 
"vote2", "vote3"), class = "data.frame", row.names = c(NA, -3L))

df1 <- 
structure(list(name = structure(c(4L, 2L, 3L, 1L, 5L), .Label = c("Bob", 
"Jane", "Jill", "Joe", "Sal"), class = "factor"), vote1 = c(1L, 
0L, 1L, 1L, 0L), vote2 = c(0L, 0L, 0L, 0L, 1L), vote3 = c(1L, 
1L, 1L, 1L, 0L), vote4 = c(0L, 1L, 1L, 1L, 0L), vote5 = c(1L, 
0L, 0L, 0L, 0L)), .Names = c("name", "vote1", "vote2", "vote3", 
"vote4", "vote5"), class = "data.frame", row.names = c(NA, -5L))