NetworkX - Create graph from node and attributes

2.2k views Asked by At

I'm trying to make a network graph using networkX that is given the nodes and attributes. Each node is unique but it can have matching attributes with other nodes. These attributes will act as the edges between the nodes that all have this same attribute.

An example of the input (node and attributes)

Name1   2-s2.0-84905590088, 2-s2.0-84901477890
Name2   2-s2.0-84941169876
Name3   2-s2.0-84958012773
Name4   2-s2.0-84960796474
Name5   2-s2.0-84945302996, 2-s2.0-84953281823, 2-s2.0-84944268402, 2-s2.0-84949478621, 2-s2.0-84947281259, 2-s2.0-84947759580, 2-s2.0-84945265895, 2-s2.0-84945247800, 2-s2.0-84946541351, 2-s2.0-84946051072, 2-s2.0-84942573284, 2-s2.0-84942280140, 2-s2.0-84937715425, 2-s2.0-84943751990, 2-s2.0-84957729558, 2-s2.0-84938844501, 2-s2.0-84934761065
Name6   2-s2.0-84908333808
Name7   2-s2.0-84925879816
Name8   2-s2.0-84940447040, 2-s2.0-84949534001
Name9   2-s2.0-84899915556, 2-s2.0-84922392381, 2-s2.0-84905079505, 2-s2.0-84940931972, 2-s2.0-84893682063, 2-s2.0-84954285577, 2-s2.0-84934934228, 2-s2.0-84926624187
Name10  2-s2.0-84907065810

so Name5 would have a lot of edges that connected up to the other names with the same identifier.

I'm not sure if this is the right idea behind networkX or if you can even use this kind of input to graph. If this way is not achievable, how would I format the input to make this graph? I haven't been able to find any documentation or videos on using networkX this way.

1

There are 1 answers

6
edo On

What you ask is possible. I stored your data in a csv file -- note that I added a , after the node names and that I removed all whitespace.

Name1,2-s2.0-84905590088,2-s2.0-84901477890
Name2,2-s2.0-84941169876
Name3,2-s2.0-84958012773
Name4,2-s2.0-84960796474
Name5,2-s2.0-84945302996,2-s2.0-84953281823,2-s2.0-84944268402,2-s2.0-84949478621,2-s2.0-84947281259,2-s2.0-84947759580,2-s2.0-84945265895,2-s2.0-84945247800,2-s2.0-84946541351,2-s2.0-84946051072,2-s2.0-84942573284,2-s2.0-84942280140,2-s2.0-84937715425,2-s2.0-84943751990,2-s2.0-84957729558,2-s2.0-84938844501,2-s2.0-84934761065
Name6,2-s2.0-84908333808
Name7,2-s2.0-84925879816
Name8,2-s2.0-84940447040,2-s2.0-84949534001
Name9,2-s2.0-84899915556,2-s2.0-84922392381,2-s2.0-84905079505,2-s2.0-84940931972,2-s2.0-84893682063,2-s2.0-84954285577,2-s2.0-84934934228,2-s2.0-84926624187
Name10,2-s2.0-84907065810

One observation: you say that Name5 would have a lot of edges but its attributes are unique. Moreover, when I run my code with your data it turns out all of the attributes are unique so there are no edges in the graph.

I tweeked your data so that I use only the first 12 characters of each attribute (I do that with the line new_attributes = [x[:12] for x in new_attributes]). That way I get some matching attributes.

Now the code:

import networkx as nx
import csv

G = nx.Graph()

with open('data.csv') as csvfile:
        csv_reader = csv.reader(csvfile, delimiter=',')
        for row in csv_reader:

            new_node = row[0]  # first element in row
            new_attributes = row[1:]  # whole row except the first element
            new_attributes = [x[:12] for x in new_attributes]  # remove this for your data!
            # add the node and its attributes to the graph
            G.add_node(new_node, my_attributes=new_attributes)  # attributes are stored as a list

            # add edges based on existing nodes
            for node, attrs in G.nodes(data=True):
                # skip node we just added
                if node != new_node:
                    for attr in attrs['my_attributes']:
                        # check if any of the attributes for `node` are also in the `new_attributes` list
                        if attr in new_attributes:
                            G.add_edge(node, new_node)

for edge in G.edges():
    print('EDGE:', edge, '| COMMON ATTRIBUTES:', set(G.node[edge[0]]['my_attributes']) & set(G.node[edge[1]]['my_attributes']))

For each csv row I add a node (with its attributes) to the graph and based on the current nodes in the graph (and their attributes) I add the edges. Note that the node attributes are stored in a list and can be accessed with the my_attributes key. In the end I also print the edges with the matching attributes for the nodes in a particular edge (I use set and & to get the intersection of two lists of attributes).

Output for the tweeked data:

EDGE: ('Name5', 'Name9') | COMMON ATTRIBUTES: {'2-s2.0-84934'}
EDGE: ('Name5', 'Name8') | COMMON ATTRIBUTES: {'2-s2.0-84949'}
EDGE: ('Name8', 'Name9') | COMMON ATTRIBUTES: {'2-s2.0-84940'}
EDGE: ('Name1', 'Name9') | COMMON ATTRIBUTES: {'2-s2.0-84905'}

One final note: if you need to have multiple edges between two nodes use a MultiGraph.