python graph-tool load csv file

3.2k views Asked by At

I'm loading directed weighted graph from csv file into graph-tool graph in python. The organization of the input csv file is:

1,2,300

2,4,432

3,89,1.24

...

Where the fist two entries of a line identify source and target of an edge and the third number is the weight of the edge.

Currently I'm using:

g = gt.Graph()
e_weight = g.new_edge_property("float")
csv_network = open (in_file_directory+ '/'+network_input, 'r')
csv_data_n = csv_network.readlines()
for line in csv_data_n:
    edge = line.replace('\r\n','')
    edge = edge.split(delimiter)
    e = g.add_edge(edge[0], edge[1])
    e_weight[e] = float(edge[2])

However it takes quite long to load the data (I have network of 10 millions of nodes and it takes about 45 min). I have tried to make it faster by using g.add_edge_list, but this works only for unweighted graphs. Any suggestion how to make it faster?

2

There are 2 answers

0
Tiago Peixoto On BEST ANSWER

This has been answered in graph-tool's mailing list:

http://lists.skewed.de/pipermail/graph-tool/2015-June/002043.html

In short, you should use the function g.add_edge_list(), as you said, and and put the weights separately via the array interface for property maps:

e_weight.a = weight_list

The weight list should have the same ordering as the edges you passed to g.add_edge_list().

0
Martin Evans On

I suggest you try the performance you get by using the csv library. This example returns edge holding a list of the 3 parameters.

import csv

reader = csv.reader(open(in_file_directory+ '/'+network_input, 'r'), delimiter=",")

for edge in reader:
    if len(edge) == 3:
        edge_float = [float(param) for param in edge]

So you would get the following to work with...

edge_float = [1.0, 2.0, 300.0]