Memory problems with large-scale social network visualization using R and Cytoscape

1k views Asked by At

I'm relatively new to R and am trying to solve the following problem:

I work on a Windows 7 Enterprise platform with the 32bit version of R and have about 3GB of RAM on my machine. I have large-scale social network data (c. 7,000 vertices and c. 30,000 edges) which are currently stored in my SQL database. I have managed to pull this data (omitting vertex and edge attributes) into an R dataframe and then into an igraph object. For further analysis and visualization, I would now like to push this igraph into Cytoscape using RCytoscape. Currently, my approach is to convert the igraph object into an graphNEL object since RCytoscape seems to work well with this object type. (The igraph plotting functions are much too slow and lack further analysis functionality.)

Unfortunately, I always run into memory issues when running this script. It has worked previously with smaller networks though.

Does anyone have an idea on how to solve this issue? Or can you recommend any other visualization and analysis tools that work well with R and can handle such large-scale data?

2

There are 2 answers

1
Sacha Epskamp On

It has been a while since I used Cytoscape so I am not exactly sure how to do it, but the manual states that you can use text files as input using the "Table Import" feature.

In igraph you can use the write.graph() function to export a graph in a bunch of ways. This way you can circumvent having to convert to a graphNEL object which might be enough to not run out of memory.

1
Paul Shannon On

Sorry for taking several days to get back to you.

I just ran some tests in which

1) an adjacency matrix is created in R 2) an R graphNEL is then created from the matrix 3) (optionally) node & edge attributes are added 4) a CytoscapeWindow is created, displayed, and layed out, and redrawn

(all times are in seconds)

nodes   edges  attributes? matrix    graph   cw    display   layout  redraw   total
  70      35       no       0.001    0.001   0.5      5.7      2.5    0.016    9.4
  70       0       no       0.033    0.001   0.2      4.2      0.5    0.49     5.6
 700     350       no       0.198    0.036   6.0      8.3      1.6    0.037   16.7
1000     500       no       0.64     0.07   12.0      9.8      1.8    0.09    24.9
1000     500      yes       0.42    30.99   15.7     29.9      1.7    0.08    79.4
2000    1000       no       3.5      0.30   73.5     14.9      4.8    0.08    96.6
2500    1250       no       2.7      0.45  127.1     18.3     11.5    0.09   160.7
3000    1500       no       4.2      0.46  236.8     19.6     10.7    0.10   272.8
4000    2000       no       8.4      0.98  502.2     27.9     21.4    0.14   561.8

To my complete surprise, and chagrin, there is an exponential slowdown in 'cw' (the new.CytoscapeWindow method) --which makes no sense at all. It may be that your memory exhaustion is related to that, and is quite fixable.

I will explore this, and probably have a fix in the next week.

By the way, did you know that you can create a graphNEL directly from an adjacency matrix?

g = new ("graphAM", adjMat = matrix, edgemode="directed")

Thanks, Ignacio, for your most helpful report. I should have done these timing tests long ago!

  • Paul