Read Pajek partitions file (.clu format) using Networkx

609 views Asked by At

I'm trying to read a pajek partition file (In other words, it is a .clu file) with NetworkX python library and I can't figure out how can I do that. I can read a pajek network (.net format) using the read_pajek method, but I did't find a way to read the .clu files.

Thanks a lot!

1

There are 1 answers

0
Joaquin Cabezas On

A .clu file follows this format:

  • First line: *Vertices NUMBER_OF_VERTICES
  • Second line: Partition of vertex 0
  • Third line: Partition of vertex 1

and so on until all NUMBER_OF_VERTICES are defined into a partition

Reading the community detection algorithms from networkx (https://networkx.github.io/documentation/stable/reference/algorithms/community.html) the preferred format in networkx is a iterable (i.e. a list or tuple) grouping the vertices number in each partition, for example:

  • [[0, 1, 2, 3, 4], [5], [6, 7, 8, 9, 10]]

That would mean that the first partition is composed of vertices 0,1,2,3 and 4.

So, reading a .clu file is the task of converting the file into that structure.

I picked up the read_pajek function at https://networkx.github.io/documentation/networkx-1.10/_modules/networkx/readwrite/pajek.html#read_pajek and transformed it into a working read_pajek_clu function (you need to import defaultdict from collections).

def parse_pajek_clu(lines):
    """Parse Pajek format partition from string or iterable.
    Parameters
    ----------
    lines : string or iterable
       Data in Pajek partition format.
    Returns
    -------
    communities (generator) – Yields sets of the nodes in each community.
    See Also
    --------
    read_pajek_clu()
    """
    if isinstance(lines, str):
        lines = iter(lines.split('\n'))
    lines = iter([line.rstrip('\n') for line in lines])

    labels = []  # in the order of the file, needed for matrix
    while lines:
        try:
            l = next(lines)
        except:  # EOF
            break
        if l.lower().startswith("*vertices"):
            l, nnodes = l.split()
            communities = defaultdict(list)
            for vertice in range(int(nnodes)):
                l = next(lines)
                community = int(l)
                communities.setdefault(community, []).append(vertice)
        else:
            break

    return [ v for k,v in dict(communities).items() ]

You can check a working example at the repository:

https://github.com/joaquincabezas/networkx_pajek_util

Also, once you have the partition, it's a good start to use something like this idea from Paul Broderson to draw it:

how to draw communities with networkx

I hope this helps!