Method to export Incidence Matrix from Grakn?

127 views Asked by At

We often use GraphBLAS for graph processing so we need to use the incidence matrix. I haven't been able to find a way to export this from Grakn to a csv or any file. Is this possible?

2

There are 2 answers

0
flyingsilverfin On BEST ANSWER

There isn't a built-in way to dump data to CSV in Grakn right now. However, we do highly encourage our community to contribute open source tooling for these kinds of tasks! Feel free to chat to use about it on our discord.

As to how it can be done, conceptually it's pretty easy:

Query to get stream all hyper-relations out:

match $r isa relation;

and then for each relation, we can pipeline another query (possibly in new transaction if you wish to keep memory usage lower):

match $r iid <iid of $r from previous query>; $r ($x); get $x;

which will get you everything in this particular hyper relation $r playing a role.

If you also wish to extract attributes that are attached to the hyper relation, you can use the following

match $r iid <iid of $r from first query>; $r has $a; get $a;

In effect we can use these steps to build up each column in the A incidence matrix.

There are a couple if important caveats I should bring up:

  • What you'll end up with, will exclude all type information about the hyper relations, the role players in the relations, and the actual role that is being played by the role player, and attribute types owned.

==> It would be interesting to hear/discuss how one could encode types information for use in GraphBLAS

  • In Graql, it's entirely possible to have relations participating in relations. in the worst case, this means all hyper-edges E will also be present in the set V. In practice only a few relations will play a role in other relations, so only a subset of E may be in V.
0
brett On

So the incidence matrix is equivalent to the nodes/edges array used in force graph visualisation. In this case it is pretty straight forward.

My approach would be slightly different than the above as all i need to do is pull all of the things in the db (entities, relations, attributes), with

match $ting isa thing;

Now when i get my transaction back, for each $ting I want to pull all of the available properties using both local and remote methods if I am building a force graph viz, but for your incidence matrix, I really only care about pulling 3 bits of data:

  1. The iid of the thing
  2. The attributes the thing may own.
  3. The roles the thing owns if it is a relation

Essentially one tests each returned object to find out the type (e.g. entity, attribute, relation), and then uses some of the local and remote methods to get the data one wants. In Python, the code for pulling the data for relations looks like

# pull relation data
        elif thing.is_relation():
            rel = {}
            rel['type'] = 'relation'
            rel['symbol'] = key
            rel['G_id'] = thing.get_iid()
            rel['G_name'] = thing.get_type().get_label().name()
            att_obj = thing.as_remote(r_tx).get_has()
            att = []
            for a in att_obj:
                att.append(a.get_iid())

            rel['has'] = att
            links = thing.as_remote(r_tx).get_players_by_role_type()
            logger.debug(f' links are -> {links}')
            edges = {}
            for edge_key, edge_thing in links.items():
                logger.debug(f' edge key is -> {edge_key}')
                logger.debug(f' edge_thing is -> {list(edge_thing)}')
                edges[edge_key.get_label().name()] = [e.get_iid() for e in list(edge_thing)]

            rel['edges'] = edges
            res.append(rel)
            layer.append(rel)
            logger.debug(f'rel -> {rel}')        

This then gives us a node array, which we can easily process to build an edges array (i.e. the links joining an object and the attributes it owns, or the links joining a relation to its role players). Thus, exporting your incidence matrix is pretty straightforward