How to convert Gremlin query results into Pandas/Python dataframes?

924 views Asked by At

I want to prettify my results from Gremlin queries by converting them into dataframes.

Gremlin outputs results that (at least to my untrained eyes) look like the Json format. As an example I'll borrow the answer to one of my previous questions that uses the air-routes graph:

g.V().
  group().
    by('code').
    by(
      outE('route').
      order().by('dist').
      inV().
      group().
        by('code').
        by(
          outE('route').
          order().by('dist').
          inV().
          values('code').fold())).
  unfold()

with the results looking something like this:

1.  {'SHH': {'WAA': ['KTS', 'SHH', 'OME'], 'OME': ['TLA', 'WMO', 'KTS', 'GLV', 'ELI', 'TNC', 'WAA', 'WBB', 'SHH', 'SKK', 'KKA', 'UNK', 'SVA', 'OTZ', 'GAM', 'ANC']}}
2.  {'KWN': {'BET': ['WNA', 'KWT', 'ATT', 'KUK', 'TLT', 'EEK', 'WTL', 'KKH', 'KWN', 'KLG', 'MLL', 'KWK', 'PQS', 'CYF', 'KPN', 'NME', 'OOK', 'GNU', 'VAK', 'SCM', 'HPB', 'EMK', 'ANC'], 'EEK': ['KWN', 'BET'], 'TOG': ['KWN']}}
...

How can I convert this into a data frame that looks like this?

Home Stop Dest
==============
SHH  WAA  KTS 
SHH  WAA  SHH 
SHH  WAA  OME
SHH  OME  TLA
SHH  OME  WMO
SHH  OME  KTS
SHH  OME  GLV
SHH  OME  ELI
SHH  OME  TNC
SHH  OME  WAA
SHH  OME  WBB
SHH  OME  SHH
SHH  OME  SKK
SHH  OME  KKA
SHH  OME  UNK
SHH  OME  SVA
SHH  OME  OTZ
SHH  OME  GAM
SHH  OME  ANC
KWN  BET  WNA
KWN  BET  KWT
KWN  BET  ATT
...

I've been able to use a combination of list operations and pandas to achieve this, but is there a more straightforward way?

Note: It would be fine to re-write the query if that makes things easier, as long as the output is similar.

I'm running Gremlin in an Amazon Neptune environment with Neptune Python Utils.

1

There are 1 answers

4
Kelvin Lawrence On

You can easily import Gremlin maps, or lists of maps into a Pandas Data frame. For example consider the following line of Gremlin Python.

vm = g.V().has('airport','region','GB-ENG').valueMap().by(unfold()).toList()

Using the air-routes data set, the query finds all airports in England.

Having been run, the vm variable will contain a list of maps with values such as:

[{'code': 'LTN', 'type': 'airport', 'desc': 'London Luton Airport', 'country': 'UK', 'longest': 7086, 'city': 'London', 'lon': -0.368333011865616, 'elev': 526, 'icao': 'EGGW', 'region': 'GB-ENG', 'runways': 1, 'lat': 51.874698638916}, {'code': 'SOU', 'type': 'airport', 'desc': 'Southampton Airport', 'country': 'UK', 'longest': 5653, 'city': 'Southampton', 'lon': -1.35679996013641, 'elev': 44, 'icao': 'EGHI', 'region': 'GB-ENG', 'runways': 1, 'lat': 50.9502983093262},

You can then create a Data Frame using code like this

import pandas as pd
from pandas import DataFrame

df = DataFrame.from_dict(vm)
df.sort_values('code',ascending=False).head(10)

and the result will be of this form:

enter image description here