Choosing an appropriate way to use Neo4j in Python

4.4k views Asked by At

I am currently using embedded python binding for neo4j. I do not have any issues currently since my graph is very small (sparse and upto 100 nodes). The algorithm I am developing involves quite a lot of traversals on the graph, more specifically DFS on the graph in general as well as on different subgraphs. In the future I intend to run the algorithm on large graphs (supposedly sparse and with millions of nodes).

Having read different threads related to the performance of python/neo4j bindings here, here, I wonder whether I should already switch to some REST API client for Python (like bulbflow, py2neo, neo4jrestclient) until I am too far to change all code.

Unfortunately, I did not find any comprehensive source of information to compare different approaches.

Could anyone provide some further insight into this issue? Which criteria should I take into account when choosing one of the options?

3

There are 3 answers

1
Peter Neubauer On

Not really sure, I am not an expert, but I think it also depends on your Django expectations, and how much of a framework you need. Py2neo is very pragmatic and slim, Bulbflow seems to build up a whole mapping stack etc, and neo4jrestclient is concentrating on Django (that may be wrong)?

4
espeed On

The easiest way to run algorithms from Python is to use Gremlin (https://github.com/tinkerpop/gremlin/wiki).

With Gremlin you can bundle everything into one HTTP request to reduce round-trip overhead.

Here's how to execute Gremlin scripts from Bulbs (http://bulbflow.com):

>>> from bulbs.neo4jserver import Graph
>>> g = Graph()
>>> script = "g.v(id).out('knows').out('knows')"
>>> params = dict(id=3)
>>> g.gremlin.execute(script, params)

The Bulbs Gremlin API docs are here: http://bulbflow.com/docs/api/bulbs/gremlin/

0
Nigel Small On

Django is an MVC web framework so you may be interested in that if yours is to be a web application.

From the point of view of py2neo (of which I am the author), I am trying to focus hard on performance by using the batch execution mechanism automatically where appropriate as well as providing strong Cypher support. I have also recently put a lot of work into providing good options for uniqueness management within indexes - specifically, the get_or_create and add_if_none methods.