Hullo, hope this doesn't end up being too trivial.
The relevant parts of my stack are Gunicorn/Celery, neomodel (0.3.6), and py2neo (1.5). Neo4j version is 1.9.4, bound on 0.0.0.0:7474 (all of this is on linux, Ubuntu 13.04 I think)
So my gunicorn/celery servers are fine most of the time, except occasionally, I get the following error:
ConnectionRefusedError(111, 'Connection refused')
Stacktrace (most recent call last):
File "flask/app.py", line 1817, in wsgi_app
response = self.full_dispatch_request()
File "flask/app.py", line 1477, in full_dispatch_request
rv = self.handle_user_exception(e)
File "flask/app.py", line 1381, in handle_user_exception
reraise(exc_type, exc_value, tb)
File "flask/_compat.py", line 33, in reraise
raise value
File "flask/app.py", line 1475, in full_dispatch_request
rv = self.dispatch_request()
File "flask/app.py", line 1461, in dispatch_request
return self.view_functions[rule.endpoint](**req.view_args)
File "Noomsa/web/core/util.py", line 156, in inner
user = UserMixin().get_logged_in()
File "Noomsa/web/core/util.py", line 117, in get_logged_in
user = models.User.index.get(username=flask.session["user"])
File "neomodel/index.py", line 50, in get
nodes = self.search(query=query, **kwargs)
File "neomodel/index.py", line 41, in search
return [self.node_class.inflate(n) for n in self._execute(str(query))]
File "neomodel/index.py", line 28, in _execute
return self.__index__.query(query)
File "py2neo/neo4j.py", line 2044, in query
self.__uri__, quote(query, "")
File "py2neo/rest.py", line 430, in _send
raise SocketError(err)
So, as you can see, I do a call to User.index.get
(The first call in the request response), and get a socket error. Sometimes. Most of the time, it connects fine. The error occurs amongst all Flask views/Celery tasks that use the neo4j connection (and not just doing User.index.get
;)).
So far, the steps I've taken have involved moneky patching the neomodel connection function to check that the GraphDatabaseService
object is created per thread, and to automatically reconnect (and authenticate) to the neo4j server every 30 or so seconds. This may have reduced the frequency of the errors, but they still occur.
Looking for the error online, it seems to be mostly people trying to connect to the wrong interface/ip/port. However, given that the majority of my requests go through, I don't feel like that is the case here.
Any ideas? I don't think it's related, but my database seems to have 38k orphaned nodes; that's probably worthy of another question in its own right.
EDIT: I should add, this seems to disappear when running gunicorn/celery with workers=1
, instead of workers=$CPU_N
. Can't see why it should matter, as apparently neo4j is set up to handle $N_CPU*10
connections by default.
This looks like a networking or web stack configuration problem so I don't think I can help from a py2neo perspective. I'd recommend upgrading to py2neo 1.6 though as the client HTTP code has been completely rewritten and it might handle a reconnection more gracefully.