I've got a cluster of engines running. When I set them to work on a long running calculation, after a couple of minutes they just seem to 'die' silently, one by one, until the calculation stalls indefinitely as there are no more engines left to process the queue. Nothing is logged in ipcluster or the Jupyter notebook.
It seems to happen on longer running calculations. They just vanish from the process list one by one.
I'm not sure how to go about debugging this. Any suggestions for where to begin would be very helpful.