I have a machine hosting a router socket as such:
router = zmq_ctx.socket(zmq.ROUTER)
router.setsockopt(zmq.constants.ROUTER_HANDOVER, 1)
router.bind(url)
Then I have a number of machines making connections to it like this.
dealer = zmq_ctx.socket(zmq.DEALER)
dealer.setsockopt(zmq.IDENTITY, options.identity)
dealer.connect(url)
# send some message every 10 seconds
and sending messages, and after some time shutting down. The connections on the dealer end aren't closed gracefully the machine are simply powered off (not sure if that is important). The problem is I've noticed that sometimes new dealers' messages do not arrive at the router. How can I prevent/debug this to find out what is going wrong? Could stale connections from previous dealer prevent new messages from arriving?
First note: ZeroMQ does not guarantee ...
ZeroMQ explicitly recommends to design messaging architectures against uncertainty and non-guaranteed delivery. That said, you may be interested in this Pieter Hintjens book published explanation:
Second note: ZeroMQ recommends Graceful Release & Termination prior to exit
While a context termination is a localhost issue, the graceful release of resources associated with ZeroMQ-sockets is not an isolated one. The ZeroMQ-socket has two ends and it is fair to request a graceful release so as to avoid remote-end ambiguities and accumulated side-effects as may grow upon your power-off-ed Herd of Machines.
Simply put, the ZeroMQ-internal Finite-State-Machines ( both the local ( which you took responsibility to simply power-off ) & also the remote ( which you ignored ) ) shall not be destabilised by any such behaviour, whence a fair graceful release of resources and exits may ( and ought ) take place.
Yes, it is important.
If you seek for stable distributed system operations, your code is held responsible & shall always spend a few CPU-clocks on enforcing non-blocking ( yes, ENFORCING & yes, NON-BLOCKING ) clean exit, i.e. to take responsible care of pre-empted ZMQ-socket buffers, avoiding
ZMQ_LINGER
dead-ends gridlocking the localhost out-of-control et al.