ZeroMQ publisher not binding after forever restartall

427 views Asked by At

Scenario

I am have multiple NodeJS scripts running in forever mode in Ubuntu OS. One of these files(start.js) imports a file that starts a ZMQ publisher by biding it to a specified port. When I start this start.js file in forever mode separately, it binds and starts the publisher, and I am able to fetch the data published by this publisher through a ZMQ subscriber that connects to this port.

I am closing the publisher gracefully by checking for exit, SIGINT and SIGUSR events.

Whenever I restart this start.js file alone using forever restart, the publisher binds and starts successfully. It also works fine if I stop it manually (using forever stop) and start it again using forever start [ also works in the case where I manually stop(using forever stopall) and start all the forever scripts one by one].

NOTE: All the forever stop and restart commands are run with CLI option --killSignal=SIGINT.

Problem

But the publisher is failing to bind when I do forever restartall --killSignal=SIGINT. It says that the address is already in use(I have checked this using netstat and there is no tcp socket at that port). When I stop all the scripts and start them one by one it binds back normally and starts successfully.

I have checked that these kill signals are caught by the publisher script and its closing the publisher socket before exiting.

Failed Attempts:

  • Lowered the TIME_WAIT state of the tcp sockets.

  • Enabled reuse of TIME_WAIT sockets.

  • I thought that the tcp socket is taking time to get released from TIME_WAIT state, and tried to bind the publisher after 1000ms on every failure to bind, but the scripts is trying to bind and failing every time it tries.

  • Tried forever restarting all the scripts using SIGINT, SIGUSR1 kill signals and handled them in the script that binds the publisher socket.

This is how I am handling the SIG* events in the publisher:

process.stdin.resume();
function exitHandler(options, err){
    if (options.cleanup) console.log('pub-clean');
    if (err) console.log("pub--" + err.stack);
    if (options.exit){
        socket.close();
        console.log("Publisher Closed")
        process.exit();
    }
}

process.on('exit', exitHandler.bind(null,{cleanup:true}));
process.on('SIGINT', exitHandler.bind(null, {exit:true}));
process.on('uncaughtException', exitHandler.bind(null,{exit:true}));
process.on('SIGUSR2', exitHandler.bind(null, {exit:true}));
process.on('SIGTERM', exitHandler.bind(null, {exit:true}));

Why the forever restarting all the scripts is causing the publisher script to fail to bind?

What can be done to make the publisher script to bind on forever restarting?

1

There are 1 answers

3
user3666197 On

ZeroMQ-resources are recommended to be released in a controlled way

As discussed in the comments above, a truly graceful release of ZeroMQ resources is not done via system-level SIG* / *KILL, but by executing the ZeroMQ-recommended graceful-release steps.

As posted so far, you do not do that at all in your code and thus the ZeroMQ-resources may and most probably remain hanging ( at least the I/O-thread seems to ).

Check your ZeroMQ-socket settings used in ( not yet posted ) setup ( .setsockopt() calls used in setup phase ) and add:

  1. ensure settings for a non-blocking .close() of all sockets setup ( be they used, or not )
  2. then execute .close() only after [1] is sure and valid
  3. finally, execute explicit ZeroMQ Context instance .term()

This is considered a guaranteed ZeroMQ-graceful-release of all ( internally handled ) resources.



On a sample code request:

A graceful release

void  msLIB.deinit() {
      aComment.ADD( "msLIB.INFO: msLIB.deinit() TracePOINT.<BoPROC>|", False );

   // --------------------------------------------------------------------------------------<THANKS>
   // DO NOT EDIT: the below  IS TRULY NEEDED or else one might get some nice memory leaks!
   // ------------            |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||   


   // -----------------------------------------------------------------------
   // ZMQ-IMPERATIVE ||||||||||||||||||||||||||||||||||||||||||||||||||||||||
   // _______________________________________________________________________ msMOD: ZMQ_Safe&CleanCODE_IMPERATIVE
      zmq_setsockopt(   zmqSpeaker,  ZMQ_LINGER, 0 );                         // no Sending QUEUE .... on ZMQ_PUBLISHER end
      zmq_close(        zmqSpeaker  );                                        // Protect against memory leaks on shutdown.
                        aComment.ADD( "<1>", False );
                        aComment.ADD( " [[[ZMQ]]]<speaker_socket>.set( ZMQ_LINGER ) / .close()-ed ", True );

      zmq_setsockopt(   zmqListener, ZMQ_LINGER, 0 );                         // aKBD.PUB 
      zmq_close(        zmqListener );                                        // Protect against memory leaks on shutdown.
                        aComment.ADD( "<2>", False );
                        aComment.ADD( " [[[ZMQ]]]<listener_socket>.set( ZMQ_LINGER ) / .close()-ed ", True );      

      zmq_term(         zmqContext  );                                        // Protect against memory leaks on shutdown.
                        aComment.ADD( "<3>", False );
                        aComment.ADD( " [[[ZMQ]]]<context>.term()-ed ", True );

   // _______________________________________________________________________ msMOD: ZMQ_Safe&CleanCODE_IMPERATIVE

   // ------------
   // DO NOT EDIT: the above  IS TRULY NEEDED or else one might get some nice memory leaks!               
   // --------------------------------------------------------------------------------------<THANKS>

      aComment.ADD( "|<EoPROC>", False );
      msLIB.aSnapshot.MAKE();
      aComment.ADD( "|aSnapshot.MAKE()-<DONE>", True );
      return;
   }


On missing "in-built" controls

One may extend the architecture so as to contain one's own soft-signalling code for all the situations, that need to get handled softer, than via SIGKILL et al.

a simpler case

and

a more complex case