How to get nanomsg to automatically reconnect reliably?

1.9k views Asked by At

I am having problems handling reconnects when the server briefly goes offline or totally offline and then comes back up. I can't get my clients to automatically reconnect. Also, there is no property anywhere where I can see the status of the socket (socket disconnected?) so that I can reconnect manually. What am I missing?

According to the nanomsg documentation, there is a reconnect interval setting called NN_RECONNECT_IVL. I can't seem to get it to work. Consider the following:

I've got a working nanomsg server:

nanocat --bind-local 1234 --bus --interval 1 --ascii --data "hello world"

Then I attach to it:

nanocat --connect-local 1234 -A --bus

and I see:

hello world
hello world
hello world

Then, I kill the server and restart it and nanocat doesn't reconnect automatically. Maybe there's a setting that I am missing?

Next, I built a client in C# using NNanomsg:

class Program
{
    static void Main(string[] args)
    {
        NNanomsg.NanomsgSocket s = new NNanomsg.NanomsgSocket(NNanomsg.Domain.SP, NNanomsg.Protocol.BUS);

        // doesn't seem to do anything
        s.Options.ReconnectInterval = new TimeSpan(0, 0, 5);

        var e = s.Connect("tcp://127.0.0.1:1234");

        while (true)
        {
            byte[] ddd;
            ddd = s.ReceiveImmediate();
            if (ddd != null)
            {
                string m = UTF8Encoding.UTF8.GetString(ddd);
                Console.WriteLine(m);
            }
            else
            {
                // don't peg the CPU
                Thread.Sleep(100);
            }
        }
    }
}

and I see:

hello world
hello world
hello world

Then, I kill the server and restart it and my C# client doesn't reconnect automatically. Maybe there's a setting that I am missing?

Next, I built a client in c:

#include <nanomsg/nn.h>
#include <nanomsg/bus.h>

int _tmain(int argc, _TCHAR* argv[])
{
    int sock;
    int recv;
    int reconnect_interval;
    char *buf;

    buf = NULL;
    reconnect_interval = 5;

    sock = nn_socket (AF_SP, NN_BUS);

    nn_setsockopt(sock, NN_SOL_SOCKET , NN_RECONNECT_IVL, &reconnect_interval, sizeof(reconnect_interval));

    nn_connect(sock, "tcp://127.0.0.1:1234");

    while(1 == 1)
    {
        recv = nn_recv(sock, &buf, NN_MSG, 0);
        if(recv > 0)
        {
            printf("%s\r\n", buf);
            nn_freemsg (buf);
        }
    }

    return 0;
}

and I see:

hello world²²²²½½½½½½½½■ε■ε■
hello world²²²²½½½½½½½½■ε■ε■
hello world²²²²½½½½½½½½■ε■ε■

(The junk because I guess nanomsg doesn't initialize the buffer and I'm lazily using printf)

Then, I kill the server and restart it and my C client doesn't reconnect automatically.

What am I missing?

NOTE: I experimented setting the socket option before and also after the nn_connect() and s.Connect. Nope.

1

There are 1 answers

3
Victor  Laskin On

I made test without nanocat. As base i took BUS example from this tutorial

This is C++ code for nodes:

#include <assert.h>
#include <libc.h>
#include <stdio.h>
#include "nanomsg-master/src/nn.h"
#include "nanomsg-master/src/bus.h"

int node (const int argc, const char **argv)
{
  int sock = nn_socket (AF_SP, NN_BUS);
  assert (sock >= 0);
  assert (nn_bind (sock, argv[2]) >= 0);
  sleep (1); // wait for connections
  if (argc >= 3)
    {
      int x=3;
      for(x; x<argc; x++)
        assert (nn_connect (sock, argv[x]) >= 0);
    }
  sleep (1); // wait for connections

  int to = 1000;
  assert (nn_setsockopt (sock, NN_SOL_SOCKET, NN_RCVTIMEO, &to, sizeof (to)) >= 0);
  // SEND
  int sz_n = strlen(argv[1]) + 1; // '\0' too
  printf ("%s: SENDING '%s' ONTO BUS\n", argv[1], argv[1]);
  int send = nn_send (sock, argv[1], sz_n, 0);
  assert (send == sz_n);
  while (1)
    {
      // RECV
      char *buf = NULL;
      int recv = nn_recv (sock, &buf, NN_MSG, 0);
      if (recv >= 0)
        {
          printf ("%s: RECEIVED '%s' FROM BUS\n", argv[1], buf);
          nn_freemsg (buf);
        }
    }
  return nn_shutdown (sock, 0);
}

int main (const int argc, const char **argv)
{
  if (argc >= 3) node (argc, argv);
  else
    {
      fprintf (stderr, "Usage: bus <NODE_NAME> <URL> <URL> ...\n");
      return 1;
    }
}

I modified running script for the test:

gcc testbus.c nanomsg-master/.libs/libnanomsg.a -o testbus
./testbus node0 tcp://127.0.0.1:1234 & node0=$!
./testbus node1 tcp://127.0.0.1:1235 tcp://127.0.0.1:1234 & node1=$!
./testbus node2 tcp://127.0.0.1:1236 tcp://127.0.0.1:1234 & node2=$!
./testbus node3 tcp://127.0.0.1:1237 tcp://127.0.0.1:1234 & node3=$!
sleep 2

ps -ax | grep testbus | grep -v grep
echo "-"

kill $node0
echo "After killing server"
ps -ax | grep testbus | grep -v grep
echo "-"

echo "Running new server..."
./testbus node0 tcp://127.0.0.1:1234 & node0=$!
sleep 5
ps -ax | grep testbus | grep -v grep
echo "-"


kill $node0 $node1 $node2 $node3
ps -ax | grep testbus | grep -v grep
echo "-"

Here i made node0 as server point with no outgoing connections. Other nodes just have one outgoing connection to one server node. And after 2 seconds im killing node0. Look at the log i got as result:

script run results

As you can see there is communication after restart of node0 as expected. Even without setting reconnection interval. I hope this example will help.