Some Java NIO selector keys perpetually in non-ready state

39 views Asked by At

I am using NIO in java to listen to incoming packets from a bunch of IOT devices. My (standard) code to listen to the connections is as follows:

 try {
            while (!shutdown) {

                int readyChannels=selector.select();
                if(readyChannels==0)
                    continue;
                messageProcessor.processRequest(selector);
       }
            log.info("Exited the main loop");
        } catch (IOException exception) {
            log.error("Exception occurred while handling the request : {}, stacktrace:{}", exception.getMessage(),exception.getStackTrace());
        }

The selector is read in the following code snippet:

Set<SelectionKey> selectionKeys = selector.selectedKeys();
        if(log.isDebugEnabled())
        {
                 log.debug("Provider:{}",selector.provider());
            log.debug("Keys_size:{}",selector.keys().size());
            log.debug("selection_keys_size: {}", selectionKeys.size());
            List<SelectionKey> nonReady=selector.keys().stream().filter(k->(k.isValid() && k.readyOps()==0)).collect(Collectors.toList());
            log.debug("nonReady:{}",nonReady);
            log.debug("nonReady_size:{}",nonReady.size());

        }

        Iterator<SelectionKey> keys = selectionKeys.iterator();

        while (keys.hasNext()) {
            SelectionKey key = keys.next();
            log.debug("Thread : {} iterating over key : {}", Thread.currentThread().getName(), key);
            keys.remove();
            if (!key.isValid()) {
                log.warn("Invalid key : {}", key);
                continue;
            }
            try {
                if (key.isAcceptable()) {
                    log.debug("acceptable key : {}", key);
                    this.accept(key, selector);
                }

                if (key.isReadable()) {
                    log.debug("readable key : {}", key);
                    this.read(key);
                }

            } catch (Exception ex) {
                log.error("Got exception {}, trace:{} while accepting_or_reading key: {}", ex.getMessage(),ex.getStackTrace(),key);
            }

This code has worked for 3+ years. Of late, the code has been giving trouble. Specifically, a large number of connections are visible in the operating system (linux) by running the command

sudo ss -o |sort | grep "my_ip_address]:"

are not printed (that is, are never present in selectionKeys) in the keys.hasNext() while loop.

After some trial and error, I was able to get a trace of those "ghost" connections by logging the keys which did not have ready ops. A small minority of the keys (1-2%) which were not ready eventually became ready. The others never did.

I am fairly certain that this happened due to a misconfiguration in the IOT devices (the person handling this told me as much). But what happens is that the connections keep growing, and eventually crash the server (though we've mitigated this by killing the process using shell scripts and restarting).

What I'd like to know is

  1. While there's likely a misconfiguration, what exactly/likely is happening at the network/transport layer which is causing readyOps to forever remain 0 for a large number of keys.
  2. What's the right way to deal with this in java? Would storing the keys in a map, and then cancelling the keys which do not get ready after some preconfigured time be a good approach?
0

There are 0 answers