Under what conditions will Linux epoll_wait return epoll_events struct with an empty events field?

504 views Asked by At

I've written an event loop to handle file descriptor read/write events. I have successfully written a version of the code that supports kqueue and a second version that supports select. I am working on my third and final version which will support epoll.

I am experiencing a problem when I register a new descriptor for a EPOLLIN event. The descriptor in question is already "listening" for connections, so I wait for a read event to occur so that I know the next call to "accept" will succeed (common practice for non-blocking accept).

All file descriptors are set to non-blocking.

My call to epoll_wait returns two events for the same descriptor. The first event has the event field set to the value of EPOLLIN. The second event structure has the event field set to 0 / empty. The data.fd field lists the same FD number as the first struct.

What are the circumstances where epoll_wait will return an event structure with a zeroed event field?

This does NOT happen every time but it happens 90+% of the time.

Lastly, I'd post code but this is written in Ruby and there is a LOT of boilerplate to wrap up the socket, listen, accept, etc. functions in FFI, set constants, etc. The example code would be quite long and unwieldy so I am not posting any code.

1

There are 1 answers

0
Chuck Remes On BEST ANSWER

The problem above was a case of garbage in, garbage out. I had removed the Ruby tag after a complaint from a commenter, but I need to add it back. The problem stemmed from the Ruby FFI definition of the epoll_event struct. Here is the original, incorrect code:

class EPollDataUnion < FFI::Union
  layout \
    :ptr, :pointer,
    :fd,  :int,
    :u32, :uint32,
    :u64, :uint64
end

class EPollEventStruct < FFI::Struct
  layout \
    :events, :uint32,
    :data, EPollDataUnion
end

The above definition yielded an EpollEventStruct with a size of 16 bytes. The struct should be 12 bytes.

The problem was that the data field in the second struct was offset 8 bytes. By default, Ruby's FFI implementation aligns all fields on a 8-byte boundary. The fix is to specify that the struct should be packed.

class EPollEventStruct < FFI::Struct
  pack 1 # force alignment on 1-byte boundaries
  layout \
    :events, :uint32, # offset at byte 0
    :data, EPollDataUnion # offset at byte 4
end

So when my code was passing the heap memory to the epoll_ctl and epoll_wait functions, it was operating on event structs that were too large. This corrupted memory which in turn produced corrupted results that made no sense (i.e. returning 2 events for the same FD, second struct had no events bits set).