zeromq performance test. What's the accurate latency?

4.9k views Asked by At

I'm using zmq to carry message across process, and I want to do some performance test to get the latency and throughout.

The official site gives the guide to tell How to Run Performance Tests

For example, I tried:

local_lat tcp://*:15213 200 100000
remote_lat tcp://127.0.0.1:15213 200 100000

and get the result:

message size: 200 [B]
roundtrip count: 100000
average latency: 13.845 [us]

But when trying the pub-sub example in C++, I found the time interval between sending and receiving is about 150us. (I get the result by print log with timestamp)

Could anybody explain the difference between these two?

EDIT: I found the question 0mq: pubsub latency continually growing with messages? The result give a nearly constant delay of 0.00015s, which is 150us, same as my test, 10x than the official performance test. Why is the difference?

1

There are 1 answers

3
Willy Pell On

I'm having the same problem: ZeroMQ - pub / sub latency

I ran wireshark on my example code which publishes a zeromq message every second. Here is the output of wireshark:

145  10.900249     10.0.1.6 -> 10.0.1.6     TCP 89 5557→51723 [PSH, ACK] Seq=158 Ack=95 Win=408192 Len=33 TSval=502262367 TSecr=502261368
146  10.900294     10.0.1.6 -> 10.0.1.6     TCP 56 51723→5557 [ACK] Seq=95 Ack=191 Win=408096 Len=0 TSval=502262367 TSecr=502262367
147  11.901993     10.0.1.6 -> 10.0.1.6     TCP 89 5557→51723 [PSH, ACK] Seq=191 Ack=95 Win=408192 Len=33 TSval=502263367 TSecr=502262367
148  11.902041     10.0.1.6 -> 10.0.1.6     TCP 56 51723→5557 [ACK] Seq=95 Ack=224 Win=408064 Len=0 TSval=502263367 TSecr=502263367

As you can see it's taking about 45 microseconds to send and acknowledge each message. At first I thought that the connection was getting re-established on each message but that's not it. So I turned my attention to the receiver...

while(true)
    if(subscriver.recv(&message, ZMQ_NOBLOCK)) {
        // print time
    }
}

By adding the ZMQ_NOBLOCK and polling in a hard while loop I got the time down to 100us. That still seems large and it comes at the price of spiking one core. But I do feel like I understand the problem slightly better. Any insight would be appreciated.