Usage of "advice" in Cumulocity Real Time Notifications

292 views Asked by At

According to http://cumulocity.com/guides/reference/real-time-notifications/ a client that initiates a handshake for receiving real time notifications can include an advice in his request body. This advice can have a timeout ("Max. time in milliseconds between sending of a connect message and response from server.") and an interval ("Period above which server will close session, if not received next connect message from client."). I don't understand these parameters and how they apply to my long polling connections.

  1. why would the server be interested in the timeout that the client uses for its connect call? It's supposed to immediately send notifications as they become available (i.e. "real time"). And indeed it does, as expected. Even when I specify a very short timeout, but then actually use a much longer timeout for my connect, then I still receive notifications that occur between these two points in time without any problem. And when I specify a long timeout I get notifications immediately anyway. It would make sense for lazy notifications, but I don't see mention of those in the documentation. So what is the meaning of this value?
  2. what is the meaning of the interval? If I specify a short interval, but wait for much longer between two consecutive connect calls, then the server does not "close the session", if that would mean that my clientID becomes invalid and I need to do a new handshake. Maybe it is just the guaranteed minimum time, that the server must keep the session? I.e. the maximum time that the client wants to be allowed to wait between connects [and waiting longer might or might not work, at the servers discretion]? It also isn't the time after which the queues actually are purged, because if I trigger a notification while I'm not connected, and reconnect after the interval time has passed, then that notification is still delivered fine.

If we compare that to the SmartREST notifications we see that there it is supposed to work in the opposite way, which, IMHO makes more sense: the server sends the advice to the client, to tell it how it should configure itself. The meaning in this case could still be somewhat ambiguous, but at least the handling could be more straight forward (= just do as the server advises):

  1. timeout: don't use longer timeouts because the server doesn't want to keep connections open for longer than X. Don't user shorter timeouts because the server might need X time to produce all the notifications for the response.
  2. interval: don't reconnect faster than Y because the servers internal notification distribution doesn't even run that fast. Don't wait longer than Y before reconnecting because the internal queues don't buffer notifications for longer than Y. (In the CometD reference these two are named interval and maxInterval, so it is clear that they are independent.)

Why is the "advice direction" reversed in the two scenarios? How (if at all) should I use the advice for regular real time notification handshakes?

Thanks a lot for any clarifications on this.

1

There are 1 answers

0
sbordet On

[Disclaimer: I'm the CometD lead and the Bayeux Protocol maintainer]

While the definition of timeout is correct, the definition of interval is wrong. The correct definition is at the Bayeux Protocol Specification, here.

For clarity, what you refer above as "connect" is actually a message on the /meta/connect channel, which is the heartbeat mechanism of the Bayeux Protocol.

The meaning of timeout is the essence of long polling. In long polling, a poll is held by the server in absence of events to relay to the client. How long the poll is held by the server (again, in absence of events) is what the timeout parameter specifies. That is why it is a timeout: it waits for events, and if none, it times out and replies to the client anyway (with an empty response).

The timeout parameter is typically configured on the server, but the client can override it (in a transient way in every advice it sends) and the server should honor the client value. Typically this is done by the client implementation, rather that by applications - the timeout parameter is opaque for applications.

The meaning of interval is how much time the client waits after receiving a /meta/connect reply before issuing another /meta/connect request. The interval parameter may be configured on both the server and the client.

These 2 parameters work together to tune the long poll.

For example, you can simply achieve a normal poll every 3 seconds by having the pair (timeout=0, interval=3000). The server will see the client requested timeout=0 and should honor that so it will reply immediately, even if no events are available. In turn, the client will wait 3 seconds before issuing another /meta/connect request.

On the other hand, a long poll has, for example, the pair (timeout=10000, interval=0), where the server holds a /meta/connect for at most 10 seconds if there are no events to relay to the client.

An overloaded server may send to the clients an advice with interval=500 to reduce the load it handles. All clients will wait 500 milliseconds on the client side before issuing another /meta/connect message, giving the server time to recover.

The timeout parameter has implications with respect to the TCP connection idle timeout: if timeout is too long, some server (or network components) may close the TCP connection before the server has the chance to reply to the /meta/connect. Java Servlet Containers never close the TCP connection on requests that are pending (per Servlet Specification), but an Apache|Nginx in front of a Java Servlet Container configured to reverse proxy the calls may close the TCP connection earlier than what specified by timeout.

The interval parameter has implications on how long the server should maintain in memory a session for a client that seems to be gone. If interval is too large, the server may expire the session for that client.

If the Cumulocity product is interpreting interval as they say in their documentation, then it's a violation of the Bayeux Protocol. I rather think it's a documentation mistake.