Detect a GRPC Server Failure in NodeJS

2.8k views Asked by At

Framing: Experienced engineer/developer dealing with GRPC and HTTP2 for the first time ever, and dealing with streaming programming for the first time in a long time.

What event(s) do I need to be aware of in order to successfully detect a "failure" (server disconnects expectedly, server disconnects unexpectedly, server has a timeout and goes away, etc.) in a GRPC server when using the @grpc/grpc-js package?

That is -- we have a GRPC service that's using protobuffers, and we can call/setup a stream something like this

const protoLoader = require('@grpc/proto-loader')

const packageDefinition = protoLoader.loadSync(
  __dirname + '/path/to/v1.proto',
  {keepCase: true,
    longs: String,
    enums: String,
    defaults: true,
    oneofs: true
  })

const packageDefinition = grpc.loadPackageDefinition(packageDefinition).com.foo.bar.v1
const client = new packageDefinition.IngestService(
  'server.url.here.com:443',
  grpc.credentials.createSsl()
)

const stream = client.recordSpan(metadata)        

At this point stream is a ClientDuplexStreamImpl object, which has Node's native Duplex as its parent class/object.

The Duplex object implements both the writable and readable stream interfaces, which means it might emit close,drain,error,finish,pipe,unpipe events (writable), or close,data,end,error,pause,readable, resume events (readable). There also appears to be a metadata and status event for the ClientDuplexStreamImpl object.

What I want to do is setup a stream that is resilient. In my naive mind this is as simple as "If the stream disconnects for any reason, I'll destroy the object and try connecting again with a backoff algorithm".

The challenge my naive mind faces is it's not clear what the difference between close and end are, or if the error channel is just letting me know an error happened or if and error happened and everything's gone away.

Also, because these are stream events, it's also not clear if every sort of server disconnect will be reflected in the stream and whether I need to be looking at different objects (which objects would these be?) to detect the actual state of the connection to the server.

It's also worth mentioning this is for a server whose implementation I don't control.

So -- restating my question: What do I, as the client/consumer of a GRPC service, need to do to make sure I detect the server has "gone away" and that I should attempt to reconnect?

1

There are 1 answers

1
murgatroid99 On BEST ANSWER

The short answer is that a gRPC call will generally end with a status with status.code equal to grpc.status.UNAVAILABLE, so you should be able to accomplish what you want by listening for a status/error with that code and re-establishing the stream when that happens.

First, I want to explain the overall lifecycle of a gRPC request. After starting a request, you will usually first get a metadata event containing response headers. Then you will perform some number of write operations and receive some number of data events. Then the stream will end, and a few semi-redundant events will trigger. The end event indicates that there is no more data to read, but has no additional information. The close event might also trigger here, but I never use it. The status event provides a status object that says how the stream ended. A .code equal to grpc.status.OK indicates that the stream completed successfully. In any other case, an error event will also be emitted, and the error object will additionally have all of the same fields that the status has. You should always listen for the error event, because if you do not and one is emitted, Node will automatically bubble it up and throw it as a global exception.

If a stream ends for any reason, including a server disconnection, it will end with a status event. Network errors, including server disconnections, are usually indicated by the UNAVAILABLE status code. That code is also used when a connection could not be established at all.

For the most part, gRPC is an abstraction over connections anyway. A single gRPC client can be backed by multiple TCP connections, and if a connection is dropped gRPC will automatically try to re-establish the connection.