Framing: Experienced engineer/developer dealing with GRPC and HTTP2 for the first time ever, and dealing with streaming programming for the first time in a long time.
What event(s) do I need to be aware of in order to successfully detect a "failure" (server disconnects expectedly, server disconnects unexpectedly, server has a timeout and goes away, etc.) in a GRPC server when using the @grpc/grpc-js
package?
That is -- we have a GRPC service that's using protobuffers, and we can call/setup a stream something like this
const protoLoader = require('@grpc/proto-loader')
const packageDefinition = protoLoader.loadSync(
__dirname + '/path/to/v1.proto',
{keepCase: true,
longs: String,
enums: String,
defaults: true,
oneofs: true
})
const packageDefinition = grpc.loadPackageDefinition(packageDefinition).com.foo.bar.v1
const client = new packageDefinition.IngestService(
'server.url.here.com:443',
grpc.credentials.createSsl()
)
const stream = client.recordSpan(metadata)
At this point stream
is a ClientDuplexStreamImpl
object, which has Node's native Duplex
as its parent class/object.
The Duplex
object implements both the writable
and readable
stream interfaces, which means it might emit close
,drain
,error
,finish
,pipe
,unpipe
events (writable), or close
,data
,end
,error
,pause
,readable
, resume
events (readable). There also appears to be a metadata
and status
event for the ClientDuplexStreamImpl
object.
What I want to do is setup a stream that is resilient. In my naive mind this is as simple as "If the stream disconnects for any reason, I'll destroy the object and try connecting again with a backoff algorithm".
The challenge my naive mind faces is it's not clear what the difference between close
and end
are, or if the error
channel is just letting me know an error happened or if and error happened and everything's gone away.
Also, because these are stream events, it's also not clear if every sort of server disconnect will be reflected in the stream and whether I need to be looking at different objects (which objects would these be?) to detect the actual state of the connection to the server.
It's also worth mentioning this is for a server whose implementation I don't control.
So -- restating my question: What do I, as the client/consumer of a GRPC service, need to do to make sure I detect the server has "gone away" and that I should attempt to reconnect?
The short answer is that a gRPC call will generally end with a
status
withstatus.code
equal togrpc.status.UNAVAILABLE
, so you should be able to accomplish what you want by listening for a status/error with that code and re-establishing the stream when that happens.First, I want to explain the overall lifecycle of a gRPC request. After starting a request, you will usually first get a
metadata
event containing response headers. Then you will perform some number ofwrite
operations and receive some number ofdata
events. Then the stream will end, and a few semi-redundant events will trigger. Theend
event indicates that there is no more data to read, but has no additional information. Theclose
event might also trigger here, but I never use it. Thestatus
event provides a status object that says how the stream ended. A.code
equal togrpc.status.OK
indicates that the stream completed successfully. In any other case, anerror
event will also be emitted, and theerror
object will additionally have all of the same fields that the status has. You should always listen for theerror
event, because if you do not and one is emitted, Node will automatically bubble it up and throw it as a global exception.If a stream ends for any reason, including a server disconnection, it will end with a
status
event. Network errors, including server disconnections, are usually indicated by theUNAVAILABLE
status code. That code is also used when a connection could not be established at all.For the most part, gRPC is an abstraction over connections anyway. A single gRPC client can be backed by multiple TCP connections, and if a connection is dropped gRPC will automatically try to re-establish the connection.