I just find that in C++, when using AsyncService, even if I don't request a new request, gRPC will still read data from the network. This caused a huge memory usage in my system.
Detailed Scenario:
I have a client that will send a lot of requests to the server.
On the server-side, I didn't request any requests. The server blocked in cq_->Next(&tag, &ok) but was kept consuming memory. Caused an OOM in my system.
So my question is how to prevent the server from reading data from the network when I don't request a new request? i.e. how to do server-side backpressure so I can save the memory??
Could anyone help me? thanks!
EDIT: Reproduce
I made a simple example for you to reproduce this problem, the code is based on the v1.46.3 tag of the official gRPC code base. I just modified the example to make the server don't request any requests and make the client send more requests. Check this commit for what I modified.
git clone -b v1.46.3_reproduce_oom --depth 1 https://github.com/lixin-wei/grpc.git && cd grpcgit submodule update --initbazel build //examples/cpp/helloworld:all- in one session, start server:
./bazel-bin/examples/cpp/helloworld/greeter_async_server - in aonther session, start client:
./bazel-bin/examples/cpp/helloworld/greeter_async_client2 - keep running
ps -aux | grep greeter_async_server, you'll notice an increasing memory usage in the server.
The server code is examples/cpp/helloworld/greeter_async_server.cc, the client code is examples/cpp/helloworld/greeter_async_client.cc.
One option is to use the
ResourceQuotato restrict buffer memory usage across the server. The size you specify is not an absolute system memory limit, since not all memory in gRPC core/C++ is tracked, but it will result in a cap on the total memory usage.In the server, you can add:
And after a memory cap is reached, adding the error code to the client output, the clients will see something like
On my system, this happens when the server processes reach ~140MB RES memory.
Edit: another option is to set the maximum number of concurrent streams that the server is willing to accept using the
GRPC_ARG_MAX_CONCURRENT_STREAMSchannel argument. Each unary call is a separate RPC, and handled as a separate stream.