Grpc throws error on AWS Batch Job:- Exception occured <_InactiveRpcError of RPC that terminated with:status = StatusCode.UNAVAILABLE

14 views Asked by At

I keep getting this error

ERROR:__main__:Exception occured <_InactiveRpcError of RPC that terminated with:status = StatusCode.UNAVAILABLE details = "Socket closed" debug_error_string = "UNKNOWN:Error received from peer {grpc_message:"Socket closed", grpc_status:14}"

I am running a batch job where i have a grpc server and a client

server.py

def create_grpc_server(dir):

 max_size = 1024 * 1024 * 1024
 ping_interval = int(os.environ.get("GRPC_KEEPALIVE_TIME_MS", "300000"))

 options = [
    ("grpc.max_send_message_length", max_size),
    ("grpc.max_receive_message_length", max_size),
    ("grpc.keepalive_time_ms", ping_interval)
 ]

 grpc_max_workers_env: int = int(os.environ.get("GRPC_MAX_WORKERS", "1"))
 pool_shutdown_timer: int = int(os.environ.get("POOL_SHUTDOWN_TIMEOUT", 30))

 server = grpc.server(
    futures.ThreadPoolExecutor(max_workers=grpc_max_workers_env), options=options
 )
 runner = Runner()
 some_pb2_grpc.add_someservicer_to_server(runner, server)

 print(f"Server listening on internal port {DEFAULT_PORT}", flush=True)
 print(f"Number of initialized gRPC workers {grpc_max_workers_env}", flush=True)

 server.add_insecure_port(f"[::]:{DEFAULT_PORT}")
 server.start()

I have a aws batch job with lets say arraysize with 4. The grpc works fine for arraysize=0,1,2 but it always fails with the above error on last arraysize of aws batch job. Plus on last arraysize job, it fails midway, I can few of them getting processed properly but all of a sudden the error comes up

  1. I tried this solution but still the same error
0

There are 0 answers