pyspark got Py4JNetworkError("Answer from Java side is empty") when exit python

7.3k views Asked by At

Background:

  • spark standalone cluster mode on k8s
  • spark 2.2.1
  • hadoop 2.7.6
  • run code in python, not in pyspark
  • client mode, not cluster mode

The pyspark code in python, not in pyspark env. Every code can work and get it down. But 'sometimes', when the code finish and exit, below error will show up even time.sleep(10) after spark.stop().


{{py4j.java_gateway:1038}} INFO - Error while receiving.
Traceback (most recent call last):
  File "/usr/lib/python2.7/site-packages/py4j-0.10.4-py2.7.egg/py4j/java_gateway.py", line 1035, in send_command
    raise Py4JNetworkError("Answer from Java side is empty")
Py4JNetworkError: Answer from Java side is empty
[2018-11-22 09:06:40,293] {{root:899}} ERROR - Exception while sending command.
Traceback (most recent call last):
  File "/usr/lib/python2.7/site-packages/py4j-0.10.4-py2.7.egg/py4j/java_gateway.py", line 883, in send_command
    response = connection.send_command(command)
  File "/usr/lib/python2.7/site-packages/py4j-0.10.4-py2.7.egg/py4j/java_gateway.py", line 1040, in send_command
    "Error while receiving", e, proto.ERROR_ON_RECEIVE)
Py4JNetworkError: Error while receiving
[2018-11-22 09:06:40,293] {{py4j.java_gateway:443}} DEBUG - Exception while shutting down a socket
Traceback (most recent call last):
  File "/usr/lib/python2.7/site-packages/py4j-0.10.4-py2.7.egg/py4j/java_gateway.py", line 441, in quiet_shutdown
    socket_instance.shutdown(socket.SHUT_RDWR)
  File "/usr/lib64/python2.7/socket.py", line 224, in meth
    return getattr(self._sock,name)(*args)
  File "/usr/lib64/python2.7/socket.py", line 170, in _dummy
    raise error(EBADF, 'Bad file descriptor')
error: [Errno 9] Bad file descriptor

I guess the reason is parent process python try to get log message from terminated child process 'jvm'. But the wired thing is the error not always raise...

Any suggestion?

1

There are 1 answers

0
Jayce Li On

This root-cause is 'py4j' log-level.

I set python log-level to DEBUG, this let the 'py4j' client & 'java' raise connection error when close pyspark.

So setting python log-level to INFO or more higher level will resolve this problem.

ref: Gateway raises an exception when shut down

ref: Tune down the logging level for callback server messages

ref: PySpark Internals