背景:
python 中的 pyspark 代码,而不是 pyspark env。
每个代码都可以工作并得到它。但是“有时”,当代码完成并退出时,在 spark.stop() 之后甚至会出现 time.sleep(10) 以下错误。
{{py4j.java_gateway:1038}} INFO - Error while receiving.
Traceback (most recent call last):
File "/usr/lib/python2.7/site-packages/py4j-0.10.4-py2.7.egg/py4j/java_gateway.py", line 1035, in send_command
raise Py4JNetworkError("Answer from Java side is empty")
Py4JNetworkError: Answer from Java side is empty
[2018-11-22 09:06:40,293] {{root:899}} ERROR - Exception while sending command.
Traceback (most recent call last):
File "/usr/lib/python2.7/site-packages/py4j-0.10.4-py2.7.egg/py4j/java_gateway.py", line 883, in send_command
response = connection.send_command(command)
File "/usr/lib/python2.7/site-packages/py4j-0.10.4-py2.7.egg/py4j/java_gateway.py", line 1040, in send_command
"Error while receiving", e, proto.ERROR_ON_RECEIVE)
Py4JNetworkError: Error while receiving
[2018-11-22 09:06:40,293] {{py4j.java_gateway:443}} DEBUG - Exception while shutting down a socket
Traceback (most recent call last):
File "/usr/lib/python2.7/site-packages/py4j-0.10.4-py2.7.egg/py4j/java_gateway.py", line 441, in quiet_shutdown
socket_instance.shutdown(socket.SHUT_RDWR)
File "/usr/lib64/python2.7/socket.py", line 224, in meth
return getattr(self._sock,name)(*args)
File "/usr/lib64/python2.7/socket.py", line 170, in _dummy
raise error(EBADF, 'Bad file descriptor')
error: [Errno 9] Bad file descriptor
我猜原因是父进程 python 尝试从终止的子进程 'jvm' 获取日志消息。但有线的事情是错误并不总是引发......
有什么建议吗?
最佳答案
这个根本原因是“py4j”日志级别。
我将python日志级别设置为DEBUG,这让'py4j'客户端和'java'在关闭pyspark时引发连接错误。
因此,将 python 日志级别设置为 INFO 或更高级别将解决此问题。
引用:Gateway raises an exception when shut down
引用:Tune down the logging level for callback server messages
引用:PySpark Internals
关于python - pyspark 在退出 python 时得到 Py4JNetworkError ("Answer from Java side is empty"),我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/53440309/