python - 运行独立 pyspark 时出现 Windows 错误

标签 python apache-spark localhost

我正在尝试在 Anaconda 中导入 pyspark 并运行示例代码。但是,每当我尝试在 Anaconda 中运行代码时,都会收到以下错误消息。

ERROR:py4j.java_gateway:An error occurred while trying to connect to the Java server (127.0.0.1:53294) Traceback (most recent call last): File "C:\spark\python\lib\py4j-0.10.3-src.zip\py4j\java_gateway.py", line 1021, in send_command self.socket.sendall(command.encode("utf-8")) ConnectionResetError: [WinError 10054] An existing connection was forcibly closed by the remote host

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File "C:\spark\python\lib\py4j-0.10.3-src.zip\py4j\java_gateway.py", line 883, in send_command response = connection.send_command(command) File "C:\spark\python\lib\py4j-0.10.3-src.zip\py4j\java_gateway.py", line 1025, in send_command "Error while sending", e, proto.ERROR_ON_SEND) py4j.protocol.Py4JNetworkError: Error while sending

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File "C:\spark\python\lib\py4j-0.10.3-src.zip\py4j\java_gateway.py", line 827, in _get_connection connection = self.deque.pop() IndexError: pop from an empty deque

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File "C:\spark\python\lib\py4j-0.10.3-src.zip\py4j\java_gateway.py", line 963, in start self.socket.connect((self.address, self.port)) ConnectionRefusedError: [WinError 10061] No connection could be made because the target machine actively refused it ERROR:py4j.java_gateway:An error occurred while trying to connect to the Java server (127.0.0.1:53294) Traceback (most recent call last): File "C:\spark\python\lib\py4j-0.10.3-src.zip\py4j\java_gateway.py", line 827, in _get_connection connection = self.deque.pop() IndexError: pop from an empty deque

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File "C:\spark\python\lib\py4j-0.10.3-src.zip\py4j\java_gateway.py", line 963, in start self.socket.connect((self.address, self.port)) ConnectionRefusedError: [WinError 10061] No connection could be made because the target machine actively refused it ERROR:py4j.java_gateway:An error occurred while trying to connect to the Java server (127.0.0.1:53294) Traceback (most recent call last): File "C:\spark\python\lib\py4j-0.10.3-src.zip\py4j\java_gateway.py", line 827, in _get_connection connection = self.deque.pop() IndexError: pop from an empty deque

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File "C:\spark\python\lib\py4j-0.10.3-src.zip\py4j\java_gateway.py", line 963, in start self.socket.connect((self.address, self.port)) ConnectionRefusedError: [WinError 10061] No connection could be made because the target machine actively refused it ERROR:py4j.java_gateway:An error occurred while trying to connect to the Java server (127.0.0.1:53294) Traceback (most recent call last): File "C:\spark\python\lib\py4j-0.10.3-src.zip\py4j\java_gateway.py", line 827, in _get_connection connection = self.deque.pop() IndexError: pop from an empty deque

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File "C:\spark\python\lib\py4j-0.10.3-src.zip\py4j\java_gateway.py", line 963, in start self.socket.connect((self.address, self.port)) ConnectionRefusedError: [WinError 10061] No connection could be made because the target machine actively refused it ERROR:py4j.java_gateway:An error occurred while trying to connect to the Java server (127.0.0.1:53294) Traceback (most recent call last): File "C:\spark\python\lib\py4j-0.10.3-src.zip\py4j\java_gateway.py", line 827, in _get_connection connection = self.deque.pop() IndexError: pop from an empty deque

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File "C:\spark\python\lib\py4j-0.10.3-src.zip\py4j\java_gateway.py", line 963, in start self.socket.connect((self.address, self.port)) ConnectionRefusedError: [WinError 10061] No connection could be made because the target machine actively refused it Reloaded modules: py4j.protocol, pyspark.sql.context, py4j.java_gateway, py4j.compat, pyspark.profiler, pyspark.sql.catalog, pyspark.context, pyspark.sql.group, pyspark.sql.conf, pyspark.sql.readwriter, pyspark.resultiterable, pyspark.sql, pyspark.sql.dataframe, pyspark.traceback_utils, pyspark.cloudpickle, pyspark.rddsampler, pyspark.accumulators, pyspark.broadcast, py4j, pyspark.rdd, pyspark.sql.functions, pyspark.java_gateway, pyspark.statcounter, pyspark.conf, pyspark.serializers, pyspark.files, pyspark.join, pyspark.sql.streaming, pyspark.shuffle, pyspark, py4j.version, pyspark.sql.session, pyspark.sql.column, py4j.finalizer, py4j.java_collections, pyspark.status, pyspark.sql.window, pyspark.sql.utils, pyspark.storagelevel, pyspark.heapq3, py4j.signals, pyspark.sql.types Traceback (most recent call last):

File "", line 1, in runfile('C:/Users/hlee/Desktop/pyspark.py', wdir='C:/Users/hlee/Desktop')

File "C:\Program Files\Anaconda3\lib\site-packages\spyder\utils\site\sitecustomize.py", line 866, in runfile execfile(filename, namespace)

File "C:\Program Files\Anaconda3\lib\site-packages\spyder\utils\site\sitecustomize.py", line 102, in execfile exec(compile(f.read(), filename, 'exec'), namespace)

File "C:/Users/hlee/Desktop/pyspark.py", line 38, in sc = SparkContext()

File "C:\spark\python\lib\pyspark.zip\pyspark\context.py", line 115, in init conf, jsc, profiler_cls)

File "C:\spark\python\lib\pyspark.zip\pyspark\context.py", line 168, in _do_init self._jsc = jsc or self._initialize_context(self._conf._jconf)

File "C:\spark\python\lib\pyspark.zip\pyspark\context.py", line 233, in _initialize_context return self._jvm.JavaSparkContext(jconf)

File "C:\spark\python\lib\py4j-0.10.3-src.zip\py4j\java_gateway.py", line 1401, in call answer, self._gateway_client, None, self._fqn)

File "C:\spark\python\lib\py4j-0.10.3-src.zip\py4j\protocol.py", line 319, in get_return_value format(target_id, ".", name), value)

Py4JJavaError: An error occurred while calling None.org.apache.spark.api.java.JavaSparkContext. : java.net.BindException: Cannot assign requested address: bind: Service 'sparkDriver' failed after 16 retries! Consider explicitly setting the appropriate port for the service 'sparkDriver' (for example spark.ui.port for SparkUI) to an available port or increasing spark.port.maxRetries. at sun.nio.ch.Net.bind0(Native Method) at sun.nio.ch.Net.bind(Net.java:433) at sun.nio.ch.Net.bind(Net.java:425) at sun.nio.ch.ServerSocketChannelImpl.bind(ServerSocketChannelImpl.java:223) at sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:74) at io.netty.channel.socket.nio.NioServerSocketChannel.doBind(NioServerSocketChannel.java:125) at io.netty.channel.AbstractChannel$AbstractUnsafe.bind(AbstractChannel.java:485) at io.netty.channel.DefaultChannelPipeline$HeadContext.bind(DefaultChannelPipeline.java:1089) at io.netty.channel.AbstractChannelHandlerContext.invokeBind(AbstractChannelHandlerContext.java:430) at io.netty.channel.AbstractChannelHandlerContext.bind(AbstractChannelHandlerContext.java:415) at io.netty.channel.DefaultChannelPipeline.bind(DefaultChannelPipeline.java:903) at io.netty.channel.AbstractChannel.bind(AbstractChannel.java:198) at io.netty.bootstrap.AbstractBootstrap$2.run(AbstractBootstrap.java:348) at io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:357) at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:357) at io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:111) at java.lang.Thread.run(Thread.java:745)



以下是我的示例代码,我在 cmd 中运行 Apache 没有问题。
import os
import sys
spark_path = r"C:\spark"
os.environ['SPARK_HOME'] = spark_path
sys.path.insert(0, spark_path + "/bin")
sys.path.insert(0, spark_path + "/python/pyspark/")
sys.path.insert(0, spark_path + "/python/lib/pyspark.zip")
sys.path.insert(0, spark_path + "/python/lib/py4j-0.10.3-src.zip")

from pyspark import SparkContext

sc = SparkContext()

import random
NUM_SAMPLES = 100000

def sample(p):
    x, y = random.random(), random.random()
    return 1 if x*x + y*y < 1 else 0

count = sc.parallelize(range(0, NUM_SAMPLES)).map(sample) \
              .reduce(lambda a, b: a + b)
print("Pi is roughly %f" % (4.0 * count / NUM_SAMPLES))

我已经下载了 winutils.exe 并在环境变量中添加了 HADOOP_HOME 变量,并在 spark-env.sh 文件中添加了 export SPARK_MASTER_IP=127.0.0.1, export SPARK_LOCAL_IP=127.0.0.1。但是,我仍然遇到相同的错误。有人可以帮助我并指出我缺少什么吗?

先感谢您,

最佳答案

就我而言,我只需要重新启动内核。

问题是我创建了两次环境:每次我犯错误时,我都会从头开始重新运行代码。

关于python - 运行独立 pyspark 时出现 Windows 错误,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/41049330/

相关文章:

python - 如何让helper脚本中的函数不需要带参数?

python - 随机时间生成

python - SQL Server 2017中使用python调用API并存储数据

node.js - Angular App 指向本地主机而不是远程服务器进行数据请求

mysql - 当我在循环 TimeoutError : ResourceRequest timed out 内启动多个选择查询时出现错误

php - 使用本地主机测试 Facebook 应用程序

python - 如果我通过conda安装tf,是否需要手动安装tensorflow-gpu的CUDA驱动

hadoop - Spark Parquet block 尺寸不均匀

apache-spark - Spark - 按键对 DStream 进行排序并限制为 5 个值

python - 用 pyspark 替换数据框中一列的所有值