python - PySpark异常: Java gateway process exited before sending its port number

标签 python apache-spark pyspark

我运行 Windows 10 并通过 Anaconda3 安装了 Python3。我正在使用 Jupyter 笔记本。我已经安装了 Spark from here (spark-2.3.0-bin-hadoop2.7.tgz)。我已提取文件并将它们粘贴到我的目录 D:\Spark 中。我修改了环境变量:

用户变量:

变量:SPARK_HOME

值:D:\Spark

系统变量:

变量:路径

值:D:\Spark\bin

我已经通过 conda 安装/更新了以下模块:

Pandas

numpy

pyarrow

pyspark

py4j

Java 已安装:

enter image description here

我不知道这是否相关,但在我的环境变量中出现以下两个变量:

enter image description here

完成所有这些操作后,我重新启动并运行以下代码,这会产生一条错误消息,我将其粘贴到此处:

import pandas as pd

import seaborn as sns

# These lines enable the run of spark commands

from pyspark.context import SparkContext
from pyspark.sql.session import SparkSession
sc = SparkContext('local')
spark = SparkSession(sc)

import pyspark

data = sns.load_dataset('iris')

data_sp = spark.createDataFrame(data)

data_sp.show()

---------------------------------------------------------------------------
Exception                                 Traceback (most recent call last)
<ipython-input-1-ec964ecd39a2> in <module>()
      7 from pyspark.context import SparkContext
      8 from pyspark.sql.session import SparkSession
----> 9 sc = SparkContext('local')
     10 spark = SparkSession(sc)
     11 

C:\ProgramData\Anaconda3\lib\site-packages\pyspark\context.py in __init__(self, master, appName, sparkHome, pyFiles, environment, batchSize, serializer, conf, gateway, jsc, profiler_cls)
    113         """
    114         self._callsite = first_spark_call() or CallSite(None, None, None)
--> 115         SparkContext._ensure_initialized(self, gateway=gateway, conf=conf)
    116         try:
    117             self._do_init(master, appName, sparkHome, pyFiles, environment, batchSize, serializer,

C:\ProgramData\Anaconda3\lib\site-packages\pyspark\context.py in _ensure_initialized(cls, instance, gateway, conf)
    296         with SparkContext._lock:
    297             if not SparkContext._gateway:
--> 298                 SparkContext._gateway = gateway or launch_gateway(conf)
    299                 SparkContext._jvm = SparkContext._gateway.jvm
    300 

C:\ProgramData\Anaconda3\lib\site-packages\pyspark\java_gateway.py in launch_gateway(conf)
     92 
     93             if not os.path.isfile(conn_info_file):
---> 94                 raise Exception("Java gateway process exited before sending its port number")
     95 
     96             with open(conn_info_file, "rb") as info:

Exception: Java gateway process exited before sending its port number

如何让 PySpark 工作?

最佳答案

我按照此处的说明解决了问题:https://changhsinlee.com/install-pyspark-windows-jupyter/

关于python - PySpark异常: Java gateway process exited before sending its port number,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/54179205/

相关文章:

scala - 如何将数据帧转换为标签特征向量?

sql - Spark sql 查询与数据帧函数

apache-spark - 在多个字段上加入两个 Spark 数据帧

apache-spark - 监控 Spark 实际工作时间与通信时间

python - 将 Numpy 数组保存为图像(说明)

python - 此表达式在 Python 中为真 : {}. keys().insert(0, "") == None。为什么?

python - 使用 boost::python,如何将结构 vector 作为字典列表返回给 Python?

python - PyQT 线程和套接字安全,并捕获多个信号

postgresql - 在 pyspark 中使用 jdbc jar

pyspark - 如何将日期类型的列转换为日期时间,并向其中添加一些分钟?