python - ###运行时错误: Java gateway process exited before sending its port number

标签 python java apache-spark-sql data-analysis

我尝试用 python 分析这些数据:

from pyspark.sql import SparkSession
from pyspark.sql.types import *
from pyspark.sql.functions import*
spark = SparkSession.builder.getOrCreate()

ds1 = spark.read.csv("C:\\Users\\User\\Desktop\\Trip_data\\202101-divvy-tripdata.csv", 
header=True)
ds2 = spark.read.csv("C:\\Users\\User\\Desktop\\Trip_data\\202102-divvy-tripdata.csv", 
header=True)
ds3 = spark.read.csv("C:\\Users\\User\\Desktop\\Trip_data\\202103-divvy-tripdata.csv", 
header=True)
ds4 = spark.read.csv("C:\\Users\\User\\Desktop\\Trip_data\\202104-divvy-tripdata.csv", 
header=True)
ds5 = spark.read.csv("C:\\Users\\User\\Desktop\\Trip_data\\202105-divvy-tripdata.csv", 
header=True)
ds6 = spark.read.csv("C:\\Users\\User\\Desktop\\Trip_data\\202106-divvy-tripdata.csv", 
header=True)
ds7 = spark.read.csv("C:\\Users\\User\\Desktop\\Trip_data\\202107-divvy-tripdata.csv", 
header=True)
ds8 = spark.read.csv("C:\\Users\\User\\Desktop\\Trip_data\\202108-divvy-tripdata.csv", 
header=True)
ds9 = spark.read.csv("C:\\Users\\User\\Desktop\\Trip_data\\202109-divvy-tripdata.csv", 
header=True)
ds10 = spark.read.csv("C:\\Users\\User\\Desktop\\Trip_data\\202110-divvy-tripdata.csv", 
header=True)
ds11 = spark.read.csv("C:\\Users\\User\\Desktop\\Trip_data\\202111-divvy-tripdata.csv", 
header=True)
ds12 = spark.read.csv("C:\\Users\\User\\Desktop\\Trip_data\\202112-divvy-tripdata.csv", 
header=True)
ds_all=ds1.union(ds2).union(ds3).union(ds4).union(ds5).union(ds6).union(ds7).union(ds8).union(ds9).union(ds10).union(ds11).union(ds12)

print((ds_all.count(), len(ds_all.columns)))

这是我的错误:

Java not found and JAVA_HOME environment variable is not set.
Install Java and set JAVA_HOME to point to the Java installation 
directory.
Traceback (most recent call last):
File "C:\Users\User\PycharmProjects\pythonProject\Case Study 1.py", l 
ine 4, in <module>
spark = SparkSession.builder.getOrCreate()
File "C:\Users\User\PycharmProjects\pythonProject\venv\lib\site- 
packages\pyspark\sql\session.py", line 228, in getOrCreate
sc = SparkContext.getOrCreate(sparkConf)
File "C:\Users\User\PycharmProjects\pythonProject\venv\lib\site- 
packages\pyspark\context.py", line 392, in getOrCreate
SparkContext(conf=conf or SparkConf())
File "C:\Users\User\PycharmProjects\pythonProject\venv\lib\site- 
packages\pyspark\context.py", line 144, in __init__
SparkContext._ensure_initialized(self, gateway=gateway, conf=conf)
File "C:\Users\User\PycharmProjects\pythonProject\venv\lib\site- 
packages\pyspark\context.py", line 339, in _ensure_initialized
SparkContext._gateway = gateway or launch_gateway(conf)
File "C:\Users\User\PycharmProjects\pythonProject\venv\lib\site- 
packages\pyspark\java_gateway.py", line 108, in launch_gateway
raise RuntimeError("Java gateway process exited before sending its 
port number")
RuntimeError: Java gateway process exited before sending its port 
number

我用谷歌搜索了一下,但很多解决方案对我来说非常困惑,我无法理解和遵循。那么有人可以对这个问题有想法吗? 或者pycharm社区有更方便的包来编码? 请给我一些建议,我将不胜感激!

最佳答案

此问题是由缺少 $JAVA_HOME 变量引起的。只需在您的 ~/.bashrc(或 Mac 上的 ~/.zshrc)文件中添加以下行即可设置:

export JAVA_HOME="/path/to/java_home/"

在 Windows 中,您需要在系统设置中添加环境变量 JAVA_HOME

请注意,对于 Spark/pyspark,您需要 Java 版本 >=8。以下是检查 Java 版本的方法:

% $JAVA_HOME/bin/java -version
java version "1.8.0_301"
Java(TM) SE Runtime Environment (build 1.8.0_301-b09)
Java HotSpot(TM) 64-Bit Server VM (build 25.301-b09, mixed mode)

关于python - ###运行时错误: Java gateway process exited before sending its port number,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/71900906/

相关文章:

python - 如何将问题从 bitbucket 转移到 Trac?

python - 列表理解中调用的方法的 mock.patch 字典失败

python - 无法导入随机python

java - 使用 glVertexAttribPointer 和 glDrawElements 从打包的顶点缓冲区中绘制

java - Apache Spark Row 将多个字符串字段转换为单个行,并使用字符串数组转换异常

apache-spark - 如何在 Spark SQL 中创建数据库

python - 为无限循环运行两个类

java - Android 上的日期格式 : Unparseable date

java - 这是重复局部变量问题的优雅解决方案吗?

apache-spark - 为什么 Spark 不根据读取时的 Parquet block 大小创建分区? (相反,它似乎按 Parquet 文件压缩大小进行分区)