Worker中的Python有不同的版本: environment variables are set correctly

标签 python python-3.x apache-spark pyspark

我正在 Linux Mint 上的 Jupyter 笔记本上运行 Python 脚本。

代码并不重要,但它就在这里(这是图框的教程):

import pandas
import pyspark

from functools import reduce
from graphframes import *
from IPython.display import display, HTML
from pyspark.context import SparkContext
from pyspark.sql import SQLContext
from pyspark.sql.functions import col, lit, when
from pyspark.sql.session import SparkSession

sc = SparkContext.getOrCreate()
sqlContext = SQLContext.getOrCreate(sc)
spark = SparkSession(sc)

vertices = sqlContext.createDataFrame(
    [
        ("a", "Alice", 34),
        ("b", "Bob", 36),
        ("c", "Charlie", 30),
        ("d", "David", 29),
        ("e", "Esther", 32),
        ("f", "Fanny", 36),
        ("g", "Gabby", 60),
    ],
    ["id", "name", "age"],
)

edges = sqlContext.createDataFrame(
    [
        ("a", "b", "friend"),
        ("b", "c", "follow"),
        ("c", "b", "follow"),
        ("f", "c", "follow"),
        ("e", "f", "follow"),
        ("e", "d", "friend"),
        ("d", "a", "friend"),
        ("a", "e", "friend"),
    ],
    ["src", "dst", "relationship"],
)

g = GraphFrame(vertices, edges)

display(g.inDegrees.toPandas())

最后一行是引起问题的行,它给出以下错误:

Exception: Python in worker has different version 2.7 than that in driver 3.6, PySpark cannot run with different minor versions.Please check environment variables PYSPARK_PYTHON and PYSPARK_DRIVER_PYTHON are correctly set.

这两个变量设置正确:

printenv PYSPARK_PYTHON
-> /usr/bin/python3
printenv PYSPARK_DRIVER_PYTHON
-> /usr/bin/python3

我还将它们添加到我的 spark-env.sh 文件中,如下所示:

# This file is sourced when running various Spark programs.
# Copy it as spark-env.sh and edit that to configure Spark for your site.

export PYSPARK_PYTHON=/usr/bin/python3       
export PYSPARK_DRIVER_PYTHON=/usr/bin/python3   

但是错误仍然存​​在,我还能在哪里更新这些变量?

编辑

python --version
Python 3.7.4

pip3 list | grep jupyter
jupyter               1.0.0      
jupyter-client        5.3.4      
jupyter-console       6.0.0      
jupyter-core          4.6.1      
jupyterlab            1.1.4      
jupyterlab-server     1.0.6     

pip3 list | grep pyspark
pyspark               2.4.4

最佳答案

问题更可能是 python 版本冲突。将 PYSPARK_PYTHONPYSPARK_DRIVER_PYTHON 设置为 /usr/bin/python。 或者您可以使用venv

cd ~
python3 -m venv spark_test
cd spark_test
source ./bin/activate
pip3 install jupyterlab pyspark graphframes
jupyter notebook

您必须将 jupyter 文件放入新创建的文件夹中。

关于Worker中的Python有不同的版本: environment variables are set correctly,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/58556526/

相关文章:

python - 如何定义接受类型参数的 Python 类

python - 使用 Python 将相对 URL 转换为完全限定的 URL

apache-spark - 清理 pyspark 数据框中的列值

scala - 与在本地加入大文本文件有关的性能问题

python - 使用 Python PIL 和 Windows API : how to deal with rounded corners? 的事件窗口屏幕截图

python - 计算多个 xarray 变量的平均值

python - 抓取电子邮件地址时无法删除不需要的东西

python-3.x - 通过从目录中读取所有.txt文件来创建一个JSON对象

python-3.x - 在另一个数组中高效查找下一个更大的

apache-spark - sbt - 构建 Spark 需要太多时间