apache-spark - Pyspark 中的环境变量

标签 apache-spark pyspark centos

我已经在集群模式下安装了 hadoop,现在我已经安装了 Spark。我想使用 pyspark,这是我的 .bashrc

# User specific aliases and functions
export HADOOP_HOME=/opt/hadoop
export PATH=$PATH:$HADOOP_HOME/bin:$HADOOP_HOME/sbin:/opt/hadoop/spark/bin:/opt/hadoop/spark/sbin
export JAVA_HOME=/usr/java/jdk1.8.0_202-amd64
#Estas variables las metemos con spark
export HADOOP_CONF_DIR=$HADOOP_HOME/etc/hadoop
export SPARK_DIST_CLASSPATH=$(hadoop classpath)
export SPARK_HOME=/opt/hadoop/spark
#Para pyspark
export PYTHONPATH=$SPARK_HOME/python:$SPARK_HOME/python/lib/py4j-0.10.9.3-src.zip:$PYTHONPATH
export PATH=$SPARK_HOME/python:$PATH
export PYSPARK_PYTHON=/usr/bin/python2.7
export PYSPARK_DRIVER_PYTHON=/usr/bin/python2.7
当我运行 pyspark命令会发生以下情况:
[hadoop@nodo1 ~]$ pyspark
Python 2.7.5 (default, Nov 16 2020, 22:23:17) 
[GCC 4.8.5 20150623 (Red Hat 4.8.5-44)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
Traceback (most recent call last):
  File "/opt/hadoop/spark/python/pyspark/shell.py", line 29, in <module>
    from pyspark.context import SparkContext
  File "/opt/hadoop/spark/python/pyspark/__init__.py", line 53, in <module>
    from pyspark.rdd import RDD, RDDBarrier
  File "/opt/hadoop/spark/python/pyspark/rdd.py", line 34, in <module>
    from pyspark.java_gateway import local_connect_and_auth
  File "/opt/hadoop/spark/python/pyspark/java_gateway.py", line 31, in <module>
    from pyspark.find_spark_home import _find_spark_home
  File "/opt/hadoop/spark/python/pyspark/find_spark_home.py", line 68
    print("Could not find valid SPARK_HOME while searching {0}".format(paths), file=sys.stderr)
                                                                                   ^
SyntaxError: invalid syntax
我在用Hadoop 3.2.3Spark 3.1.2Python 2.7.5CentOs 7错误在哪里?

最佳答案

问题出在 Python 版本中。安装 Python3 通过保留以下环境变量解决了该问题:

export HADOOP_HOME=/opt/hadoop
export PATH=$PATH:$HADOOP_HOME/bin:$HADOOP_HOME/sbin:/opt/hadoop/spark/bin:/opt/hadoop/spark/sbin
export JAVA_HOME=/usr/java/jdk1.8.0_202-amd64
export HADOOP_CONF_DIR=$HADOOP_HOME/etc/hadoop
export SPARK_DIST_CLASSPATH=$(hadoop classpath)
export SPARK_HOME=/opt/hadoop/spark

关于apache-spark - Pyspark 中的环境变量,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/72174368/

相关文章:

datetime - 从 Pyspark 中包含时间戳的字符串列中提取日期

python - 流式 pyspark 应用程序中的连接池

linux - Perl Cron Jobs = 高服务器负载?

linux - 将 Sonatype Nexus 2 安装为服务

apache-spark - Spark 独立集群 :Configuring Distributed File System

Apache Spark 中的 java 要求

hadoop - 无法在 Windows 10 中启动 Spark Master

java - Spark 作业无法连接到 Cassandra

pyspark检查一个巨大列表中的元素

linux - Asterisk CDR 向 MSSQL 报告