python - 带 Spark 的 iPython 笔记本在 SparkContext 中出现错误

标签 python apache-spark ipython pyspark jupyter-notebook

我正在我的 macbook osx 10.10.5 上使用此示例测试 turi https://turi.com/learn/gallery/notebooks/spark_and_graphlab_create.html

到达这一步时

# Set up the SparkContext object
# this can be 'local' or 'yarn-client' in PySpark
# Remember if using yarn-client then all the paths should be accessible
# by all nodes in the cluster.
sc = SparkContext('local')

出现以下错误

---------------------------------------------------------------------------
Exception                                 Traceback (most recent call last)
<ipython-input-12-dc1befb4186c> in <module>()
      3 # Remember if using yarn-client then all the paths should be accessible
      4 # by all nodes in the cluster.
----> 5 sc = SparkContext()

/usr/local/Cellar/apache-spark/1.6.2/libexec/python/pyspark/context.pyc in __init__(self, master, appName, sparkHome, pyFiles, environment, batchSize, serializer, conf, gateway, jsc, profiler_cls)
    110         """
    111         self._callsite = first_spark_call() or CallSite(None, None, None)
--> 112         SparkContext._ensure_initialized(self, gateway=gateway)
    113         try:
    114             self._do_init(master, appName, sparkHome, pyFiles, environment, batchSize, serializer,

/usr/local/Cellar/apache-spark/1.6.2/libexec/python/pyspark/context.pyc in _ensure_initialized(cls, instance, gateway)
    243         with SparkContext._lock:
    244             if not SparkContext._gateway:
--> 245                 SparkContext._gateway = gateway or launch_gateway()
    246                 SparkContext._jvm = SparkContext._gateway.jvm
    247 

/usr/local/Cellar/apache-spark/1.6.2/libexec/python/pyspark/java_gateway.pyc in launch_gateway()
     92                 callback_socket.close()
     93         if gateway_port is None:
---> 94             raise Exception("Java gateway process exited before sending the driver its port number")
     95 
     96         # In Windows, ensure the Java child processes do not linger after Python has exited.

Exception: Java gateway process exited before sending the driver its port number

快速谷歌搜索还没有提供任何帮助。

这是我的 .bash_profile

# added by Anaconda2 4.1.1 installer
export PATH="/Users/me/anaconda/bin:$PATH"

export SCALA_HOME=/usr/local/Cellar/scala/2.11.8/libexec
export SPARK_HOME=/usr/local/Cellar/apache-spark/1.6.2/libexec
export PYTHONPATH=$SPARK_HOME/python/pyspark:$PYTHONPATH
export PYTHONPATH=$SPARK_HOME/python/lib/py4j-0.9-src.zip:$PYTHONPATH 
export PYTHONPATH=$SPARK_HOME/python/:$PYTHONPATH

有人知道如何修复这个错误吗?

谢谢

最佳答案

发生这种情况的原因有两个:

  1. 环境变量 SPARK_HOME 可能指向错误的路径
  2. 设置 export PYSPARK_SUBMIT_ARGS="--master local[2]" - 这是您希望 PySpark 启动时使用的配置。

关于python - 带 Spark 的 iPython 笔记本在 SparkContext 中出现错误,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/38354580/

相关文章:

python - 删除 Jupyter Notebook 中的 plotly 子图中的 "This is the format of your plot grid:"

matplotlib - 在 IPython Notebook 中缩放散点图

python - PyCharm 中是否有多个类似游标的功能?

python - Pandas `DataFrameGroupBy` 和 `SeriesGroupBy`

java - Spark - 为什么在打印 RDD 之前需要收集()到驱动程序节点?不能并行吗?

scala - 如何使用 spark-submit(类似于 Python 脚本)运行 Scala 脚本?

python - 为什么在使用 pyinstaller 构建 .exe 时出现 ImportError?

python - flask 在哪里寻找图像文件?

python - 尝试递增 db.IntegerProperty 时出错 : TypeError: can only concatenate tuple (not "int") to tuple

java - 如何以编程方式检测 Databricks 环境