dataframe - 在pyspark中的数据帧上应用udf后出现错误

标签 dataframe error-handling pyspark user-defined-functions

PYSPARK版本2.3.2

我在pyspark中具有以下格式的数据框(df):

>>> df.printSchema
<bound method DataFrame.printSchema of
DataFrame[id: string, 
          F: string, 
          D: string, 
          T: string, 
          S: string, 
          P: string]>

我有以下简化的UDF:
rep = UserDefinedFunction(lambda x: x.replace(":",";"))

我做的:
df1 = df.withColumn("occ", rep(col("D")))

但是在df1.show()之后出现错误:
 df1.show()
 [Stage 9:>                                                       
 (0 + 1) / 1]19/08/23 23:59:15 WARN 
  org.apache.spark.scheduler.TaskSetManager: 
  Lost task 0.0 in stage 9.0 (TID 30, cluster, executor 1):
  java.io.IOException: 
  Cannot run program "/opt/conda/bin/python": 
  error=2, No such file or directory
  at java.lang.ProcessBuilder.start(ProcessBuilder.java:1048)
  at .....
  Caused by: java.io.IOException: error=2, No such file or directory
  19/08/23 23:59:16 ERROR 
  org.apache.spark.scheduler.TaskSetManager: Task 0 in stage 9.0 failed 4 times; aborting job
  Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/lib/spark/python/pyspark/sql/dataframe.py", line 350, in show
  print(self._jdf.showString(n, 20, vertical))
  File "/usr/lib/spark/python/lib/py4j-0.10.7-src.zip/py4j/java_gateway.py", line 1257, in __call__
  File "/usr/lib/spark/python/pyspark/sql/utils.py", line 63, in deco
  return f(*a, **kw)
  File "/usr/lib/spark/python/lib/py4j-0.10.7-src.zip/py4j/protocol.py", line 328, in get_return_value
  py4j.protocol.Py4JJavaError: An error occurred while calling o339.showString.
  .......

最佳答案

看起来您的安装有问题。

  Cannot run program "/opt/conda/bin/python": 
  error=2, No such file or directory

关于dataframe - 在pyspark中的数据帧上应用udf后出现错误,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/57634181/

相关文章:

python - 从 Spark 数据帧中提取 Json 数据

python - Pandas 日期和文本条件

python - 通过在 python (pyspark) 中使用 combinebykey spark rdd 计算组上的聚合

android - Android Studio 未找到默认 Activity 错误

error-handling - ANTLR 4 : How to generate a parse error from an embedded action?

pyspark - pyspark udf 的参数数量可变

python - 过滤 Spark Dataframe 中的列以查找每个元素的百分比

python - 如何从 json 数据创建 DataFrame - 数组中的字典、列表和数组

python - Pandas - 如何对由列表对象组成的列进行子集化?

angular - Angular中的错误处理