apache-spark - 无法写入 EMR 内的 s3 并获取 NPE

标签 apache-spark amazon-s3 amazon-emr

有人见过这个吗?可怕的 NPE,所以不知道是什么原因造成的。

尝试通过 EMR 集群中的 Spark .write.mode(SaveMode.Overwrite).parquet 写入 s3。

Exception in thread "main" java.lang.NullPointerException
    at com.amazon.ws.emr.hadoop.fs.s3.lite.S3Errors.isHttp200WithErrorCode(S3Errors.java:57)
    at com.amazon.ws.emr.hadoop.fs.s3.lite.executor.GlobalS3Executor.execute(GlobalS3Executor.java:100)
    at com.amazon.ws.emr.hadoop.fs.s3.lite.AmazonS3LiteClient.invoke(AmazonS3LiteClient.java:184)
    at com.amazon.ws.emr.hadoop.fs.s3.lite.AmazonS3LiteClient.deleteObjects(AmazonS3LiteClient.java:127)
    at com.amazon.ws.emr.hadoop.fs.s3n.Jets3tNativeFileSystemStore.deleteAll(Jets3tNativeFileSystemStore.java:364)
    at com.amazon.ws.emr.hadoop.fs.s3n.S3NativeFileSystem.doSingleThreadedBatchDelete(S3NativeFileSystem.java:1380)
    at com.amazon.ws.emr.hadoop.fs.s3n.S3NativeFileSystem.delete(S3NativeFileSystem.java:663)
    at com.amazon.ws.emr.hadoop.fs.EmrFileSystem.delete(EmrFileSystem.java:337)
    at org.apache.spark.internal.io.FileCommitProtocol.deleteWithJob(FileCommitProtocol.scala:124)
    at org.apache.spark.sql.execution.datasources.InsertIntoHadoopFsRelationCommand.deleteMatchingPartitions(InsertIntoHadoopFsRelationCommand.scala:223)
    at org.apache.spark.sql.execution.datasources.InsertIntoHadoopFsRelationCommand.run(InsertIntoHadoopFsRelationCommand.scala:122)
    at org.apache.spark.sql.execution.command.DataWritingCommandExec.sideEffectResult$lzycompute(commands.scala:104)
    at org.apache.spark.sql.execution.command.DataWritingCommandExec.sideEffectResult(commands.scala:102)
    at org.apache.spark.sql.execution.command.DataWritingCommandExec.doExecute(commands.scala:122)
    at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:131)
    at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:127)
    at org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:156)
    at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
    at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:152)
    at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:127)
    at org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:80)
    at org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:80)
    at org.apache.spark.sql.DataFrameWriter$$anonfun$runCommand$1.apply(DataFrameWriter.scala:676)
    at org.apache.spark.sql.DataFrameWriter$$anonfun$runCommand$1.apply(DataFrameWriter.scala:676)
    at org.apache.spark.sql.execution.SQLExecution$$anonfun$withNewExecutionId$1.apply(SQLExecution.scala:78)
    at org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:125)
    at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:73)
    at org.apache.spark.sql.DataFrameWriter.runCommand(DataFrameWriter.scala:676)
    at org.apache.spark.sql.DataFrameWriter.saveToV1Source(DataFrameWriter.scala:285)
    at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:271)
    at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:229)
    at org.apache.spark.sql.DataFrameWriter.parquet(DataFrameWriter.scala:566)
...

最佳答案

根据 AWS 的说法,这是一个已知错误,并已在 EMR 5.28.0 中修复,因此只需升级 EMR。

关于apache-spark - 无法写入 EMR 内的 s3 并获取 NPE,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/59267457/

相关文章:

emr - emrfs-site.xml 在哪里?

hadoop - 具有 6 到 7 个节点硬件配置的分布式 Spark 和 HDFS 集群

python - 如何在一次传递的 Spark 中添加元组列表中第二个元素的长度?

scala - 如何在 Scala 或 Java 中模拟或 stub AWS SDK S3 存储桶调用?

Python Lambda 函数中断且 CloudWatch 中未记录任何错误

amazon-web-services - EMR 步骤的顺序

hadoop - Spark : yarn cluster mode can't read hdfs path (No such file or directory)

scala - Spark : how to zip an RDD with each partition of the other RDD

pandas - 如何将数据从 S3 存储桶加载到 Sagemaker jupyter 笔记本来训练模型?

amazon-web-services - 带有 Flink 的 AWS EMR 集群不运行任何 Jar,而是给出 java.lang.NoSuchMethodError