scala - 如何在 AWS S3 中保存和使用 Spark History Server 日志

标签 scala apache-spark amazon-s3

我想在AWS S3中记录和查看Spark History Server的事件日志。

以下是spark-defaults.conf中记录的属性。

spark.hadoop.fs.s3a.impl          org.apache.hadoop.fs.s3a.S3AFileSystem
spark.hadoop.fs.s3a.endpoint    {{endpoint}}
spark.hadoop.fs.s3a.access.key  {{accessKey}}
spark.hadoop.fs.s3a.secret.key  {{secretKey}}
spark.hadoop.fs.s3a.fast.upload true
spark.hadoop.fs.s3a.block.size  268435456
spark.eventLog.enabled            true
spark.eventLog.dir                s3a://{{bucketName}}/{{path}}
spark.history.fs.logDirectory     s3a://{{bucketName}}/{{path}}

但是,在启动spark历史服务器时,出现以下错误。

20/10/07 14:07:14 INFO S3AFileSystem: Error Message: Status Code: 403, AWS Service: Amazon S3, AWS Request ID: {{requestId}}, AWS Error Code: null, AWS Error Message: Forbidden
20/10/07 14:07:14 INFO S3AFileSystem: HTTP Status Code: 403
20/10/07 14:07:14 INFO S3AFileSystem: AWS Error Code: null
20/10/07 14:07:14 INFO S3AFileSystem: Error Type: Client
20/10/07 14:07:14 INFO S3AFileSystem: Request ID: {{requestId}}
20/10/07 14:07:14 INFO S3AFileSystem: Class Name: com.amazonaws.services.s3.model.AmazonS3Exception
Exception in thread "main" java.lang.reflect.InvocationTargetException
        at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
        at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
        at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
        at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
        at org.apache.spark.deploy.history.HistoryServer$.main(HistoryServer.scala:296)
        at org.apache.spark.deploy.history.HistoryServer.main(HistoryServer.scala)
Caused by: com.amazonaws.services.s3.model.AmazonS3Exception: Status Code: 403, AWS Service: Amazon S3, AWS Request ID: {{requestId}}, AWS Error Code: null, AWS Error Message: Forbidden, S3 Extended Request ID: {{requestId}}
        at com.amazonaws.http.AmazonHttpClient.handleErrorResponse(AmazonHttpClient.java:798)
        at com.amazonaws.http.AmazonHttpClient.executeHelper(AmazonHttpClient.java:421)
        at com.amazonaws.http.AmazonHttpClient.execute(AmazonHttpClient.java:232)
        at com.amazonaws.services.s3.AmazonS3Client.invoke(AmazonS3Client.java:3528)
        at com.amazonaws.services.s3.AmazonS3Client.getObjectMetadata(AmazonS3Client.java:976)
        at com.amazonaws.services.s3.AmazonS3Client.getObjectMetadata(AmazonS3Client.java:956)
        at org.apache.hadoop.fs.s3a.S3AFileSystem.getFileStatus(S3AFileSystem.java:688)
        at org.apache.hadoop.fs.s3a.S3AFileSystem.getFileStatus(S3AFileSystem.java:71)
        at org.apache.spark.deploy.history.FsHistoryProvider.org$apache$spark$deploy$history$FsHistoryProvider$$startPolling(FsHistoryProvider.scala:257)
        at org.apache.spark.deploy.history.FsHistoryProvider.initialize(FsHistoryProvider.scala:211)
        at org.apache.spark.deploy.history.FsHistoryProvider.<init>(FsHistoryProvider.scala:207)
        at org.apache.spark.deploy.history.FsHistoryProvider.<init>(FsHistoryProvider.scala:86)
        ... 6 more

使用上述认证信息调用API时,上传下载效果良好。

配置 Spark History Server 时我错过了什么吗?

最佳答案

我能够让这个工作。 基本上,您需要将相关的 jar 添加到 Spark_Home/jars 目录。 请找到我对类似问题的详细回答: https://stackoverflow.com/a/65086818/6239561

关于scala - 如何在 AWS S3 中保存和使用 Spark History Server 日志,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/64237678/

相关文章:

ScalaCheck 有效/无效的测试边界

java - 删除文件夹及其内容 AWS S3 java

scala - Spark 中的分区

java - 如何替换字符串类型列中的子字符串?

java - Spark 支持 BigInteger 类型吗?

r - 有没有办法在 sparkR 中加载 .RData 或 .model 文件(使用数据 block )?

amazon-web-services - 未找到 AmazonServiceException 类

linux - 日期 : extra operand '+%s'

mysql - Play 2.0 Complex join query如何解析(Anorm)

arrays - Scala 搜索嵌套数组