amazon-web-services - 在 Apache Spark 中。如何设置worker/executor的环境变量?

标签 amazon-web-services amazon-s3 apache-spark distributed-computing

我在 EMR 上的 spark 程序不断收到此错误:

Caused by: javax.net.ssl.SSLPeerUnverifiedException: peer not authenticated
    at sun.security.ssl.SSLSessionImpl.getPeerCertificates(SSLSessionImpl.java:421)
    at org.apache.http.conn.ssl.AbstractVerifier.verify(AbstractVerifier.java:128)
    at org.apache.http.conn.ssl.SSLSocketFactory.connectSocket(SSLSocketFactory.java:397)
    at org.apache.http.impl.conn.DefaultClientConnectionOperator.openConnection(DefaultClientConnectionOperator.java:148)
    at org.apache.http.impl.conn.AbstractPoolEntry.open(AbstractPoolEntry.java:149)
    at org.apache.http.impl.conn.AbstractPooledConnAdapter.open(AbstractPooledConnAdapter.java:121)
    at org.apache.http.impl.client.DefaultRequestDirector.tryConnect(DefaultRequestDirector.java:573)
    at org.apache.http.impl.client.DefaultRequestDirector.execute(DefaultRequestDirector.java:425)
    at org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:820)
    at org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:754)
    at org.jets3t.service.impl.rest.httpclient.RestStorageService.performRequest(RestStorageService.java:334)
    at org.jets3t.service.impl.rest.httpclient.RestStorageService.performRequest(RestStorageService.java:281)
    at org.jets3t.service.impl.rest.httpclient.RestStorageService.performRestHead(RestStorageService.java:942)
    at org.jets3t.service.impl.rest.httpclient.RestStorageService.getObjectImpl(RestStorageService.java:2148)
    at org.jets3t.service.impl.rest.httpclient.RestStorageService.getObjectDetailsImpl(RestStorageService.java:2075)
    at org.jets3t.service.StorageService.getObjectDetails(StorageService.java:1093)
    at org.jets3t.service.StorageService.getObjectDetails(StorageService.java:548)
    at org.apache.hadoop.fs.s3native.Jets3tNativeFileSystemStore.retrieveMetadata(Jets3tNativeFileSystemStore.java:172)
    at sun.reflect.GeneratedMethodAccessor18.invoke(Unknown Source)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:606)
    at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:190)
    at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:103)
    at org.apache.hadoop.fs.s3native.$Proxy8.retrieveMetadata(Unknown Source)
    at org.apache.hadoop.fs.s3native.NativeS3FileSystem.getFileStatus(NativeS3FileSystem.java:414)
    at org.apache.hadoop.fs.FileSystem.exists(FileSystem.java:1398)
    at org.apache.hadoop.fs.s3native.NativeS3FileSystem.create(NativeS3FileSystem.java:341)
    at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:906)
    at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:887)
    at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:784)

我做了一些研究,发现可以通过设置环境变量在低安全性情况下禁用此身份验证:
com.amazonaws.sdk.disableCertChecking=true

但我只能用 spark-submit.sh --conf 设置它,它只影响驱动程序,而大多数错误都发生在 worker 身上。

有没有办法将它们传播给 worker ?

非常感谢。

最佳答案

刚刚在 Spark documentation 中偶然发现了一些东西:
spark.executorEnv.[EnvironmentVariableName]

Add the environment variable specified by EnvironmentVariableName to the Executor process. The user can specify multiple of these to set multiple environment variables.



所以在你的情况下,我会设置 Spark 配置选项 spark.executorEnv.com.amazonaws.sdk.disableCertCheckingtrue看看这是否有帮助。

关于amazon-web-services - 在 Apache Spark 中。如何设置worker/executor的环境变量?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/29354077/

相关文章:

amazon-web-services - neo4j产生 "No authorization header supplied"错误

scala - Yarn 集群不能平等地管理 vcore,超出队列资源限制

apache-spark - 如何将表转换为 Spark Dataframe

ruby-on-rails - 使用 Cloudfront 从 S3 提供图像

python - 通过 Google Cloud Endpoints 将 UIImage 上传到 AWS S3

mysql - AWS RDS : Load XML From S3?

scala - 将主题映射回 Spark LDA 中的文档

amazon-web-services - 如何在AWS中通过CloudFormation配置 "Instance Protection"?

powershell - 使用 Write-S3Object 时如何设置 Cache-Control?

amazon-web-services - 如何通过 AWS SDK 创建 AWS Athena 分区