使用 2.11.0 版本运行 DataFlow 流作业。 几个小时后我收到以下身份验证错误:
File "streaming_twitter.py", line 188, in <lambda>
File "streaming_twitter.py", line 102, in estimate
File "streaming_twitter.py", line 84, in estimate_aiplatform
File "streaming_twitter.py", line 42, in get_service
File "/usr/local/lib/python2.7/dist-packages/googleapiclient/_helpers.py", line 130, in positional_wrapper return wrapped(*args, **kwargs)
File "/usr/local/lib/python2.7/dist-packages/googleapiclient/discovery.py", line 227, in build credentials=credentials)
File "/usr/local/lib/python2.7/dist-packages/googleapiclient/_helpers.py", line 130, in positional_wrapper return wrapped(*args, **kwargs)
File "/usr/local/lib/python2.7/dist-packages/googleapiclient/discovery.py", line 363, in build_from_document credentials = _auth.default_credentials()
File "/usr/local/lib/python2.7/dist-packages/googleapiclient/_auth.py", line 42, in default_credentials credentials, _ = google.auth.default()
File "/usr/local/lib/python2.7/dist-packages/google/auth/_default.py", line 306, in default raise exceptions.DefaultCredentialsError(_HELP_MESSAGE) DefaultCredentialsError: Could not automatically determine credentials. Please set GOOGLE_APPLICATION_CREDENTIALS or explicitly create credentials and re-run the application.
此 Dataflow 作业对 AI Platform 预测执行 API 请求 并且似乎身份验证 token 即将到期。
代码片段:
def get_service():
# If it hasn't been instantiated yet: do it now
return discovery.build('ml', 'v1',
discoveryServiceUrl=DISCOVERY_SERVICE,
cache_discovery=True)
我尝试将以下几行添加到服务函数中:
os.environ[
"GOOGLE_APPLICATION_CREDENTIALS"] = "/tmp/key.json"
但我得到:
DefaultCredentialsError: File "/tmp/key.json" was not found. [while running 'generatedPtransform-930']
我认为是因为文件不在 DataFlow 机器中。
其他选项是在构建方法中使用 developerKey
参数,但 AI Platform 预测似乎不支持,我收到错误:
Expected OAuth 2 access token, login cookie or other valid authentication credential. See https://developers.google.com/identity/sign-in/web/devconsole-project."> [while running 'generatedPtransform-22624']
想要了解如何解决该问题以及最佳实践是什么?
有什么建议吗?
最佳答案
设置os.environ['GOOGLE_APPLICATION_CREDENTIALS'] = '/tmp/key.json'
仅适用于本地 DirectRunner。一旦部署到像 Dataflow 这样的分布式运行器,每个工作线程将无法找到本地文件 /tmp/key.json
.
如果您希望每个工作人员使用特定的服务帐户,您可以告诉 Beam 使用哪个服务帐户来识别工作人员。
首先,grant the roles/dataflow.worker
role to the service account您希望您的员工使用。无需下载服务帐户 key 文件:)
那么如果你让 PipelineOptions
解析你的命令行参数,你可以简单地使用 service_account_email
option ,并将其指定为 --service_account_email your-email@your-project.iam.gserviceaccount.com
运行管道时。
您的 GOOGLE_APPLICATION_CREDENTIALS
指向的服务帐户只是用于启动作业,但每个工作线程都使用 service_account_email
指定的服务帐户。如果 service_account_email
不传则默认为您 GOOGLE_APPLICATION_CREDENTIALS
的邮件文件。
关于python - Google Cloud DataFlow 作业在几个小时后发出警报,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/58723809/