python - 为什么我在使用 Dataflow 管道时遇到 "Error syncing pod"?

标签 python kubernetes google-cloud-dataflow

当我想使用 PyPI 中的特定库时,我在数据流管道中尝试了一个奇怪的错误。

我在 ParDo 中需要 jsonschema,因此,在我的 requirements.txt 文件中,我添加了 jsonschema==3.2.0。 我使用下面的命令行启动管道:

python -m gcs_to_all \
    --runner DataflowRunner \
    --project <my-project-id> \
    --region europe-west1 \
    --temp_location gs://<my-bucket-name>/temp/ \
    --input_topic "projects/<my-project-id>/topics/<my-topic>" \
    --network=<my-network> \
    --subnetwork=<my-subnet> \
    --requirements_file=requirements.txt \
    --experiments=allow_non_updatable_job \
    --streaming  

在终端中,一切似乎都很好:

INFO:root:2020-01-03T09:18:35.569Z: JOB_MESSAGE_BASIC: Worker configuration: n1-standard-4 in europe-west1-b.
INFO:root:2020-01-03T09:18:35.806Z: JOB_MESSAGE_WARNING: The network default doesn't have rules that open TCP ports 12345-12346 for internal connection with other VMs. Only rules with a target tag 'dataflow' or empty target tags set apply. If you don't specify such a rule, any pipeline with more than one worker that shuffles data will hang. Causes: Firewall rules associated with your network don't open TCP ports 12345-12346 for Dataflow instances. If a firewall rule opens connection in these ports, ensure target tags aren't specified, or that the rule includes the tag 'dataflow'.
INFO:root:2020-01-03T09:18:48.549Z: JOB_MESSAGE_DETAILED: Workers have started successfully.

Dataflow 网页上的日志选项卡中没有错误,但在 stackdriver 中:

message: "Error syncing pod 6515c378c6bed37a2c0eec1fcfea300c ("<dataflow-id>--01030117-c9pc-harness-5lkv_default(6515c378c6bed37a2c0eec1fcfea300c)"), skipping: [failed to "StartContainer" for "sdk0" with CrashLoopBackOff: "Back-off 10s restarting failed container=sdk0 pod=<dataflow-id>--01030117-c9pc-harness-5lkv_default(6515c378c6bed37a2c0eec1fcfea300c)""
message: ", failed to "StartContainer" for "sdk1" with CrashLoopBackOff: "Back-off 5m0s restarting failed container=sdk1 pod=<dataflow-id>--01030117-c9pc-harness-5lkv_default(6515c378c6bed37a2c0eec1fcfea300c)"" 
message: ", failed to "StartContainer" for "sdk2" with CrashLoopBackOff: "Back-off 5m0s restarting failed container=sdk2 pod=<dataflow-id>--01030117-c9pc-harness-5lkv_default(6515c378c6bed37a2c0eec1fcfea300c)"" 
message: ", failed to "StartContainer" for "sdk3" with CrashLoopBackOff: "Back-off 5m0s restarting failed container=sdk3 pod=<dataflow-id>--01030117-c9pc-harness-5lkv_default(6515c378c6bed37a2c0eec1fcfea300c)"" 

我也发现了这个错误(在信息模式下):

Collecting jsonschema (from -r /var/opt/google/staged/requirements.txt (line 1))
  Installing build dependencies: started
Looking in links: /var/opt/google/staged
  Installing build dependencies: started
Collecting jsonschema (from -r /var/opt/google/staged/requirements.txt (line 1))
  Installing build dependencies: started
Looking in links: /var/opt/google/staged
Collecting jsonschema (from -r /var/opt/google/staged/requirements.txt (line 1))
  Installing build dependencies: started
  Installing build dependencies: finished with status 'error'
  ERROR: Command errored out with exit status 1:
   command: /usr/local/bin/python3 /usr/local/lib/python3.7/site-packages/pip install --ignore-installed --no-user --prefix /tmp/pip-build-env-mdurhav9/overlay --no-warn-script-location --no-binary :none: --only-binary :none: --no-index --find-links /var/opt/google/staged -- 'setuptools>=40.6.0' wheel
       cwd: None
  Complete output (5 lines):
  Looking in links: /var/opt/google/staged
  Collecting setuptools>=40.6.0
  Collecting wheel
    ERROR: Could not find a version that satisfies the requirement wheel (from versions: none)
  ERROR: No matching distribution found for wheel

但我不知道为什么它可以获得这种依赖...

你知道我该如何调试这个吗?或者为什么我会遇到这个错误?

谢谢

最佳答案

当 Dataflow Worker 启动时,它们会执行几个步骤:

  1. requirements.txt 安装软件包
  2. 安装指定为 extra_packages 的软件包
  3. 安装工作流 tarball 并执行 setup.py 中提供的操作。

Errorsyncing pod with CrashLoopBackOff 消息可能与依赖项冲突有关。您需要验证与作业使用的库和版本不存在冲突。请引用documentation用于暂存管道所需的依赖项。

此外,请查看 preinstalled dependencies还有这个StackOverflow thread

您可以尝试更改 jsonschema 的版本,然后再次尝试运行它。如果没有帮助,请提供 requirements.txt 文件。

希望对你有帮助。

关于python - 为什么我在使用 Dataflow 管道时遇到 "Error syncing pod"?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/59576287/

相关文章:

Python:计算 TP、FP、FN и TN

Python 图像处理 - 如何删除某些轮廓并将值与周围像素混合?

java - 分析在 Google Dataflow 上运行的 Java 应用程序

python - 为什么 python 没有 attrsetter(制作一个需要什么)?

azure - 版本 "2019-12-01"中的 Microsoft.ContainerInstance/containerGroups 类型没有匹配项

Kubernetes NetworkPlugin cni 设置 pod 失败

kubernetes - kubectl:使用带有 map 的自定义列输出

google-cloud-dataflow - 如何确保我的数据流管道可扩展?

java - 无法使用 API 客户端库 (Java) 启动数据流模板,因为我收到无效参数异常

python - 如何让 gobject.idle_add() 通过引用传递参数?