google-cloud-platform - dbt 和 google cloud composer PyPI 依赖问题

标签 google-cloud-platform google-cloud-composer dbt

我目前运行的是 Composer 版本 2.0.9 和 airflow 版本 2.1.4 的 Google Cloud Composer。我正在尝试安装最新版本的 dbt(1.0.4 用于核心,1.0.0 用于 BigQuery 插件)。因为云堆肥图像安装了特定的包,我遇到了相互冲突的 PyPI 依赖性问题。当我尝试修复一个依赖项时,会出现另一个问题。有谁知道安装的特定软件包集可以解决此问题?我已阅读社区的以下帖子,但我想知道是否有人有仅使用 composer 的解决方案?

How to run DBT in airflow without copying our repo

How to set up dbt with Google Cloud Composer?

最佳答案

我能够重现您所看到的行为。以下是我在 Cloud Build 日志中看到的依赖冲突。这些冲突发生在 dbt-core 要求和 Composer 中的预安装包要求之间。

预装包要求:

hologram 0.0.14 has requirement jsonschema<3.2,>=3.0, but you have jsonschema 3.2.0. ##=> can be installed manually
flask 1.1.4 has requirement click<8.0,>=5.1, but you have click 8.1.2.
apache-airflow 2.1.4+composer has requirement markupsafe<2.0,>=1.1.1, but you have markupsafe 2.0.1.
looker-sdk 22.4.0 has requirement typing-extensions>=4.1.1, but you have typing-extensions 3.10.0.2.

dbt 核心要求:

hologram 0.0.14 has requirement jsonschema<3.2,>=3.0, but you have jsonschema 3.2.0. ##=> can be installed manually
dbt-core 1.0.4 has requirement click<9,>=8, but you have click 7.1.2.
dbt-core 1.0.4 has requirement MarkupSafe==2.0.1, but you have markupsafe 1.1.1.
dbt-core 1.0.4 has requirement typing-extensions<3.11,>=3.7.4, but you have typing-extensions 4.1.1.

我尝试降级预装包,但后续包安装失败,它是 not recommended

因此,我建议使用 this thread 中所述的外部解决方案你有联系。在这里引用@Ryan Yuan 的回答中给出的解决方法。

  1. Using external services to run dbt jobs, e.g. Cloud Run.
  2. Using Composer's KubernetesPodOperator(updated Composer 2 link). My colleague has put up a nice article on dbt discourse here going through the setup process.
  3. Ignoring Composer's Dependency conflicts by setting Composer's environmental variable IGNORE_PYPI_DEPENDENCY_CONFLICTS to True. However, I don't recommend this as it may cause potential issues.
  4. Creating a Python virtual environment in Composer and install the dbt packages.

关于google-cloud-platform - dbt 和 google cloud composer PyPI 依赖问题,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/71899387/

相关文章:

ubuntu - 在 Google Cloud VM 中找不到 Conda 命令

amazon-web-services - 在 GCP 中替代 AWS 的安全组?

logging - 向 Google Cloud Build 日志输出添加时间戳

amazon-s3 - 无法从私有(private) Google 云 Composer 访问 AWS s3 存储桶

sql - 未定义行数的计算

javascript - 如何在 JavaScript 中使用 OR 运算符从 Firestore 检索数据?

airflow - Composer 2/GKE Autopilot 集群 PodOperator 任务的工作负载身份和服务帐户

python - Airflow : pass parameter from python function to MySQL Operator

sql - 如何提高从 dbt 到 Snowflake 的多个 SQL 查询的性能?

sql - 将参数传递给 ref dbt jinja