google-cloud-dataproc - 云 Composer 任务无法创建 dataproc 集群

标签 google-cloud-dataproc google-cloud-composer airflow

我正在尝试使用云 Composer 运算符创建 dataproc 集群。 我的 DAG 如下所示:

default_dag_args = {
    'owner': 'airflow',
    'depends_on_past': False,
    'start_date': days_ago(1),
    'email': ['****************'],
    'email_on_failure': False,
    'email_on_retry': False,
    'retries': 3,
    'retry_delay': timedelta(minutes=5),
  }
CLUSTER_CONFIG = {
    "master_config": {
        "num_instances": 1,
        "machine_type_uri": "n1-standard-4",
        "disk_config": {"boot_disk_type": "pd-standard", "boot_disk_size_gb": 10},
    },
    "worker_config": {
        "num_instances": 2,
        "machine_type_uri": "n1-standard-4",
        "disk_config": {"boot_disk_type": "pd-standard", "boot_disk_size_gb": 10},
    },
}

with models.DAG(
        'PanelSettings_dag',
        schedule_interval="@daily",
        default_args=default_dag_args) as dag:

    t1 = BashOperator(
        task_id='print_date',
        bash_command='date',
    )
    
    create_cluster = DataprocCreateClusterOperator(
        task_id="create_cluster",
        gcp_conn_id='google-dataproc',
        project_id=GCP_PROJECT_ID,
        cluster_config=CLUSTER_CONFIG,
        region=REGION,
        cluster_name=CLUSTER_NAME,
    )

我已在 Airflow 上创建了 dataproc 连接,并向服务帐户授予了 dataproc 管理员和存储管理员角色。 如果没有这个连接,我会收到错误:

Getting connection using `google.auth.default()` since no key file is defined for hook.

现在我收到错误:

[2021-06-16 21:30:48,109] {taskinstance.py:1152} ERROR - 501 Received http2 header with status: 404
Traceback (most recent call last)
  File "/opt/python3.6/lib/python3.6/site-packages/google/api_core/grpc_helpers.py", line 73, in error_remapped_callabl
    return callable_(*args, **kwargs
  File "/opt/python3.6/lib/python3.6/site-packages/grpc/_channel.py", line 923, in __call_
    return _end_unary_response_blocking(state, call, False, None
  File "/opt/python3.6/lib/python3.6/site-packages/grpc/_channel.py", line 826, in _end_unary_response_blockin
    raise _InactiveRpcError(state
grpc._channel._InactiveRpcError: <_InactiveRpcError of RPC that terminated with
    status = StatusCode.UNIMPLEMENTE
    details = "Received http2 header with status: 404
    debug_error_string = "{"created":"@1623879048.108981125","description":"Received http2 :status header with non-200 OK status","file":"src/core/ext/filters/http/client/http_client_filter.cc","file_line":129,"grpc_message":"Received http2 header with status: 404","grpc_status":12,"value":"404"}

我是 Airflow 新手。有人可以帮忙调试这个吗?无法理解我做错了什么。

最佳答案

错误是在区域字段中输入区域名称。 我纠正了它并且它起作用了。 如果错误提到“区域未找到/不存在”,将会很有帮助。

关于google-cloud-dataproc - 云 Composer 任务无法创建 dataproc 集群,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/68010295/

相关文章:

google-cloud-platform - 如何从 Google Cloud Composer 获取 Airflow db 凭据

scala - Spark BigQuery 连接器 : Writing ARRAY type causes exception: ""Invalid value for: ARRAY is not a valid value""

airflow - Airflow 和 Cloud Composer 中的最大 DAG 数量

google-cloud-composer - Google Cloud Composer v2 运行状况检查似乎是漏报/不稳定

循环中的 Airflow DAG 任务依赖关系

dependencies - 使用Pipenv安装Airflow 1.10.10

python - 如何从查询中捕获值并将其用作另一个查询中的值

hadoop - 如何在将 hive 作业提交到 dataproc 集群时执行 gcp 存储桶中的 hive 查询列表(在我的例子中为 gs :/hive/hive. sql")

google-cloud-platform - 如何在 Google Dataproc 上安排 Spark 作业?

apache-spark - 如何在 Google Dataproc 主节点上启用 pyspark HIVE 支持