airflow - Apache Airflow : Executor reports task instance finished (failed) although the task says its queued

标签 airflow executor

我们的 Airflow 安装使用 CeleryExecutor。
并发配置是

# The amount of parallelism as a setting to the executor. This defines
# the max number of task instances that should run simultaneously
# on this airflow installation
parallelism = 16

# The number of task instances allowed to run concurrently by the scheduler
dag_concurrency = 16

# Are DAGs paused by default at creation
dags_are_paused_at_creation = True

# When not using pools, tasks are run in the "default pool",
# whose size is guided by this config element
non_pooled_task_slot_count = 64

# The maximum number of active DAG runs per DAG
max_active_runs_per_dag = 16
[celery]
# This section only applies if you are using the CeleryExecutor in
# [core] section above

# The app name that will be used by celery
celery_app_name = airflow.executors.celery_executor

# The concurrency that will be used when starting workers with the
# "airflow worker" command. This defines the number of task instances that
# a worker will take, so size up your workers based on the resources on
# your worker box and the nature of your tasks
celeryd_concurrency = 16


我们有一个每天执行的 dag。它遵循一种模式并行执行一些任务,即检测数据是否存在于 hdfs 中,然后休眠 10 分钟,最后上传到 s3。

一些任务遇到了以下错误:
2019-05-12 00:00:46,212 ERROR - Executor reports task instance <TaskInstance: example_dag.task1 2019-05-11 04:00:00+00:00 [queued]> finished (failed) although the task says its queued. Was the task killed externally?
2019-05-12 00:00:46,558 INFO - Marking task as UP_FOR_RETRY
2019-05-12 00:00:46,561 WARNING - section/key [smtp/smtp_user] not found in config

这种错误在这些任务中随机发生。发生此错误时,任务实例的状态立即设置为 up_for_retry,并且工作节点中没有日志。经过一些重试,它们最终执行并完成。

这个问题有时会给我们带来很大的 ETL 延迟。有谁知道如何解决这个问题?

最佳答案

我们遇到了类似的问题,已通过
"-x, --donot_pickle"选项。

欲了解更多信息:- https://airflow.apache.org/cli.html#backfill

关于airflow - Apache Airflow : Executor reports task instance finished (failed) although the task says its queued,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/56119107/

相关文章:

python - Airflow DataProcPySparkOperator 不考虑全局区域以外的集群

apache-spark - 在 join 和 reduceByKey 中触发执行程序内存不足

java - 了解 kotlin 执行器

java - 在这种解析文本的场景中无法找出正确的数据结构和正确的方法

java - 如何在 Future 对象中有效使用 isDone()

java - Smalltalk - 可以在Smalltalk中编写Java程序吗?

python-3.x - Dag 可以读取 CSV 行作为运算符(operator)的输入

python-2.7 - Airflow : DAG marked as "success" if one task fails, 由于触发规则 ALL_DONE

python - Airflow 中使用 `BaseBranchOperator` 的多重继承

python - Airflow 快速启动不起作用