airflow - Apache Airflow : initdb vs resetdb

标签 airflow

“airflow initdb”命令和“airflow resetdb”命令之间究竟有什么区别?

真的有必要有 2 个不同的命令吗?

什么时候使用一种和另一种比较合适?

doc说...

airflow initdb: Initialize the metadata database

airflow resetdb: Burn down and rebuild the metadata database



这并没有告诉我太多。

我最好的猜测是

airflow initdb is to be used only the first time that the database is created from the airflow.cfg
airflow resetdb is to be used if any changes to that configuration are required.



当我运行它们时,它们都不会改变 sqlite 数据库上的时间戳,但 resetdb 似乎更加嘈杂。

Airflow initdb :
(.sandbox) [airflow@localhost airflow]$ airflow initdb
[2020-01-01 21:49:21,603] {settings.py:252} INFO - settings.configure_orm(): Using pool settings. pool_size=5, max_overflow=10, pool_recycle=1800, pid=24917
DB: postgresql+psycopg2://airflow@localhost:5432/airflow_mdb
[2020-01-01 21:49:22,257] {db.py:368} INFO - Creating tables
INFO  [alembic.runtime.migration] Context impl PostgresqlImpl.
INFO  [alembic.runtime.migration] Will assume transactional DDL.
Done.

Airflow 重置数据库 :
(.sandbox) [airflow@localhost airflow]$ airflow resetdb
[2020-01-01 21:49:46,579] {settings.py:252} INFO - settings.configure_orm(): Using pool settings. pool_size=5, max_overflow=10, pool_recycle=1800, pid=25045
DB: postgresql+psycopg2://airflow@localhost:5432/airflow_mdb
This will drop existing tables if they exist. Proceed? (y/n)y
[2020-01-01 21:49:49,984] {db.py:389} INFO - Dropping tables that exist
[2020-01-01 21:49:50,062] {migration.py:154} INFO - Context impl PostgresqlImpl.
[2020-01-01 21:49:50,063] {migration.py:161} INFO - Will assume transactional DDL.
[2020-01-01 21:49:50,070] {db.py:368} INFO - Creating tables
INFO  [alembic.runtime.migration] Context impl PostgresqlImpl.
INFO  [alembic.runtime.migration] Will assume transactional DDL.
INFO  [alembic.runtime.migration] Running upgrade  -> e3a246e0dc1, current schema
INFO  [alembic.runtime.migration] Running upgrade e3a246e0dc1 -> 1507a7289a2f, create is_encrypted
INFO  [alembic.runtime.migration] Running upgrade 1507a7289a2f -> 13eb55f81627, maintain history for compatibility with earlier migrations
INFO  [alembic.runtime.migration] Running upgrade 13eb55f81627 -> 338e90f54d61, More logging into task_instance
INFO  [alembic.runtime.migration] Running upgrade 338e90f54d61 -> 52d714495f0, job_id indices
INFO  [alembic.runtime.migration] Running upgrade 52d714495f0 -> 502898887f84, Adding extra to Log
INFO  [alembic.runtime.migration] Running upgrade 502898887f84 -> 1b38cef5b76e, add dagrun
INFO  [alembic.runtime.migration] Running upgrade 1b38cef5b76e -> 2e541a1dcfed, task_duration
INFO  [alembic.runtime.migration] Running upgrade 2e541a1dcfed -> 40e67319e3a9, dagrun_config
INFO  [alembic.runtime.migration] Running upgrade 40e67319e3a9 -> 561833c1c74b, add password column to user
INFO  [alembic.runtime.migration] Running upgrade 561833c1c74b -> 4446e08588, dagrun start end
INFO  [alembic.runtime.migration] Running upgrade 4446e08588 -> bbc73705a13e, Add notification_sent column to sla_miss
INFO  [alembic.runtime.migration] Running upgrade bbc73705a13e -> bba5a7cfc896, Add a column to track the encryption state of the 'Extra' field in connection
INFO  [alembic.runtime.migration] Running upgrade bba5a7cfc896 -> 1968acfc09e3, add is_encrypted column to variable table
INFO  [alembic.runtime.migration] Running upgrade 1968acfc09e3 -> 2e82aab8ef20, rename user table
INFO  [alembic.runtime.migration] Running upgrade 2e82aab8ef20 -> 211e584da130, add TI state index
INFO  [alembic.runtime.migration] Running upgrade 211e584da130 -> 64de9cddf6c9, add task fails journal table
INFO  [alembic.runtime.migration] Running upgrade 64de9cddf6c9 -> f2ca10b85618, add dag_stats table
INFO  [alembic.runtime.migration] Running upgrade f2ca10b85618 -> 4addfa1236f1, Add fractional seconds to mysql tables
INFO  [alembic.runtime.migration] Running upgrade 4addfa1236f1 -> 8504051e801b, xcom dag task indices
INFO  [alembic.runtime.migration] Running upgrade 8504051e801b -> 5e7d17757c7a, add pid field to TaskInstance
INFO  [alembic.runtime.migration] Running upgrade 5e7d17757c7a -> 127d2bf2dfa7, Add dag_id/state index on dag_run table
INFO  [alembic.runtime.migration] Running upgrade 127d2bf2dfa7 -> cc1e65623dc7, add max tries column to task instance
INFO  [alembic.runtime.migration] Running upgrade cc1e65623dc7 -> bdaa763e6c56, Make xcom value column a large binary
INFO  [alembic.runtime.migration] Running upgrade bdaa763e6c56 -> 947454bf1dff, add ti job_id index
INFO  [alembic.runtime.migration] Running upgrade 947454bf1dff -> d2ae31099d61, Increase text size for MySQL (not relevant for other DBs' text types)
INFO  [alembic.runtime.migration] Running upgrade d2ae31099d61 -> 0e2a74e0fc9f, Add time zone awareness
INFO  [alembic.runtime.migration] Running upgrade d2ae31099d61 -> 33ae817a1ff4, kubernetes_resource_checkpointing
INFO  [alembic.runtime.migration] Running upgrade 33ae817a1ff4 -> 27c6a30d7c24, kubernetes_resource_checkpointing
INFO  [alembic.runtime.migration] Running upgrade 27c6a30d7c24 -> 86770d1215c0, add kubernetes scheduler uniqueness
INFO  [alembic.runtime.migration] Running upgrade 86770d1215c0, 0e2a74e0fc9f -> 05f30312d566, merge heads
INFO  [alembic.runtime.migration] Running upgrade 05f30312d566 -> f23433877c24, fix mysql not null constraint
INFO  [alembic.runtime.migration] Running upgrade f23433877c24 -> 856955da8476, fix sqlite foreign key
INFO  [alembic.runtime.migration] Running upgrade 856955da8476 -> 9635ae0956e7, index-faskfail
INFO  [alembic.runtime.migration] Running upgrade 9635ae0956e7 -> dd25f486b8ea, add idx_log_dag
INFO  [alembic.runtime.migration] Running upgrade dd25f486b8ea -> bf00311e1990, add index to taskinstance
INFO  [alembic.runtime.migration] Running upgrade 9635ae0956e7 -> 0a2a5b66e19d, add task_reschedule table
INFO  [alembic.runtime.migration] Running upgrade 0a2a5b66e19d, bf00311e1990 -> 03bc53e68815, merge_heads_2
INFO  [alembic.runtime.migration] Running upgrade 03bc53e68815 -> 41f5f12752f8, add superuser field
INFO  [alembic.runtime.migration] Running upgrade 41f5f12752f8 -> c8ffec048a3b, add fields to dag
INFO  [alembic.runtime.migration] Running upgrade c8ffec048a3b -> dd4ecb8fbee3, Add schedule interval to dag
INFO  [alembic.runtime.migration] Running upgrade dd4ecb8fbee3 -> 939bb1e647c8, task reschedule fk on cascade delete
INFO  [alembic.runtime.migration] Running upgrade c8ffec048a3b -> a56c9515abdc, Remove dag_stat table
INFO  [alembic.runtime.migration] Running upgrade 939bb1e647c8 -> 6e96a59344a4, Make TaskInstance.pool not nullable
INFO  [alembic.runtime.migration] Running upgrade 6e96a59344a4 -> 74effc47d867, change datetime to datetime2(6) on MSSQL tables
INFO  [alembic.runtime.migration] Running upgrade 939bb1e647c8 -> 004c1210f153, increase queue name size limit
(.sandbox) [airflow@localhost airflow]$ 

当然,您可以将数据库从 sqlite 移动到 postgres。
目前尚不清楚哪种情况适合这种情况。
还不清楚网络服务器和调度程序如何知道在哪里查找配置?
也许他们先查看airflow.cfg 找出数据库在哪里,然后再查看数据库?这似乎是多余的。

最佳答案

resetdb将从元数据数据库中删除所有条目。这包括所有 dag 运行、变量和连接。initdb安装 Airflow 后,仅运行一次。
一般来说,我们不太担心 dag 运行。但是重新创建变量和连接可能会很烦人,因为它们通常包含 secret 和敏感数据,根据安全最佳实践,这些数据可能不会被复制。initdb也是幂等的,因此可以根据您选择的频率运行它,而无需担心数据库更改。

关于airflow - Apache Airflow : initdb vs resetdb,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/59556501/

相关文章:

python - 使用 TaskFlowAPI 在 Apache Airflow 中进行分支

airflow - 如何运行 Airflow DAG 特定次数?

python - 当我旋转 Airflow docker 然后再旋转它时,连接消失

postgresql - Airflow 1.9 - 任务卡在队列中

python - 在调用 Airflow 测试时设置 dag_run.conf 参数

Airflow :从 web-ui 手动触发时,执行日期在未来

运行 Airflow 命令行后,sqlite3 引发错误

Airflow:airflow initdb 命令是否具有破坏性

python - 在 MWAA 中设置 PYTHONPATH

command-line - 激活 Airflow DAG的命令行选项