arangodb - 非索引属性 FILTER 的查询规则

标签 arangodb

我发现这两个 AQL 语句与包含大约 20 个 Mio 记录的数据库集之间存在巨大的运行时差异:

FOR e IN EAll
    FILTER e.lastname == "Kmp"  // <-- skip-index
    FILTER e.lastpaff != ""     // <-- no index
RETURN e
// runs in less than a second

并且

FOR e IN EAll
    FILTER e.lastpaff != ""     // <-- no index
    FILTER e.lastname == "Kmp"  // <-- skip-index
RETURN e
// needs about a minute to execute.

除了被索引(或不被索引)之外,这些语句的选择性也有很大不同:indexedAttribute 具有高度选择性,而 nonIndexedAttribute 仅过滤 50%。

是否可能还没有优化规则?我目前使用的是 ArangoDB 2.4.0。

详细信息:

索引属性上有一个 SKIP-Index(似乎在执行计划 1 中使用)。 以下是执行计划,其中仅更改了过滤器的顺序:

    FAST QUERY:

    arangosh [Uni]> stmt.explain()
    {
      "plan" : {
        "nodes" : [
          {
            "type" : "SingletonNode",
            "dependencies" : [ ],
            "id" : 1,
            "estimatedCost" : 1,
            "estimatedNrItems" : 1
          },
          {
            "type" : "IndexRangeNode",
            "dependencies" : [
              1
            ],
            "id" : 8,
            "estimatedCost" : 170463.32,
            "estimatedNrItems" : 170462,
            "database" : "Uni",
            "collection" : "EAll",
            "outVariable" : {
              "id" : 0,
              "name" : "i"
            },
            "ranges" : [
              [
                {
                  "variable" : "i",
                  "attr" : "lastname",
                  "lowConst" : {
                    "bound" : "Kmp",
                    "include" : true,
                    "isConstant" : true
                  },
                  "highConst" : {
                    "bound" : "Kmp",
                    "include" : true,
                    "isConstant" : true
                  },
                  "lows" : [ ],
                  "highs" : [ ],
                  "valid" : true,
                  "equality" : true
                }
              ]
            ],
            "index" : {
              "type" : "skiplist",
              "id" : "13295598550318",
              "unique" : false,
              "fields" : [
                "lastname"
              ]
            },
            "reverse" : false
          },
          {
            "type" : "CalculationNode",
            "dependencies" : [
              8
            ],
            "id" : 5,
            "estimatedCost" : 340925.32,
            "estimatedNrItems" : 170462,
            "expression" : {
              "type" : "compare !=",
              "subNodes" : [
                {
                  "type" : "attribute access",
                  "name" : "lastpaff",
                  "subNodes" : [
                    {
                      "type" : "reference",
                      "name" : "i",
                      "id" : 0
                    }
                  ]
                },
                {
                  "type" : "value",
                  "value" : ""
                }
              ]
            },
            "outVariable" : {
              "id" : 2,
              "name" : "2"
            },
            "canThrow" : false
          },
          {
            "type" : "FilterNode",
            "dependencies" : [
              5
            ],
            "id" : 6,
            "estimatedCost" : 511387.32,
            "estimatedNrItems" : 170462,
            "inVariable" : {
              "id" : 2,
              "name" : "2"
            }
          },
          {
            "type" : "ReturnNode",
            "dependencies" : [
              6
            ],
            "id" : 7,
            "estimatedCost" : 681849.3200000001,
            "estimatedNrItems" : 170462,
            "inVariable" : {
              "id" : 0,
              "name" : "i"
            }
          }
        ],
        "rules" : [
          "move-calculations-up",
          "move-filters-up",
          "move-calculations-up-2",
          "move-filters-up-2",
          "use-index-range",
          "remove-filter-covered-by-index"
        ],
        "collections" : [
          {
            "name" : "EAll",
            "type" : "read"
          }
        ],
        "variables" : [
          {
            "id" : 0,
            "name" : "i"
          },
          {
            "id" : 1,
            "name" : "1"
          },
          {
            "id" : 2,
            "name" : "2"
          }
        ],
        "estimatedCost" : 681849.3200000001,
        "estimatedNrItems" : 170462
      },
      "warnings" : [ ],
      "stats" : {
        "rulesExecuted" : 19,
        "rulesSkipped" : 0,
        "plansCreated" : 1
      }
    }



    SLOW Query:

    arangosh [Uni]> stmt.explain()
    {
      "plan" : {
        "nodes" : [
          {
            "type" : "SingletonNode",
            "dependencies" : [ ],
            "id" : 1,
            "estimatedCost" : 1,
            "estimatedNrItems" : 1
          },
          {
            "type" : "EnumerateCollectionNode",
            "dependencies" : [
              1
            ],
            "id" : 2,
            "estimatedCost" : 17046233,
            "estimatedNrItems" : 17046232,
            "database" : "Uni",
            "collection" : "EAll",
            "outVariable" : {
              "id" : 0,
              "name" : "i"
            },
            "random" : false
          },
          {
            "type" : "CalculationNode",
            "dependencies" : [
              2
            ],
            "id" : 3,
            "estimatedCost" : 34092465,
            "estimatedNrItems" : 17046232,
            "expression" : {
              "type" : "compare !=",
              "subNodes" : [
                {
                  "type" : "attribute access",
                  "name" : "lastpaff",
                  "subNodes" : [
                    {
                      "type" : "reference",
                      "name" : "i",
                      "id" : 0
                    }
                  ]
                },
                {
                  "type" : "value",
                  "value" : ""
                }
              ]
            },
            "outVariable" : {
              "id" : 1,
              "name" : "1"
            },
            "canThrow" : false
          },
          {
            "type" : "FilterNode",
            "dependencies" : [
              3
            ],
            "id" : 4,
            "estimatedCost" : 51138697,
            "estimatedNrItems" : 17046232,
            "inVariable" : {
              "id" : 1,
              "name" : "1"
            }
          },
          {
            "type" : "CalculationNode",
            "dependencies" : [
              4
            ],
            "id" : 5,
            "estimatedCost" : 68184929,
            "estimatedNrItems" : 17046232,
            "expression" : {
              "type" : "compare ==",
              "subNodes" : [
                {
                  "type" : "attribute access",
                  "name" : "lastname",
                  "subNodes" : [
                    {
                      "type" : "reference",
                      "name" : "i",
                      "id" : 0
                    }
                  ]
                },
                {
                  "type" : "value",
                  "value" : "Kmp"
                }
              ]
            },
            "outVariable" : {
              "id" : 2,
              "name" : "2"
            },
            "canThrow" : false
          },
          {
            "type" : "FilterNode",
            "dependencies" : [
              5
            ],
            "id" : 6,
            "estimatedCost" : 85231161,
            "estimatedNrItems" : 17046232,
            "inVariable" : {
              "id" : 2,
              "name" : "2"
            }
          },
          {
            "type" : "ReturnNode",
            "dependencies" : [
              6
            ],
            "id" : 7,
            "estimatedCost" : 102277393,
            "estimatedNrItems" : 17046232,
            "inVariable" : {
              "id" : 0,
              "name" : "i"
            }
          }
        ],
        "rules" : [
          "move-calculations-up",
          "move-filters-up",
          "move-calculations-up-2",
          "move-filters-up-2"
        ],
        "collections" : [
          {
            "name" : "EAll",
            "type" : "read"
          }
        ],
        "variables" : [
          {
            "id" : 0,
            "name" : "i"
          },
          {
            "id" : 1,
            "name" : "1"
          },
          {
            "id" : 2,
            "name" : "2"
          }
        ],
        "estimatedCost" : 102277393,
        "estimatedNrItems" : 17046232
      },
      "warnings" : [ ],
      "stats" : {
        "rulesExecuted" : 19,
        "rulesSkipped" : 0,
        "plansCreated" : 1
      }
    }

最佳答案

事实上,即使可以使用索引,如下所示的条件也会禁用索引:

FILTER doc.indexedAttribute != ... FILTER doc.indexedAttribute == ...

有趣的是,当两个条件放入同一个 FILTER 条件并与 && 组合时,会使用索引:

FILTER doc.indexedAttribute != ... && doc.indexedAttribute == ...

虽然这两个语句是等效的,但它们触发的代码路径略有不同。前者将对两个现有的 FILTER 范围进行 AND 组合,后者将从单个 FILTER 生成一个范围。 FILTER 范围的 AND 组合的情况过于防御性,即使只有一侧(在本例中是带有非等式运算符的一侧)不能用于索引,双方也会被拒绝扫描。

此问题已在 2.4 中修复,该修复将包含在 2.4.2 中。目前的一种解决方法是将两个 FILTER 语句合并为一个语句。

关于arangodb - 非索引属性 FILTER 的查询规则,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/28066309/

相关文章:

arangodb - 如何在 arangodb docker 容器中停止和启动 arangodb 服务器

graph-databases - 嵌入在列表中的 ArangoDB 查询属性

php - 正确格式化 php 输出

graph - 类型错误 : Object #<ShapedJson> has no method 'indexOf'

graph - 将批量数据导入 ArangoDB 的最佳方法

对象内部字段的 Arangodb AQL 更新

arangodb - 使用 AQL 变量,例如用于计数(LET sum = sum + 1)

java - 在 ArrayList 中链接两个 ArangoDB 集合时出现反序列化问题

node.js - 如果重点关注建模数据,那么哪个(内存中)图形数据库

operators - ArangoDB - 带有 "!="运算符的数据库索引