sql - 将 ActiveRecord 查询重写为递归 SQL

标签 sql ruby-on-rails postgresql recursion activerecord

我有一个类似于事件记录结构的树,带有一个自引用对象 - 例如,该对象可以是同一类的另一个对象的父级或子级。我需要一种在代码中有效地映射此结构的方法。到目前为止,我一直在使用事件记录 ORM 在 ruby​​ 中做它,它的效率非常低。

这是 pod.rb 模型的样子:

    has_many :pod_parents, class_name: "PodPod", dependent: :delete_all
    has_many :parents, through: :pod_parents, :foreign_key => 'parent_id', :source => 'parent'
    has_many :pod_children, class_name: "PodPod", :foreign_key => 'parent_id'
    has_many :children, through: :pod_children, :source => 'pod'

    scope :active, -> {
        where(pod_state: "active").where(pod_type: ["standard","readonly"])
    }

这是相关的数据库架构:
table "pods"
  t.string "intention"
  t.integer "user_id"
  t.string "slug"
  t.string "url_handle"
  t.index ["slug"], name: "index_pods_on_slug"
  t.index ["url_handle"], name: "index_pods_on_url_handle"

table "pod_pods"
  t.integer "parent_id"
  t.integer "pod_id"
  t.index ["parent_id", "pod_id"], name: "index_pod_pods_on_parent_id_and_pod_id", unique: true
  t.index ["parent_id"], name: "index_pod_pods_on_parent_id"
  t.index ["pod_id"], name: "index_pod_pods_on_pod_id"

以下是我正在优化的特定功能:
def get_all_parents
    parents = []
    self.parents.active.each do |parent|
        parents << parent
        parents.concat(parent.get_all_parents)
    end
    return parents
end

def get_all_children
    children = []
    self.children.each do |child|
        children.concat(child.get_all_children)
    end
    return children
end

def get_all_parents_and_children
    pod_array = self.get_all_parents
    pod_array.concat(self.get_all_children)
    return pod_array
end

def get_all_relations(inclusive = false)
    circles_array = self.get_all_parents
    circles_array.each do |parent|
        circles_array = circles_array.concat(parent.get_all_children)
    end
    circles_array = circles_array.concat(self.get_all_children)
    unique_ids = circles_array.compact.map(&:id).uniq - [self.id]
    circles = Pod.where(id: unique_ids)
end

据我研究,Postgres 支持一种递归 SQL 查询。我一直在使用这些文章来指明方向:1 , 2 .

这是我得到的:
def get_all_parents2
      sql =
        <<-SQL
            WITH RECURSIVE pod_tree(id, path) AS (
                SELECT id, ARRAY[id]
                FROM pods
                WHERE id = #{self.id}
            UNION ALL
                SELECT pods.id, path
                FROM pod_tree
                JOIN pods ON pods.id=pod_tree.id
                JOIN pod_pods ON pod_pods.parent_id = pods.id
                WHERE NOT pods.id = ANY(path)
            )
            SELECT * FROM pod_tree
            ORDER BY path;
        SQL
      sql.chomp
        Pod.find_by_sql(sql)
    end

我的 SQL 不是特别好,我不知道如何向上和向下导航树结构,以便能够将我上面提到的函数重写为递归 SQL。如果您对此有所帮助,我将不胜感激。谢谢你。

最佳答案

您尝试完成的任务绝对可以通过递归 CTE 实现。我将介绍您拥有的前两个场景,因为其他两个只是前两个的扩展。

在所有 SQL 示例中,我将使用 id 1 来说明您在模型级别替换的值。由于您编写了该查询,因此我将假设您对递归 CTE 有所了解,并尝试寻找解决方案。
get_all_children
让我们采取方法get_all_children第一的。这种方法涉及沿着树向下走,一层一层地覆盖我们遇到的节点。

由于 pod_pods 包含有关层次结构的所有信息,并且在获取 child 时不涉及范围,因此我们可以为 child 递归 pod_pods。

-- Snippet #1
WITH RECURSIVE pod_tree AS (
  SELECT pod_id -- Get the pod_id of the children of the base case node
  FROM pod_pods
  WHERE parent_id = 1 -- Base case
  UNION ALL -- Recurse on this and do a union with the previous step
  SELECT p.pod_id
  FROM pod_pods p
    INNER JOIN pod_tree ptree 
      ON ptree.pod_id = p.parent_id -- Get the children nodes for nodes found at the previous recursion step.
)

SELECT * FROM pods 
WHERE id IN (SELECT DISTINCT(pod_id) FROM pod_tree);

您的 Ruby 代码没有涵盖由于循环而发生无限循环的可能性,但如果有可能发生,您将解决此问题的方法是跟踪您已经看到的 id。

-- Snippet #2
WITH RECURSIVE pod_tree(pod_id, rtree) AS ( -- Extra rtree parameter to keep track of visited nodes
  SELECT pod_id, ARRAY[pod_id] -- Make the base case array with pod_id
  FROM pod_pods
  WHERE parent_id = 1 -- Base case
  UNION ALL
  SELECT p.pod_id, rtree || p.pod_id -- Add the current pod_id to array
  FROM pod_pods p
    INNER JOIN pod_tree ptree 
      ON ptree.pod_id = p.parent_id
  WHERE NOT (p.pod_id = ANY(rtree)) -- Exclude nodes which have already been seen  
)

SELECT * FROM pods 
WHERE id IN (SELECT DISTINCT(pod_id) FROM pod_tree);

如果你可以在 pod_pods 中有孤儿关系并且想忽略它们,那么 pod 之间需要一个连接。

-- Snippet #3
WITH RECURSIVE pod_tree(id, rtree) AS (
  SELECT p1.id, ARRAY[p1.id]
  FROM pods p1 INNER JOIN pod_pods p2 ON p1.id = p2.pod_id 
  WHERE parent_id = 1
  UNION ALL
  SELECT p1.id, rtree || p1.id
  FROM pods p1 
    INNER JOIN pod_pods p2 ON p1.id = p2.pod_id
    INNER JOIN pod_tree ptree ON p2.parent_id = ptree.id
  WHERE NOT (p1.id = ANY(ptree.rtree))  
)

SELECT * FROM pods WHERE id IN (SELECT DISTINCT(id) FROM pod_tree);

如果您没有孤立链接,我的建议是使用 Snippet #1 或 #2,因为它们比 #3 更快,因为它涉及额外的连接。
get_all_parents
首先,为了简单起见,让我们添加由于稍后激活而被添加的范围字段。首先,我们沿着 pod_pods 表的树向下走,获取所有父 ID,然后我们应用范围。

-- Snippet #4
WITH RECURSIVE pod_tree AS (
  SELECT parent_id -- Get the parent_id of the parents of the base case node
  FROM pod_pods
  WHERE pod_id = 1 -- Base case
  UNION ALL -- Recurse on this and do a union with the previous step
  SELECT p.parent_id
  FROM pod_pods p
    INNER JOIN pod_tree ptree 
      ON ptree.parent_id = p.pod_id -- Get the parent nodes for nodes found at the previous recursion step.
)

SELECT * FROM pods 
WHERE 
  id IN (SELECT DISTINCT(parent_id) FROM pod_tree)
  AND pod_state = 'active'
  AND pod_type IN ('standard', 'readonly')
;

但是,这仅在获取所有节点后才应用事件过滤器。这可能并不理想,因为它可能会走比所需更多的树,甚至可能返回非事件节点的父节点。为了使它像 Ruby 代码中的方法一样,我们需要将它与 pod 连接起来。我在这里添加了无限递归避免步骤,并且您现在对此有所了解。

-- Snippet #5
WITH RECURSIVE pod_tree(id, rtree) AS (
  SELECT p1.id, ARRAY[p1.id]
  FROM pods p1 
    INNER JOIN pod_pods p2 ON p1.id = p2.parent_id 
  WHERE pod_id = 1
    AND p1.pod_state = 'active' 
    AND p1.pod_type IN ('standard', 'readonly')
  UNION ALL
  SELECT p1.id, rtree || p1.id
  FROM pods p1 
    INNER JOIN pod_pods p2 ON p1.id = p2.parent_id
    INNER JOIN pod_tree ptree ON p2.pod_id = ptree.id
  WHERE p1.pod_state = 'active' 
    AND p1.pod_type IN ('standard', 'readonly')
    AND NOT (p1.id = ANY(ptree.rtree))  
)

SELECT * FROM pods WHERE id IN (SELECT DISTINCT(id) FROM pod_tree);

在基于您的 stub 方法的 Rails 中,代码段 #5 的代码将如下所示

def get_all_parents
  sql =
    <<-SQL
      WITH RECURSIVE pod_tree(id, rtree) AS (
        SELECT p1.id, ARRAY[p1.id]
        FROM pods p1 
          INNER JOIN pod_pods p2 ON p1.id = p2.parent_id 
        WHERE pod_id = #{self.id}
          AND p1.pod_state = 'active' 
          AND p1.pod_type IN ('standard', 'readonly')
        UNION ALL
        SELECT p1.id, rtree || p1.id
        FROM pods p1 
          INNER JOIN pod_pods p2 ON p1.id = p2.parent_id
          INNER JOIN pod_tree ptree ON p2.pod_id = ptree.id
        WHERE p1.pod_state = 'active' 
          AND p1.pod_type IN ('standard', 'readonly')
          AND NOT (p1.id = ANY(ptree.rtree))  
      )

      SELECT * FROM pods WHERE id IN (SELECT DISTINCT(id) FROM pod_tree);
    SQL
  # IMP!
  # sql = sql_sanitize(sql)
  # Add some sanitize step here
  sql.chomp
  Pod.find_by_sql(sql)
end

这应该涵盖您的前两个用例。如前所述,另外两个是这两个的扩展,因此您可以使用这些扩展到那些。

笔记:
  • 如果你没有循环,你可以避免无限递归列,因为它是额外的簿记。
  • 如果您没有孤立链接,则更喜欢仅在 pod_pods 上进行迭代对于 child ,因为它避免了不必要的连接
  • rtree在上面的 sql 查询中包含层次结构。如果您需要该信息,您可以选择将其传回。我跳过了它,因为你无论如何最终都会使结果变平。
  • 我正在获取独特的节点。如果一个节点被多次访问,您的 Rails 代码当前将获取多次出现的节点。如果你想要这个,加上树的顺序,你可以有这样的行为:

  • -- Example for getting all parents
    WITH RECURSIVE pod_tree(id, slug, pod_type, parent_id, rtree) AS (
      SELECT p1.id, p1.slug, p1.pod_type, p2.parent_id, ARRAY[p1.id] -- Select the fields you need
      FROM pods p1 INNER JOIN pod_pods p2 ON p1.id = p2.parent_id 
      WHERE pod_id = 1
      AND p1.pod_state = 'active' AND p1.pod_type IN ('standard', 'readonly')
      UNION ALL
      SELECT p1.id, p1.slug, p1.pod_type, p2.parent_id, rtree || p1.id
      FROM pods p1 INNER JOIN pod_pods p2 ON p1.id = p2.parent_id
      INNER JOIN pod_tree ptree ON p2.pod_id = ptree.id
      WHERE p1.pod_state = 'active' AND p1.pod_type IN ('standard', 'readonly')
      AND NOT (p1.id = ANY(ptree.rtree))  
    )
    
    SELECT * FROM pod_tree;
    
    

    关于sql - 将 ActiveRecord 查询重写为递归 SQL,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/60841892/

    相关文章:

    MySQL查看: convert unix timestamp to datetime

    postgresql - OLAP/MDX - 定义计算成员,对所有时间数据求和

    postgresql - Postgres : query schedule given day, time_of_day,和时区名称

    mysql - 在我的 INNER JOIN sql 查询中获取重复输出

    php - 如果我在查询中插入分组依据,我的搜索栏将不再工作

    c++ - Yaml 到数据库

    mysql - 由于内存限制,无法通过隧道连接到 vmc mysql 服务

    ruby-on-rails - Jenkins 部署到暂存

    ruby-on-rails - 一些帮助找出日志中显示的与 ruby​​ on rails favicon 相关的错误

    linux - postgresql 不随 Linux 启动