我必须将一个带有散列的数组作为输入数据,每个散列是一个 html 标签的描述(文本中的开始和结束位置以及标签的类型)。我需要生成另一个数组,其中标签按顺序排列。
例如:
input = [
{start_p: 0, end_p: 100, start_t: '<p>', end_t: '</p>'},
{start_p: 10, end_p: 50, start_t: '<p>', end_t: '</p>'},
{start_p: 0, end_p: 100, start_t: '<span>', end_t: '</span>'},
{start_p: 20, end_p: 30, start_t: '<em>', end_t: '</em>'},
{start_p: 40, end_p: 50, start_t: '<em>', end_t: '</em>'},
{start_p: 50, end_p: 60, start_t: '<em>', end_t: '</em>'},
{start_p: 70, end_p: 80, start_t: '<em>', end_t: '</em>'},
{start_p: 8, end_p: 99, start_t: '<strong>', end_t: '</strong>'}
]
expected_output: [<p><span><strong><p><em></em><em></em></p><em></em><em></em></strong></span></p>]
不仅仅是输出中的标签,每个标签应该是一个带有位置和标签的哈希,比如:
{position: 0, tag: '<p>'}
最重要的是按照正确的顺序排列,遵守 HTML 标签不相交的规则(如果多个标签在同一位置结束,最后打开的应该排在第一位,如果一个结束另一个打开在相同的位置,结束将在第一位,依此类推)。
这是遗留系统的一部分,目前无法更改输入和输出。此外,输入可能非常大(数十万个元素)
有比暴力递归更好的解决方案吗?
最佳答案
input.group_by { |h| h[:start_p] }.
values.
flat_map do |a|
x = 1.0
a.flat_map do |h|
x /= 2.0
[[h[:start_p] += x, h[:start_t]], [h[:end_p] -= x, h[:end_t]]]
end
end.sort_by(&:first).map(&:last).join
#=> "<span><p><strong><p><em></em><em></p></em><em></em><em></em></strong></p></span>"
步骤如下。
b = input.group_by { |h| h[:start_p] }
#=> { 0=>[{:start_p=>0, :end_p=>100, :start_t=>"<p>", :end_t=>"</p>"},
# {:start_p=>0, :end_p=>100, :start_t=>"<span>", :end_t=>"</span>"}],
# 10=>[{:start_p=>10, :end_p=>50, :start_t=>"<p>", :end_t=>"</p>"}],
# 20=>[{:start_p=>20, :end_p=>30, :start_t=>"<em>", :end_t=>"</em>"}],
# 40=>[{:start_p=>40, :end_p=>50, :start_t=>"<em>", :end_t=>"</em>"}],
# 50=>[{:start_p=>50, :end_p=>60, :start_t=>"<em>", :end_t=>"</em>"}],
# 70=>[{:start_p=>70, :end_p=>80, :start_t=>"<em>", :end_t=>"</em>"}],
# 8=>[{:start_p=> 8, :end_p=>99, :start_t=>"<strong>", :end_t=>"</strong>"}]}
c = b.values
#=> [[{:start_p=>0, :end_p=>100, :start_t=>"<p>", :end_t=>"</p>"},
# {:start_p=>0, :end_p=>100, :start_t=>"<span>", :end_t=>"</span>"}],
# [{:start_p=>10, :end_p=>50, :start_t=>"<p>", :end_t=>"</p>"}],
# ...
# [{:start_p=>8, :end_p=>99, :start_t=>"<strong>", :end_t=>"</strong>"}]]
d = c.flat_map do |a|
x = 1.0
a.flat_map do |h|
x /= 2.0
[[h[:start_p] += x, h[:start_t]], [h[:end_p] -= x, h[:end_t]]]
end
end
#=> [[0.5, "<p>"], [99.5, "</p>"], [0.25, "<span>"], [99.75, "</span>"],
# [10.5, "<p>"], [49.5, "</p>"], [20.5, "<em>"], [29.5, "</em>"],
# [40.5, "<em>"], [49.5, "</em>"], [50.5, "<em>"], [59.5, "</em>"],
# [70.5, "<em>"], [79.5, "</em>"], [8.5, "<strong>"], [98.5, "</strong>"]]
d
的前四个元素(元组)对于理解我所采用的方法最为重要。
e = d.sort_by(&:first)
#=> [[0.25, "<span>"], [0.5, "<p>"], [8.5, "<strong>"], [10.5, "<p>"],
# [20.5, "<em>"], [29.5, "</em>"], [40.5, "<em>"], [49.5, "</p>"],
# [49.5, "</em>"], [50.5, "<em>"], [59.5, "</em>"], [70.5, "<em>"],
# [79.5, "</em>"], [98.5, "</strong>"], [99.5, "</p>"], [99.75, "</span>"]]
f = e.map(&:last)
#=> ["<span>", "<p>", "<strong>", "<p>", "<em>", "</em>", "<em>", "</p>",
# "</em>", "<em>", "</em>", "<em>", "</em>", "</strong>", "</p>", "</span>"]
f.join
#=> "<span><p><strong><p><em></em><em></p></em><em></em><em></em></strong></p></span>"
如果需要的话,我会在上面详细说明 d
的计算。
关于确定有效 HTML 结构的 Ruby 算法,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/45309534/