python - 将所有 next_elements 包装在 BeautifulSoup 中

我有一段 HTML，如下所示:

<figure>
    <img src=".." alt=".." />
    Some text that I have to wrap in <code>figcaption</code>
</figure>

我正在尝试包装 <img> 之后的所有内容在 <figcaption> 。这可能吗？

next_elements可以很好地获取我想要的元素，但返回一个生成器，它与 wrap 不能很好地配合方法。

最佳答案

这是一种方法:

>>> from bs4 import BeautifulSoup
>>> soup = BeautifulSoup("""
... <figure>
...     <img src=".." alt=".." />
...     Some text that I have to wrap in <code>figcaption</code>
... </figure>
... """)
>>> for figure in soup.find_all("figure"):
...     img = figure.find("img")
...     if img is not None:
...         figcaption = soup.new_tag("figcaption")
...         for el in list(img.next_siblings):
...             figcaption.append(el)
...         img.insert_after(figcaption)
... 
>>> soup
<html><body><figure>
    <img alt=".." src=".."/><figcaption>
    Some text that I have to wrap in <code>figcaption</code>
</figcaption></figure></body></html>

需要注意的几点:

我们使用next_siblings ，它只返回我们实际需要的元素，而不是 next_elements ，它将继续到 figure 元素的末尾。
我们用 list() 包装 next_siblings 以创建一个可以迭代的浅拷贝 - 否则，因为附加 el 的行为code> 到 figcaption 将其从文档树中以前的位置删除，这将修改我们要迭代的序列，即 a bad idea 。我们本可以使用find_next_siblings() (它也返回一个列表)，但上面的版本更明确。
由于我们已经从文档树中原来的位置删除了 img 的所有下一个 sibling ，因此我们需要做的就是附加 figcaption(现在包含它们)紧跟在 img 元素之后。
对于人类来说，空格的放置不再直观地“正确”，但解决这个问题需要大量的额外工作，而且可能不值得。

关于python - 将所有 next_elements 包装在 BeautifulSoup 中，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/17605801/

python - 将所有 next_elements 包装在 BeautifulSoup 中

上一篇：Python 在 Tkinter 中覆盖文本

下一篇：python - 什么是合适的 Python 数据结构来保存其项目及其项目的组合列表？