python - 替换 BeautifulSoup 迭代器中的字符串提前退出？

我正在使用 BeautifulSoup 4 尝试遍历字符串列表并替换子字符串，但我在执行 replace_with 时遇到问题在遍历 strings 时生成器提前退出循环。

例如，给定这段代码

from bs4 import BeautifulSoup

s = BeautifulSoup("<p>a</p><p>b</p><p>c</p>", features="html.parser")
for st in s.strings:
  st.replace_with('replace')

s的最终内容将是 replacebc ，而预期的行为是 a、b 和 c 各自被替换。使用调试器单步执行确认迭代 strings替换发生后停止，基本上只执行一次迭代并提前退出。

在实践中，我将更新字符串的子部分并将它们替换为新创建的 BeautifulSoup 对象，因此更简单的替换方法可能不起作用:

updated = st.replace(keyword, f'<a href="url/{keyword}">{keyword}</a>')
st.replace_with(BeautifulSoup(updated, features="html.parser"))

是否有解决方法或更正确的方法来执行此操作？

最佳答案

你正在得到这个输出 b'coz，正如 replace_with() 的文档中所解释的那样

PageElement.replace_with() removes a tag or string from the tree, and replaces it with the tag or string of your choice

一旦从树中移除，它就不再有 next_element并且生成器提前退出。我们可以使用这段代码进行检查

from bs4 import BeautifulSoup
s = BeautifulSoup("<p>a</p><p>b</p><p>c</p>", features="html.parser")
for st in s.strings:
    print(st.next_element)
    st.replace_with('replace')
    print(st)
    print(st.next_element)

输出

<p>b</p>
a
None

在 replace_with() 之后，next_element 为 None。

一种方法是@cody ie 提到的方法。使用 list() 一次获取值的所有值。

另一种方法是存储 next_element 并在 replace_with() 之后将其重新设置，让生成器生成更多元素。

from bs4 import BeautifulSoup
s = BeautifulSoup("<p>a</p><p>b</p><p>c</p>", features="html.parser")
for st in s.strings:
    next=st.next_element
    st.replace_with('replace')
    st.next_element=next
print(s)

输出

<p>replace</p><p>replace</p><p>replace</p>

关于python - 替换 BeautifulSoup 迭代器中的字符串提前退出？，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/54285243/

python - 替换 BeautifulSoup 迭代器中的字符串提前退出？

上一篇：python - 如何收缩 NetworkX 中只有 2 条边的节点？

下一篇：python - 为什么代码检测到最外层)