python - 使用 BeautifulSoup 在 HTML 中搜索和替换

我想用BeautfulSoup来搜索和替换<\a>与 <\a>  .我知道如何用 urllib2 打开然后解析以提取所有 <a>标签。我想要做的是搜索并用结束标记和分隔符替换结束标记。任何帮助，非常感谢。

编辑

我假设它类似于:

soup.findAll('a').

在文档中，有一个:

find(text="ahh").replaceWith('Hooray')

所以我认为它会遵循以下原则:

soup.findAll(tag = '</a>').replaceWith(tag = '</a><br>')

但这不起作用而且 python help() 并没有提供太多

最佳答案

这将插入一个  每个 <a>...</a> 结束后的标签元素:

from BeautifulSoup import BeautifulSoup, Tag

# ....

soup = BeautifulSoup(data)
for a in soup.findAll('a'):
    a.parent.insert(a.parent.index(a)+1, Tag(soup, 'br'))

您不能使用 soup.findAll(tag = '</a>')因为 BeautifulSoup 不会单独对结束标记进行操作 - 它们被视为同一元素的一部分。

如果你想把 <a>  中的元素您在评论中询问的元素，您可以使用这个:

for a in soup.findAll('a'):
    p = Tag(soup, 'p') #create a P element
    a.replaceWith(p)   #Put it where the A element is
    p.insert(0, a)     #put the A element inside the P (between <p> and </p>)

同样，您不会创建 和 分开，因为它们是同一事物的一部分。

关于python - 使用 BeautifulSoup 在 HTML 中搜索和替换，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/2073541/

上一篇：python - 如何获取 memcached 中特定项目的过期时间

下一篇：python - 具有百万行的 Django 表

python - 当不需要时，请求会对 POST 参数进行编码

python - 网页抓取到 .csv

python - 重定向到新的 URL 进行解析

从 HTML 中提取 Python 脚本

python - WebDriver异常: 'chromedriver.exe' executable may have wrong permissions

python - 检查对象是否存在，如果不存在则手动引发错误

python - 如何在函数 Python 中调用类

python - 使用 beautifulsoup 在表格的第二列中打印文本

python - 简单的 bs4 脚本到 pandas df