python - 从使用 BeautifulSoup 解析的 HTML 中删除标签

我是 python 的新手，我正在使用 BeautifulSoup 来解析网站，然后提取数据。我有以下代码:

for line in raw_data: #raw_data is the parsed html separated into smaller blocks
    d = {}
    d['name'] = line.find('div', {'class':'torrentname'}).find('a')
    print d['name']

<a href="/ubuntu-9-10-desktop-i386-t3144211.html">
<strong class="red">Ubuntu</strong> 9.10 desktop (i386)</a>

通常我可以通过以下方式提取“Ubuntu 9.10 desktop (i386)”:

d['name'] = line.find('div', {'class':'torrentname'}).find('a').string

但由于强大的 html 标签，它返回 None。有没有办法提取强标签然后使用 .string 还是有更好的方法？我曾尝试使用 BeautifulSoup 的 extract() 函数，但无法正常工作。

编辑:我刚刚意识到，如果有两组强标签，我的解决方案将不起作用，因为单词之间的空格被遗漏了。有什么方法可以解决这个问题？

最佳答案

使用“.text”属性:

d['name'] = line.find('div', {'class':'torrentname'}).find('a').text

或者在 findAll(text=True) 上做一个连接:

anchor = line.find('div', {'class':'torrentname'}).find('a')
d['name'] = ''.join(anchor.findAll(text=True))

关于python - 从使用 BeautifulSoup 解析的 HTML 中删除标签，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/3585725/

上一篇：html - 如何用 Pisa 重复 <th> (xhtml2pdf)

下一篇：javascript - 使用 JavaScript 获取元素的所有 CSS 样式

相关文章：

html - 如何让图片悬浮在页面上方

java - 如何将字符串解析为 BigDecimal？

python - Django & Suds : UnicodeEncodeError When Using QuerySets

python - 如何在azure上发布OSQA？

html - 为什么 td 宽度会被忽略而变成 100%？

Java Pattern.matcher(StringBuffer)，为什么它的行为与 Pattern.matcher(String) 不同？

regex - 从字符串中获取整数

Python 脚本调用 Make 和其他实用程序

python - python3中的多个替换值

javascript - 如何合并具有相同值的单元格