假设我有一个带有 <p>
的 HTML和 <br>
里面的标签。之后,我将剥离 HTML 以清理标签。我怎样才能把它们变成换行符?
我正在使用 Python 的 BeautifulSoup库,如果有帮助的话。
最佳答案
如果没有一些细节,很难确定这是否完全符合您的要求,但这应该给您一个想法……它假定您的 b 标签包含在 p 元素中。
from BeautifulSoup import BeautifulSoup
import six
def replace_with_newlines(element):
text = ''
for elem in element.recursiveChildGenerator():
if isinstance(elem, six.string_types):
text += elem.strip()
elif elem.name == 'br':
text += '\n'
return text
page = """<html>
<body>
<p>America,<br>
Now is the<br>time for all good men to come to the aid<br>of their country.</p>
<p>pile on taxpayer debt<br></p>
<p>Now is the<br>time for all good men to come to the aid<br>of their country.</p>
</body>
</html>
"""
soup = BeautifulSoup(page)
lines = soup.find("body")
for line in lines.findAll('p'):
line = replace_with_newlines(line)
print line
运行此结果...
(py26_default)[mpenning@Bucksnort ~]$ python thing.py
America,
Now is the
time for all good men to come to the aid
of their country.
pile on taxpayer debt
Now is the
time for all good men to come to the aid
of their country.
(py26_default)[mpenning@Bucksnort ~]$
关于python - 如何将 <br> 和 <p> 变成换行符?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/10491223/