我有一些 html
元素,我想从中提取文本。所以 html
就像
<pre>
<span class="ansi-red-fg">ZeroDivisionError</span>Traceback (most recent call last)
<span class="ansi-green-fg"><ipython-input-2-0f9f90da76dc></span> in <span class="ansi-cyan-fg"><module></span><span class="ansi-blue-fg">()</span>
</pre>
我想将文本提取为
ZeroDivisionErrorTraceback (most recent call last)
<ipython-input-2-0f9f90da76dc> in<module>()
我找到了该问题的答案 here ,但这对我不起作用。完整示例代码
from bs4 import BeautifulSoup as BSHTML
bs = BSHTML("""<pre>
<span class="ansi-red-fg">ZeroDivisionError</span>Traceback (most recent call last)
<span class="ansi-green-fg"><ipython-input-2-0f9f90da76dc></span> in <span class="ansi-cyan-fg"><module></span><span class="ansi-blue-fg">()</span>
</pre>""")
print bs.font.contents[0].strip()
我收到以下错误:
Traceback (most recent call last):
File "invest.py", line 13, in <module>
print bs.font.contents[0].strip()
AttributeError: 'NoneType' object has no attribute 'contents'
我有什么遗漏的吗? beautifulsoap
版本:4.6.0
最佳答案
您想要该 pre
block 的所有文本内容吗?
print bs.pre.text
返回:
ZeroDivisionErrorTraceback (most recent call last)
<ipython-input-2-0f9f90da76dc> in <module>()
关于python - 如何从html标签之间提取文本?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/53429107/