我正在尝试剥离所有 div。
输入:
<p>111</p>
<div class="1334">bla</div>
<p>333</p>
<p>333</p>
<div some unkown stuff>bla2</div>
期望的输出:
<p>111</p>
<p>333</p>
<p>333</p>
我试过了,但没用:
release_content = re.sub("/<div>.*<\/div>/s", "", release_content)
最佳答案
Do not use regex for this problem .使用 html 解析器。这是一个使用 BeautifulSoup 的 python 解决方案:
from BeautifulSoup import BeautifulSoup
with open('Path/to/file', 'r') as content_file:
content = content_file.read()
soup = BeautifulSoup(content)
[div.extract() for div in soup.findAll('div')]
with open('Path/to/file.modified', 'w') as output_file:
output_file.write(str(soup))
关于python - 从 HTML 字符串中删除所有 div 标签,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/15796994/