python - 从 Python 中的字符串中删除“”的正则表达式

我正在使用以下代码从 RSS 提要中获取结果:

try:  
desc = item.xpath('description')[0].text
if date is not None:
    desc =date +"\n"+"\n"+desc
except:
    desc = None

但有时描述中包含少量 unicode html 字符，如下所示:

The text from XML looks like " and with ' and other &...; stuff

在显示内容时我不想显示它们。是否有任何正则表达式来删除 HTML 标记。

最佳答案

我用了一个叫“Unescaping XML”的东西，不知道对你有没有帮助。

from xml.sax.saxutils import unescape

unescape("&lt; &amp; &gt;")

'< & >'




unescape("&apos; &quot;", {"&apos;": "'", "&quot;": '"'})

'\' "'

编辑

刚刚看到这个，可能很有趣。 (未测试):unescape with urllib

关于python - 从 Python 中的字符串中删除“”的正则表达式，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/7332502/

相关文章：

c# - 正则表达式 : .net 与 javascript 中特殊字符的差异