python - 在 Python 中转义特殊的 HTML 字符

标签 python escaping

我有一个字符串,其中可以出现 '"& (...) 等特殊字符。在字符串中:

string = """ Hello "XYZ" this 'is' a test & so on """

我怎样才能自动转义每个特殊字符,以便我得到这个:

string = " Hello "XYZ" this 'is' a test & so on "

最佳答案

在 Python 3.2 中,您可以使用 html.escape function ,例如

>>> string = """ Hello "XYZ" this 'is' a test & so on """
>>> import html
>>> html.escape(string)
' Hello "XYZ" this 'is' a test & so on '

对于早期版本的 Python,请查看 http://wiki.python.org/moin/EscapingHtml :

The cgi module that comes with Python has an escape() function:

import cgi

s = cgi.escape( """& < >""" )   # s = "&amp; &lt; &gt;"

However, it doesn't escape characters beyond &, <, and >. If it is used as cgi.escape(string_to_escape, quote=True), it also escapes ".


Here's a small snippet that will let you escape quotes and apostrophes as well:

 html_escape_table = {
     "&": "&amp;",
     '"': "&quot;",
     "'": "&apos;",
     ">": "&gt;",
     "<": "&lt;",
     }

 def html_escape(text):
     """Produce entities within text."""
     return "".join(html_escape_table.get(c,c) for c in text)

You can also use escape() from xml.sax.saxutils to escape html. This function should execute faster. The unescape() function of the same module can be passed the same arguments to decode a string.

from xml.sax.saxutils import escape, unescape
# escape() and unescape() takes care of &, < and >.
html_escape_table = {
    '"': "&quot;",
    "'": "&apos;"
}
html_unescape_table = {v:k for k, v in html_escape_table.items()}

def html_escape(text):
    return escape(text, html_escape_table)

def html_unescape(text):
    return unescape(text, html_unescape_table)

关于python - 在 Python 中转义特殊的 HTML 字符,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/2077283/

相关文章:

python - 在numpy中组合多维数据

python - 心理学/Postgres : Connections hang out randomly

python - 在 Pandas 中用 .loc 覆盖 Nan 值

python - ImportError:/home/test/test_1.so: 来自 cython 构建的错误 ELF 类

javascript - 在 JavaScript 中对 utf-8 字符串使用 encodeURI() 与 escape()

java - 快速而肮脏的 SQL 字符串转义

python - lstm_9 层的输入 0 与发现 ndim=4 的 : expected ndim=3, 层不兼容。完整形状收到 : [None, 2, 4000, 256]

c - 替换C中的多行单行注释(转义换行符)

php - 为 html 和输入字段安全地转义输出

regex - 我可以在 HTML5 模式属性中使用包含 & 符号的正则表达式吗?