Python SAX 解析器 : resolveEntity

标签 python xml sax entityresolver

我很难弄清楚如何将我自己的 ResolveEntityHandler 绑定(bind)到 SAX 解析器。就这样吧this answer 。但不幸的是我无法在那里重现结果。

当我运行以下代码时(实际上是从上述答案复制的,刚刚更新为 Python 3),

import io
import xml.sax
from xml.sax.handler import ContentHandler

# Inheriting from EntityResolver and DTDHandler is not necessary
class TestHandler(ContentHandler):

    # This method is only called for external entities. Must return a value.
    def resolveEntity(self, publicID, systemID):
        print ("TestHandler.resolveEntity(): %s %s" % (publicID, systemID))
        return systemID

    def skippedEntity(self, name):
        print ("TestHandler.skippedEntity(): %s" % (name))

    def unparsedEntityDecl(self, name, publicID, systemID, ndata):
        print ("TestHandler.unparsedEntityDecl(): %s %s" % (publicID, systemID))

    def startElement(self, name, attrs):
        summary = attrs.get('summary', '')
        print ('TestHandler.startElement():', summary)

def main(xml_string):
    try:
        parser = xml.sax.make_parser()
        curHandler = TestHandler()
        parser.setContentHandler(curHandler)
        parser.setEntityResolver(curHandler)
        parser.setDTDHandler(curHandler)

        stream = io.StringIO(xml_string)
        parser.parse(stream)
        stream.close()
    except xml.sax.SAXParseException as e:
        print ("ERROR %s" % e)

XML = """<!DOCTYPE test SYSTEM "test.dtd">
<test summary='step: &num;'>Entity: &not;</test>
"""

main(XML)

和外部test.dtd

<!ENTITY num "FOO">
<!ENTITY pic SYSTEM 'bar.gif' NDATA gif>

我得到的是

TestHandler.startElement(): step: 
TestHandler.skippedEntity(): not

Process finished with exit code 0

所以我的问题是:

  1. 为什么resolveEntity从未被调用?
  2. 如何将 ResolveEntityHandler 绑定(bind)到解析器?

最佳答案

您所看到的内容与 change in Python 3.7.1 有关。 :

Changed in version 3.7.1: The SAX parser no longer processes general external entities by default to increase security. Before, the parser created network connections to fetch remote files or loaded local files from the file system for DTD and entities. The feature can be enabled again with method setFeature() on the parser object and argument feature_external_ges.

要获得与早期版本相同的行为,请添加以下行:

from xml.sax.handler import feature_external_ges

和(在main函数中)

parser.setFeature(feature_external_ges, True)

关于Python SAX 解析器 : resolveEntity,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/58530121/

相关文章:

Python JSON 转储对象列表。

java - XML 解析(SAX、StAX)与 Scanner

java - 循环遍历列表并查找字符串是否存在

xml - 如何使用 XSLT 将换行符转换为 <br/>?

xml - XSL : How can I sort a three digit, 连字符分隔值?

delphi - 如何在 Delphi 中停止使用 IVBSAXXMLReader 解析 XML 文档?

ruby - 在 nokogiri 中使用 SAX 解析 XML 时如何搜索它

python - 在 Pandas 中,如何根据多个列的组合创建唯一 ID?

python - 避免比较数字和字符串的错误

Vim 的 Python 和 Django 插件