java - 使用 SAX 获取 DOCTYPE 详细信息 (JDK 7)

标签 java xml-parsing sax

我正在使用 JDK7 附带的 SAX 解析器。我试图获取 DOCTYPE 声明,但 DefaultHandler 中的任何方法似乎都没有被触发。我错过了什么?

import java.io.StringReader;
import javax.xml.parsers.SAXParser;
import javax.xml.parsers.SAXParserFactory;
import org.xml.sax.Attributes;
import org.xml.sax.InputSource;
import org.xml.sax.SAXException;
import org.xml.sax.helpers.DefaultHandler;

public class Problem {

    public static void main(String[] args) throws Exception {
        String xml = "<!DOCTYPE HTML><html><head></head><body></body></html>";
        SAXParser saxParser = SAXParserFactory.newInstance().newSAXParser();
        InputSource in = new InputSource(new StringReader(xml));
        saxParser.parse(in, new DefaultHandler() {

            @Override
            public void startElement(String uri, String localName, String qName,
                    Attributes attributes) throws SAXException {
                System.out.println("Element: " + qName);
            }
        });;
    }
}

这会产生:

Element: html
Element: head
Element: body

想要它产生:

DocType: HTML
Element: html
Element: head
Element: body

如何获取 DocType?


更新:看起来有一个 DefaultHandler2 类需要扩展。我可以用它作为替代品吗?

最佳答案

而不是 DefaultHander ,使用org.xml.sax.ext.DefaultHandler2其中有 startDTD()方法。

Report the start of DTD declarations, if any. This method is intended to report the beginning of the DOCTYPE declaration; if the document has no DOCTYPE declaration, this method will not be invoked.

All declarations reported through DTDHandler or DeclHandler events must appear between the startDTD and endDTD events. Declarations are assumed to belong to the internal DTD subset unless they appear between startEntity and endEntity events. Comments and processing instructions from the DTD should also be reported between the startDTD and endDTD events, in their original order of (logical) occurrence; they are not required to appear in their correct locations relative to DTDHandler or DeclHandler events, however.

Note that the start/endDTD events will appear within the start/endDocument events from ContentHandler and before the first startElement event.

但是,您还必须为 XML Reader 设置 LexicalHandler。

import java.io.StringReader;
import javax.xml.parsers.SAXParser;
import javax.xml.parsers.SAXParserFactory;
import org.xml.sax.Attributes;
import org.xml.sax.InputSource;
import org.xml.sax.SAXException;
import org.xml.sax.ext.DefaultHandler2;

public class Problem{

    public static void main(String[] args) throws Exception {
        String xml = "<!DOCTYPE html><hml><img/></hml>";
        SAXParser saxParser = SAXParserFactory.newInstance().newSAXParser();
        InputSource in = new InputSource(new StringReader(xml));

        DefaultHandler2 myHandler = new DefaultHandler2(){
            @Override
            public void startElement(String uri, String localName, String qName,
                    Attributes attributes) throws SAXException {
                System.out.println("Element: " + qName);
            }

            @Override
            public void startDTD(String name,  String publicId,
            String systemId) throws SAXException {
                System.out.println("DocType: " + name);
            }
        };
        saxParser.setProperty("http://xml.org/sax/properties/lexical-handler",
                               myHandler);
        saxParser.parse(in, myHandler);
    }
}

关于java - 使用 SAX 获取 DOCTYPE 详细信息 (JDK 7),我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/22887764/

相关文章:

javascript - XMLParser 给出错误 '' 无法设置未定义的属性值''

java - eclipse : XML document structures must start and end within the same entity 中的 SaxParseException

Java SAX 解析

xml - 使用 SAX 解析 XML 字符串

java - 检查机器人的给定移动序列是否在 Java 中是圆形的

java - XSLT 中的页脚(xml-html-pdf 转换)

java - 如何导出jar文件?

java - rxjava 中的 throttleLatest 和 throttleLast 有什么区别?

c++ - XML Lite 解析问题 - 解析时忽略无效数据

java - 将 xml 转换为 hashmap,反之亦然