使用 XSD/trang HTML 5 Relax NG 到 XSD 转换进行 Java HTML 5 验证

标签 java html validation xsd relaxng

我的目标是使用 java 中的 XSD 架构进行独立的 HTML 5 标记验证。

下面我将描述我的方法。 任何帮助都将受到赞赏 - 如果有替代或更好的方法来做到这一点。

WHATTF发布了Relax NG / Schematron schema for HTML 5 .

Trang [3] 是一个针对不同 XML 模式语言的开源转换器,应该能够从 Relax NG 转换为 XSD。使用 WHATTF 模式,可以按如下方式进行 trang 转换器调用:

$ java -jar ./trang.jar ./whattf/syntax/relaxng/html5.rnc html5.xsd

但是,trang 会针对不正确的类型转换产生以下许多警告:

whattf/syntax/relaxng/applications.rnc:265:51: warning: cannot convert datatype library "http://whattf.org/datatype-draft"; using datatype "string"
[...]

我认为要使 trang 工作,需要将可插入数据类型 [4] 传递给 jing。靖 [5] 是一个 Relax NG validator ,我认为它是由 trang 使用的。

在whattf/syntax/relaxng/datatype 文件夹中,提供了这些可插入数据类型的java 实现。因此,我创建了一个 html5-datatypes.jar 并将其添加到 trangs 类路径,如下所示:

$ java -cp ./html5-datatypes.jar  -jar ./trang.jar ./whattf/syntax/relaxng/html5.rnc html5.xsd

但是,这会导致相同的错误。

除此之外,将创建的 XSD 文件与 javax.xml.validation.Validator 一起使用,如下所示:

SchemaFactory schemaFactory = SchemaFactory.newInstance("http://www.w3.org/2001/XMLSchema");
Schema schema = schemaFactory.newSchema( new File("html5.xsd") );
Validator validator = schema.newValidator();
validator.validate( new StreamSource( new File("example.html") ) );

产生异常:

org.xml.sax.SAXParseException: cos-element-consistent: Error for type 'time.inner'. Multiple elements with name 'script', with different types, appear in the model group.
at org.apache.xerces.util.ErrorHandlerWrapper.createSAXParseException(Unknown Source)
at org.apache.xerces.util.ErrorHandlerWrapper.error(Unknown Source)
at org.apache.xerces.impl.XMLErrorReporter.reportError(Unknown Source)
at org.apache.xerces.impl.XMLErrorReporter.reportError(Unknown Source)
at org.apache.xerces.impl.xs.XSConstraints.reportSchemaError(Unknown Source)
at org.apache.xerces.impl.xs.XSConstraints.fullSchemaChecking(Unknown Source)
at org.apache.xerces.impl.xs.XMLSchemaLoader.loadGrammar(Unknown Source)
at org.apache.xerces.impl.xs.XMLSchemaLoader.loadGrammar(Unknown Source)
at org.apache.xerces.jaxp.validation.XMLSchemaFactory.newSchema(Unknown Source)
at javax.xml.validation.SchemaFactory.newSchema(SchemaFactory.java:594)
at javax.xml.validation.SchemaFactory.newSchema(SchemaFactory.java:610)

[3] thaiopensource.com/relaxng/trang.html

[4] thaiopensource.com/relaxng/pluggable-datatypes.html

[5] thaiopensource.com/relaxng/jing.html

最佳答案

摘自 html5 规范第 8.2 节

http://www.w3.org/TR/2011/WD-html5-20110113/parsing.html

While the HTML syntax described in this specification bears a close resemblance to SGML and XML, it is a separate language with its own parsing rules.

Some earlier versions of HTML (in particular from HTML2 to HTML4) were based on SGML and used SGML parsing rules. However, few (if any) web browsers ever implemented true SGML parsing for HTML documents; the only user agents to strictly handle HTML as an SGML application have historically been validators. The resulting confusion — with validators claiming documents to have one representation while widely deployed Web browsers interoperably implemented a different representation — has wasted decades of productivity. This version of HTML thus returns to a non-SGML basis.

Authors interested in using SGML tools in their authoring pipeline are encouraged to use XML tools and the XML serialization of HTML.

因此,除非您编写的是 xml 形式的 html5,否则无法使用 XSD 来验证 html5。

关于使用 XSD/trang HTML 5 Relax NG 到 XSD 转换进行 Java HTML 5 验证,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/14876285/

相关文章:

java - 如果测试运行正常,为什么我会收到反射异常 NoSuchMethodException

java - Docx4j:在父级下的当前项目之后插入项目 X 次

java - 如何根据另一个保管箱的选择将数据填充到保管箱中?

html - 如何让表格填充父div高度但不超过

javascript - 通过 JS (quilljs) 注入(inject)的 HTML 元素与 flexbox 冲突

javascript - 为什么不检查正确的值? (Javascript/HTML)

java - 将带有用点表示的路径的字符串转换为文件名

使用 App Engine 从打印机进行 Java 打印

oracle - PL/SQL 检查日期是否有效

ruby-on-rails - 无效的创建方法不会引发任何错误