我有一个大型 XML 文件需要在 VBA(excel 2003 和 2007)中进行解析。 xml 文件中可能有超过 11,000 行数据,每行有 10 到 20 列。仅解析和获取数据(5 - 7 分钟)就成了一项艰巨的任务。我尝试读取 xml 并将每个“行”放入字典(键 = 行号,值 = 行属性),但这需要同样长的时间。
遍历 DOM 需要很长时间。有没有更有效的方法?
Dim XMLDict
Sub ParseXML(ByRef RootNode As IXMLDOMNode)
Dim Counter As Long
Dim RowList As IXMLDOMNodeList
Dim ColumnList As IXMLDOMNodeList
Dim RowNode As IXMLDOMNode
Dim ColumnNode As IXMLDOMNode
Counter = 1
Set RowList = RootNode.SelectNodes("Row")
For Each RowNode In RowList
Set ColumnList = RowNode.SelectNodes("Col")
Dim NodeValues As String
For Each ColumnNode In ColumnList
NodeValues = NodeValues & "|" & ColumnNode.Attributes.getNamedItem("id").Text & ":" & ColumnNode.Text
Next ColumnNode
XMLDICT.Add Counter, NodeValues
Counter = Counter + 1
Next RowNode
End Sub
最佳答案
您可以尝试使用 SAX 而不是 DOM。当您所做的只是解析文档并且文档的大小非常大时,SAX 应该会更快。 MSXML 中 SAX2 实现的引用是 here
对于 Excel 中的大多数 XML 解析,我通常直接使用 DOM,但 SAX 在某些情况下似乎具有优势。空头比较here可能有助于解释它们之间的差异。
这是一个组合在一起的示例(部分基于 this ),仅使用 Debug.Print
进行输出:
通过“工具”>“引用”添加对“Microsoft XML,v6.0”的引用
在普通模块中添加这段代码
Option Explicit
Sub main()
Dim saxReader As SAXXMLReader60
Dim saxhandler As ContentHandlerImpl
Set saxReader = New SAXXMLReader60
Set saxhandler = New ContentHandlerImpl
Set saxReader.contentHandler = saxhandler
saxReader.parseURL "file://C:\Users\foo\Desktop\bar.xml"
Set saxReader = Nothing
End Sub
添加一个类模块,命名为ContentHandlerImpl
并添加如下代码
Option Explicit
Implements IVBSAXContentHandler
Private lCounter As Long
Private sNodeValues As String
Private bGetChars As Boolean
使用模块顶部的左侧下拉菜单选择“IVBSAXContentHandler”,然后使用右侧下拉菜单依次为每个事件添加 stub (来自 characters
到 startPrefixMapping
)
如下在部分 stub 中添加代码
显式设置计数器和标志来显示我们此时是否要读取文本数据
Private Sub IVBSAXContentHandler_startDocument()
lCounter = 0
bGetChars = False
End Sub
每次启动新元素时,检查元素的名称并采取适当的操作
Private Sub IVBSAXContentHandler_startElement(strNamespaceURI As String, strLocalName As String, strQName As String, ByVal oAttributes As MSXML2.IVBSAXAttributes)
Select Case strLocalName
Case "Row"
sNodeValues = ""
Case "Col"
sNodeValues = sNodeValues & "|" & oAttributes.getValueFromName(strNamespaceURI, "id") & ":"
bGetChars = True
Case Else
' do nothing
End Select
End Sub
检查我们是否对文本数据感兴趣,如果我们感兴趣,则切掉任何无关的空白并删除所有换行符(这可能需要也可能不需要,具体取决于您尝试解析的文档)
Private Sub IVBSAXContentHandler_characters(strChars As String)
If (bGetChars) Then
sNodeValues = sNodeValues & Replace(Trim$(strChars), vbLf, "")
End If
End Sub
如果我们已经到达 Col
的末尾,则停止读取文本值;如果我们已经到达 Row
的末尾,则打印出节点值的字符串
Private Sub IVBSAXContentHandler_endElement(strNamespaceURI As String, strLocalName As String, strQName As String)
Select Case strLocalName
Case "Col"
bGetChars = False
Case "Row"
lCounter = lCounter + 1
Debug.Print lCounter & " " & sNodeValues
Case Else
' do nothing
End Select
End Sub
为了使事情更清楚,这里是 ContentHandlerImpl
的完整版本,其中包含所有 stub 方法:
Option Explicit
Implements IVBSAXContentHandler
Private lCounter As Long
Private sNodeValues As String
Private bGetChars As Boolean
Private Sub IVBSAXContentHandler_characters(strChars As String)
If (bGetChars) Then
sNodeValues = sNodeValues & Replace(Trim$(strChars), vbLf, "")
End If
End Sub
Private Property Set IVBSAXContentHandler_documentLocator(ByVal RHS As MSXML2.IVBSAXLocator)
End Property
Private Sub IVBSAXContentHandler_endDocument()
End Sub
Private Sub IVBSAXContentHandler_endElement(strNamespaceURI As String, strLocalName As String, strQName As String)
Select Case strLocalName
Case "Col"
bGetChars = False
Case "Row"
lCounter = lCounter + 1
Debug.Print lCounter & " " & sNodeValues
Case Else
' do nothing
End Select
End Sub
Private Sub IVBSAXContentHandler_endPrefixMapping(strPrefix As String)
End Sub
Private Sub IVBSAXContentHandler_ignorableWhitespace(strChars As String)
End Sub
Private Sub IVBSAXContentHandler_processingInstruction(strTarget As String, strData As String)
End Sub
Private Sub IVBSAXContentHandler_skippedEntity(strName As String)
End Sub
Private Sub IVBSAXContentHandler_startDocument()
lCounter = 0
bGetChars = False
End Sub
Private Sub IVBSAXContentHandler_startElement(strNamespaceURI As String, strLocalName As String, strQName As String, ByVal oAttributes As MSXML2.IVBSAXAttributes)
Select Case strLocalName
Case "Row"
sNodeValues = ""
Case "Col"
sNodeValues = sNodeValues & "|" & oAttributes.getValueFromName(strNamespaceURI, "id") & ":"
bGetChars = True
Case Else
' do nothing
End Select
End Sub
Private Sub IVBSAXContentHandler_startPrefixMapping(strPrefix As String, strURI As String)
End Sub
关于xml - 如何提高 VBA 中 XML 解析的速度,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/5626653/