我需要从以下网站提取 COCOA London Dec20 和 Mar21 收盘价 https://www.mrci.com/ohlc/ohlc-all.php Screenshot for the website
我为此编写了以下代码,但它抛出错误,请帮助:
Sub extract()
Dim appIE As Object
Set appIE = CreateObject("internetexplorer.application")
With appIE
.Navigate "https://www.mrci.com/ohlc/ohlc-all.php"
.Visible = False
End With
Do While appIE.Busy
DoEvents
Loop
Set allRowOfData = appIE.document.getElementsByClassName("strat").getElementsByTagName("tbody")(183)
Dim myValue As String: myValue = allRowOfData.Cells(5).innerHTML
appIE.Quit
Set appIE = Nothing
lastrow = Sheets("Sheet2").Cells(Rows.Count, "A").End(xlUp).Row + 1
Range("A" & lastrow).Value = myValue
End Sub
- 代码抛出以下错误:
Run-Time Error - 438 Object doesn't support the property or method.
最佳答案
您的错误是因为您正在集合上调用文档/节点方法。
Set allRowOfData = appIE.document.getElementsByClassName("strat").getElementsByTagName("tbody")(183)
getElementsByClassName("strat")
返回一个集合,您需要对其进行索引,然后使用方法 getElementsByTagName
.
例如
Set allRowOfData = appIE.document.getElementsByClassName("strat")(0).getElementsByTagName("tbody")(183)
总结方法:
由于不同的数据 futures
排列在一长串表行(tr
)节点中,需要确定trs
的正确 block 。由领先future
header ( th
),检查后续兄弟 trs
为右mmmyy
然后从该行中提取右列 ( td
) 值。一个人需要在下一个开始时停止future
block ,或同级 block trs
的末尾;以先到者为准。
tl;博士;
HTML 并不适合快速识别正确的 trs
;使用的索引越多,程序就越脆弱。以下确实有假设,但更稳健。
我使用 XMLHTTP request因为使用浏览器的费用是不必要的。您有一个感兴趣的 future 变量和所有标题 ths
收集者:className
使用 css 类选择器,进入 nodeList
循环直到目标 future
被发现。这会将您置于感兴趣的第一行的开头。各种mmmyy
然后在后续行中。我通过使用辅助函数检查表标题并与定义的目标标题名称进行比较来确定适当的列索引。该函数返回找到 header 的相应索引,如果未找到则返回 -1。
现在,我有一本字典,其中包含 mmmyy
兴趣时期。我循环所有tds
直到我进入下一部分(即下一个 tr
,其标题( th
)作为其 FirstChild
)。我检查每个 FirstChild
连续,如果 mmmyy
找到的值在字典中我用适当的列值更新字典。
在循环中,我的工作级别低于 HTMLDocument
所以,为了利用querySelectorAll
,我转储当前nextNode.NextSibling.OuterHTML
成为代理人 HTMLDocument
多变的;然后我可以访问querySelectorAll
再次,可以选择合适的td
按索引。我需要将传输的html包装在<TABLE><TD></TABLE>
中让 HTML 解析器不提示并获得正确的标签 # td
元素。非常肯定的是,如果我有时间重新审视这一点,我可能会加强这一点。
最后,我将字典值写到工作表中。
VBA:
Option Explicit
Public Sub GetCocoaClosePrices()
Dim html As MSHTML.HTMLDocument, targetPeriods As Object, targetFuture As String, targetColumnName As String
targetFuture = "London Cocoa(LCE)"
targetColumnName = "Close"
Set targetPeriods = CreateObject("Scripting.Dictionary")
Set html = New MSHTML.HTMLDocument
targetPeriods.Add "Dec20", "Not found"
targetPeriods.Add "Mar21", "Not found"
With CreateObject("MSXML2.XMLHTTP")
.Open "GET", "https://www.mrci.com/ohlc/ohlc-all.php", False
.Send
html.body.innerHTML = .responseText
End With
Dim tableHeaders As Object, targetColumnNumber As Long
Set tableHeaders = html.querySelectorAll("tr ~ tr ~ tr .colhead")
targetColumnNumber = GetTargetColumnNumber(tableHeaders, targetColumnName)
If targetColumnNumber = -1 Then Exit Sub
Set targetPeriods = GetUpdatedDictionary(targetPeriods, html, targetFuture, targetColumnNumber)
With ThisWorkbook.Worksheets(1)
.Cells(1, 1).Resize(1, targetPeriods.Count) = targetPeriods.keys
.Cells(2, 1).Resize(1, targetPeriods.Count) = targetPeriods.items
End With
End Sub
Public Function GetUpdatedDictionary(ByRef targetPeriods As Object, ByVal html As HTMLDocument, ByVal targetFuture As String, ByVal targetColumnNumber As Long) As Object
Dim html2 As MSHTML.HTMLDocument, firstChild As Object, i As Long
Dim nextNode As Object, headerNodes As Object
Set headerNodes = html.querySelectorAll(".note1")
Set html2 = New MSHTML.HTMLDocument
For i = 0 To headerNodes.Length - 1
If headerNodes.Item(i).innerText = targetFuture Then 'find the right target future header
Set nextNode = headerNodes.Item(i).ParentNode 'move up to the parent tr node
Do 'walk the adjacent tr nodes
Set nextNode = nextNode.NextSibling
Set firstChild = nextNode.firstChild
If nextNode Is Nothing Then
Set GetUpdatedDictionary = targetPeriods
Exit Function 'exit if no next section
End If
html2.body.innerHTML = "<TABLE><TD>" & nextNode.outerHTML & "</TABLE>"
If targetPeriods.Exists(firstChild.innerText) Then
targetPeriods(firstChild.innerText) = html2.querySelectorAll("td").Item(targetColumnNumber).innerText
End If
Loop While firstChild.tagName <> "TH" 'stop at next section i present
End If
Next
Set GetUpdatedDictionary = targetPeriods
End Function
Public Function GetTargetColumnNumber(ByVal nodeList As Object, ByVal targetColumnName As String) As Long
Dim i As Long
For i = 0 To nodeList.Length - 1
If nodeList.Item(i).innerText = targetColumnName Then
GetTargetColumnNumber = i + 1 'to account for th
Exit Function
End If
Next
GetTargetColumnNumber = -1
End Function
阅读:
引用文献(VBE>工具>引用文献):
- Microsoft HTML 对象库
关于html - 使用 VBA 从网站提取数据时遇到问题,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/59109387/