我试图声明一个节点数组(这不是问题),然后抓取数组每个元素内两个子节点的 innerHTML
- 以 SE 为例(一个使用 IE
对象方法),假设我试图抓取主页上问题的标题和摘录,有一个节点数组(类名:“question-summary")。
然后有两个子节点(图 block - 类名称:“问题超链接”和提取 - 类名称:“摘录”)我的代码使用如下:
Sub Scraper()
Dim ie As Object
Dim doc As Object, oQuestionShells As Object, oQuestionTitle As Object, oQuestion As Object, oElement As Object
Dim QuestionShell As String, QuestionTitle As String, Question As String, sURL As String
Set ie = CreateObject("internetexplorer.application")
sURL = "https://stackoverflow.com/questions/tagged/excel-formula"
QuestionShell = "question-summary"
QuestionTitle = "question-hyperlink"
Question = "excerpt"
With ie
.Visible = False
.Navigate sURL
End With
Set doc = ie.Document 'Stepping through so doc is getting assigned (READY_STATE = 4)
Set oQuestionShells = doc.getElementsByClassName(QuestionShell)
For Each oElement In oQuestionShells
Set oQuestionTitle = oElement.getElementByClassName(QuestionTitle) 'Assigning this object causes an "Object doesn't support this property or method"
Set oQuestion = oElement.getElementByClassName(Question) 'Assigning this object causes an "Object doesn't support this property or method"
Debug.Print oQuestionTitle.innerHTML
Debug.Print oQuestion.innerHTML
Next
End Sub
最佳答案
getElementByClassName
不是方法。
您只能使用返回 IHTMLElementCollection
的 getElementsByClassName
(注意方法名称中的复数)。
使用 Object
代替 IHTMLElementCollection
很好 - 但您仍然必须通过提供索引来访问集合中的特定元素。
假设对于每个 oElement
,只有一个 question-summary
类实例和一个 question-hyperlink
类实例>。然后,您只需使用 getElementsByClassName
并在末尾使用 (0)
即可取出返回的数组的第一个元素。
所以你的代码更正是:
Set oQuestionTitle = oElement.getElementsByClassName(QuestionTitle)(0)
Set oQuestion = oElement.getElementsByClassName(Question)(0)
完整的工作代码(有一些更新,即使用Option Explicit
并等待页面加载):
Option Explicit
Sub Scraper()
Dim ie As Object
Dim doc As Object, oQuestionShells As Object, oQuestionTitle As Object, oQuestion As Object, oElement As Object
Dim QuestionShell As String, QuestionTitle As String, Question As String, sURL As String
Set ie = CreateObject("internetexplorer.application")
sURL = "https://stackoverflow.com/questions/tagged/excel-formula"
QuestionShell = "question-summary"
QuestionTitle = "question-hyperlink"
Question = "excerpt"
With ie
.Visible = True
.Navigate sURL
Do
DoEvents
Loop While .ReadyState < 4 Or .Busy
End With
Set doc = ie.Document
Set oQuestionShells = doc.getElementsByClassName(QuestionShell)
For Each oElement In oQuestionShells
'Debug.Print TypeName(oElement)
Set oQuestionTitle = oElement.getElementsByClassName(QuestionTitle)(0)
Set oQuestion = oElement.getElementsByClassName(Question)(0)
Debug.Print oQuestionTitle.innerHTML
Debug.Print oQuestion.innerHTML
Next
ie.Quit
End Sub
关于html - 使用VBA从网站中抓取innerHTML,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/44238865/