html - VBA:使用 <ul 和 <li 以及 <div 和 <span 进行网页抓取

标签 html excel vba web-scraping

我正在使用 VBA 从 HTML 中提取数据 <span <Div 下的代码在 <li 下在 <ul

我正在尝试从 HTML 中提取“日期和事项”。在 Excel 中,“日期”应在 A 列,“事项”应在 B 列。

我的代码的缺点是,它拉取了所有 Datematter成单个细胞。

Sub GetDat()
    Dim IE As New InternetExplorer, html As HTMLDocument
    Dim elem As Object, data As String

    With IE
        .Visible = True
        .navigate "https://www.MyURL/sc/wo/Worders/index?id=76888564"
        Do While .readyState <> READYSTATE_COMPLETE: Loop
        Set html = .document
    End With

    data = ""

    For Each elem In html.getElementsByClassName("simple-list")(0).getElementsByTagName("li")
        data = data & " " & elem.innerText
    Next elem

    Range("A1").Value = data

    IE.Quit
End Sub

我需要的输出如图所示:

HTML:

最佳答案

您可以抓取两个节点列表,一个用于日期,一个用于事项,然后将它们循环写入工作表。根据data-bind属性值匹配dates关于类名的问题:

Dim dates As Object, matters As Object, i As Long, ws As Worksheet

Set ws = ThisWorkbook.Worksheets("Sheet1")
Set dates = ie.document.querySelectorAll("[data-bind^='text:createdDate']") '.wo-notes-col-1 [data-bind^='text:createdDate']
Set matters = ie.document.querySelectorAll(".wo-notes")

With ws

    For i = 0 To dates.Length - 1
        .Cells(i + 1, 1) = dates.Item(i).innertext
        .Cells(i + 1, 2) = matters.Item(i).innertext
    Next

End With

C 列的示例读取值:

Option Explicit

Public Sub GetMatters()
    Dim ws As Worksheet, lastRow As Long, urls(), results(), ie As SHDocVw.InternetExplorer, r As Long

    Set ie = New SHDocVw.InternetExplorer
    Set ws = ThisWorkbook.Worksheets("Sheet1")
    lastRow = ws.Cells(ws.Rows.Count, "C").End(xlUp).Row
    urls = Application.Transpose(ws.Range("C2:C" & lastRow).Value)
    ReDim results(1 To 1000, 1 To 2)

    With ie
        .Visible = True

        For i = LBound(urls) To UBound(urls)
            .navigate2 "https://www.MyURL/sc/wo/Worders/index?id=" & urls(i)
            While .Busy Or .readyState <> 4: DoEvents: Wend

            Dim dates As Object, matters As Object, i As Long

            Set dates = .document.querySelectorAll("[data-bind^='text:createdDate']") '.wo-notes-col-1 [data-bind^='text:createdDate']
            Set matters = .document.querySelectorAll(".wo-notes")

            For i = 0 To dates.Length - 1
                r = r + 1
                results(r, 1) = dates.Item(i).innertext
                results(r, 2) = matters.Item(i).innertext
            Next
            Set dates = Nothing: Set matter = dates
        Next
        .Quit
    End With

    ws.Cells(2, 1).Resize(UBound(results, 1), UBound(results, 2)) = results
End Sub

引用资料:

  1. document.querySelectorAll
  2. css selectors

关于html - VBA:使用 <ul 和 <li 以及 <div 和 <span 进行网页抓取,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/59104000/

相关文章:

html - IE 的 CSS 调试工具

asp.net - 当前页面的颜色(asp.NET masterpage)

html - 如何使用 Google 电子表格作为后端创建 HTML 数据输入表单

excel - 如果单元格的值等于另一列的任何值,则对单元格进行条件格式设置

平均值的 Excel 宏

excel - 无法使用 vba 从已加载的网页导航到另一个网页

excel - 使用 Excel VBA 命令删除除特定 ShapeType 之外的所有形状

javascript - 缺少数字 - CSS inside document.getElementById ("countdown-timer").innerHTML

excel - 是否可以使用 Office 插件创建文档级 Excel 自定义?

vba - 具有多个条件都等于相同值的 IF 语句的替代方案