Excel VBA 从超链接列表中提取 Web 数据

标签 excel vba web-scraping excel-web-query

我在工作表 1 的 C 列中有一个超链接列表,我想从每个链接中提取数据,并将每个链接的数据放在已经创建的单独工作表中。所有的超链接都指向同一个网站……职业足球引用……但每个链接都针对不同的 NFL 球员。我想为每个玩家提取相同的数据表。我已经能够从第一个链接中提取数据并将其按原样放入表 2 中,但我对 VBA 非常陌生,无法弄清楚如何为列表中的每个链接创建一个循环来执行此操作和把它放在其他纸上。以下是我目前必须从第一个链接获取数据的代码:

Sub passingStats()
Dim x As Long, y As Long
Dim htm As Object

Set htm = CreateObject("htmlFile")

With CreateObject("msxml2.xmlhttp")
    .Open "GET", Range("C2"), False
    .send
    htm.body.innerhtml = .responsetext
End With

With htm.getelementbyid("passing")
    For x = 0 To .Rows.Length - 1
        For y = 0 To .Rows(x).Cells.Length - 1
            Sheets(2).Cells(x + 4, y + 1).Value = .Rows(x).Cells(y).innertext
        Next y
        Next x
End With

End Sub

任何帮助将不胜感激。

最佳答案

下面显示了使用循环。

注:

  • 您的表格写入中有一个逻辑缺陷,我为此编写了一个补丁。
  • 某些字符串在您的脚本中被错误地转换。我的前缀是 '阻止这个。

  • 代码:
    Option Explicit
    Public Sub GetInfo()
        Dim html As New HTMLDocument, links(), link As Long, wsSourceSheet As Worksheet
        Dim hTable As HTMLTable, ws As Worksheet, playerName As String
        Set wsSourceSheet = ThisWorkbook.Worksheets("Sheet1") '<change to sheet containing links
        Application.ScreenUpdating = False
        With wsSourceSheet
            links = .Range("C2:C" & .Cells(.Rows.Count, "C").End(xlUp).Row).Value
        End With
        For link = LBound(links, 1) To UBound(links, 1)
            If InStr(links(link, 1), "https://") > 0 Then
                Set html = GetHTMLDoc(links(link, 1))
                Set hTable = html.getElementById("passing")
                If Not hTable Is Nothing Then
                    playerName = GetNameAbbr(links(link, 1))
                    Set ws = AddPlayerSheet(playerName)
                    WriteTableToSheet hTable, ws
                    FixTable ws
                End If
            End If
        Next
        Application.ScreenUpdating = True
    End Sub
    
    Public Function GetHTMLDoc(ByVal url As String) As HTMLDocument
        Dim sResponse As String, html As New HTMLDocument
        With CreateObject("MSXML2.XMLHTTP")
            .Open "GET", url, False
            .send
            sResponse = StrConv(.responseBody, vbUnicode)
        End With
        sResponse = Mid$(sResponse, InStr(1, sResponse, "<!DOCTYPE "))
        html.body.innerHTML = sResponse
        Set GetHTMLDoc = html
    End Function
    
    Public Sub WriteTableToSheet(ByVal hTable As HTMLTable, ByVal ws As Worksheet)
        Dim x As Long, y As Long
        With hTable
            For x = 0 To .Rows.Length - 1
                For y = 0 To .Rows(x).Cells.Length - 1
                    If y = 6 Or y = 7 Then
                        ws.Cells(x + 4, y + 1).Value = Chr$(39) & .Rows(x).Cells(y).innerText
                    Else
                        ws.Cells(x + 4, y + 1).Value = .Rows(x).Cells(y).innerText
                    End If
                Next y
            Next x
        End With
    End Sub
    
    Public Function GetNameAbbr(ByVal url As String) As String
        Dim tempArr() As String
        tempArr = Split(url, "/")
        GetNameAbbr = Left$(tempArr(UBound(tempArr)), 6)
    End Function
    
    Public Function AddPlayerSheet(ByVal playerName As String) As Worksheet
        Dim ws As Worksheet
        If SheetExists(playerName) Then
            Application.DisplayAlerts = False
            ThisWorkbook.Worksheets(playerName).Delete
            Application.DisplayAlerts = True
        End If
        Set ws = ThisWorkbook.Worksheets.Add
        ws.Name = playerName
        Set AddPlayerSheet = ws
    End Function
    
    Public Function SheetExists(ByVal playerName As String) As Boolean
        SheetExists = Evaluate("ISREF('" & playerName & "'!A1)")
    End Function
    
    Public Sub FixTable(ByVal ws As Worksheet)
        Dim found As Range, numSummaryRows As Long
        With ws
            Set found = .Columns("A").Find("Career")
            If found Is Nothing Then Exit Sub
            numSummaryRows = .Cells(.Rows.Count, "A").End(xlUp).Row - found.Row
            numSummaryRows = IIf(numSummaryRows = 0, 1, numSummaryRows + 1)
            Debug.Print found.Offset(, 1).Resize(numSummaryRows, 30).Address, ws.Name
            found.Offset(, 1).Resize(numSummaryRows, 30).Copy found.Offset(, 2)
            found.Offset(, 1).Resize(numSummaryRows, 1).ClearContents
        End With
    End Sub
    

    sheet1 中的测试链接:

    Sheet1

    示例网页:

    sample results

    对应代码写出:

    Sheet write out

    关于Excel VBA 从超链接列表中提取 Web 数据,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/51577153/

    相关文章:

    用于删除包含某些文本字符串的所有行的 Vba 脚本会给出不匹配错误以及转置脚本

    excel - 在 Excel 中将文本存储为数字

    excel - 如何保存在“另存为”对话框中选择的文件?

    vba - 如何从excel vba中的 `find`方法中获取多个单元格地址

    python - 如何为 subreddit 构建网页抓取功能?

    vba - 使用 VBA 从列表中消除重复项并将结果复制到单独的工作表中

    excel - VBA删除不包含特定值的行的更快或最佳选择?

    VBA:为什么 Not 运算符会停止工作?

    python - "Permission Denied"错误废墟 Selenium 刮

    Python BeautifulSoup 在特定标签之后提取文本