html - 使用宏将数据从网站抓取到 Excel...丢失

标签 html excel vba web-scraping

我对此完全陌生,但这是我的范围。 我正在运行一个宏来从业务系统中提取数据。 提取此信息后,我想要一个宏来获取某些字段,将它们放入网站表单中,单击“提交”,然后将某些数据结果抓取并粘贴回 Excel 中。 除了抓取并粘贴回 Excel 之外,一切正常。

请帮忙!

我已经搜索了整个堆栈溢出并观看了视频,试图弄清楚我需要做什么,但我一定是误解了一些东西。

Sub Track()
Range("B2").Select

'This should call to PT and deliver tracking info

Dim IE As Object
Dim tbl As Object, td As Object



 Set IE = CreateObject("InternetExplorer.Application") 'Set IEapp = 
 InternetExplorer
 IE.Visible = True

      IE.Navigate "https://www.partstown.com/track-my-order"
      With IEapp
          Do
          DoEvents
          Loop Until IE.readyState = 4



'Input PO and zip
 Call IE.Document.getElementById("orderNo").SetAttribute("value", 
 "4500969111")
'ActiveCell.Offset(0, 2).Select
 Call IE.Document.getElementById("postalCode").SetAttribute("value", 
 "37040")
 IE.Document.forms(7).Submit

 Application.Wait Now + TimeValue("00:00:09")

'this is where i am stuck. I know this isnt right but tried to piece it 
 together
 Set elemCollection = IE.Document.getelElementsByTagname("table.account- 
 table details _tc_table_highlighted")

 For t = 0 To (elemCollection.Length - 1)
 For r = 0 To (elemCollection(t).Rows.Length - 1)
    For c = 0 To (elemCollection(t).Rows(r).Cells.Length - 1)
 ThisWorkbook.Worksheets(1).Cells(r + 1, c + 1) = 
 elemCollection(t).Rows.Cells(c).innertext
 Next c
 Next r
 Next t

 End With


 End Sub

这是我想要它拉的东西: 出货栏目 订购数量 发货数量 并以线性方式显示: 运输、订购数量、发货数量、产品

最佳答案

Internet Explorer:

我已经使它比平常更详细一些,以便您可以看到每个步骤。

关键事项:

1) 正确的页面加载等待 While .Busy Or .readyState < 4: DoEvents: Wend

2) 尽可能通过 id 选择元素。 #是一个CSS id selectorcss selectors querySelector 申请.document 的方法并检索页面中与指定模式匹配的第一个元素

3)需要一个定时循环来等待结果出现

4) 订单数量等信息是换行符分隔的字符串。似乎最容易分割这些换行符,然后通过索引访问结果数组中的各个项目

5) 我根据您的规范,将结果排序到一个数组中,并将该数组一次写入到工作表中

6) “.”是 class selector.order-history__item-descript--min即返回第一个元素 classorder-history__item-descript--min

7) [x=y] 是 attribute = value selector[data-label=Shipping]即返回第一个元素 data-label具有值 Shipping 的属性

8) .details-table a的组合正在使用 descendant combinator ,“”,指定我想要 a具有类 .details-table 的父级的标记元素

VBA:

Option Explicit

'VBE > Tools > References:
' Microsoft Internet Controls
Public Sub RetrieveInfo()
    Dim ie As InternetExplorer, ele As Object, t As Date
    Const MAX_WAIT_SEC As Long = 5

    Set ie = New InternetExplorer

    With ie
        .Visible = True
        .Navigate2 "https://www.partstown.com/track-my-order"

        While .Busy Or .readyState < 4: DoEvents: Wend

        With .document
            .querySelector("#orderNo").Value = "4500969111"
            .querySelector("#postalCode").Value = "37040"
            .querySelector("#orderLookUpForm").submit  
        End With

        While .Busy Or .readyState < 4: DoEvents: Wend

        Dim shipping As String, order As String, items() As String
        With .document
            t = Timer
            Do
                On Error Resume Next
                Set ele = .querySelector("[data-label=Shipping]")
                On Error GoTo 0
                If Timer - t > MAX_WAIT_SEC Then Exit Do
            Loop While ele Is Nothing

            If ele Is Nothing Then Exit Sub

            shipping = ele.innerText
            order = .querySelector(".order-history__item-descript--min").innerText
            items = Split(order, vbNewLine)

            Dim qtyOrdered As Long, qtyShipped As String, product As String

            qtyOrdered = CLng(Replace$(items(0), "Qty Ordered: ", vbNullString))
            qtyShipped = CLng(Replace$(items(1), "Qty Shipped: ", vbNullString))
            product = .querySelector(".details-table a").Title

            Dim results()
            results = Array(shipping, qtyOrdered, qtyShipped, product)
            ThisWorkbook.Worksheets("Sheet1").Cells(1, 1).Resize(1, UBound(results) + 1) = results

        End With
        .Quit
    End With
End Sub

如果您不熟悉 HTML,请查看:

https://developer.mozilla.org/en-US/docs/Web/HTML

如果对 CSS 选择器不熟悉,请查看:

https://flukeout.github.io/


XMLHTTP:

整个事情也可以用 XHR 来完成。这比打开浏览器要快得多。

XHR:

Use XMLHttpRequest (XHR) objects to interact with servers. You can retrieve data from a URL without having to do a full page [render]

在这种情况下,我做了一个初始 GET请求登陆页面检索 CSRFToken在我重新制定 POST 时使用当您手动输入数据并按提交时,页面向服务器发出请求。您可以在服务器响应中获取所需的数据。我在 POST 发送行的正文中传递一个查询字符串 .send "orderNo=4500969111&postalCode=37040&CSRFToken=" & csrft ;您可以在那里看到您的参数。

Option Explicit
Public Sub GetInfo()
    Dim html As HTMLDocument, csrft As String  '<  VBE > Tools > References > Microsoft HTML Object Library
    Set html = New HTMLDocument

    With CreateObject("MSXML2.XMLHTTP")
        .Open "GET", "https://www.partstown.com", False
        .send

        html.body.innerHTML = .responseText

        csrft = html.querySelector("[name=CSRFToken]").Value

        .Open "POST", "https://www.partstown.com/track-my-order", False
        .setRequestHeader "Referer", "https://www.partstown.com/track-my-order"
        .setRequestHeader "User-Agent", "Mozilla/5.0"
        .setRequestHeader "Content-Type", "application/x-www-form-urlencoded"
        .setRequestHeader "Accept", "text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8"
        .setRequestHeader "Accept-Encoding", "gzip, deflate"
        .setRequestHeader "Accept-Language", "en-US,en;q=0.9"
        .send "orderNo=4500969111&postalCode=37040&CSRFToken=" & csrft

        html.body.innerHTML = .responseText
    End With

    Dim shipping As String, order As String, items() As String

    shipping = html.querySelector("[data-label=Shipping]").innerText
    order = html.querySelector(".order-history__item-descript--min").innerText
    items = Split(order, vbNewLine)

    Dim qtyOrdered As Long, qtyShipped As String, product As String

    qtyOrdered = CLng(Replace$(items(0), "Qty Ordered: ", vbNullString))
    qtyShipped = CLng(Replace$(items(1), "Qty Shipped: ", vbNullString))
    product = html.querySelector(".details-table a").Title

    Dim results()
    results = Array(shipping, qtyOrdered, qtyShipped, product)
    ThisWorkbook.Worksheets("Sheet1").Cells(1, 1).Resize(1, UBound(results) + 1) = results
End Sub

循环示例:

Option Explicit

Public Sub GetInfo()
    Dim html As HTMLDocument, csrft As String, lastRow As Long, sourceValues() '<  VBE > Tools > References > Microsoft HTML Object Library
    Set html = New HTMLDocument
    Dim ws As Worksheet, i As Long
    Set ws = ThisWorkbook.Worksheets("Sheet4")
    lastRow = ws.Cells(ws.rows.Count, "B").End(xlUp).Row
    sourceValues = ws.Range("B2:D" & lastRow).Value
    Dim results()
    ReDim results(1 To UBound(sourceValues, 1), 1 To 4)
    With CreateObject("MSXML2.XMLHTTP")
        .Open "GET", "https://www.partstown.com", False
        .send
        html.body.innerHTML = .responseText

        csrft = html.querySelector("[name=CSRFToken]").Value
        Stop
        For i = LBound(sourceValues, 1) To UBound(sourceValues, 1)
            If sourceValues(i, 1) <> vbNullString And sourceValues(i, 3) <> vbNullString Then
                DoEvents
                .Open "POST", "https://www.partstown.com/track-my-order", False
                .setRequestHeader "Referer", "https://www.partstown.com/track-my-order"
                .setRequestHeader "User-Agent", "Mozilla/5.0"
                .setRequestHeader "Content-Type", "application/x-www-form-urlencoded"
                .setRequestHeader "Accept", "text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8"
                .setRequestHeader "Accept-Encoding", "gzip, deflate"
                .setRequestHeader "Accept-Language", "en-US,en;q=0.9"
                .send "orderNo=" & sourceValues(i, 1) & "&postalCode=" & sourceValues(i, 3) & "&CSRFToken=" & csrft

                html.body.innerHTML = .responseText

                Dim shipping As String, order As String, items() As String

                shipping = html.querySelector("[data-label=Shipping]").innerText
                order = html.querySelector(".order-history__item-descript--min").innerText
                items = Split(order, vbNewLine)

                Dim qtyOrdered As Long, qtyShipped As String, product As String

                qtyOrdered = CLng(Replace$(items(0), "Qty Ordered: ", vbNullString))
                qtyShipped = CLng(Replace$(items(1), "Qty Shipped: ", vbNullString))
                product = html.querySelector(".details-table a").Title

                results(i, 1) = shipping
                results(i, 2) = qtyOrdered
                results(i, 3) = qtyShipped
                results(i, 4) = product
            End If
            'Application.Wait Now + TimeSerial(0, 0, 1)
        Next
    End With
    'results written out from row 2 column E
    ws.Cells(2, 5).Resize(UBound(results, 1), UBound(results, 2)) = results
End Sub

关于html - 使用宏将数据从网站抓取到 Excel...丢失,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/55638615/

相关文章:

javascript - 水平格式化 JavaScript 输出而不是垂直列表

javascript - 将对象直接传递给值转换器,而不是手动将其复制到新对象

excel - 如何获取Excel单元格的 "Name Box"名称?

vba - 如何获取Excel中某个范围的大小

html - 在高度为 100% 的 div 中垂直对齐

html - 需要一些 CSS 字体声明的解释

Windows 更新后 Excel 2010 ActiveX 控件不再工作

如果不为空、无效或为 0,则计算单元格的 Excel 公式

excel - 使用 COUNTIF/VLOOKUP 进行库存管理

excel - VBA - Worksheet_Calculate 事件一遍又一遍地触发