json - 在需要对象交互的站点上使用 XML HTTP 请求

标签 json xml vba http web-scraping

我正在从事一个从多个网站抓取信息的项目。我有许多网站都没有问题,主要是通过修改 URL 以通过相关标准或通过发布 AJAX 请求来处理它们。我对此很陌生,所以我正在寻求一些帮助。

我遇到了一个网站,我需要与页面上的对象进行交互才能获得更多信息。以下网站就是一个例子:

Example Site

如果您访问该网站并转到底部有更多品牌,点击“查看”将显示更多产品。这些的 HTML 仅在单击后返回。

我使用以下方法从其他网站获取信息。 有没有办法在执行页面对象操作后通过 XML HTTP 方法处理页面?

如有任何帮助,我们将不胜感激。目前我假设我将不得不坚持使用 Internet Explorer 对象来抓取此类网站。

Option Explicit
Public Sub sbKF()

Dim conn As ADODB.Connection
Dim rsIn As ADODB.Recordset
Dim HTMLDoc As HTMLDocument

Dim strUrl As String
Dim strPost As String

Set conn = CurrentProject.Connection

Set rsIn = New ADODB.Recordset

Set HTMLDoc = New MSHTML.HTMLDocument

rsIn.Open pcstrInput, conn, adOpenStatic, adLockReadOnly

rsIn.MoveLast: rsIn.MoveFirst

Do While Not rsIn.EOF

    ' Create the URL and Post submission for input size.
    strUrl = "http://www.[Site].com"
    strPost = "Stage=2&sop=TyreSize&ssq=1&vnp=&vmk=&vch=&vmo=&drd="

    ' Return the Document body results
    HTMLDoc.body.innerHTML = fnPostXmlHttp(strUrl, strPost)

    rsIn.MoveNext

Loop
End Sub

Public Function fnPostXmlHttp(ByVal strUrl As String, ByVal strScript As String)

Dim XMLHttpRequest As Object
Dim strOut As String

Set XMLHttpRequest = CreateObject("MSXML2.XMLHTTP")

XMLHttpRequest.Open "POST", strUrl, False
XMLHttpRequest.setRequestHeader "Content-Type", "application/x-www-form-urlencoded"
XMLHttpRequest.send (strScript)

While XMLHttpRequest.ReadyState <> 4
    DoEvents
Wend

fnPostXmlHttp = XMLHttpRequest.responseText
End Function

最佳答案

如果您查看 www.blackcircles.com 的 HTML 响应,您将看到 javascript 片段:

...
var newTyresActionUrl;
var lookupAddress;

$(document).ready(function () {

    newTyresActionUrl = new BC.classes.productV6SearchPage('https://www.blackcircles.com/order/tyres',
        {"Error":false,"VariantFitments":[{"Name":"All Season","VariantType":11,"SeasonalType":true,"TruckType":false,"FriendlyName":"allseason","Count":17,
        ...
        "TakeoverCss":"\u003clink id=\"brandtakeover-css\" rel=\u0027stylesheet\u0027 type=\u0027text/css\u0027 href=\u0027/templates/bcstyles/css/goodyear-effgrip-perf.css\u0027\u003e"},
        "Width", 
        "Profile",
        "Rim",
        "Speed",
        "Method",
        true,
        ""
    );

    addToBasket = new BC.classes.addtobasket('https://www.blackcircles.com/order/tyres', "order", '/truck/garages');
    ...

其实花括号里面的部分代表一个JSON对象,里面包含了所有显示在网页上的数据。因此,您可以通过 Instr() 从 HTML 内容中提取该 JSON 字符串,解析它,转换为数组并输出到工作表,如下面的示例代码所示。 进口JSON.bas模块到 VBA 项目中进行 JSON 处理。

Option Explicit

Sub Test_blackcircles()

    Dim sResp As String
    Dim vJSON As Variant
    Dim sState As String
    Dim i As Long
    Dim vItem
    Dim aData()
    Dim aHeader()

    With CreateObject("MSXML2.XMLHTTP")
        .Open "GET", "https://www.blackcircles.com/order/tyres/search?width=205&profile=55&rim=R16&speed=V&vehicle-make=&postcode=&delivery=1&findTyre=", False
        .send
        sResp = .responseText
    End With
    sResp = getFragment(sResp, "new BC.classes.productV6SearchPage", "new BC.classes.addtobasket")
    sResp = getFragment(sResp, "{", "}")
    sResp = "{" & sResp & "}"
    JSON.Parse sResp, vJSON, sState
    i = 1
    With Sheets(1)
        .Cells.Delete
        .Cells.WrapText = False
        For Each vItem In Array( _
                "Manufacturers", _
                "CarManufacturers", _
                "All", _
                "Deals", _
                "Best", _
                "Rest", _
                "SearchParams" _
                )
            .Cells(i, 1).Value = vItem
            JSON.ToArray vJSON(vItem), aData, aHeader
            OutputArray .Cells(i + 2, 1), aHeader
            Output2DArray .Cells(i + 3, 1), aData
            .Columns.AutoFit
            i = i + UBound(aData, 1) + 5

        Next
    End With

End Sub

Sub OutputArray(oDstRng As Range, aCells As Variant)

    With oDstRng
        .Parent.Select
        With .Resize(1, UBound(aCells) - LBound(aCells) + 1)
            .NumberFormat = "@"
            .Value = aCells
        End With
    End With

End Sub

Sub Output2DArray(oDstRng As Range, aCells As Variant)

    With oDstRng
        .Parent.Select
        With .Resize( _
                UBound(aCells, 1) - LBound(aCells, 1) + 1, _
                UBound(aCells, 2) - LBound(aCells, 2) + 1)
            .NumberFormat = "@"
            .Value = aCells
        End With
    End With

End Sub

Function getFragment( _
    sourceText As String, _
    startPattern As String, _
    endPattern As String _
) As String

    Dim startPos
    startPos = InStr(sourceText, startPattern)
    If startPos = 0 Then Exit Function
    Dim partText
    partText = Mid(sourceText, startPos + Len(startPattern))
    Dim endPos
    endPos = InStrRev(partText, endPattern)
    If endPos = 0 Then Exit Function
    getFragment = Left(partText, endPos - 1)

End Function

顺便说一句,应用了类似的方法in other answers .

关于json - 在需要对象交互的站点上使用 XML HTTP 请求,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/31327424/

相关文章:

java - 以编程方式从网站 URL 获取 Logo 图标的任何方法?

excel - 函数更改工作表中的值

wcf - JSONP 对您的 API 来说是必须的吗?

jquery - 是否有更有效的方法将表单元素映射到表单提交中的 jSON 字符串?

.net - 如何快速将大量 XML 数据放入我的数据库中?

json - 如何在 Excel VBA 中从这个简单的 JSON 字符串中提取值?

vba - 如何显示重复系列实例的开始日期?

php - Swift4 - 错误域=NSCocoaErrorDomain 代码=4865

javascript - 使用换行符解析 json 并显示为 HTML

java - 无法使用 XStream Java lib 将属性绑定(bind)到对象