我正在从事一个从多个网站抓取信息的项目。我有许多网站都没有问题,主要是通过修改 URL 以通过相关标准或通过发布 AJAX
请求来处理它们。我对此很陌生,所以我正在寻求一些帮助。
我遇到了一个网站,我需要与页面上的对象进行交互才能获得更多信息。以下网站就是一个例子:
如果您访问该网站并转到底部有更多品牌,点击“查看”将显示更多产品。这些的 HTML
仅在单击后返回。
我使用以下方法从其他网站获取信息。 有没有办法在执行页面对象操作后通过 XML HTTP 方法处理页面?
如有任何帮助,我们将不胜感激。目前我假设我将不得不坚持使用 Internet Explorer 对象来抓取此类网站。
Option Explicit
Public Sub sbKF()
Dim conn As ADODB.Connection
Dim rsIn As ADODB.Recordset
Dim HTMLDoc As HTMLDocument
Dim strUrl As String
Dim strPost As String
Set conn = CurrentProject.Connection
Set rsIn = New ADODB.Recordset
Set HTMLDoc = New MSHTML.HTMLDocument
rsIn.Open pcstrInput, conn, adOpenStatic, adLockReadOnly
rsIn.MoveLast: rsIn.MoveFirst
Do While Not rsIn.EOF
' Create the URL and Post submission for input size.
strUrl = "http://www.[Site].com"
strPost = "Stage=2&sop=TyreSize&ssq=1&vnp=&vmk=&vch=&vmo=&drd="
' Return the Document body results
HTMLDoc.body.innerHTML = fnPostXmlHttp(strUrl, strPost)
rsIn.MoveNext
Loop
End Sub
Public Function fnPostXmlHttp(ByVal strUrl As String, ByVal strScript As String)
Dim XMLHttpRequest As Object
Dim strOut As String
Set XMLHttpRequest = CreateObject("MSXML2.XMLHTTP")
XMLHttpRequest.Open "POST", strUrl, False
XMLHttpRequest.setRequestHeader "Content-Type", "application/x-www-form-urlencoded"
XMLHttpRequest.send (strScript)
While XMLHttpRequest.ReadyState <> 4
DoEvents
Wend
fnPostXmlHttp = XMLHttpRequest.responseText
End Function
最佳答案
如果您查看 www.blackcircles.com 的 HTML 响应,您将看到 javascript 片段:
...
var newTyresActionUrl;
var lookupAddress;
$(document).ready(function () {
newTyresActionUrl = new BC.classes.productV6SearchPage('https://www.blackcircles.com/order/tyres',
{"Error":false,"VariantFitments":[{"Name":"All Season","VariantType":11,"SeasonalType":true,"TruckType":false,"FriendlyName":"allseason","Count":17,
...
"TakeoverCss":"\u003clink id=\"brandtakeover-css\" rel=\u0027stylesheet\u0027 type=\u0027text/css\u0027 href=\u0027/templates/bcstyles/css/goodyear-effgrip-perf.css\u0027\u003e"},
"Width",
"Profile",
"Rim",
"Speed",
"Method",
true,
""
);
addToBasket = new BC.classes.addtobasket('https://www.blackcircles.com/order/tyres', "order", '/truck/garages');
...
其实花括号里面的部分代表一个JSON对象,里面包含了所有显示在网页上的数据。因此,您可以通过 Instr()
从 HTML 内容中提取该 JSON 字符串,解析它,转换为数组并输出到工作表,如下面的示例代码所示。 进口JSON.bas模块到 VBA 项目中进行 JSON 处理。
Option Explicit
Sub Test_blackcircles()
Dim sResp As String
Dim vJSON As Variant
Dim sState As String
Dim i As Long
Dim vItem
Dim aData()
Dim aHeader()
With CreateObject("MSXML2.XMLHTTP")
.Open "GET", "https://www.blackcircles.com/order/tyres/search?width=205&profile=55&rim=R16&speed=V&vehicle-make=&postcode=&delivery=1&findTyre=", False
.send
sResp = .responseText
End With
sResp = getFragment(sResp, "new BC.classes.productV6SearchPage", "new BC.classes.addtobasket")
sResp = getFragment(sResp, "{", "}")
sResp = "{" & sResp & "}"
JSON.Parse sResp, vJSON, sState
i = 1
With Sheets(1)
.Cells.Delete
.Cells.WrapText = False
For Each vItem In Array( _
"Manufacturers", _
"CarManufacturers", _
"All", _
"Deals", _
"Best", _
"Rest", _
"SearchParams" _
)
.Cells(i, 1).Value = vItem
JSON.ToArray vJSON(vItem), aData, aHeader
OutputArray .Cells(i + 2, 1), aHeader
Output2DArray .Cells(i + 3, 1), aData
.Columns.AutoFit
i = i + UBound(aData, 1) + 5
Next
End With
End Sub
Sub OutputArray(oDstRng As Range, aCells As Variant)
With oDstRng
.Parent.Select
With .Resize(1, UBound(aCells) - LBound(aCells) + 1)
.NumberFormat = "@"
.Value = aCells
End With
End With
End Sub
Sub Output2DArray(oDstRng As Range, aCells As Variant)
With oDstRng
.Parent.Select
With .Resize( _
UBound(aCells, 1) - LBound(aCells, 1) + 1, _
UBound(aCells, 2) - LBound(aCells, 2) + 1)
.NumberFormat = "@"
.Value = aCells
End With
End With
End Sub
Function getFragment( _
sourceText As String, _
startPattern As String, _
endPattern As String _
) As String
Dim startPos
startPos = InStr(sourceText, startPattern)
If startPos = 0 Then Exit Function
Dim partText
partText = Mid(sourceText, startPos + Len(startPattern))
Dim endPos
endPos = InStrRev(partText, endPattern)
If endPos = 0 Then Exit Function
getFragment = Left(partText, endPos - 1)
End Function
顺便说一句,应用了类似的方法in other answers .
关于json - 在需要对象交互的站点上使用 XML HTTP 请求,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/31327424/