excel - 如何使用 VBA 中的 getelementsbyclassname 从在线零售商的网页上抓取产品名称和定价数据?

标签 excel web-scraping html vba

我编写了一个宏来从零售商的网页上抓取产品信息。它运行良好,但不会在我的工作表中呈现任何结果。我很难理解为什么。我在搜索输入框中输入“sale”,导致以下 url:

http://www.shopjustice.com/search/?q=sale&originPageName=home

我想要工作表中的产品名称、以前的价格和当前价格。这 这些元素的 HTML 如下:

<div class="subCatName">
            <a href="/girls-clothing/colored-jeggings/6611358/651?pageSort=W3sidHlwZSI6InJlbGV2YW5jZSIsInZhbCI6IiJ9XQ==&amp;productOrigin=search%20page&amp;productGridPlacement=1-1" id="anchor2_6611358" class="auxSubmit">Colored Jeggings</a>
        </div>
<div class="cat-list-price subCatPrice">
            <div class="priceContainer">
                <span class="mobile-was-price">
                            was 
                            $26.90</span>
                       <span class="mobile-now-price">
                           now 
                           $10.49</span>
                    </div>

            <div class="price_description">
                        <span class="mobile-extra">
                            Extra 30% off clearance!</span>
                    </div>              
                </div>

代码如下:

Sub test2()

Dim RowCount, erow As Long
Dim sht As Object
Dim ele As IHTMLElement
Dim eles As IHTMLElementCollection
Dim doc As HTMLDocument

Set sht = Sheets("JUSTICESALE")
RowCount = 1
sht.Range("A" & RowCount) = "Clothing Item"
sht.Range("B" & RowCount) = "SKU"
sht.Range("C" & RowCount) = "Former Price"
sht.Range("D" & RowCount) = "Sale Price"

Set ie = CreateObject("InternetExplorer.application")
searchterm = InputBox("ENTER SEARCH TERM")

Application.StatusBar = "LOADING JUSTICE SEARCH"
With ie
.Visible = True
.navigate "http://www.shopjustice.com/"

Do While .busy Or _
.readystate <> 4
DoEvents
Loop

Set doc = ie.document

doc.getelementsbyname("q").Item.innertext = searchterm
doc.getElementsByClassName("searchbtn").Item.Click

Application.StatusBar = "EXTRACTING PRODUCT DATA"

Set eles = doc.getElementsByClassName("subCatName")
For Each ele In eles
If ele.className = "subCatName" Then
erow = sht.Cells(Rows.count, 1).End(xlUp).Offset(1, 0).Row
Cells(erow, 1) = doc.getElementsByClassName("auxSubmit")(RowCount).innertext
Cells(erow, 2) = doc.getElementsByClassName("mobile-was-price")(RowCount).innertext
RowCount = RowCount + 1

End If

Next ele

End With

Set ie = Nothing

Application.StatusBar = ""

End Sub

如有任何帮助,我们将不胜感激。

编辑: 嗨,彼得,我很欣赏你的洞察力。它肯定已经预先解决了一些问题。但是,在 edited-to-account-for-missing classname 循环之前添加以下代码后,它仍然没有写入 excel。

Do While ie.readyState <> READYSTATE_COMPLETE
DoEvents
Loop

我错过了什么?

我还为不同零售商的网页呈现了另一种方法,尽管概念相同,如下所示。您对这种方法有何看法?我唯一的问题是 Select Case 行出现 Permission Denied Error 70。

Sub test5()

Dim erow As Long
Dim ele As Object

Set sht = Sheets("CARTERS")
RowCount = 1
sht.Range("A" & RowCount) = "Clothing Item"
sht.Range("B" & RowCount) = "SKU"
sht.Range("C" & RowCount) = "Former Price"
sht.Range("D" & RowCount) = "Sale Price"

erow = Sheet1.Cells(Rows.count, 1).End(xlUp).Offset(1, 0).Row

Set objIE = CreateObject("Internetexplorer.application")

searchterm = InputBox("ENTER CARTER'S SEARCH TERM")

With objIE
.Visible = True
.navigate "http://www.carters.com/"

Do While .Busy Or _
.readyState <> 4
DoEvents
Loop

.document.getElementsByName("q").Item.innerText = searchterm
.document.getElementsByClassName("btn_search").Item.Click

Do While .readyState <> READYSTATE_COMPLETE
DoEvents
Loop

For Each ele In .document.all
Select Case ele.className

Case “product - name”
RowCount = RowCount + 1
sht.Range("A" & RowCount) = ele.innerText

Case “product - standard - price”
sht.Range("B" & RowCount) = ele.innerText

Case "product-sales-price"
sht.Range("C" & RowCount) = ele.innerText

End Select
Next ele
End With

Set objIE = Nothing

End Sub

再次感谢您的帮助。

最佳答案

您的代码运行良好,但有两个注意事项...

首先,在您“点击”主页上的搜索按钮后,您的代码不会等待结果页面加载。因此,您查找每个项目的循环失败,因为那里(还)没有任何东西。

其次,当您为某些字段解析 HTML 时,您需要一些错误处理来处理这些字段丢失的情况。例如,查看此处的代码并将其应用于您的情况:

For Each ele In eles
    If ele.className = "subCatName" Then
        erow = sht.Cells(Rows.Count, 1).End(xlUp).Offset(1, 0).Row
        On Error Resume Next
        Cells(erow, 1) = doc.getElementsByClassName("auxSubmit")(RowCount).innerText
        If Err.Number <> 0 Then
            Cells(erow, 1) = "ERR: 'auxSubmit' Class Name Not Found!"
            Err.Clear
        Else
        End If
        Cells(erow, 2) = doc.getElementsByClassName("mobile-was-price")(RowCount).innerText
        If Err.Number <> 0 Then
            Cells(erow, 2) = "ERR: 'mobile-was-price' Class Name Not Found!"
            Err.Clear
        End If
        On Error GoTo 0
        RowCount = RowCount + 1
    End If
Next ele

关于excel - 如何使用 VBA 中的 getelementsbyclassname 从在线零售商的网页上抓取产品名称和定价数据?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/36559520/

相关文章:

python - 使用 beautifulsoup 解析带有一些文本的标签

java - 使用 JSOUP 解析动态加载(通过滚动)页面

web-scraping - 获取所有 ChatGPT session 的提示和对话列表

javascript - 如何使 &lt;input type ="text"/> 的字符串/内容为其值?

javascript - 与分页一起使用的本地存储

asp.net-mvc - ASP.NET MVC : Downloading an excel file

excel - 如何在匹配另一个单元格的同时查找一个单元格

python - 过滤 Excel 文件并将结果输出到另一个 Excel

excel - MATCH 与正在运行的lookup_value

php - 将 worldpay 示例拼接在一起