VBA 提取源代码文本

标签 vba excel web-scraping

我想发送值 $871.63 从以下 HTML 提取到我的电子表格:

<tr>
   <th>Insurer</th>
   <th>12-month Price</th>
   <th>6-month Price</th>
   <th class="print-hide-th">Price Breakdown</th>
   <th>Phone</th>
   <th>Web Site</th>
</tr>
</thead>
<tbody>
   <tr>
      <td>AAMI</td>
      <td><span id="MainPlaceHolder_lblAAMIFull">$871.63</span></td>
      <td><span id="MainPlaceHolder_lblAAMIHalf">$447.12</span></td>
      <td class="print-hide-td"><a href="PriceBreakDown.aspx?companyName=AAMI&policyCost=$871.63&region=Metro&class=1&startDate=2/01/2016" class="info">more information</a></td>
      <td class="no-wrap">132 244</td>

我试过了
Cells(1, 1) = IE.Document.getElementByID("MainPlaceHolder_lblAAMIFull").outerText

但这不起作用 - 你能提出一个解决方案吗?

这是我尝试的整个代码,直到最后一行:
Sub GetQuotes()

On Error Resume Next

Dim IE As Object

Set IE = CreateObject("InternetExplorer.Application")

IE.navigate ("http://prices.maa.nsw.gov.au/")

IE.Visible = True

Do
DoEvents
Loop Until IE.readystate = 4

'STEP 1

IE.Document.getElementByID("Q1a_Day").SelectedIndex = 1
IE.Document.getElementByID("Q1b_Month").SelectedIndex = 1
IE.Document.getElementByID("Q1c_Year").SelectedIndex = 1

IE.Document.getElementByID("btnNext").Click


'STEP 2

IE.Document.getElementByID("Q2a_VehicleClass").SelectedIndex = 1

'YOU OFTEN NEED THE WEBPAGE TO UPDATE BEFORE YOU CAN MANIPULATE OTHER FIELDS - SEE dispatchEvent lines

Set evt = IE.Document.createEvent("HTMLEvents")
evt.initEvent "change", True, False
Set lst = IE.Document.getElementByID("Q3_VehicleYear")
lst.Value = 2015
lst.dispatchEvent evt


IE.Document.getElementByID("Q3b_VehicleMake").SelectedIndex = 2
IE.Document.getElementByID("Q3b_VehicleMake").dispatchEvent evt


IE.Document.getElementByID("Q3c_VehicleModel").SelectedIndex = 1
IE.Document.getElementByID("Q3c_VehicleModel").dispatchEvent evt


IE.Document.getElementByID("Q4_Postcode").Value = 2000
IE.Document.getElementByID("Q4_Postcode").dispatchEvent evt

IE.Document.getElementByID("Q5_CompanyOwned").SelectedIndex = 2

IE.Document.getElementByID("Q6_Usage").SelectedIndex = 2

IE.Document.getElementByID("Q7_CurrentCTP").Value = "N"

IE.Document.getElementByID("btnNext").Click


'STEP 3

IE.Document.getElementByID("Q7_CurrentCTP").SelectedIndex = 1
IE.Document.getElementByID("Q7_CurrentCTP").dispatchEvent evt

IE.Document.getElementByID("Q8_CurrentCTPCompany").SelectedIndex = 1

IE.Document.getElementByID("Q10_OtherInsurance").SelectedIndex = 1
IE.Document.getElementByID("Q10_OtherInsurance").dispatchEvent evt

IE.Document.getElementByID("Q11_OtherInsuranceCompany").SelectedIndex = 1
IE.Document.getElementByID("Q11_OtherInsuranceCompany").dispatchEvent evt

IE.Document.getElementByID("Q12_OtherInsuranceYears").SelectedIndex = 1

IE.Document.getElementByID("Q13a_NoClaimDiscount").SelectedIndex = 1

IE.Document.getElementByID("btnNext").Click


'STEP 4

IE.Document.getElementByID("Q14_OwnerAge").Value = 25
IE.Document.getElementByID("Q14_OwnerAge").dispatchEvent evt

IE.Document.getElementByID("Q15_Demerits").SelectedIndex = 1

IE.Document.getElementByID("Q16_DriverAge").Value = 23
IE.Document.getElementByID("Q16_DriverAge").dispatchEvent evt


IE.Document.getElementByID("Q17_Accidents2yr").SelectedIndex = 1
IE.Document.getElementByID("Q17_Accidents2yr").dispatchEvent evt

IE.Document.getElementByID("Q19_Convictions").SelectedIndex = 1
IE.Document.getElementByID("Q19_Convictions").dispatchEvent evt

IE.Document.getElementByID("Q17b_YearsDriverWLic").SelectedIndex = 1
IE.Document.getElementByID("Q17b_YearsDriverWLic").dispatchEvent evt

IE.Document.getElementByID("NRMARadioYes").Click

IE.Document.getElementByID("Q18_Roadside").SelectedIndex = 3
IE.Document.getElementByID("Q18_Roadside").dispatchEvent evt

IE.Document.getElementByID("btnNext").Click


'STEP 5

IE.Document.getElementByID("btnSubmit").Click

'GET PRICES

Cells(1, 1) = IE.Document.getElementByID("MainPlaceHolder_lblAAMIFull").innerText 



End Sub

我将非常感谢您的解决方案!

最佳答案

没有 outerText属性(property)。有一个innerHTMLouterHTML但我认为您正在寻找 innerText属性(property)。

Cells(1, 1) = IE.Document.getElementByID(&quot;MainPlaceHolder_lblAAMIFull&quot;).innerText

更新: J Ried 关于页面未加载是正确的。 On Error Resume Next阻止您发现错误。问题是单击提交按钮后,页面更新需要一些时间。您试图找到尚未加载的元素。我添加了 4 秒的延迟并且代码正确执行。

在模块顶部:
#If VBA7 Then
    Public Declare PtrSafe Sub Sleep Lib "kernel32" (ByVal dwMilliseconds As LongPtr) 'For 64 Bit Systems
#Else
    Public Declare Sub Sleep Lib "kernel32" (ByVal dwMilliseconds As Long) 'For 32 Bit Systems
#End If

单击提交按钮后添加 4 秒延迟。
'STEP 5

IE.document.getElementById("btnSubmit").Click

Sleep 4000

关于VBA 提取源代码文本,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/40252415/

相关文章:

excel - 刷新 Excel 表格过滤器和使用 VBA 排序时出错

excel - 无法使用 VBA 在 Excel 中插入图片

sql - 在 Excel 2013 中查询超过 65536 行错误

vba - 如何在 Excel VBA 中仅迭代自动筛选工作表中的行?

python - 构建WebScraper时出错: TypeError: 'NoneType' object has no attribute '__getitem__'

VBA:如何按定义的名称显示单元格值?

excel - 数据透视表-VBA 错误

excel - 在指令中使用变量的内容

web-scraping - Puppeteer - 如何在 page.evaluate() 中使用 page.click()

python - 抓取 PowerBI 仪表板报告 : expanding rows and loading/saving values in a scrolling table