html - 从 div、class 和 span 元素中抓取网页

标签 html vba excel web-scraping

我想从 S&P Down Jones Indices web site 中提取数据.相关数据嵌入在这段代码中:

<div class="indices-detail-container">
  <div id="all-indices-slider" class="slides" style="float: none; position: absolute; top: 0px; left: -5px; margin: 0px; width: 6318px; height: 113px;">

   <div class="index-detail">
     <h5><a href="/indices/equity/dow-jones-sustainability-chile-index-clp" title="DJSI Chile" contentidentifier="2e9cb165-0cbf-4070-a5ef-dc20bf6219ba" contenttype="web-page" contenttitle="Dow Jones Sustainability™ Chile Index (CLP)">DJSI Chile</a></h5>
     <span class="return-value">943.76 </span>
     <span class="daily-change  up ">0.07% ▲</span>
   </div>

   <div class="index-detail">
     <h5><a href="/indices/equity/sp-bvl-peru-general-index-pen" title="S&amp;P/BVL Peru General Index (PEN)" contentidentifier="cec2fa99-13f9-4bf5-9770-4832d86dc017" contenttype="web-page" contenttitle="S&amp;P/BVL Peru General Index (PEN)">S&amp;P/BVL Peru General Index ...</a></h5>
     <span class="return-value">9,922.82 </span>
     <span class="daily-change  down ">-0.04% ▼ </span>
   </div>

   <div class="index-detail">
     <h5><a href="/indices/equity/sp-bvl-peru-select-index" title="S&amp;P/BVL Peru Select Index" contentidentifier="162ea564-b038-493c-a3bc-5f56bda60bb4" contenttype="web-page" contenttitle="S&amp;P/BVL Peru Select Index">S&amp;P/BVL Peru Select Index</a></h5>
     <span class="return-value">188.02 </span>
     <span class="daily-change  up "> 0.18% ▲ </span>
   </div>

   <div class="index-detail last">
     <h5><a href="/indices/equity/sp-bvl-lima-25-index-pen" title="S&amp;P/BVL LIMA 25 Index (PEN)" contentidentifier="12f6a899-f5f6-4c6f-9a82-9db3da8d2821" contenttype="web-page" contenttitle="S&amp;P/BVL LIMA 25 Index (PEN)">S&amp;P/BVL LIMA 25 Index (PEN)</a></h5>
     <span class="return-value">13,153.1 </span>
     <span class="daily-change  down "> -0.3% ▼ </span>
   </div>

   <div class="index-detail no-bottom-border">
     <h5><a href="/indices/equity/sp-bvl-mining-index-pen" title="S&amp;P/BVL Mining Index (PEN)" contentidentifier="2bef26d1-5720-457f-838a-761a176b06a6" contenttype="web-page" contenttitle="S&amp;P/BVL Mining Index (PEN)">S&amp;P/BVL Mining Index (PEN)</a></h5>
     <span class="return-value">117.81 </span>
     <span class="daily-change  up "> 1.15% ▲ </span>
   </div>

   <div class="index-detail no-bottom-border">
     <h5><a href="/indices/equity/sp-lac-40-us" title="S&amp;P Latin America 40" contentidentifier="41ac7d89-a7d8-49d7-8d15-ff9bbc22a17a" contenttype="web-page" contenttitle="S&amp;P Latin America 40">S&amp;P Latin America 40</a></h5>
     <span class="return-value">2,213.49 </span>
     <span class="daily-change  down "> -0.49% ▼ </span>
   </div>

   <div class="index-detail no-bottom-border">
     <h5><a href="/indices/fixed-income/sp-valmer-mexico-government-cetes-index" title="S&amp;P/Valmer Mexico Government CETES Index" contentidentifier="d1973dbe-ce5e-4757-b5d5-face93abbb7c" contenttype="web-page" contenttitle="S&amp;P/Valmer Mexico Government CETES Index">S&amp;P/Valmer Mexico ...</a></h5>
     <span class="return-value">201.36 </span>
     <span class="daily-change  up "> 0.01% ▲ </span>
   </div>

   <div class="index-detail last no-bottom-border">
     <h5><a href="/indices/equity/sp-mila-andean-40-index" title="S&amp;P MILA Andean 40" contentidentifier="b5374c9e-85b3-44c1-a37e-dd1f8d3abb1b" contenttype="web-page" contenttitle="S&amp;P MILA Andean 40">S&amp;P MILA Andean 40</a></h5>
     <span class="return-value">439.28 </span>
     <span class="daily-change  up "> 0.41% ▲ </span>
   </div>
  </div>

  <div class="index-slide" style="margin-right: 5px;">

   <div class="index-detail">
     <h5><a href="/indices/commodities/dow-jones-commodity-index" title="DJCI" contentidentifier="338b4dbf-d7eb-470b-9b17-8c713c4612ab" contenttype="web-page" contenttitle="Dow Jones Commodity Index">DJCI</a></h5>
     <span class="return-value">234.06 </span>
     <span class="daily-change  down "> -1.05% ▼ </span>
   </div>

   <div class="index-detail">
     <h5><a href="/indices/equity/sp-500" title="S&amp;P 500" contentidentifier="725e00f8-85c7-4fef-87f6-1c11be7f6517" contenttype="web-page" contenttitle="S&amp;P 500®">S&amp;P 500</a></h5>
     <span class="return-value">2,051.35 </span>
     <span class="daily-change  down "> -1.05% ▼ </span>
   </div>

   <div class="index-detail">
     <h5><a href="/indices/equity/sp-mila-pacific-alliance-composite" title="S&amp;P MILA Pacific Alliance Composite" contentidentifier="3baf0ead-3784-4daf-9333-2f32470ddb4e" contenttype="web-page" contenttitle="S&amp;P MILA Pacific Alliance Composite">S&amp;P MILA Pacific Alliance ...</a></h5>
     <span class="return-value">349.36 </span>
     <span class="daily-change  up "> 1.54% ▲ </span>
   </div>

   <div class="index-detail last">
     <h5><a href="/indices/commodities/sp-gsci" title="S&amp;P GSCI" contentidentifier="dd11d7c8-0c9b-492c-8242-1017e4d41c29" contenttype="web-page" contenttitle="S&amp;P GSCI">S&amp;P GSCI</a></h5>
     <span class="return-value">2,121.09 </span>
     <span class="daily-change  down ">-1.03% ▼ </span>
   </div>

   <div class="index-detail no-bottom-border">
     <h5><a href="/indices/equity/sp-latin-america-bmi-us-dollar" title="S&amp;P Latin America BMI" contentidentifier="c9ba7da8-4dcb-4a7d-9a81-ae8497a9f1db" contenttype="web-page" contenttitle="S&amp;P Latin America BMI">S&amp;P Latin America BMI</a></h5>
     <span class="return-value">189.48 </span>
     <span class="daily-change  up "> 1.08% ▲ </span>
   </div>

   <div class="index-detail no-bottom-border">
     <h5><a href="/indices/equity/sp-ifci-latin-america-price-index-in-us-dollar" title="S&amp;P/IFCI Latin America" contentidentifier="b22825f7-d873-4e96-818d-28036f7dba27" contenttype="web-page" contenttitle="S&amp;P/IFCI Latin America">S&amp;P/IFCI Latin America</a></h5>
     <span class="return-value">1,228.91 </span>
     <span class="daily-change  up "> 0.35% ▲ </span>
   </div>

   <div class="index-detail no-bottom-border">
     <h5><a href="/indices/equity/sp-latin-america-infrastructure-index" title="S&amp;P Latin America Infrastructure" contentidentifier="b3751332-cf1c-4fb2-8e46-733932ed6989" contenttype="web-page" contenttitle="S&amp;P Latin America Infrastructure Index">S&amp;P Latin America ...</a></h5>
     <span class="return-value">1,055.92 </span>
     <span class="daily-change  up "> 2.79% ▲ </span>
   </div>

   <div class="index-detail last no-bottom-border">
     <h5><a href="/indices/equity/sp-latin-america-adr-index" title="S&amp;P Latin America ADR" contentidentifier="91c85053-fa63-448b-9b5f-7f34f0afa964" contenttype="web-page" contenttitle="S&amp;P Latin America ADR Index">S&amp;P Latin America ADR</a></h5>
     <span class="return-value">205.43 </span>
     <span class="daily-change  up "> 2.17% ▲ </span>
   </div>
  </div>

  <div class="index-slide" style="margin-right: 5px;">

   <div class="index-detail">
     <h5><a href="/indices/equity/sp-mila-pacific-alliance-select" title="S&amp;P MILA Pacific Alliance Select" contentidentifier="5da0480d-e00f-4dd6-a99f-8c9d01dfe859" contenttype="web-page" contenttitle="S&amp;P MILA Pacific Alliance Select">S&amp;P MILA Pacific Alliance ...</a></h5>
     <span class="return-value">3,842.7 </span>
     <span class="daily-change  up "> 1.63% ▲ </span>
   </div>

   <div class="index-detail">
     <h5><a href="/indices/equity/sp-mila-pacific-alliance-completion-index" title="S&amp;P MILA Pacific Alliance Completion" contentidentifier="cb45f262-959e-4eab-b9e2-abddd4efc6e6" contenttype="web-page" contenttitle="S&amp;P MILA Pacific Alliance Completion">S&amp;P MILA Pacific Alliance ...</a></h5>
     <span class="return-value">477.39 </span>
     <span class="daily-change  up "> 1.45% ▲ </span>
   </div>

   <div class="index-detail">
     <h5><a href="/indices/fixed-income/sp-valmer-mexico-government-international-1-year-ums-index" title="S&amp;P/Valmer Mexico Government International 1+ Year UMS Index" contentidentifier="6b29ea9c-3a43-4c09-94b5-a5fe93fac9b4" contenttype="web-page" contenttitle="S&amp;P/Valmer Mexico Government International 1+ Year UMS Index">S&amp;P/Valmer Mexico ...</a></h5>
     <span class="return-value">327.07 </span>
     <span class="daily-change  up "> 0.12% ▲ </span>
   </div>

   <div class="index-detail last">
     <h5><a href="/indices/fixed-income/sp-valmer-mexico-government-1-5-year-mbonos-index" title="S&amp;P/Valmer Mexico Government 1-5 Year MBONOS Index" contentidentifier="16d4060c-3a31-4efa-8c57-239c679bb779" contenttype="web-page" contenttitle="S&amp;P/Valmer Mexico Government 1-5 Year MBONOS Index">S&amp;P/Valmer Mexico ...</a></h5>
     <span class="return-value">244.56 </span>
     <span class="daily-change  up "> 0.05% ▲ </span>
   </div>
  </div>
</div>

有一个很大的部分封装了索引数据,class定义为 indices-detail-container .本节内有三个子节,其中一个用 class 定义。 all-indices-slider最后两个定义为 class index-slide .我要提取的数据在这三个子部分中,包含在:

<div class="index-detail">
    ...
</div>

具体来说,我想要 content titlereturn-valueindex-detail里面类。例如,对于我想要的第一项:

Title = "Dow Jones Sustainability™ Chile Index" or "DJSI Chile"

Value= 943.76

我在想我可以使用 contentidentifier标题元素内的标签 <h5> ,但是我不知道如何调用标签来区分索引。

到目前为止我有:

Sub Dow_HistoricalData()

    Dim xmlHttp As Object
    Dim TR_col As Object, TR As Object
    Dim TD_col As Object, TD As Object
    Dim row As Long, col As Long

    Set xmlHttp = CreateObject("MSXML2.XMLHTTP.6.0")
    xmlHttp.Open "GET", "http://www.espanol.spindices.com/", False
    xmlHttp.setRequestHeader "Content-Type", "text/xml"
    xmlHttp.send

    Dim html As Object
    Set html = CreateObject("htmlfile")
    html.body.innerHTML = xmlHttp.responseText

    Dim tbl As Object
    Set tbl = html.getElementById("all-indices-slider")

End Sub

最佳答案

这很容易使用 CSS selectors :

你已经很好地解释了你所追求的是什么:

I want the content title and the return-value inside the index-detail classes


return-value 是一个类,因此您可以:

.index-detail .return-value

"." stands for className , 和 " ." means classNames within preceeding即获取 .index-detail classNames 中包含的所有返回值 className 元素。

对于显示的 HTML,您可以缩写为 .return-value


contenttitle 是一个属性,需要稍微不同的语法来选择:

.index-detail [contenttitle]

可以缩写为:[contenttitle]


这是两个选择器的示例 View :

.return-value

sample

[contenttitle]

Sample


VBA:

那么,这如何转化为 VBA?嗯,.document 有一个 querySelectorAll()方法。您使用 html 变量创建了一个实例,并用

填充了它
html.body.innerHTML = xmlHttp.responseText

假设这返回了您需要的 HTML,那么您只需使用:

Dim contentTitles As Object, returns As Object
Set contentTitles = html.querySelectorAll("[contenttitle]")
Set returns = html.querySelectorAll(".return-value")

Dim currentNode As Long
For currentNode = 0 To contentTitles.Length - 1
    Debug.Print contentTitles(currentNode).innerText
    'Debug.Print contentTitles.item(currentNode).innerText '<==Or potentially this syntax
    Debug.Print returns(currentNode).innerText
    'Debug.Print returns.item(currentNode).innerText '<==Or potentially this syntax
Next currentNode

注意:

返回的对象是static nodeLists .匹配项目的集合。您遍历这些匹配的长度(0 到 19 个索引),并通过 .innerText 属性访问文本。

关于html - 从 div、class 和 span 元素中抓取网页,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/34345353/

相关文章:

javascript - 无法读取未定义的属性 'top' - jQuery

html - 为什么我的 div 在高度设置为 50% 时不显示,但在设置为 500px 时显示?

excel - 如何在 VBA 中设置滚动条值属性

VBA Excel 循环遍历一年中的所有日期

javascript - 如何在 jquery 中附加 li 并添加 i 类?

html - 用CSS居中固定宽度的元素

vba - 循环中的VBA错误处理

vba - FollowHyperlink 事件不起作用

sql - 优化 SQL ADO 返回时间

Excel 如果当前时间晚于给定时间则等于是