html - python 3 : How to web scrape text from div that contains multiple class values

标签 html python-3.x selenium web-scraping beautifulsoup

我正在尝试抓取网站( Here is the link to website )，但页面中的 div 似乎有多个类属性，这让我很难抓取数据。我试图寻找 Stackoverflow 上发布的历史问题，但找不到我想要的答案。以下是我从网站上提取的部分代码:

<div data-reactid="118">
  <div class="ue-ga base_ ue-jk" style="margin-left:-24px;margin-bottom:;" data-reactid="119">
    <div style="display: flex; flex-direction: column; width: 100%; padding-left: 24px;" data-reactid="120">
      <div class="ue-a3 ue-ap ue-a6 ue-gb ue-ah ue-n ue-f5 ue-ec ue-gc ue-gd ue-ge ue-gf base_ ue-jv ue-gz ue-h0 ue-h1" data-reactid="121">
        <div class="ue-a6 ue-bz ue-gb ue-ah ue-gg ue-gh ue-gi" data-reactid="122">
          <div class="ue-bn ue-bo ue-cc ue-bq ue-g9 ue-bs" title="Want to extract this part" data-reactid="123">
            Want to extract this part
          </div>
        </div>
      </div>
    </div>
  </div>
</div>

我要提取的是“要提取这部分”的文本。我确实想过通过 data-reactid 抓取数据，但是不同的页面分配了不同的 data-reactid 编号，所以不是一个好主意。我还想告知类名不是唯一的。

任何人都可以指导我完成这个吗？非常感激。

最佳答案

如果每个页面上该特定元素的类始终保持不变，您可以使用此选择器定位它:
.ue-bn.ue-bo.ue-cc.ue-bq.ue-g9.ue-bs
但是，您可以使用许多其他选择器，但这完全取决于它们在页面之间是否唯一且一致。

关于html - python 3 : How to web scrape text from div that contains multiple class values，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/52195176/

上一篇：google-apps-script - 如何从过滤器中自动选择范围而无需手动输入？

下一篇：json - Gatling JSON Feeder 独特的 POST 机构

ios - 从 HTML5 创建 native iOS/Android 应用程序

带有(子)进程覆盖率报告的 python unittest

python - 即使 div 存在，按 id 查找也返回 None

html - 带有自签名 SSL 证书服务器的 HTTPS 和在 HTML 文件中嵌入 CSS

python - python2.7和python 3.4中的ord函数有什么不同？

python - 在python中向字典添加排名

javascript - 如何从 div 标签表示的表中检索行和列数据

python - 使用 selenium webdriver 单击链接

html - 为什么这些跨度的高度呈现不同？