Python 美丽汤 : finding an element after a specific string

标签 python beautifulsoup find

我有以下 html 代码:

<div class="xyOfqd">
<div class="aAAD">
   <div class="Bgbcca">Updated</div>
   <span class="hthtb">
      <div>
         <span class="hthtb">September 30, 2018</span>
      </div>
   </span>
</div>
<div class="aAAD">
   <div class="Bgbcca">Text1</div>
   <span class="hthtb">
      <div><span class="hthtb">Text2</span></div>
   </span>
</div>
<div 
   class="aAAD">
   <div class="Bgbcca">MyText</div>
   <span class="hthtb">
      <div> 
         <span class="hthtb">Text3</span>
      </div>
   </span>
</div>
<div class="aAAD">
   <div class="Bgbcca">Text4</div>
   <span class="hthtb">
      <div><span 
         class="hthtb">Text5</span></div>
   </span>
</div>
<div class="aAAD">
   <div 
      class="Bgbcca">Text6</div>
   <span class="hthtb">
      <div><span 
         class="hthtb">Text7</span></div>
   </span>
</div>
<div class="aAAD">
<div 
   class="Bgbcca">
   Text8/div>
   <span class="hthtb">
      <div>
         <span class="hthtb">
            <div>Text9</div>
            <div><a href="https://google.com">Text10</a></div>
         </span>
      </div>
   </span>
</div>
<div class="aAAD">
   <div 
      class="Bgbcca">Text11</div>
   <span class="hthtb">
      <div><span class="hthtb">Text12</span></div>
   </span>
</div>

如何找到位于 div 元素后面且字符串为 MyTextText3

最佳答案

您可以使用lxml.html解决办法:

from lxml import html

source = """
<div class="xyOfqd">
<div class="aAAD">
   <div class="Bgbcca">Updated</div>
   ...
   <span class="hthtb">
      <div><span class="hthtb">Text12</span></div>
   </span>
</div>"""

tree = html.fromstring(source)
print(tree.xpath('//div[.="MyText"]/following-sibling::span/div/span/text()'))

关于Python 美丽汤 : finding an element after a specific string,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/53066960/

相关文章:

python - 根据行的内容对所有行进行排序

python - 网络抓取运行时出现 ConnectionResetError。

excel - 如何在 VBA 中自动化我的手动选择过程

batch-file - 在任何映射驱动器中查找特定文件夹并报告其完整的 UNC 路径

python - grep envoy.run 没有这样的文件或目录

python - 如何将包含多个字符的列表转换为字符串?

python - 从 HTML 标签中包含的一系列字符串和不带标签的字符串中提取文本

javascript - Python解析JavaScript生成的HTML表格

linux - 如何将查找的输出通过管道传输到 pdf 查看器

python - 将 Airflow 宏 'ts' 转换为日期时间对象