python - 使用 BeautifulSoup 获取 li 中的文本

标签 python html python-3.x beautifulsoup

我正在尝试使用 bs4 抓取此 HTML:

<td style="vertical-align:top;" class="vi-VR-brumblnkLst vi-VR-brumb-hasNoPrdlnks" id="vi-VR-brumb-lnkLst">
   <table width="100%" role="presentation">
      <tbody>
         <tr>
            <td style="">
               <ul role="list" aria-label="Listed in category:" itemscope="" itemtype="https://schema.org/BreadcrumbList">
                  <li itemprop="itemListElement" itemscope="" itemtype="https://schema.org/ListItem" class="bc-w">
                     <a itemprop="item" _sp="p2047675.l2706" href="https://www.ebay.com/b/Jewelry-Watches-/281" class="thrd"><span itemprop="name">Jewelry &amp; Watches</span></a>
                     <meta itemprop="position" content="1">
                  </li>
                  <li aria-hidden="true">&gt;</li>
                  <li itemprop="itemListElement" itemscope="" itemtype="https://schema.org/ListItem" class="bc-w">
                     <a itemprop="item" _sp="p2047675.l2706" href="https://www.ebay.com/b/Watches-Parts-Accessories-/14324" class="thrd"><span itemprop="name">Watches, Parts &amp; Accessories</span></a>
                     <meta itemprop="position" content="2">
                  </li>
                  <li aria-hidden="true">&gt;</li>
                  <li itemprop="itemListElement" itemscope="" itemtype="https://schema.org/ListItem" class="bc-w">
                     <a itemprop="item" _sp="p2047675.l2706" href="https://www.ebay.com/b/Wristwatches-/31387" class="scnd"><span itemprop="name">Wristwatches</span></a>
                     <meta itemprop="position" content="3">
                  </li>
                  <li>&gt;</li>
                  <li itemprop="itemListElement" itemscope="" itemtype="https://schema.org/ListItem" class="bc-w">
                     <a itemprop="item" _sp="p2047675.l2644" href="https://www.ebay.com/p/18032713872" title="See more 17j Seiko 5 Automatic Black Dial Analog Golden Color Watch Working Properly">
                     <span itemprop="name">See more 17j Seiko 5 Automatic Black Dial Analog Golden...</span>
                     </a>
                     <meta itemprop="position" content="1">
                  </li>
               </ul>
            </td>
         </tr>
      </tbody>
   </table>
</td>

具体来说,我想从中获取“Wristwatches”文本:

<li itemprop="itemListElement" itemscope="" itemtype="https://schema.org/ListItem" class="bc-w"><a itemprop="item" _sp="p2047675.l2706" href="https://www.ebay.com/b/Wristwatches-/31387" class="scnd"><span itemprop="name">Wristwatches</span></a><meta itemprop="position" content="3"></li>

我当前的代码是这样的:

import requests
from bs4 import BeautifulSoup

data = requests.get('https://www.ebay.com/sch/i.html?_from=R40&_trksid=m570.l1313&_nkw=mens+watch&_sacat=31387&LH_TitleDesc=0&_osacat=0&_odkw=mens+wath').text
soup = BeautifulSoup(data, 'lxml')

cat = soup.find('li', itemProp = 'itemListElement').text.strip()

print(cat)

但它返回错误。我怎样才能实现这个目标?谢谢。

最佳答案

没关系,我明白了,感谢所有花时间阅读我的帖子的人。

import requests
from bs4 import BeautifulSoup

data = requests.get('https://www.ebay.com/itm/SEIKO-5-AUTOMATIC-MENS-STEEL-VINTAGE-JAPAN-MADE-BLACK-DIAL-WATCH-RUN-ORDER-K/143420840058?epid=18032713872&_trkparms=ispr%3D1&hash=item21648c587a:g:ZzEAAOSw9MRdsI8v&enc=AQAEAAACQBPxNw%2BVj6nta7CKEs3N0qVBgKB1sCHq6imZgPqwOxGc8125XNy2Dq0slMe8clDZgTSnJdS4K5F5NyTF%2FwJExAng2G2%2FdtRUNYEnKcxoo4WXaAM5K%2BUxqDKTnmNGfgjTzpWCdoE50XlC7BXz3bBrJTY0vo62kBVR03HYvJwVCxnu8NEBiz4YMfAlPWDNnP2lVje46p22rKWDem6rHFqpoKtLDVHS8CaQER%2BqJxucEnw14LJIybRkfCmDuobZv%2F4F9Lhrl8xiPp%2Bbk6iRIu3UqqocBO%2FNyxW1aAa8QWkaJqtUy3g6Yue61yMEb0GY3BwO1%2BpVwkTOZLDvYHXZ%2FZEGNu%2F%2BYznes9jNtctDCr9Xv3QECsXyLDEOeo7LHh1srunEoRvK9T0AkS7oT%2BI3%2B%2BtD5fGnpJJu%2FJ3MdktqvgnTwieipeZTrGsHiQ8iL1nWm0CJcMbe2UUELEG%2BLHPNSSkRcUVBWnoPuOE5FjuyFHR1ujG2TgGLfN8HlO6ZyfNWz0K%2Bc4zjo7wBPnJdffcn6p8kLHWhbFyMyIY1Jc8yZBl20mlA29S%2BN%2Bw0e3uZDHK%2BIyCBctbYgGxaQM6Aevcdx0OcXl%2Fy7aDoRTqhBue9OYrAa3fEQf6ObFqtCbiEiXTioQZZJfrC%2FXfbq36oMTuQAFRvH2ahowGoPhSQkE1Jn73QLI%2FGXVynHIG2KdQSbX4eU%2FgoGy9y5WIvvUL9Xxy4ltNvTtCpjg5XlY8VxDv4M2gsLY3C0SRv7LNELk%2FitBSjfuUjzg%3D%3D&checksum=143420840058aa89790ec2164a5caf16644bb1bfd7c8&enc=AQAEAAACQBPxNw%2BVj6nta7CKEs3N0qVBgKB1sCHq6imZgPqwOxGc8125XNy2Dq0slMe8clDZgTSnJdS4K5F5NyTF%2FwJExAng2G2%2FdtRUNYEnKcxoo4WXaAM5K%2BUxqDKTnmNGfgjTzpWCdoE50XlC7BXz3bBrJTY0vo62kBVR03HYvJwVCxnu8NEBiz4YMfAlPWDNnP2lVje46p22rKWDem6rHFqpoKtLDVHS8CaQER%2BqJxucEnw14LJIybRkfCmDuobZv%2F4F9Lhrl8xiPp%2Bbk6iRIu3UqqocBO%2FNyxW1aAa8QWkaJqtUy3g6Yue61yMEb0GY3BwO1%2BpVwkTOZLDvYHXZ%2FZEGNu%2F%2BYznes9jNtctDCr9Xv3QECsXyLDEOeo7LHh1srunEoRvK9T0AkS7oT%2BI3%2B%2BtD5fGnpJJu%2FJ3MdktqvgnTwieipeZTrGsHiQ8iL1nWm0CJcMbe2UUELEG%2BLHPNSSkRcUVBWnoPuOE5FjuyFHR1ujG2TgGLfN8HlO6ZyfNWz0K%2Bc4zjo7wBPnJdffcn6p8kLHWhbFyMyIY1Jc8yZBl20mlA29S%2BN%2Bw0e3uZDHK%2BIyCBctbYgGxaQM6Aevcdx0OcXl%2Fy7aDoRTqhBue9OYrAa3fEQf6ObFqtCbiEiXTioQZZJfrC%2FXfbq36oMTuQAFRvH2ahowGoPhSQkE1Jn73QLI%2FGXVynHIG2KdQSbX4eU%2FgoGy9y5WIvvUL9Xxy4ltNvTtCpjg5XlY8VxDv4M2gsLY3C0SRv7LNELk%2FitBSjfuUjzg%3D%3D&checksum=143420840058aa89790ec2164a5caf16644bb1bfd7c8').text

soup = BeautifulSoup(data, 'lxml')

#cat = soup.find('li', class_ = 'bc-w')
cat = soup.find('a', class_ = 'scnd').text.strip()

print(cat)

关于python - 使用 BeautifulSoup 获取 li 中的文本,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/58947528/

相关文章:

python - 在python中最快计算512位数的最大质因数

HTML 位置 :fixed variable-height page header and in-page anchors

.NET核心| MVC 将音频文件传递给 html5 播放器。启用搜索

django - 当已经有ID字段时,如何使UUID字段成为默认字段

python-3.x - H2O 目标均值编码器 "frames are being sent in the same order"错误

python-3.x - SODA API 过滤

python - 通过 SSH 查看脚本输出?

python - 父类方法调用父类方法,而不是子类方法

python - 在 Keras 中规范化神经网络的验证集

html - 如何将文本区域的标签对齐到顶部?