python - 如何使用Python的beautifulsoup选择特定元素下的表格元素

标签 python web-scraping beautifulsoup

我想选择 <i>Member</> 下的表格元素

html代码:


<table class="table profile-table">
<td>Teams</td>
<td>
<i>Leader</i>:
 <a href="/shdb-team/20-739/" class="chip team">SHDB Team</a><a href="/the-spider-society/20-490/" class="chip team">The Spider Society</a><a href="/new-warriors/20-79/" class="chip team">New Warriors</a><a href="/the-six/20-474/" class="chip team">The Six</a>
 <i>Member</i>: 
 <a href="/the-mighty-avengers/20-384/" class="chip team">The Mighty Avengers</a><a href="/new-avengers/20-101/" class="chip team">New Avengers</a><a href="/shield/20-467/" class="chip team">S.H.I.E.L.D.</a><a href="/avengers-resistance/20-154/" class="chip team">Avengers Resistance</a><a href="/marvel-knights/20-377/" class="chip team">Marvel Knights</a><a href="/avengers/20-4/" class="chip team">Avengers</a><a href="/secret-defenders/20-96/" class="chip team">Secret Defenders</a><a href="/daily-bugle/20-216/" class="chip team">Daily Bugle</a><a href="/defenders/20-9/" class="chip team">Defenders</a>
 <i>Formerly</i>: 
 <a href="/future-foundation/20-290/" class="chip team">Future Foundation</a><a href="/heroes-for-hire/20-5/" class="chip team">Heroes For Hire</a><a href="/fantastic-four/20-1/" class="chip team">Fantastic Four</a> </td>

如何仅选择 Member 的文本?

我尝试过:

li = bs.find('i', text = "Member")
children = li.findNextSiblings()
for child in children:
    member.append(child.text)
print(member)

但是它将所有结果作为输出:

SHDB Team
The Spider Society
New Warriors
The Six
Member
The Mighty Avengers
New Avengers
S.H.I.E.L.D.
Avengers Resistance
Marvel Knights
Avengers
Secret Defenders
Daily Bugle
Defenders
Formerly
Future Foundation
Heroes For Hire
Fantastic Four

我只想选择成员(member)部分。 这段代码让我选择 Member 之后和之前之前的所有内容,但这是一个低效的解决方案:

     teams[teams.index("Member")+1:teams.index("Formerly")]

最佳答案

您可以选择元素的 next_siblings 并检查同级标签的名称是否为 a 或如果标签名称为 i 则中断循环:

for tag in soup.select_one('i:-soup-contains("Member")').next_siblings:
    if tag.name == 'i':
        break
    if tag.name == 'a':
        print(tag.text) 
示例
html = '''
<table class="table profile-table">
<td>Teams</td>
<td>
<i>Leader</i>:
 <a href="/shdb-team/20-739/" class="chip team">SHDB Team</a><a href="/the-spider-society/20-490/" class="chip team">The Spider Society</a><a href="/new-warriors/20-79/" class="chip team">New Warriors</a><a href="/the-six/20-474/" class="chip team">The Six</a>
 <i>Member</i>: 
 <a href="/the-mighty-avengers/20-384/" class="chip team">The Mighty Avengers</a><a href="/new-avengers/20-101/" class="chip team">New Avengers</a><a href="/shield/20-467/" class="chip team">S.H.I.E.L.D.</a><a href="/avengers-resistance/20-154/" class="chip team">Avengers Resistance</a><a href="/marvel-knights/20-377/" class="chip team">Marvel Knights</a><a href="/avengers/20-4/" class="chip team">Avengers</a><a href="/secret-defenders/20-96/" class="chip team">Secret Defenders</a><a href="/daily-bugle/20-216/" class="chip team">Daily Bugle</a><a href="/defenders/20-9/" class="chip team">Defenders</a>
 <i>Formerly</i>: 
 <a href="/future-foundation/20-290/" class="chip team">Future Foundation</a><a href="/heroes-for-hire/20-5/" class="chip team">Heroes For Hire</a><a href="/fantastic-four/20-1/" class="chip team">Fantastic Four</a> </td>

'''
soup = BeautifulSoup(html)

for tag in soup.select_one('i:-soup-contains("Member")').next_siblings:
    if tag.name == 'i':
        break
    if tag.name == 'a':
        print(tag.text)
输出
The Mighty Avengers
New Avengers
S.H.I.E.L.D.
Avengers Resistance
Marvel Knights
Avengers
Secret Defenders
Daily Bugle
Defenders

关于python - 如何使用Python的beautifulsoup选择特定元素下的表格元素,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/71844974/

相关文章:

r - 如何使用 Rselenium 读取 html 表格?

javascript - 使用 Puppeteer 在循环中抓取多个 URL

python - BeautifulSoup:获取另一个标签后面的标签文本

python - 当每列都是数组时如何应用数据框

python - 网络抓取从网页中提取产品名称

python - 3 个类别的回归分析错误

javascript - 用于网页抓取的 Selenium 与 BeautifulSoup

python - 使用 Beautiful Soup 和 Python 从搜索页面提取 HTML 内容

python - Django:正确使用基于类的 View 继承?

Python:调用函数真的很慢?