我想选择 <i>Member</>
下的表格元素
html代码:
<table class="table profile-table">
<td>Teams</td>
<td>
<i>Leader</i>:
<a href="/shdb-team/20-739/" class="chip team">SHDB Team</a><a href="/the-spider-society/20-490/" class="chip team">The Spider Society</a><a href="/new-warriors/20-79/" class="chip team">New Warriors</a><a href="/the-six/20-474/" class="chip team">The Six</a>
<i>Member</i>:
<a href="/the-mighty-avengers/20-384/" class="chip team">The Mighty Avengers</a><a href="/new-avengers/20-101/" class="chip team">New Avengers</a><a href="/shield/20-467/" class="chip team">S.H.I.E.L.D.</a><a href="/avengers-resistance/20-154/" class="chip team">Avengers Resistance</a><a href="/marvel-knights/20-377/" class="chip team">Marvel Knights</a><a href="/avengers/20-4/" class="chip team">Avengers</a><a href="/secret-defenders/20-96/" class="chip team">Secret Defenders</a><a href="/daily-bugle/20-216/" class="chip team">Daily Bugle</a><a href="/defenders/20-9/" class="chip team">Defenders</a>
<i>Formerly</i>:
<a href="/future-foundation/20-290/" class="chip team">Future Foundation</a><a href="/heroes-for-hire/20-5/" class="chip team">Heroes For Hire</a><a href="/fantastic-four/20-1/" class="chip team">Fantastic Four</a> </td>
如何仅选择 Member 的文本?
我尝试过:
li = bs.find('i', text = "Member")
children = li.findNextSiblings()
for child in children:
member.append(child.text)
print(member)
但是它将所有结果作为输出:
SHDB Team
The Spider Society
New Warriors
The Six
Member
The Mighty Avengers
New Avengers
S.H.I.E.L.D.
Avengers Resistance
Marvel Knights
Avengers
Secret Defenders
Daily Bugle
Defenders
Formerly
Future Foundation
Heroes For Hire
Fantastic Four
我只想选择成员(member)部分。 这段代码让我选择 Member 之后和之前之前的所有内容,但这是一个低效的解决方案:
teams[teams.index("Member")+1:teams.index("Formerly")]
最佳答案
您可以选择元素的 next_siblings
并检查同级标签的名称是否为 a
或如果标签名称为 i
则中断循环:
for tag in soup.select_one('i:-soup-contains("Member")').next_siblings:
if tag.name == 'i':
break
if tag.name == 'a':
print(tag.text)
示例
html = '''
<table class="table profile-table">
<td>Teams</td>
<td>
<i>Leader</i>:
<a href="/shdb-team/20-739/" class="chip team">SHDB Team</a><a href="/the-spider-society/20-490/" class="chip team">The Spider Society</a><a href="/new-warriors/20-79/" class="chip team">New Warriors</a><a href="/the-six/20-474/" class="chip team">The Six</a>
<i>Member</i>:
<a href="/the-mighty-avengers/20-384/" class="chip team">The Mighty Avengers</a><a href="/new-avengers/20-101/" class="chip team">New Avengers</a><a href="/shield/20-467/" class="chip team">S.H.I.E.L.D.</a><a href="/avengers-resistance/20-154/" class="chip team">Avengers Resistance</a><a href="/marvel-knights/20-377/" class="chip team">Marvel Knights</a><a href="/avengers/20-4/" class="chip team">Avengers</a><a href="/secret-defenders/20-96/" class="chip team">Secret Defenders</a><a href="/daily-bugle/20-216/" class="chip team">Daily Bugle</a><a href="/defenders/20-9/" class="chip team">Defenders</a>
<i>Formerly</i>:
<a href="/future-foundation/20-290/" class="chip team">Future Foundation</a><a href="/heroes-for-hire/20-5/" class="chip team">Heroes For Hire</a><a href="/fantastic-four/20-1/" class="chip team">Fantastic Four</a> </td>
'''
soup = BeautifulSoup(html)
for tag in soup.select_one('i:-soup-contains("Member")').next_siblings:
if tag.name == 'i':
break
if tag.name == 'a':
print(tag.text)
输出
The Mighty Avengers
New Avengers
S.H.I.E.L.D.
Avengers Resistance
Marvel Knights
Avengers
Secret Defenders
Daily Bugle
Defenders
关于python - 如何使用Python的beautifulsoup选择特定元素下的表格元素,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/71844974/