假设我有一个像这样的 html 表:
<table>
<tr>
<tr>...
</tr>
<tr>...
</tr>
</tr>
<tr>
<tr>...
</tr>
<tr>...
</tr>
</tr>
...
</table>
我可以找到表格标签。我怎样才能找到第一层表行,它们是表的儿子..而不是表的孙子。
print table.findAll('tr') # would return All the trs under table which is not what I want.
最佳答案
尝试以下操作:
from bs4 import BeautifulSoup
soup = BeautifulSoup('''
<body>
<table>
<tr id="tr_1">
<tr id="tr_1_1">..</tr>
<tr id="tr_1_2">...</tr>
</tr>
<tr id="tr_2">
<tr id="tr_2_1">...</tr>
<tr id="tr_2_2">...</tr>
</tr>
</table>
</body>''', ['lxml','xml'])
for tr in soup.select('table > tr'):
print(tr)
print('---')
打印
<tr id="tr_1">
<tr id="tr_1_1">..</tr>
<tr id="tr_1_2">...</tr>
</tr>
---
<tr id="tr_2">
<tr id="tr_2_1">...</tr>
<tr id="tr_2_2">...</tr>
</tr>
---
注意:需要 lxml
.
关于Python BeautifulSoup 查找 sibling ,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/18703578/