我的网页如下所示:
<p>
<strong class="offender">YOB:</strong> 1987<br/>
<strong class="offender">RACE:</strong> WHITE<br/>
<strong class="offender">GENDER:</strong> FEMALE<br/>
<strong class="offender">HEIGHT:</strong> 5'05''<br/>
<strong class="offender">WEIGHT:</strong> 118<br/>
<strong class="offender">EYE COLOR:</strong> GREEN<br/>
<strong class="offender">HAIR COLOR:</strong> BROWN<br/>
</p>
我想提取每个人的信息并得到 YOB:1987
, RACE:WHITE
等等……
我尝试的是:
subc = soup.find_all('p')
subc1 = subc[1]
subc2 = subc1.find_all('strong')
但这只给了我 YOB:
的值, RACE:
等等……
有什么方法可以获取 YOB:1987
中的数据吗? , RACE:WHITE
格式?
最佳答案
只需遍历所有 <strong>
标签和使用 next_sibling
得到你想要的。像这样:
for strong_tag in soup.find_all('strong'):
print(strong_tag.text, strong_tag.next_sibling)
演示:
from bs4 import BeautifulSoup
html = '''
<p>
<strong class="offender">YOB:</strong> 1987<br />
<strong class="offender">RACE:</strong> WHITE<br />
<strong class="offender">GENDER:</strong> FEMALE<br />
<strong class="offender">HEIGHT:</strong> 5'05''<br />
<strong class="offender">WEIGHT:</strong> 118<br />
<strong class="offender">EYE COLOR:</strong> GREEN<br />
<strong class="offender">HAIR COLOR:</strong> BROWN<br />
</p>
'''
soup = BeautifulSoup(html)
for strong_tag in soup.find_all('strong'):
print(strong_tag.text, strong_tag.next_sibling)
这给了你:
YOB: 1987
RACE: WHITE
GENDER: FEMALE
HEIGHT: 5'05''
WEIGHT: 118
EYE COLOR: GREEN
HAIR COLOR: BROWN
关于python - 使用 BeautifulSoup 提取无标签文本,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/23380171/