这是lxml,它保存为sample.html。
<html>
<body>
<div class ="ecopyramid">
<ul id ="producers">
<li class ="producerlist">
<div class ="name">A1</div>
<div class ="number">100000</div>
</li>
<li class ="producerlist">
<div class ="name">B1</div>
<div class ="number">100000</div>
</li>
</ul>
<ul id ="primaryconsumers">
<li class ="primaryconsumerlist">
<div class ="name">A2</div>
<div class ="number">1000</div>
</li>
<li class ="primaryconsumerlist">
<div class ="name">B2</div>
<div class ="number">2000</div>
</li>
</ul>
<ul id ="secondaryconsumers">
<li class ="secondaryconsumerlist">
<div class ="name">A3</div>
<div class ="number">100</div>
</li>
<li class ="secondaryconsumerlist">
<div class ="name">B3</div>
<div class ="number">98</div>
</li>
</ul>
<ul id ="tertiaryconsumers">
<li class ="tertiaryconsumerlist">
<div class ="name">A4</div>
<div class ="number">80</div>
</li>
<li class ="tertiaryconsumerlist">
<div class ="name">B4</div>
<div class ="number">50</div>
</li>
</ul>
</body>
</html>
下面是浏览上面的 example.html 的代码:
from bs4 import BeautifulSoup
with open("sample.html", "r") as sample_pyramid:
soup=BeautifulSoup(sample_pyramid, "lxml")
soup_object = soup.find("ul", id="secondaryconsumers")
print soup_object.li.div.string
因此,在这段代码中,我能够首先通过标签“ul”和id“secondaryconsumers”指定文本“A3”的父位置,然后在打印命令中我通过“.li.div”进一步指定.string”后缀并输出所需的文本“A3”。我的问题如下:
1) 如何编码才能调用/打印本例中的文本“B3”?
2)在此示例中,如何编码才能调用/打印文本“98”(“B3”下方)?
我尝试了很多方法但没有成功,我可以通过导航调用第一个文本对象,但不能调用共享标记中的第二个文本对象。
有什么想法吗?
最佳答案
您可以使用CSS selectors获取姓名和号码:
names = soup.select('ul#secondaryconsumers > li.secondaryconsumerlist > div.name')
numbers = soup.select('ul#secondaryconsumers > li.secondaryconsumerlist > div.number')
print [name.text for name in names]
print [number.text for number in numbers]
打印:
[u'A3', u'B3']
[u'100', u'98']
<小时/>
评论中后续问题的示例代码:
from bs4 import BeautifulSoup
data = """
<div class="span9">
<table class="result-data table" border="0">
<tbody>
<tr class="result-item highlighting">
<td class="result-category" scope="row">Name:</td>
<td class="result-value-bold" colspan="4" itemprop="item">
Robin Hood
</td>
</tr>
</tbody>
</table>
</div>
"""
soup = BeautifulSoup(data)
print soup.find('td', class_="result-value-bold").get_text(strip=True)
打印罗宾汉
。
或者,首先找到父table
和tr
:
table = soup.find('table', class_='result-data')
tr = table.find('tr', class_='result-item')
print tr.find('td', class_="result-value-bold").get_text(strip=True)
关于python - 使用 BeautifulSoup 导航到第二个字符串文本,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/24923826/