我使用 find.all()
提取了一些数据
这给了我一个包含许多字符串的列表,如下所示。
<div class="x"><a class="x" href="x"><i class="x"></i></a> <a class="y" href="x">to make</a><span> something</span></div>
<div class="x"><a class="x" href="x"><i class="x"></i></a> <a class="y" href="x">to make</a><span> something</span></div>
<div class="x"><a class="x" href="x"><i class="x"></i></a> <a class="y" href="x">to make</a><span> something</span></div>
我需要的只是 <a class ="y">
中的文本
我该怎么做?也许使用循环?
最佳答案
以下是如何使用 BeautifulSoup 来做到这一点:
>>> html= '''\
<div class="x"><a class="x" href="x"><i class="x"></i></a> <a class="y" href="x">to make</a><span> something</span></div>
<div class="x"><a class="x" href="x"><i class="x"></i></a> <a class="y" href="x">to make</a><span> something</span></div>
<div class="x"><a class="x" href="x"><i class="x"></i></a> <a class="y" href="x">to make</a><span> something</span></div>'''
>>> soup = BeautifulSoup(html)
>>> list_of_y = soup.findAll("a", {'class': 'y'})
它返回您可以打印的项目列表:
>>> print(list_of_y)
[<a class="y" href="x">to make</a>, <a class="y" href="x">to make</a>, <a class="y" href="x">to make</a>]
或迭代:
>>> for y in list_of_y:
... print(y.text)
to make
to make
to make
<小时/>
不过,我对 lxml 有一点偏好,即:
>>> h = etree.HTML(html)
>>> list_of_y = h.xpath('//a[@class="y"]/text()')
>>> print list_of_y
['to make', 'to make', 'to make']
>>> for y in list_of_y:
... print(y)
...
to make
to make
to make
或使用 CSS 选择器:
>>> from lxml import etree, cssselector
>>> h = etree.HTML(html)
>>> sel = cssselector.CSSSelector('a.y')
>>> list_of_y = sel(h)
>>> for y in list_of_y:
>>> print(y.text)
关于python - 如何使用 BeautifulSoup 从列表中提取部分项目?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/23789019/