python - 如何使用 BeautifulSoup 从列表中提取部分项目？

我使用 find.all() 提取了一些数据

这给了我一个包含许多字符串的列表，如下所示。

<div class="x"><a class="x" href="x"><i class="x"></i></a> <a class="y" href="x">to make</a><span> something</span></div>
<div class="x"><a class="x" href="x"><i class="x"></i></a> <a class="y" href="x">to make</a><span> something</span></div>
<div class="x"><a class="x" href="x"><i class="x"></i></a> <a class="y" href="x">to make</a><span> something</span></div>

我需要的只是 <a class ="y"> 中的文本

我该怎么做？也许使用循环？

最佳答案

以下是如何使用 BeautifulSoup 来做到这一点:

>>> html= '''\
<div class="x"><a class="x" href="x"><i class="x"></i></a> <a class="y" href="x">to make</a><span> something</span></div>
<div class="x"><a class="x" href="x"><i class="x"></i></a> <a class="y" href="x">to make</a><span> something</span></div>
<div class="x"><a class="x" href="x"><i class="x"></i></a> <a class="y" href="x">to make</a><span> something</span></div>'''
>>> soup = BeautifulSoup(html)    
>>> list_of_y = soup.findAll("a", {'class': 'y'})

它返回您可以打印的项目列表:

>>> print(list_of_y)
[<a class="y" href="x">to make</a>, <a class="y" href="x">to make</a>, <a class="y" href="x">to make</a>]

或迭代:

>>> for y in list_of_y:
...   print(y.text)
to make
to make
to make

<小时/>

不过，我对 lxml 有一点偏好，即:

>>> h = etree.HTML(html)
>>> list_of_y = h.xpath('//a[@class="y"]/text()')
>>> print list_of_y
['to make', 'to make', 'to make']
>>> for y in list_of_y:
...   print(y)
... 
to make
to make
to make

或使用 CSS 选择器:

>>> from lxml import etree, cssselector
>>> h = etree.HTML(html)
>>> sel = cssselector.CSSSelector('a.y')
>>> list_of_y = sel(h)
>>> for y in list_of_y:
>>>     print(y.text)

关于python - 如何使用 BeautifulSoup 从列表中提取部分项目？，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/23789019/

python - 如何使用 BeautifulSoup 从列表中提取部分项目？

上一篇：python - pandas.DataFrame.plot(kind ="bar")的更多绘图选项

下一篇：Python for循环遍历目录文件