python - 如何找到特定 <ul> 类中的所有 <li>？

环境:

BeautifulSoup 4

python 2.7.5

逻辑:

'find_all' <li> <ul> 内的实例类 my_class例如:

<ul class='my_class'>
<li>thing one</li>
<li>thing two</li>
</ul>

澄清:只需获取 <li> 之间的“文本”标签。

Python 代码:

(下面的 find_all 不正确，我只是放在上下文中)

from bs4 import BeautifulSoup, Comment
import re

# open original file
fo = open('file.php', 'r')
# convert to string
fo_string = fo.read()
# close original file
fo.close()
# create beautiful soup object from fo_string
bs_fo_string = BeautifulSoup(fo_string, "lxml")
# get rid of html comments
my_comments = bs_fo_string.findAll(text=lambda text:isinstance(text, Comment))
[my_comment.extract() for my_comment in my_comments]

my_li_list = bs_fo_string.find_all('ul', 'my_class')

print my_li_list

最佳答案

这个？

>>> html = """<ul class='my_class'>
... <li>thing one</li>
... <li>thing two</li>
... </ul>"""
>>> from bs4 import BeautifulSoup as BS
>>> soup = BS(html)
>>> for ultag in soup.find_all('ul', {'class': 'my_class'}):
...     for litag in ultag.find_all('li'):
...             print litag.text
... 
thing one
thing two

说明:

soup.find_all('ul', {'class': 'my_class'})找到所有 ul类别为 my_class 的标签.

然后我们找到所有 li那些 ul 中的标签标签，并打印标签的内容。

关于python - 如何找到特定 <ul> 类中的所有 <li>？，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/17246963/

上一篇：python - 在 Python 中重置文本框

下一篇：python - 十六进制字符串变量到python中的十六进制值转换

相关文章：

python - 使用 Python 在 XML 中查找和替换标签

python - 为什么 python3 的加载时间是 python2 的两倍？

python-3.x - 如何在python中将png图像的目录转换为jpg

python - 如何从python中的源代码导入库？

python - 获取导入模块的文件路径

python - 在 python 中评估迭代器

python - 使用 for 循环迭代并引用 lst[i] 时出现 TypeError/IndexError

从一个字符串列表创建的 Python 字符串列表

python - 如何在 BeautifulSoup 中得到想要的值？

python - BeautifulSoup 在 HTML 中找不到元素类