python - BeautifulSoup:获取特定表的内容

标签 python web-scraping beautifulsoup tabular

My local airport可耻地阻止没有 IE 的用户，看起来很糟糕。我想编写一个 Python 脚本，每隔几分钟获取到达和离开页面的内容，并以更易读的方式显示它们。

我选择的工具是mechanize欺骗网站相信我使用 IE 和 BeautifulSoup用于解析页面以获取航类数据表。

老实说，我迷失在 BeautifulSoup 文档中，无法理解如何从整个文档中获取表(我知道其标题)，以及如何从该表中获取行列表。

有什么想法吗？

最佳答案

这不是你需要的具体代码，只是一个如何使用 BeautifulSoup 的演示。它找到id为“Table1”的表并获取其所有tr元素。

html = urllib2.urlopen(url).read()
bs = BeautifulSoup(html)
table = bs.find(lambda tag: tag.name=='table' and tag.has_attr('id') and tag['id']=="Table1") 
rows = table.findAll(lambda tag: tag.name=='tr')

关于python - BeautifulSoup:获取特定表的内容，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/2935658/

上一篇：Python:单元测试基于套接字的代码？

下一篇：python - 使用python进行图像颜色检测

python - 从文件中提取一级邻居

python - 查找可能包含 NAN 的一维数组的最大值

python - WebElement 和 __getitem__ 有什么问题

python - BeautifulSoup 无法找到具有特定类的表

python - 如何在 beautifulsoup 中抓取 image-src

通过 QueueHandler 的 Python 多处理日志记录

python - 从机场网站抓取航类数据表失败

python - 网页抓取 - 网页登录问题

python - 使用 BeautifulSoup 帮助解析 <pre> 标签