python - 使用 Python 从网页中抓取表格

标签 python html csv beautifulsoup html-table

from bs4 import BeautifulSoup
from urllib import urlopen

player_code = open("/Users/brandondennis/Desktop/money/CF_Name.txt").read()
player_code = player_code.split("\r")


for player in player_code:

html =urlopen("https://www.capfriendly.com/players/"+player+"")

soup = BeautifulSoup(html, 'html.parser')

for section in soup.findAll('div',{"class": "table_c"}):
    table = section.findChildren()[10].text
    print player, table

这里是示例播放器页面的链接:https://www.capfriendly.com/players/patrik-elias

以下是我从文本文件添加到基本网址的玩家名称示例。

这就是我最终想要为包含 1000 多名玩家的文本文件所做的事情

最佳答案

除了其他人提到的。看一下这一行:

table = soup.findAll('table_c')[2]

这里，BeautifulSoup 将尝试定位 table_c 元素。但是，table_c 是一个类属性:

<div class="table_c"><div class="rel navc column_head3 cntrct"><div class="ofh"><div>HISTORICAL SALARY </div><div class="l cont_t mt4">SOURCE: The Hockey News, USA Today</div></div></div>
    <table class="cntrct" id="contractinsert" cellpadding="0" border="0" cellspacing="0">
    ...
    </table>
</div>

改用class_参数:

table = soup.find_all(class_='table_c')[2]

或者，您可以通过id直接进入表格:

table = soup.find("table", id="contractinsert")

关于python - 使用 Python 从网页中抓取表格，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/37884990/

上一篇：python - 在不运行 Bottle 服务器的情况下测试 Bottle 应用程序

下一篇：python - 使用 django 在 base.html 上填充 bootstrap 下拉列表

相关文章：

javascript - 按下按钮递增数据然后推送到多维数组

javascript - 如何在 preventDefault 之后创建粘贴事件？

c# - 将 css 样式内联到 html 元素中

python 将 csv 文件读取为字典并排序和递增计数器

python - Matplotlib:调整刻度以适合图形

python - 类型错误 : 'type' object has no attribute '__getitem__'

python - 如何在 Python 中将 N 个空格分隔的数字转换为数组？

javascript - 将 CSV 文件读取为 HTML 下拉列表

java - 我在使用此代码将 csv 转换为 json 架构时收到此错误

python - 使用 pulp cbc 求解器时，我可以设置约束的优先级吗？