python - 在 python 中抓取表

标签 python web-scraping beautifulsoup scrapy

有人可以帮我从 https://www.statsinsider.com.au/prediction-results?fbclid=IwAR18wxeCq_ygxLG1v2JEe3YqBNNS6krzNnOQULYp4IZihQY6JMgHwzpIl6o 上的大表中抓取数据吗?

我这里有一些基础:

from bs4 import BeautifulSoup
from requests_html import HTMLSession
session = HTMLSession()
url = 'https://www.statsinsider.com.au/prediction-results?fbclid=IwAR18wxeCq_ygxLG1v2JEe3YqBNNS6krzNnOQULYp4IZihQY6JMgHwzpIl6o'
r = session.get(url)
soup=BeautifulSoup(r.html.html,'html.parser')
stat_table = soup.find('table')

这会输出以下内容,这似乎不是整个表格。帮助赞赏,谢谢!

<table>
<tbody>
<tr>
<th>Date</th>
<th class="to-hide">Sport</th>
<th>Team</th>
<th class="to-hide">Bet Type</th>
<th>Odds</th>
<th class="to-hide">Bet</th>
<th>Result</th>
<th>Profit/Loss</th>
</tr>
<tr ng-repeat="match in recentResults">
<td>{{match.Date}}</td>
<td class="to-hide">{{match.Sport}}</td>
<td>{{match.Team}}</td>
<td class="to-hide">{{match.Type}}</td>
<td>${{match.Odds}}</td>
<td class="to-hide">${{match.Bet}}</td>
<td>{{match.Result}}</td>
<td class="green" ng-if="match.Return &gt; 0">${{match.Return}}</td>
<td class="red" ng-if="match.Return &lt; 0">${{match.Return}}</td>
<td ng-if="match.Return == 0"></td>
</tr>
</tbody>
</table>

最佳答案

由于您已经在使用请求,您可能需要考虑使用 Requests-HTML .尽管它的功能不如selenium 先进,在这种您只想呈现页面的情况下,它非常有用。

安装

pip install requests-html

您提供的链接中的表格可以使用 Requests-HTML 轻松抓取

代码:

from bs4 import BeautifulSoup
from requests_html import HTMLSession
session = HTMLSession()
url = 'https://www.statsinsider.com.au/prediction-results?fbclid=IwAR18wxeCq_ygxLG1v2JEe3YqBNNS6krzNnOQULYp4IZihQY6JMgHwzpIl6o'
r = session.get(url)
r.html.render()
soup=BeautifulSoup(r.html.html,'html.parser')
stat_table = soup.find('table')
print(stat_table)

输出

<table>
<tbody>
<tr>
<th>Date</th>
<th class="to-hide">Sport</th>
<th>Team</th>
<th class="to-hide">Bet Type</th>
<th>Odds</th>
<th class="to-hide">Bet</th>
<th>Result</th>
<th>Profit/Loss</th>
</tr>

...

<tr class="ng-scope" ng-repeat="match in recentResults">
<td class="ng-binding">17/09</td>
<td class="to-hide ng-binding">NFL</td>
<td class="ng-binding">NO</td>
<td class="to-hide ng-binding">Line</td>
<td class="ng-binding">$1.91</td>
<td class="to-hide ng-binding">$25</td>
<td class="ng-binding">LOSE</td>
<!-- ngIf: match.Return > 0 -->
<!-- ngIf: match.Return < 0 --><td class="red ng-binding ng-scope" ng-if="match.Return &lt; 0">$-25.00</td><!-- end ngIf: match.Return < 0 -->
<!-- ngIf: match.Return == 0 -->
</tr><!-- end ngRepeat: match in recentResults -->
</tbody>
</table>

关于python - 在 python 中抓取表,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/55321733/

相关文章:

python - 使用带有下拉选项的 Python 请求模块

python - 即使通过代理的连接失败,如何重试当前循环

python - 从提供者列表中的单个结果中抓取数据

python - 无法打印正确解码的 readAllStandardOutput

Python 使用较小的 bool 数组索引 numpy 数组

python - 有什么方法可以加快 Python long int 按位运算的速度吗?

node.js - 使用cheerio访问脚本内容

python - 从多个 Excel 文件创建 Pandas 数据框

python - 在 BeautifulSoup 中输入内容?

python - 在Python中使用string.strip()提取特定列