python - 如何使用scrapy在抓取过程中更改div标签中的style属性值?

标签 python html web-scraping scrapy

所以,我的问题来自网页中的特定标记,该标记隐藏了我想要抓取的内容。

Here the link of the page

让我们向您展示我到底希望抓取什么。

由于我无法插入图像,因为我怀念它的声誉......我给你一些图像托管的链接。

enter image description here 您可以在这个中看到,选项卡“COTES”内容实际上隐藏在 html 渲染中,如您在红框中看到的那样。

但我注意到,如果我改变 < div id="pariCotesTab" class="tab" style="display: none;"属性stylestyle="display: block;" ,隐藏的html部分出现:you can see it in the second image .

当我使用scrapy shell https://www.zeturf.fr/fr/course/2018-10-19/R1C1-vincennes-prix-klymene/turf时并尝试获取“COTES”选项卡的元素,例如 In [1]: response.xpath("//td[@class='cote-simplegagnant cote-reference']/text()").extract() ,它什么也不返回 Out 1: [] ,这是合乎逻辑的,但它阻止了我。

那么如何更改属性 stylestyle="display: block;"在一个scrapy过程中?并获取我想要在“COTES”选项卡中抓取的内容。

我想提一下,我不想使用splash,因为它对我来说是一个 hell ,安装docker等等......我不会使用selenium,因为我想抓取掉大约 1000 页,甚至更多。我在之前的项目中使用了selenium,使用任何浏览器进行网络驱动只是浪费时间进行抓取。这首先是一个网络测试程序,而不是网络爬虫,也不是网络抓取模块。

Scrapy版本:1.5.0,Python版本:2.7.9

最佳答案

你的问题并不是你想象的那样。

Scrapy 不关心使用什么 css 样式(除非您在选择器中明确使用它们),只关心页面源代码中存在的内容。
您页面的源代码包含如下内容:

<th class="cote-simplegagnant cote-reference"></th>

如您所见,您选择的th是空的,稍后将由 javascript 填充它们。
仔细查看源代码,您可以找到包含所需信息的 script 标记:

cotesInfos: {"referenceDateTime":{"date":"2018-10-19 19:30:00.000000","timezone_type":3,"timezone":"Europe\/Paris"},"meilleureCoteSG":{"reference":3.3,"live":3.6},"displayedColumns":{"hasSG":1,"hasSP":1,"hasZC":1,"hasZS":1},"1":{"odds_single":27.2,"odds_couillon":11.7,"odds_zeshow":29,"odds":{"reference":36.4,"SG":27.2,"SPMin":5,"SPMax":9.1,"ZC":11.7,"ZS":29},"oddsprogress_single":"moins"},"2":{"odds_single":13.3,"odds_couillon":13.6,"odds_zeshow":11.4,"odds":{"reference":14.5,"SG":13.3,"SPMin":2.3,"SPMax":4.1,"ZC":13.6,"ZS":11.4}},"3":{"odds_single":3.7,"odds_couillon":7.2,"odds_zeshow":8,"odds":{"reference":6.8,"SG":3.7,"SPMin":1.2,"SPMax":1.8,"ZC":7.2,"ZS":8},"oddsprogress_single":"moins"},"4":{"odds_single":274.1,"odds_couillon":19.6,"odds_zeshow":40.9,"odds":{"reference":168.9,"SG":274.1,"SPMin":13.5,"SPMax":41.7,"ZC":19.6,"ZS":40.9},"oddsprogress_single":"plus"},"5":{"odds_single":20.2,"odds_couillon":9,"odds_zeshow":13.1,"odds":{"reference":16,"SG":20.2,"SPMin":2.9,"SPMax":5.2,"ZC":9,"ZS":13.1},"oddsprogress_single":"plus"},"6":{"odds_single":9.4,"odds_couillon":11.7,"odds_zeshow":12.6,"odds":{"reference":4.8,"SG":9.4,"SPMin":3.2,"SPMax":5.8,"ZC":11.7,"ZS":12.6},"oddsprogress_single":"plus"},"7":{"odds_single":32.3,"odds_couillon":9.8,"odds_zeshow":11.4,"odds":{"reference":27.9,"SG":32.3,"SPMin":5.1,"SPMax":9.2,"ZC":9.8,"ZS":11.4},"oddsprogress_single":"plus"},"8":{"odds_single":78.2,"odds_couillon":16.3,"odds_zeshow":34.8,"odds":{"reference":109.3,"SG":78.2,"SPMin":8,"SPMax":14.7,"ZC":16.3,"ZS":34.8},"oddsprogress_single":"moins"},"9":{"odds_single":7.1,"odds_couillon":9.9,"odds_zeshow":9.5,"odds":{"reference":11.2,"SG":7.1,"SPMin":1.5,"SPMax":2.5,"ZC":9.9,"ZS":9.5},"oddsprogress_single":"moins"},"10":{"odds_single":3.6,"odds_couillon":18.9,"odds_zeshow":2.9,"odds":{"reference":3.3,"SG":3.6,"SPMin":1.6,"SPMax":2.7,"ZC":18.9,"ZS":2.9}},"11":{"odds_single":16.4,"odds_couillon":9.6,"odds_zeshow":13.1,"odds":{"reference":14.4,"SG":16.4,"SPMin":3.4,"SPMax":6,"ZC":9.6,"ZS":13.1},"oddsprogress_single":"plus"},"12":{"odds_single":21.3,"odds_couillon":6.7,"odds_zeshow":10,"odds":{"reference":23.3,"SG":21.3,"SPMin":3.8,"SPMax":6.8,"ZC":6.7,"ZS":10}},"13":{"odds_single":40.9,"odds_couillon":21,"odds_zeshow":27.8,"odds":{"reference":20.1,"SG":40.9,"SPMin":5.8,"SPMax":10.6,"ZC":21,"ZS":27.8},"oddsprogress_single":"plus"},"14":{"odds_single":34.8,"odds_couillon":10.8,"odds_zeshow":20.4,"odds":{"reference":22.2,"SG":34.8,"SPMin":5.2,"SPMax":9.5,"ZC":10.8,"ZS":20.4},"oddsprogress_single":"plus"}}

关于python - 如何使用scrapy在抓取过程中更改div标签中的style属性值?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/53069244/

相关文章:

excel - 如何使用 VBA 从 Bloomberg 的网站上抓取数据

python - 如何使Django Queryset选择组内具有最大值的记录

python - 使用极投影将颜色条添加到 pcolormesh

javascript - froala 编辑器在高度大于宽度时旋转图像

jquery - getbodyclass IE Issue magento(IE 不参加这个类(class),但在 mozilla 和 chrome 中它工作正常)

html - Excel Web Scraping - ul 列表项上的 fireEvent

python - Python编写循环获取某些特定时间段的数据

python - 通过排除字段使用更新 View Django 编辑模型对象

javascript - 我在 Javascript 中的函数有问题

python - 从表单中检索数据?