css - Scrapy 找不到表格 css

最近刚开始使用 Scrapy，到目前为止我一直很幸运，直到这个问题。我似乎无法在此处“找到”排名表；

http://www.baseball-reference.com/leagues/MLB/2016-standings.shtml#all_expanded_standings_overall

它有 id = '#expanded_standings_overall' 但我无法用我的蜘蛛或 shell 找到它。我能够得到 #all_expanded_standings_overall 的结果，因为有一个带有该 ID 的 div。在 shell 中提取它会显示我想要的表，但即使在其中我也无法使用“tbody”或“tr”或我尝试过的任何其他方法找到它。

最佳答案

如果您查看页面源代码，您会看到有问题的 ID (expanded_standings_overall)

<div class="placeholder"></div>
<!--
    <div class="table_outer_container">
        <div class="overthrow table_container" id="div_expanded_standings_overall">
            <table class="sortable stats_table" id="expanded_standings_overall" data-cols-to-freeze=2>
                <caption>MLB Detailed Standings</caption>
                    ... sweet data here ..
                </table>
        </div>
    </div>
-->
</div>

HTML 注释似乎是一种向我们无辜的爬虫隐藏内容的技巧；)

有趣的是 Firebug 不显示这些评论......？

解决此问题的一种方法是提取评论，将其删除并继续处理评论中的数据。例如:

$ scrapy shell www.baseball-reference.com/leagues/MLB/2016-standings.shtml
>>> view(response)
>>> from scrapy.selector import Selector
>>> sel = Selector(response)
>>> sel.xpath('//table[@id="expanded_standings_overall"]')
[]
>>> import re
>>> regex = re.compile(r'<!--(.*)-->', re.DOTALL)
>>> for comment in sel.xpath('//comment()').re(regex):
>>>     table = Selector(text=comment).xpath('//table[@id="expanded_standings_overall"]')
>>>     print(table)
...
[]
[]
[<Selector xpath='//table[@id="expanded_standings_overall"]' data='<table class="sortable stats_table" id="'>]
[]
[]

如您所见，我更喜欢 XPATH 选择器而不是 CSS，但它们在原则上是相同的，请参阅 https://doc.scrapy.org/en/latest/topics/selectors.html .

关于css - Scrapy 找不到表格 css，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/42731288/

css - Scrapy 找不到表格 css

上一篇：html - 如何将我的 flex 盒子放在两个 flex 盒子下面

下一篇：HTML CSS 边框图像