scrapy shell无法识别 'sel'对象

标签 scrapy

我是一个Python新手,正在尝试使用scrapy进行项目。 Scrapy 0.19安装在我的centos(linux 2.6.32)上,我按照scrapy文档页面上的说明进行操作,但发现scrapy shell找不到'sel'对象,这是我的步骤:

[root@localhost rpm]# scrapy shell http://doc.scrapy.org/en/latest/_static/selectors-sample1.html
2014-03-02 06:33:23+0800 [scrapy] INFO: Scrapy 0.19.0 started (bot: scrapybot)
2014-03-02 06:33:23+0800 [scrapy] DEBUG: Optional features available: ssl, http11, libxml2
2014-03-02 06:33:23+0800 [scrapy] DEBUG: Overridden settings: {'LOGSTATS_INTERVAL': 0}
2014-03-02 06:33:23+0800 [scrapy] DEBUG: Enabled extensions: TelnetConsole, CloseSpider, WebService, CoreStats, SpiderState
2014-03-02 06:33:23+0800 [scrapy] DEBUG: Enabled downloader middlewares: HttpAuthMiddleware, DownloadTimeoutMiddleware, UserAgentMiddleware, RetryMiddleware, DefaultHeadersMiddleware, MetaRefreshMiddleware, HttpCompressionMiddleware, RedirectMiddleware, CookiesMiddleware, ChunkedTransferMiddleware, DownloaderStats
2014-03-02 06:33:23+0800 [scrapy] DEBUG: Enabled spider middlewares: HttpErrorMiddleware, OffsiteMiddleware, RefererMiddleware, UrlLengthMiddleware, DepthMiddleware
2014-03-02 06:33:23+0800 [scrapy] DEBUG: Enabled item pipelines: 
2014-03-02 06:33:23+0800 [scrapy] DEBUG: Telnet console listening on 0.0.0.0:6023
2014-03-02 06:33:23+0800 [scrapy] DEBUG: Web service listening on 0.0.0.0:6080
2014-03-02 06:33:23+0800 [default] INFO: Spider opened
2014-03-02 06:33:24+0800 [default] DEBUG: Crawled (200) <GET 

http://doc.scrapy.org/en/latest/_static/selectors-sample1.html> (referer: None)
[s] Available Scrapy objects:
[s]   hxs        <HtmlXPathSelector xpath=None data=u'<html><head><base   href="http://example.c'>
[s]   item       {}
[s]   request    <GET http://doc.scrapy.org/en/latest/_static/selectors-sample1.html>
[s]   response   <200 http://doc.scrapy.org/en/latest/_static/selectors-sample1.html>
[s]   settings   <CrawlerSettings module=None>
[s]   spider     <BaseSpider 'default' at 0x3668ed0>
[s] Useful shortcuts:
[s]   shelp()           Shell help (print this help)
[s]   fetch(req_or_url) Fetch request (or URL) and update local objects
[s]   view(response)    View response in a browser

>>> sel.xpath('//title/text()')
Traceback (most recent call last):
File "<console>", line 1, in <module>
NameError: name 'sel' is not defined
>>> 

谁能告诉我怎么解决?提前谢谢

最佳答案

sel 对象是在 0.20 版本中添加的。当您运行 shell 命令时,它会告诉您可以使用哪些对象,在您的例子中,hxs 具有类似的行为:

>>> hxs.select('//title/text()')

您应该首先尝试阅读文档。在选择器部分中,非常清楚地解释了如何根据当前版本使用它们。

关于scrapy shell无法识别 'sel'对象,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/22115982/

相关文章:

python - 错误 403 : HTTP status code is not handled or not allowed in scrapy

javascript - 使用 Scrapy Xpath 从脚本标签获取数据并将其用作 CSV

web-scraping - 让 Splash、Scrapy 和 Scrapy 协同工作

python - 使用 Selenium Python 滚动到底部

python - 即使收到 200 状态代码也重试 Scrapy 请求

python - 在 Scrapy 中使用 start_request 和使用 cookie 在网站上工作的正确方法是什么

python - 连接被对方​​拒绝 : 111: Connection refused

python - Scrapy 爬取速度慢(60 页/分钟)

python - Scrapy 不跟踪图像链接

python - Scrapy - 在抓取期间写入磁盘