web-scraping - 抓取此页面时,我遇到 scrapy 超时错误

标签 web-scraping web-crawler scrapy

我无法抓取此页面https://www.adidas.pe/scrapycrapy my_spider返回:

2018-12-17 15:36:39 [scrapy.core.engine] INFO: Spider opened
2018-12-17 15:36:39 [scrapy.extensions.logstats] INFO: Crawled 0 pages (at 0 pages/min), scraped 0 items (at 0 items/min)
2018-12-17 15:36:39 [scrapy.extensions.telnet] DEBUG: Telnet console listening on 127.0.0.1:6024
2018-12-17 15:36:39 [scrapy.downloadermiddlewares.redirect] DEBUG: Redirecting (301) to <GET http://www.adidas.pe/> from <GET http://adidas.pe/>
2018-12-17 15:37:39 [scrapy.extensions.logstats] INFO: Crawled 0 pages (at 0 pages/min), scraped 0 items (at 0 items/min)
2018-12-17 15:38:39 [scrapy.extensions.logstats] INFO: Crawled 0 pages (at 0 pages/min), scraped 0 items (at 0 items/min)

我尝试更改settings.py:

COOKIES_ENABLED = True
ROBOTSTXT_OBEY = False

并且不起作用

最佳答案

您可以尝试更改 settings.py 中的USER_AGENT,它对我有用。我的settings.py:

 # -*- coding: utf-8 -*-

# Scrapy settings for adidas project
#
# For simplicity, this file contains only settings considered important or
# commonly used. You can find more settings consulting the documentation:
#
#     https://doc.scrapy.org/en/latest/topics/settings.html
#     https://doc.scrapy.org/en/latest/topics/downloader-middleware.html
#     https://doc.scrapy.org/en/latest/topics/spider-middleware.html

BOT_NAME = 'adidas'

SPIDER_MODULES = ['adidas.spiders']
NEWSPIDER_MODULE = 'adidas.spiders'


# Crawl responsibly by identifying yourself (and your website) on the user-agent
USER_AGENT = 'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/68.0.3440.106 Safari/537.36'

关于web-scraping - 抓取此页面时,我遇到 scrapy 超时错误,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/53822827/

相关文章:

python - 从滚动时添加新表格的页面中抓取 HTML 数据

python - HTTP POST 和使用 Scrapy 解析 JSON

solr - solr会进行网络爬网吗?

python - 如何用scrapy解析多个页面

python - 为什么我定义的项目没有从 Scrapy 填充和存储?

r - 使用 RStudio Chromote 获取页面生成的 XHR 请求的响应正文的正确方法

python - 如何处理 IncompleteRead : in python

python - Scrapy爬取数据到mysql

python - 使用Python从网页中提取图像链接

nosql - 将 redis nosql 与网络爬虫一起使用