python - 如何解决 Splash 405 https ://www. controller.com/listings/aircraft/for-sale/list>: HTTP 状态代码未处理或不允许

标签 python scrapy scrapy-splash

我正在尝试使用 Scrapy-Splash 访问网站,但我收到错误 405 Ignoring response <405 https://www.controller.com/> : HTTP 状态码未处理或不允许

我使用的代码

import scrapy
from scrapy_splash import SplashRequest

class ProxySpider(scrapy.Spider):
    name = "proxyss"

    def start_requests(self):
        urls = [
            'https://controller.com/',
        ]
        for url in urls:
             yield SplashRequest("https://www.controller.com/listings/aircraft/for-sale/list", self.parse,args={"http_method":'GET','wait': 5,'proxy': 'http://xxxxxxxxxx'})

    def parse(self, response):
        page = response.url.split("/")[-2]
        filename = 'proxy.html'
        with open(filename, 'wb') as f:
            f.write(response.body)
        self.log('Saved file %s' % filename)

日志

2020-08-17 21:30:55 [scrapy.downloadermiddlewares.retry] DEBUG: Retrying <GET https://www.controller.com> (failed 1 times): 405 Method Not Allowed
2020-08-17 21:30:55 [scrapy.downloadermiddlewares.retry] DEBUG: Retrying <GET https://www.controller.com/listings/aircraft/for-sale/list> (failed 1 times): 405 Method Not Allowed
2020-08-17 21:30:55 [scrapy.downloadermiddlewares.retry] DEBUG: Retrying <GET https://www.controller.com> (failed 2 times): 405 Method Not Allowed
2020-08-17 21:30:55 [scrapy.downloadermiddlewares.retry] DEBUG: Retrying <GET https://www.controller.com/listings/aircraft/for-sale/list> (failed 2 times): 405 Method Not Allowed
2020-08-17 21:30:55 [scrapy.downloadermiddlewares.retry] DEBUG: Retrying <GET https://www.controller.com> (failed 3 times): 405 Method Not Allowed
2020-08-17 21:30:55 [scrapy.downloadermiddlewares.retry] DEBUG: Retrying <GET https://www.controller.com/listings/aircraft/for-sale/list> (failed 3 times): 405 Method Not Allowed
2020-08-17 21:30:55 [scrapy.downloadermiddlewares.retry] DEBUG: Gave up retrying <GET https://www.controller.com> (failed 4 times): 405 Method Not Allowed
2020-08-17 21:30:55 [scrapy.core.engine] DEBUG: Crawled (405) <GET https://www.controller.com> (referer: https://www.controller.com/listings/aircraft/for-sale/list)
2020-08-17 21:30:55 [scrapy.downloadermiddlewares.retry] DEBUG: Gave up retrying <GET https://www.controller.com/listings/aircraft/for-sale/list> (failed 4 times): 405 Method Not Allowed
2020-08-17 21:30:55 [scrapy.core.engine] DEBUG: Crawled (405) <GET https://www.controller.com/listings/aircraft/for-sale/list> (referer: https://www.controller.com/listings/aircraft/for-sale/list)
2020-08-17 21:30:56 [scrapy.spidermiddlewares.httperror] INFO: Ignoring response <405 https://www.controller.com>: HTTP status code is not handled or not allowed
2020-08-17 21:30:56 [scrapy.spidermiddlewares.httperror] INFO: Ignoring response <405 https://www.controller.com/listings/aircraft/for-sale/list>: HTTP status code is not handled or not allowed
2020-08-17 21:30:56 [scrapy.core.engine] INFO: Closing spider (finished)
2020-08-17 21:30:56 [scrapy.statscollectors] INFO: Dumping Scrapy stats:

最佳答案

可能只是重试问题。将此添加到您的 settings.py 文件中,看看是否有帮助:

RETRY_ENABLED = True
RETRY_TIMES = 3
RETRY_HTTP_CODES = [405]

关于python - 如何解决 Splash 405 https ://www. controller.com/listings/aircraft/for-sale/list>: HTTP 状态代码未处理或不允许,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/63421568/

相关文章:

python - 即使使用 mp.Manager.Queue() ,队列对象也只能通过继承在进程之间共享

python - Xpath 选择 <br/> 标签后的数据

python - 如果字符串包含逗号(,)空格和其他字符(如(

javascript - 在 Scrapy 响应中执行内联 JavaScript

javascript - 使用scrapy抓取包含 anchor 标记<a href = "#">的网页

python - iPython NoteBook 的 MathJax 符号帮助

python - 在 python 文档的哪个位置允许链接 `in` 运算符?

python - python的scrapy如何使用css选择器提取url?

python - Docker,错误 :zygote_host_impl_linux. cc(89)] 不支持在没有 --no-sandbox 的情况下以 root 身份运行

python - 如何使类方法返回其自身的新实例?