我收到一个值错误:
raise ValueError('Missing scheme in request url: %s' % self._url)
ValueError: Missing scheme in request url: h
我的 items.py 代码是:
class Brand(scrapy.Item):
name = scrapy.Field()
url = scrapy.Field()
brand_image = scrapy.Field()
image_urls = scrapy.Field()
images = scrapy.Field()
我的setting.py是:
BOT_NAME = 'scraper'
SPIDER_MODULES = ['scraper.spiders']
NEWSPIDER_MODULE = 'scraper.spiders'
ITEM_PIPELINES = {'scrapy.contrib.pipeline.images.ImagesPipeline': 1}
IMAGES_STORE = 'images'
我的蜘蛛代码:
import scrapy
import json
from scraper.items import Brand
class QuotesSpider(scrapy.Spider):
name = "brandDetails"
allowed_domains = ["ozhat-turkiye.com"]
with open('brands.json') as data_file:
data_item = json.load(data_file)
start_urls = list()
for item in data_item:
start_urls.append(item["url"])
def parse(self, response):
item = Brand()
name = response.css("div.th::text").extract_first()
name = name.replace('Products of ', '')
item['name'] = name
item['url'] = response.url
urls = response.css("div.productimage img::attr(src)").extract_first()
urls = response.urljoin(urls)
item['image_urls'] = urls
yield item
最佳答案
Missing scheme in request url
始终意味着您的网址无效,缺少 http://
和 https://
因此,请在您拥有的图像网址之前添加 https://
或 http://
`https://` + response.css("div.productimage img::attr(src)").extract_first()
关于python - 我在从 scrapy 蜘蛛下载/抓取图像时遇到值错误,我正在使用图像管道,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/52568025/