我想通过Scrapy收集偶像的照片。
收藏主页是https://news.mynavi.jp/article/20191229-947707/ .
我写了蜘蛛...
(save_gradol.py)
import scrapy
from scrapy.spiders import CrawlSpider, Rule
from scrapy.linkextractors import LinkExtractor
from gradol.items import GradolItem
class SaveGradolSpider(CrawlSpider):
name = 'save_gradol'
allowed_domains = ['news.mynavi.jp/']
start_urls = ['https://news.mynavi.jp/article/20191229-947707/']
rules = (
Rule(LinkExtractor(allow=(), unique=True), callback="parse_page", follow=True),
)
def parse_page(self, response):
#print("\n>>> Parse " + response.url + " <<<")
item = GradolItem()
item["image_urls"].append(start_urls.rsplit("/", 3)[0] + "/" + response.xpath("//a/@href").extract())
yield item
我也写过项目...
(items.py)
import scrapy
from scrapy.item import Item, Field
class GradolItem(scrapy.Item):
# define the fields for your item here like:
# name = scrapy.Field()
#image_directory_name = scrapy.Field()
image_urls = scrapy.Field()
images = scrapy.Field()
我还写了管道...
(管道.py)
import scrapy
from scrapy.pipelines.images import ImagesPipeline
class MyImagesPipeline(object):
def process_item(self, item, spider):
return item
我还写了设置...
(设置.py)
ITEM_PIPELINES = {'gradol.pipelines.MyImagesPipeline': 1}
IMAGES_STORE = './savedImages'
MEDIA_ALLOW_REDIRECTS = True
然后,我尝试蜘蛛[sudo scrapy scrapy save_gradol], 但不抓取也不收集照片。
请帮我解决这个问题。
最佳答案
你可以用最简单的方法做到这一点:
import requests
from tqdm import tqdm
number_of_photos = 26
for i in tqdm(range(1, number_of_photos + 1)):
image_url = 'https://news.mynavi.jp/article/20191229-947707/images/{:03}l.jpg'.format(i)
try:
response = requests.get(image_url)
except:
pass
else:
if response.status_code == 200:
with open('{:02}.jpg'.format(i), 'wb') as f:
f.write(response.content)
享受。
关于python - 如何通过Scrapy收集jpeg,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/59690938/