我使用 selenium 加 scrapy 创建了一个蜘蛛,这表明它确实使用相同的脚本抓取了从现在到昨天的内容,我能够将输出写入 csv 文件,但现在下午它显示 scrapy 没有识别命令以及 python 和 pip
所以我从头开始安装了包括 python 在内的所有内容,当我尝试运行蜘蛛之后,蜘蛛运行顺利,但没有像以前那样以首选方式写入。
我已经绞尽脑汁了 4 个小时,但无法找到一种方法,如果有人可以帮助我,我将非常感激,以下是您需要的东西
我曾多次尝试更改管道
设置.py
BOT_NAME = 'mcmastersds'
SPIDER_MODULES = ['grainger.spiders']
NEWSPIDER_MODULE = 'grainger.spiders'
LOG_LEVEL = 'INFO'
ROBOTSTXT_OBEY = False
ITEM_PIPELINES = {'grainger.pipelines.GraingerPipeline': 300,}
DOWNLOAD_DELAY = 1
USER_AGENT = 'Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/56.0.2924.76 Safari/537.36 OPR/43.0.2442.806'
PROXY_MODE = 0
RETRY_TIMES = 0
SPLASH_URL = 'http://localhost:8050'
SPIDER_MIDDLEWARES = {
'scrapy_splash.SplashDeduplicateArgsMiddleware': 100,
}
管道.py
import csv
import os.path
from scrapy.loader import ItemLoader
from scrapy.loader.processors import TakeFirst, MapCompose, Join
class GraingerPipeline(object):
def __init__(self):
if not os.path.isfile('CONTENT_psysci.csv'):
self.csvwriter = csv.writer(open('safale.csv', 'a',newline="",encoding='utf8'))
self.csvwriter.writerow(['url','Title','sellername','travlink','travlink1','rating','Crreview','feature','Description','proddescription','Additonalinfo','details','detailsextended','producttable','stockstatus','newseller','condition','deliverystatus','price','bestsellersrank','mainimage','subimage'])
def process_item(self, item, spider):
self.csvwriter.writerow([item['url'],item['title'],item['sellername'],item['travlink'],item['travlink1'],item['rating'],item['Crreview'],item['feature'],item['Description'],item['proddescription'],item['Additonalinfo'],item['details'],item['detailsextended'],item['producttable'],item['stockstatus'],item['newseller'],item['condition'],item['deliverystatus'],item['price'],item['bestsellersrank'],item['mainimage'],item['subimage']])
return item
你能帮我吗
最佳答案
如果您只想写入项目而不执行任何特定于数据的操作,我建议使用 feed exports特征。 Scrapy有一个内置的CSV feed exporter提供。
上面的代码不起作用的原因是您从未关闭在 self.csvwriter
初始化语句中打开的 csv 文件。
您应该使用open_spider
和close_spider
方法打开文件并在处理项目后关闭它,看看json pipeline example在 scrapy 文档中,这是类似的。
因此,您的上述管道应适应以下代码:
class GraingerPipeline(object):
csv_file = None
def open_spider(self):
if not os.path.isfile('CONTENT_psysci.csv'):
self.csvfile = open('safale.csv', 'a',newline="",encoding='utf8')
self.csvwriter = csv.writer(self.csvfile)
self.csvwriter.writerow(['url','Title','sellername','travlink','travlink1','rating','Crreview','feature','Description','proddescription','Additonalinfo','details','detailsextended','producttable','stockstatus','newseller','condition','deliverystatus','price','bestsellersrank','mainimage','subimage'])
def process_item(self, item, spider):
self.csvwriter.writerow([item['url'],item['title'],item['sellername'],item['travlink'],item['travlink1'],item['rating'],item['Crreview'],item['feature'],item['Description'],item['proddescription'],item['Additonalinfo'],item['details'],item['detailsextended'],item['producttable'],item['stockstatus'],item['newseller'],item['condition'],item['deliverystatus'],item['price'],item['bestsellersrank'],item['mainimage'],item['subimage']])
return item
def close_spider(self):
if self.csv_file:
self.csv_file.close()
关于python - Scrapy不生成outputcsv文件,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/55398735/