我对使用 Scrapy 的输出中的引号有疑问。我正在尝试删除包含逗号的数据,这会导致某些列中出现双引号,如下所示:
TEST,TEST,TEST,ON,TEST,TEST,"$2,449,000, 4,735 Sq Ft, 6 Bed, 5.1 Bath, Listed 03/01/2016"
TEST,TEST,TEST,ON,TEST,TEST,"$2,895,000, 4,975 Sq Ft, 5 Bed, 4.1 Bath, Listed 01/03/2016"
只有带逗号的列才会被双引号括起来。如何双引号所有数据列?
我想让Scrapy输出:
"TEST","TEST","TEST","ON","TEST","TEST","$2,449,000, 4,735 Sq Ft, 6 Bed, 5.1 Bath, Listed 03/01/2016"
"TEST","TEST","TEST","ON","TEST","TEST","$2,895,000, 4,975 Sq Ft, 5 Bed, 4.1 Bath, Listed 01/03/2016"
我可以更改任何设置来执行此操作吗?
最佳答案
默认情况下,对于 CSV 输出,Scrapy 使用 csv.writer()
with the defaults .
对于字段引用,the default is csv.QUOTE_MINIMAL :
Instructs writer objects to only quote those fields which contain special characters such as delimiter, quotechar or any of the characters in lineterminator.
但您可以构建自己的 CSV 项目导出器并设置新的方言,以默认的 'excel'
方言为基础。
例如,在exporters.py
模块中,定义如下
import csv
from scrapy.exporters import CsvItemExporter
class QuoteAllDialect(csv.excel):
quoting = csv.QUOTE_ALL
class QuoteAllCsvItemExporter(CsvItemExporter):
def __init__(self, *args, **kwargs):
kwargs.update({'dialect': QuoteAllDialect})
super(QuoteAllCsvItemExporter, self).__init__(*args, **kwargs)
然后你只需要reference this item exporter in your settings对于 CSV 输出,类似于:
FEED_EXPORTERS = {
'csv': 'myproject.exporters.QuoteAllCsvItemExporter',
}
还有一个像这样的简单蜘蛛:
import scrapy
class ExampleSpider(scrapy.Spider):
name = "example"
allowed_domains = ["example.com"]
start_urls = ['http://example.com/']
def parse(self, response):
yield {
"name": "Some name",
"title": "Some title, baby!",
"description": "Some description, with commas, quotes (\") and all"
}
将输出这个:
"description","name","title"
"Some description, with commas, quotes ("") and all","Some name","Some title, baby!"
关于python - 如何在 Scrapy .csv 结果中获取双引号,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/42658875/