python - 如何在 Scrapy .csv 结果中获取双引号

标签 python csv web-scraping scrapy

我对使用 Scrapy 的输出中的引号有疑问。我正在尝试删除包含逗号的数据,这会导致某些列中出现双引号,如下所示:

TEST,TEST,TEST,ON,TEST,TEST,"$2,449,000, 4,735 Sq Ft, 6 Bed, 5.1 Bath, Listed 03/01/2016"
TEST,TEST,TEST,ON,TEST,TEST,"$2,895,000, 4,975 Sq Ft, 5 Bed, 4.1 Bath, Listed 01/03/2016"

只有带逗号的列才会被双引号括起来。如何双引号所有数据列?

我想让Scrapy输出:

"TEST","TEST","TEST","ON","TEST","TEST","$2,449,000, 4,735 Sq Ft, 6 Bed, 5.1 Bath, Listed 03/01/2016"
"TEST","TEST","TEST","ON","TEST","TEST","$2,895,000, 4,975 Sq Ft, 5 Bed, 4.1 Bath, Listed 01/03/2016"

我可以更改任何设置来执行此操作吗?

最佳答案

默认情况下,对于 CSV 输出,Scrapy 使用 csv.writer() with the defaults .

对于字段引用,the default is csv.QUOTE_MINIMAL :

Instructs writer objects to only quote those fields which contain special characters such as delimiter, quotechar or any of the characters in lineterminator.

但您可以构建自己的 CSV 项目导出器并设置新的方言,以默认的 'excel' 方言为基础。

例如,在exporters.py模块中,定义如下

import csv

from scrapy.exporters import CsvItemExporter


class QuoteAllDialect(csv.excel):
    quoting = csv.QUOTE_ALL


class QuoteAllCsvItemExporter(CsvItemExporter):

    def __init__(self, *args, **kwargs):
        kwargs.update({'dialect': QuoteAllDialect})
        super(QuoteAllCsvItemExporter, self).__init__(*args, **kwargs)

然后你只需要reference this item exporter in your settings对于 CSV 输出,类似于:

FEED_EXPORTERS = {
    'csv': 'myproject.exporters.QuoteAllCsvItemExporter',
}

还有一个像这样的简单蜘蛛:

import scrapy


class ExampleSpider(scrapy.Spider):
    name = "example"
    allowed_domains = ["example.com"]
    start_urls = ['http://example.com/']

    def parse(self, response):
        yield {
            "name": "Some name",
            "title": "Some title, baby!",
            "description": "Some description, with commas, quotes (\") and all"
        }

将输出这个:

"description","name","title"
"Some description, with commas, quotes ("") and all","Some name","Some title, baby!"

关于python - 如何在 Scrapy .csv 结果中获取双引号,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/42658875/

相关文章:

django - 在 django 中上传 csv 时,迭代器应该返回字符串,而不是字节(您是否以文本模式打开文件?)

javascript - 为什么在 CasperJS 中 POST 请求响应数据为空,即使相同的请求在 Postman 中显示数据

python - 确定列值是否在基于另一列的条件范围之间

javascript在Safari中下载csv数据

SQL Server 2008 行到 1 个 CSV 字段

javascript - 使用 BeautifulSoup 从图像标签 Src 属性中提取 JPG

javascript - VBA抓取生成的不在HTML源文件中的内容

python - 来自变量的 Pymysql 表名

python - 如何通过 PyQGIS 使用 cpt-city 目录中的颜色渐变

python - 立即获取标签后的文本