python - 如何删除 unicode 字符串 "[u' 字符串]"当我写 CSV 文件时

标签 python scrapy

如何在写入 CSV 文件时删除 unicode 字符串“[u'string]”。

**this is my spider:**
import pdb
import FileManager
from scrapy.selector import HtmlXPathSelector
from scrapy.contrib.linkextractors.sgml import SgmlLinkExtractor
from scrapy.contrib.spiders import CrawlSpider, Rule
from centerfireguns.items import CenterfiregunsItem
from urlparse import urljoin
from scrapy.http import Request
new_filemanager = FileManager.File_Manager()
class FiregunsSpider(CrawlSpider):
name = 'centerfireguns'
allowed_domains = ['centerfireguns.com']
start_urls = ['http://www.centerfireguns.com/firearms.html']

rules = (
    Rule(SgmlLinkExtractor(allow=(), restrict_xpaths=('//a[contains(@class, "i-next")][1]')), callback='parse_item', follow=True),
)

def parse_item(self, response):
    hxs = HtmlXPathSelector(response)
    urls = hxs.select('//a[contains(@class,"product-image")]/@href').extract()
    for url in urls:
        new_url = urljoin("http://www.centerfireguns.com/", url)
        item = CenterfiregunsItem()
        item['ad_url'] = new_url
        request = Request(new_url, callback = self.parse_detail)
        request.meta['item'] = item
        yield request


def parse_detail(self, response):
    hxs = HtmlXPathSelector(response)
    item = response.meta['item']

    #<div class="product-name"><h1 itemprop="name">Adcor Defense BEAR 223 16 OPT RDY</h1>
    item['title'] = hxs.select('//div[contains(@class, "product-name")]//h1/text()').extract()

    #<div class="product-shop"><span class="regular-price" id="product-price-21339"> <span class="price" itemprop="price">$1,389.00</span> </span>
    item['price'] = hxs.select('//div[contains(@class, "product-shop")]//span[contains(@itemprop,"price")][1]/text()').extract()

    #<div class="sku"><span>Model #: </span>2013040</div>
    item['model'] = hxs.select('//div[contains(@class, "sku")]/text()').extract()

    #<img id="image" itemprop="image" src="http://www.centerfireguns.com/media/catalog/product/cache/1/image/292x320/9df78eab33525d08d6e5fb8d27136e95/a/d/adcor-defense-2013040-tactical-rifles.jpg">
    item['img_url'] = hxs.select('//img[contains(@id, "image")]/@src').extract()

    #<table class="data-table" id="product-attribute-specs-table">
    item['specification'] = hxs.select('//table[contains(@id, "product-attribute-specs-table")]/text()').extract()

    #<div id="product_tabs_description_tabbed_contents"><h6>Full Description</h6><ol><h2>Details</h2><div class="std">
    item['description'] = hxs.select('//div[contains(@id, "product_tabs_description_tabbed_contents")]//div[contains(@class, "std")]/text()').extract()

    #new_filemanager.writeFile("/home/user1/Public/www/GajenderData/SCRIPTS/pythonprog/ganesh/centerfireguns_detail.csv",str(title) + "\n")
    yield item

这是pipeline.py

# -*- coding: utf-8 -*-
import csv
class CenterfiregunsPipeline(object):

def __init__(self):
    self.myCSV = csv.writer(open('/home/user1/Public/www/GajenderData/SCRIPTS/pythonprog/ganesh/centerfireguns_detail.csv', 'wb'))
    self.myCSV.writerow(['ad_url','title', 'model','price','img_url','specification','description'])

def process_item(self, item, spider):
    self.myCSV.writerow([item['ad_url'].encode('utf-8'),item['title'].encode('utf-8'),item['model'].encode('utf-8'),item['price'].encode('utf-8'),item['img_url'].encode('utf-8'),item['specification'].encode('utf-8'),item['description'].encode('utf-8')])
    return item

当我使用 .encode('utf-8') 时出现此错误。请检查下方

Traceback (most recent call last):
File "/usr/lib/python2.7/dist-packages/scrapy/middleware.py", line 54, in _process_chain
return process_chain(self.methods[methodname], obj, *args)
File "/usr/lib/python2.7/dist-packages/scrapy/utils/defer.py", line 65, in process_chain
d.callback(input)
File "/usr/lib/python2.7/dist-packages/twisted/internet/defer.py", line 362, in callback
self._startRunCallbacks(result)
File "/usr/lib/python2.7/dist-packages/twisted/internet/defer.py", line 458, in _startRunCallbacks
self._runCallbacks()
--- <exception caught here> ---
File "/usr/lib/python2.7/dist-packages/twisted/internet/defer.py", line 545, in _runCallbacks
current.result = callback(current.result, *args, **kw)
File "/home/user1/Public/www/GajenderData/SCRIPTS/pythonprog/ganesh/centerfireguns/centerfireguns/pipelines.py", line 14, in process_item
self.myCSV.writerow([item['ad_url'].encode('utf-8'),item['title'].encode('utf-8'),item['model'].encode('utf-8'),item['price'].encode('utf-8'),item['img_url'].encode('utf-8'),item['specification'].encode('utf-8'),item['description'].encode('utf-8')])
**exceptions.AttributeError: 'list' object has no attribute 'encode'**

我是python初学者

最佳答案

以下代码片段作为函数的一部分作为我的宠物项目中函数的一部分,它通过声明一个包含您希望删除的值的字典然后使用替换方法来完全删除您正在寻找的内容在目标文件和字典中的项目上...我确实在文本文件中使用它,所以您必须使用 CSV 编写器和阅读器对其进行调整,但想法是...

    name = "file.csv"    
    infile = name
    outfile = name + "_clean.csv"

    delete_list = ["['", "']"]
    fin = open(infile)
    fout = open(outfile, "w+")
    for line in fin:
        for word in delete_list:
            line = line.replace(word, "")
        fout.write(line)
    fin.close()
    fout.close()

此外,也许更 pythonic-hackish 会在您定义项目之前剥离然后加入项目...示例

#...
tit = hxs.select('//div[contains(@class, "product-name")]//h1/text()').extract()
tit = [x.strip() for in in tit]
tit = ''.join()
prc = hxs.select('//div[contains(@class, "product-shop")]//span[contains(@itemprop,"price")][1]/text()').extract()
prc = [x.strip() for x in prc]
prc = ''.join(prc)

item = response.meta['item']

item['title'] = tit

item['price'] = prc
#...

这样你甚至可以避免使用管道(如果管道的唯一原因是编码)......否则你可以取消管道中的编码,如果它为你提供所需的目的...请问为什么需要管道?

关于python - 如何删除 unicode 字符串 "[u' 字符串]"当我写 CSV 文件时,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/34460579/

相关文章:

c# - 如何将 python 绑定(bind)添加到 C#?

python - 不明白为什么这段代码在不同版本的Python中给出不同的结果

python - “<' not supported between instances of ' 方法”和 'method' - Python、Django

python - 无法使用 scrapy 从 Reddit 嵌入式提要窗口中获取 `href`

python - 如何更改 scrapy spider 中的 User_AGENT?

python - Spider必须返回Request,BaseItem,dict或None,得到 'set'

python - 在ubuntu中升级python

python - Instapy 错误与 smart_run 第 117 行导致失败

python - 拒绝 scrapy linkextractor 中的某些链接

javascript - 如何在从 Angular JavaScript 提供数据的页面上执行 Scrapy 和 Selenium?