python - DictReader 和 UnicodeError

标签 python python-2.7 csv unicode python-unicode

def openFile(fileName):
    try:
       trainFile  = io.open(fileName,"r",encoding = "utf-8")
    except IOError as e:
       print ("File could not be opened: {}".format(e))
    else:
       trainData = csv.DictReader(trainFile)
       print trainData
       return trainData

def computeTFIDF(trainData):
     bodyList = []
     print "Inside computeTFIDF"
     for row in trainData:
        for key, value in row.iteritems():
             print key, unicode(value, "utf-8", "ignore")
     print "Done"
     return

 if __name__ == "__main__":
     print "Main"
     trainData = openFile("../Data/TrainSample.csv")
     print "File Opened"
     computeTFIDF(trainData)

错误:

Traceback (most recent call last):
  File "C:\DebSeal\IUB MS Program\IUB Sem III\Facebook Kaggle Comp\Src\facebookChallenge.py", line 62, in <module>
    computeTFIDF(trainData)
  File "C:\DebSeal\IUB MS Program\IUB Sem III\Facebook Kaggle Comp\Src\facebookChallenge.py", line 42, in computeTFIDF
    for row in trainData:
  File "C:\Python27\lib\csv.py", line 104, in next
    row = self.reader.next()
UnicodeEncodeError: 'ascii' codec can't encode character u'\u201c' in position 215: ordinal not in range(128)

TrainSample.csv:是一个包含 4 列(带标题)的 csv 文件。
操作系统:Windows 7 64 位。
使用 Python 2.x

我不知道这里出了什么问题。我说它忽略编码。但仍然会抛出相同的错误。

我认为在控件到达编码之前,它会抛出一个错误。

谁能告诉我哪里出错了。

最佳答案

Python 2 CSV 模块处理 Unicode 输入。

以二进制模式打开文件,将其解析为 CSV 后进行解码。这对于 UTF-8 编解码器是安全的,因为换行符、定界符和引号都编码为 1 个字节。

csv 模块文档在 example section 中包含一个 UnicodeReader 包装器类那将为您解码;它很容易适应 DictReader 类:

import csv

class UnicodeDictReader:
    """
    A CSV reader which will iterate over lines in the CSV file "f",
    which is encoded in the given encoding.
    """

    def __init__(self, f, dialect=csv.excel, encoding="utf-8", **kwds):
        self.encoding = encoding
        self.reader = csv.DictReader(f, dialect=dialect, **kwds)

    def next(self):
        row = self.reader.next()
        return {k: unicode(v, "utf-8") for k, v in row.iteritems()}

    def __iter__(self):
        return self

将其用于以二进制模式打开的文件:

def openFile(fileName):
    try: 
        trainFile  = open(fileName, "rb")
    except IOError as e:
        print "File could not be opened: {}".format(e)
    else:
        return UnicodeDictReader(trainFile)

关于python - DictReader 和 UnicodeError,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/19740385/

相关文章:

python - S形曲线检测

python - 如何删除文件中每一行的一部分?

python - 如何使用mockito模拟os.path.join

csv - 批处理 for 循环、csv 解析和正确输出到文件的问题

python - 如何让 Discord Bot 播放 YouTube 音频

从列表中获取值的 Pythonic 方式

python - "Settings"用于 Python 中的函数

python - 导入错误 : Package installed from Git using pip not found by Python

csv - 如何输出带有引号的 SQLite 列

javascript - 在 UI 上解析上传的 CSV 文件(约 8GB)时,在内存中存储非常大的 JSON 对象