python - 使用 numpy.genfromtxt 读取包含逗号的字符串的 csv 文件

我正在尝试使用 numpy.genfromtxt 读取 csv 文件，但其中一些字段是包含逗号的字符串。字符串用引号引起来，但 numpy 没有将引号识别为定义单个字符串。例如，使用 't.csv' 中的数据:

2012, "Louisville KY", 3.5
2011, "Lexington, KY", 4.0

代码

np.genfromtxt('t.csv', delimiter=',')

产生错误:

ValueError: Some errors were detected ! Line #2 (got 4 columns instead of 3)

我要找的数据结构是:

array([['2012', 'Louisville KY', '3.5'],
       ['2011', 'Lexington, KY', '4.0']], 
      dtype='|S13')

查看文档，我没有看到任何解决此问题的选项。有没有办法用 numpy 来做呢，还是我只需要用 csv 模块读入数据，然后将其转换为 numpy 数组？

最佳答案

您可以使用 pandas (成为在科学 python 中处理数据帧(异构数据)的默认库)为此。它是 read_csv可以处理这个。来自文档:

quotechar : string

The character to used to denote the start and end of a quoted item. Quoted items 
can include the delimiter and it will be ignored.

默认值为"。示例:

In [1]: import pandas as pd

In [2]: from StringIO import StringIO

In [3]: s="""year, city, value
   ...: 2012, "Louisville KY", 3.5
   ...: 2011, "Lexington, KY", 4.0"""

In [4]: pd.read_csv(StringIO(s), quotechar='"', skipinitialspace=True)
Out[4]:
   year           city  value
0  2012  Louisville KY    3.5
1  2011  Lexington, KY    4.0

这里的技巧是您还必须使用 skipinitialspace=True 来处理逗号分隔符后的空格。

除了强大的 csv 阅读器之外，我还强烈建议您将 pandas 与您拥有的异构数据一起使用(您提供的 numpy 示例输出都是字符串，尽管您可以使用结构化数组)。

关于python - 使用 numpy.genfromtxt 读取包含逗号的字符串的 csv 文件，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/17933282/

python - 使用 numpy.genfromtxt 读取包含逗号的字符串的 csv 文件

上一篇：如果文件名已经存在，Python 将文件复制到新目录并重命名

下一篇：python - Python 版本行是什么意思？