python -numpy : read csv into numpy with proper value type

这是我的 test_data.csv:

A,1,2,3,4,5
B,6,7,8,9,10
C,11,12,13,14,15
A,16,17,18,19,20

我使用下面的代码将其读取到 numpy 数组:

def readCSVToNumpyArray(dataset):
    with open(dataset) as f:
        values = [i for i in csv.reader(f)]

    data = numpy.array(values)

    return data

在主代码中，我有:

    numpyArray = readCSVToNumpyArray('test_data.csv')
    print(numpyArray)

这给了我输出:

(array([['A', '1', '2', '3', '4', '5'],
       ['B', '6', '7', '8', '9', '10'],
       ['C', '11', '12', '13', '14', '15'],
       ['A', '16', '17', '18', '19', '20']], 
      dtype='|S2'))

但是数组中的所有数字都被视为字符串，有没有一种好方法可以将它们存储为float，而无需遍历每个元素并分配类型？

谢谢!

最佳答案

由于每行的第一个字符是一个字符串，因此您必须在 numpy 中使用更灵活的类型，称为“object”。尝试使用此功能，看看这是否是您正在寻找的:

    def readCSVToNumpyArray(dataset):
        values = [[]]
        with open(dataset) as f:
            counter = 0
            for i in csv.reader(f):
                for j in i:
                    try:
                        values[counter].append(float(j))
                    except ValueError:
                        values[counter].append(j)
                counter = counter + 1
                values.append([])

        data = numpy.array(values[:-1],dtype='object')

        return data

    numpyArray = readCSVToNumpyArray('test_data.csv')
    print(numpyArray)

结果是:

    [['A' 1.0 2.0 3.0 4.0 5.0]
     ['B' 6.0 7.0 8.0 9.0 10.0]
     ['C' 11.0 12.0 13.0 14.0 15.0]
     ['A' 16.0 17.0 18.0 19.0 20.0]]

关于 python -numpy : read csv into numpy with proper value type，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/36066726/

python -numpy : read csv into numpy with proper value type

上一篇：python - 后继者和前任者 - 二叉搜索树 (Python)

下一篇：python - 正则表达式匹配键，除了一个