python - Numpy 结构化数组无法进行基本的 numpy 操作

我希望操作命名 numpy 数组(加、乘、连接...)

我定义了结构化数组:

types=[('name1', int), ('name2', float)]
a = np.array([2, 3.3], dtype=types)
b = np.array([4, 5.35], dtype=types)

a 和 b 是这样创建的

a
array([(2, 2. ), (3, 3.3)], dtype=[('name1', '<i8'), ('name2', '<f8')])

但我真的希望 a['name1'] 只是 2，而不是 array([2, 3])

同样，我希望 a['name2'] 仅为 3.3

这样我可以求和 c=a+b，它应该是一个长度为 2 的数组，其中 c['name1'] 是 6 并且 c['name2'] 是 8.65

我该怎么做？

最佳答案

定义一个结构化数组:

In [125]: dt = np.dtype([('f0','U10'),('f1',int),('f2',float)])
In [126]: a = np.array([('one',2,3),('two',4,5.5),('three',6,7)],dt)
In [127]: a
Out[127]: 
array([('one', 2, 3. ), ('two', 4, 5.5), ('three', 6, 7. )],
      dtype=[('f0', '<U10'), ('f1', '<i8'), ('f2', '<f8')])

和一个具有相同数据的对象dtype数组

In [128]: A = np.array([('one',2,3),('two',4,5.5),('three',6,7)],object)
In [129]: A
Out[129]: 
array([['one', 2, 3],
       ['two', 4, 5.5],
       ['three', 6, 7]], dtype=object)

加法有效，因为它(迭代地)将操作委托(delegate)给所有元素

In [130]: A+A
Out[130]: 
array([['oneone', 4, 6],
       ['twotwo', 8, 11.0],
       ['threethree', 12, 14]], dtype=object)

结构化加法不起作用

In [131]: a+a
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-131-6ff992d1ddd5> in <module>()
----> 1 a+a

TypeError: ufunc 'add' did not contain a loop with signature matching types 
dtype([('f0', '<U10'), ('f1', '<i8'), ('f2', '<f8')]) dtype([('f0', '<U10'), ('f1', '<i8'), ('f2', '<f8')]) 
dtype([('f0', '<U10'), ('f1', '<i8'), ('f2', '<f8')])

让我们逐个字段尝试添加:

In [132]: aa = np.zeros_like(a)
In [133]: for n in a.dtype.names: aa[n] = a[n] + a[n]
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-133-68476e5d579e> in <module>()
----> 1 for n in a.dtype.names: aa[n] = a[n] + a[n]

TypeError: ufunc 'add' did not contain a loop with signature matching types 
dtype('<U10') dtype('<U10') dtype('<U10')

糟糕，不太好用 - 字符串 dtype 没有添加。但是我们可以单独处理字符串字段:

In [134]: aa['f0'] = a['f0']
In [135]: for n in a.dtype.names[1:]: aa[n] = a[n] + a[n]
In [136]: aa
Out[136]: 
array([('one',  4,  6.), ('two',  8, 11.), ('three', 12, 14.)],
      dtype=[('f0', '<U10'), ('f1', '<i8'), ('f2', '<f8')])

或者我们可以将字符串dtype改为object:

In [137]: dt1 = np.dtype([('f0',object),('f1',int),('f2',float)])
In [138]: b = np.array([('one',2,3),('two',4,5.5),('three',6,7)],dt1)
In [139]: b
Out[139]: 
array([('one', 2, 3. ), ('two', 4, 5.5), ('three', 6, 7. )],
      dtype=[('f0', 'O'), ('f1', '<i8'), ('f2', '<f8')])
In [140]: bb = np.zeros_like(b)
In [141]: for n in a.dtype.names: bb[n] = b[n] + b[n]
In [142]: bb
Out[142]: 
array([('oneone',  4,  6.), ('twotwo',  8, 11.), ('threethree', 12, 14.)],
      dtype=[('f0', 'O'), ('f1', '<i8'), ('f2', '<f8')])

Python 字符串确实有一个 __add__，定义为连接。 Numpy dtype 字符串没有该定义。 Python 字符串可以乘以整数，否则会报错。

我的猜测是 pandas 采用了类似于我刚刚所做的事情。我怀疑它是否在编译代码中实现了数据帧添加(某些特殊情况除外)。如果 dtype 允许，它可能逐列工作。它似乎也可以自由切换到对象 dtype(例如，具有 np.nan 和字符串的列)。时间可能会证实我的猜测(我没有在这个操作系统上安装 pandas)。

关于python - Numpy 结构化数组无法进行基本的 numpy 操作，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/50931421/

python - Numpy 结构化数组无法进行基本的 numpy 操作

上一篇：python - 对 pandas 列执行条件操作

下一篇：python - 对 pandas 中的群体进行采样