python - 堆叠相同数组列表时 numpy vstack/c_ 的奇怪行为

我正在尝试将三个单独的混合类型列表堆叠到一个矩阵中。

例如，这样的东西完美地工作:

import numpy as np


In [31]:

c1 = [0, [1], [1], [1], [1], [1], [1], [1], [1], [1]]
c2 = [[1], 0, 0, [1], [1], [1], [1], [1], 0, [1]]
c3 = [1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0]

In [32]:

np.c_[c1,c2,c3]

Out[32]:

array([[0, list([1]), 1.0],
       [list([1]), 0, 1.0],
       [list([1]), 0, 1.0],
       [list([1]), list([1]), 1.0],
       [list([1]), list([1]), 1.0],
       [list([1]), list([1]), 1.0],
       [list([1]), list([1]), 1.0],
       [list([1]), list([1]), 1.0],
       [list([1]), 0, 1.0],
       [list([1]), list([1]), 1.0]], dtype=object)

In [33]:

np.vstack((c1, c2, c3)).T

Out[33]:

array([[0, list([1]), 1.0],
       [list([1]), 0, 1.0],
       [list([1]), 0, 1.0],
       [list([1]), list([1]), 1.0],
       [list([1]), list([1]), 1.0],
       [list([1]), list([1]), 1.0],
       [list([1]), list([1]), 1.0],
       [list([1]), list([1]), 1.0],
       [list([1]), 0, 1.0],
       [list([1]), list([1]), 1.0]], dtype=object)

这尊重数组中存储的数据类型。然而，一旦将任何由相同数组组成的列表添加到 mix 中，就会发生这种情况:

In [28]:

c1 = [[1], [1], [1], [1], [1], [1], [1], [1], [1], [1]]
c2 = [[1], 0, 0, [1], [1], [1], [1], [1], 0, [1]]
c3 = [1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0]

In [29]:

np.c_[c1,c2,c3]

Out[29]:

array([[1, list([1]), 1.0],
       [1, 0, 1.0],
       [1, 0, 1.0],
       [1, list([1]), 1.0],
       [1, list([1]), 1.0],
       [1, list([1]), 1.0],
       [1, list([1]), 1.0],
       [1, list([1]), 1.0],
       [1, 0, 1.0],
       [1, list([1]), 1.0]], dtype=object)

In [30]:

np.vstack((c1, c2, c3)).T

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-30-b54eaf7a5522> in <module>
----> 1 np.vstack((c1, c2, c3)).T

<__array_function__ internals> in vstack(*args, **kwargs)

~/anaconda3/envs/idp/lib/python3.8/site-packages/numpy/core/shape_base.py in vstack(tup)
    280     if not isinstance(arrs, list):
    281         arrs = [arrs]
--> 282     return _nx.concatenate(arrs, 0)
    283 
    284 

<__array_function__ internals> in concatenate(*args, **kwargs)

ValueError: all the input array dimensions for the concatenation axis must match exactly, but along dimension 1, the array at index 0 has size 1 and the array at index 1 has size 10

两个代码块之间的唯一区别是，现在列表 c1 完全由元素 [1] 组成 - 使用 np.c_ 将此列表转换为平面列表，我相信这也是破坏 np.vstack() 的原因。有什么办法可以避免这种行为吗？

编辑:我的意思是，有没有办法得到与此等效的东西:

array([[list([1]), list([1]), 1.0],
       [list([1]), 0, 1.0],
       [list([1]), 0, 1.0],
       [list([1]), list([1]), 1.0],
       [list([1]), list([1]), 1.0],
       [list([1]), list([1]), 1.0],
       [list([1]), list([1]), 1.0],
       [list([1]), list([1]), 1.0],
       [list([1]), 0, 1.0],
       [list([1]), list([1]), 1.0]], dtype=object)

当第一列完全由相同元素的列表组成时，无论是字符串、列表还是其他任何内容。

最佳答案

c_ 和 vstack 都根据输入创建数组。

In [8]: c1 = [0, [1], [1], [1], [1], [1], [1], [1], [1], [1]] 
   ...: c2 = [[1], 0, 0, [1], [1], [1], [1], [1], 0, [1]] 
   ...: c3 = [1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0]                                
In [9]: np.array(c1)                                                                           
Out[9]: 
array([0, list([1]), list([1]), list([1]), list([1]), list([1]),
       list([1]), list([1]), list([1]), list([1])], dtype=object)
In [10]: np.array(c2)                                                                          
Out[10]: 
array([list([1]), 0, 0, list([1]), list([1]), list([1]), list([1]),
       list([1]), 0, list([1])], dtype=object)
In [11]: np.array(c3)                                                                          
Out[11]: array([1., 1., 1., 1., 1., 1., 1., 1., 1., 1.])

这些都是 (10,) 一维数组。下一个 c1 则不然。

In [12]: c1 = [[1], [1], [1], [1], [1], [1], [1], [1], [1], [1]]                               
In [13]: np.array(c1)                                                                          
Out[13]: 
array([[1],
       [1],
       [1],
       [1],
       [1],
       [1],
       [1],
       [1],
       [1],
       [1]])

np.vstack 将所有输入转换为二维数组。也就是说，一维数组现在具有形状 (1,10)。它们在第一个轴上连接以形成 (3,10) 数组。

在最后一种情况下，(10,1) 无法与 (1,10) 数组连接。

np.c_ 进行一些更复杂的形状调整。但简而言之，这里所做的是将所有数组转换为 (10,1) 形状，并在第二个轴上连接。 np.column_stack 做同样的事情。

或者使用底层连接:

np.concatenate([np.array(c1), np.array(c2)[:,None], np.array(c3)[:,None]],axis=1)

关于python - 堆叠相同数组列表时 numpy vstack/c_ 的奇怪行为，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/60210897/

python - 堆叠相同数组列表时 numpy vstack/c_ 的奇怪行为

上一篇：python - aggfunc 中具有不同条件的 Pandas 数据透视表

下一篇：python - 与 Pandas 数据帧上的 'apply' 结合使用时，向 'groupby' 函数添加关键字参数时出现问题