python - pandas 与复杂类型列不兼容的形状

如何将复杂类型(即 numpy 数组)作为列添加到 pandas 数据框？

df = pd.DataFrame({'foo':['bar', 'baz'], 'bar':[1,2]})
display(df)

my_array = np.array([[[0.61209572, 0.616934  , 0.94374808, 0.6818203 ],
        [0.4236548 , 0.64589411, 0.43758721, 0.891773  ]],

       [[0.52184832, 0.41466194, 0.26455561, 0.77423369],
        [0.5488135 , 0.71518937, 0.60276338, 0.54488318]]])

print(my_array)
print(df.shape)
print(my_array.shape)

df['complex_type'] = my_array

失败:

AssertionError: Shape of new values must be compatible with manager shape

我的 pandas 版本是:1.0.0

编辑

一个更复杂的例子:

#%%timeit
import numpy as np
import pandas as pd
from scipy.spatial import cKDTree

rng = np.random.RandomState(0)
n_points = 50
d_dimensions = 4
k_neighbours = 3

X = rng.random_sample((n_points, d_dimensions))

df = pd.DataFrame(X)
df = df.reset_index(drop=False)
df.columns = ['id_str', 'lat_1', 'long_1', 'lat_2', 'long_2']
df.id_str = df.id_str.astype(object)

tree = cKDTree(df[['lat_1', 'long_1', 'lat_2', 'long_2']])
dist,ind=tree.query(X, k=k_neighbours,n_jobs=-1)


df = df.join(pd.DataFrame({'complex_type' : [arr for arr in X[ind]]}))
#df['complex_type'] = list(X[ind])    
df.head()

最佳答案

In [29]: df = pd.DataFrame({'foo':['bar', 'baz'], 'bar':[1,2]}) 
    ...: display(df) 
    ...:  
    ...: my_array = np.array([[[0.61209572, 0.616934  , 0.94374808, 0.6818203 ], 
    ...:         [0.4236548 , 0.64589411, 0.43758721, 0.891773  ]], 
    ...:  
    ...:        [[0.52184832, 0.41466194, 0.26455561, 0.77423369], 
    ...:         [0.5488135 , 0.71518937, 0.60276338, 0.54488318]]]) 
    ...:                                                                                       
   foo  bar
0  bar    1
1  baz    2
In [30]: my_array.shape                                                                        
Out[30]: (2, 2, 4)

分配两个 (2,4) 数组的列表有效:

In [31]: df['new'] = list(my_array)                                                            
In [32]: df                                                                                    
Out[32]: 
   foo  bar                                                new
0  bar    1  [[0.61209572, 0.616934, 0.94374808, 0.6818203]...
1  baz    2  [[0.52184832, 0.41466194, 0.26455561, 0.774233...

In [33]: df.info()                                                                             
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 2 entries, 0 to 1
Data columns (total 3 columns):
foo    2 non-null object
bar    2 non-null int64
new    2 non-null object
dtypes: int64(1), object(2)
memory usage: 176.0+ bytes

但请注意，您不会从 pandas 返回 (2,2,4) 数组；你得到 (2,) 带有数组元素的数组。

In [34]: df['new'].to_numpy()                                                                  
Out[34]: 
array([array([[0.61209572, 0.616934  , 0.94374808, 0.6818203 ],
       [0.4236548 , 0.64589411, 0.43758721, 0.891773  ]]),
       array([[0.52184832, 0.41466194, 0.26455561, 0.77423369],
       [0.5488135 , 0.71518937, 0.60276338, 0.54488318]])], dtype=object)

保存这样的帧时也要小心。 csv 文件很难重新加载。

关于python - pandas 与复杂类型列不兼容的形状，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/60083394/

python - pandas 与复杂类型列不兼容的形状

编辑

上一篇：python - 如何在数据框中查找任意位置包含单个字符的句子

下一篇：python - 如何使用存储在单独模块中的错误代码