有一个 4-D numpy.ndarray,例如
myarr = np.random.rand(10,4,3,2)
dims={'time':1:10,'sub':1:4,'cond':['A','B','C'],'measure':['meas1','meas2'] }
但可能有更高的维度。如何创建具有多索引的 pandas.dataframe,只需将维度作为索引传递,而无需进一步手动调整(将 ndarray reshape 为 2D 形状)?
我无法全神贯注于 reshape ,甚至在 3 dimensions 中都没有。还没有,所以如果可能的话,我正在寻找一种“自动”方法。
传递列/行索引并创建数据框的函数是什么?像这样的东西:
df=nd2df(myarr,dim2row=[0,1],dim2col=[2,3],rowlab=['time','sub'],collab=['cond','measure'])
然后是类似的东西:
meas1 meas2
A B C A B C
sub time
1 1
2
3
.
.
2 1
2
...
如果自动化不可能/不可行,解释不如 Multiindexing manual 简洁受到赞赏。
当我不关心维度的顺序时,我什至无法正确处理,例如我希望这会起作用:
a=np.arange(24).reshape((3,2,2,2))
iterables=[[1,2,3],[1,2],['m1','m2'],['A','B']]
pd.MultiIndex.from_product(iterables, names=['time','sub','meas','cond'])
pd.DataFrame(a.reshape(2*3*1,2*2),index)
给出:
ValueError: Shape of passed values is (4, 6), indices imply (4, 24)
最佳答案
您收到错误是因为您已将 ndarray reshape 为 6x4 并应用旨在捕获单个系列中所有维度的索引。以下是使 pet 示例正常工作的设置:
a=np.arange(24).reshape((3,2,2,2))
iterables=[[1,2,3],[1,2],['m1','m2'],['A','B']]
index = pd.MultiIndex.from_product(iterables, names=['time','sub','meas','cond'])
pd.DataFrame(a.reshape(24, 1),index=index)
解决方案
这是一个通用的 DataFrame 创建器,应该可以完成工作:
def produce_df(rows, columns, row_names=None, column_names=None):
"""rows is a list of lists that will be used to build a MultiIndex
columns is a list of lists that will be used to build a MultiIndex"""
row_index = pd.MultiIndex.from_product(rows, names=row_names)
col_index = pd.MultiIndex.from_product(columns, names=column_names)
return pd.DataFrame(index=row_index, columns=col_index)
演示
没有命名索引级别
produce_df([['a', 'b'], ['c', 'd']], [['1', '2'], ['3', '4']])
1 2
3 4 3 4
a c NaN NaN NaN NaN
d NaN NaN NaN NaN
b c NaN NaN NaN NaN
d NaN NaN NaN NaN
具有命名索引级别
produce_df([['a', 'b'], ['c', 'd']], [['1', '2'], ['3', '4']],
row_names=['alpha1', 'alpha2'], column_names=['number1', 'number2'])
number1 1 2
number2 3 4 3 4
alpha1 alpha2
a c NaN NaN NaN NaN
d NaN NaN NaN NaN
b c NaN NaN NaN NaN
d NaN NaN NaN NaN
关于numpy - 简单的多维 numpy ndarray 到 pandas 数据框方法?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/36853594/