numpy - 简单的多维 numpy ndarray 到 pandas 数据框方法？

有一个 4-D numpy.ndarray，例如

myarr = np.random.rand(10,4,3,2) dims={'time':1:10,'sub':1:4,'cond':['A','B','C'],'measure':['meas1','meas2'] }

但可能有更高的维度。如何创建具有多索引的 pandas.dataframe，只需将维度作为索引传递，而无需进一步手动调整(将 ndarray reshape 为 2D 形状)？

我无法全神贯注于 reshape ，甚至在 3 dimensions 中都没有。还没有，所以如果可能的话，我正在寻找一种“自动”方法。

传递列/行索引并创建数据框的函数是什么？像这样的东西:

df=nd2df(myarr,dim2row=[0,1],dim2col=[2,3],rowlab=['time','sub'],collab=['cond','measure'])

然后是类似的东西:

              meas1             meas2
              A     B     C     A    B    C
sub   time
  1      1
         2
         3
         .
         .
  2      1
         2
 ...

如果自动化不可能/不可行，解释不如 Multiindexing manual 简洁受到赞赏。

当我不关心维度的顺序时，我什至无法正确处理，例如我希望这会起作用:

a=np.arange(24).reshape((3,2,2,2))
iterables=[[1,2,3],[1,2],['m1','m2'],['A','B']]
pd.MultiIndex.from_product(iterables, names=['time','sub','meas','cond'])



pd.DataFrame(a.reshape(2*3*1,2*2),index)

给出:

ValueError: Shape of passed values is (4, 6), indices imply (4, 24)

最佳答案

您收到错误是因为您已将 ndarray reshape 为 6x4 并应用旨在捕获单个系列中所有维度的索引。以下是使 pet 示例正常工作的设置:

a=np.arange(24).reshape((3,2,2,2))
iterables=[[1,2,3],[1,2],['m1','m2'],['A','B']]
index = pd.MultiIndex.from_product(iterables, names=['time','sub','meas','cond'])

pd.DataFrame(a.reshape(24, 1),index=index)

解决方案

这是一个通用的 DataFrame 创建器，应该可以完成工作:

def produce_df(rows, columns, row_names=None, column_names=None):
    """rows is a list of lists that will be used to build a MultiIndex
    columns is a list of lists that will be used to build a MultiIndex"""
    row_index = pd.MultiIndex.from_product(rows, names=row_names)
    col_index = pd.MultiIndex.from_product(columns, names=column_names)
    return pd.DataFrame(index=row_index, columns=col_index)

演示

没有命名索引级别

produce_df([['a', 'b'], ['c', 'd']], [['1', '2'], ['3', '4']])

       1         2     
       3    4    3    4
a c  NaN  NaN  NaN  NaN
  d  NaN  NaN  NaN  NaN
b c  NaN  NaN  NaN  NaN
  d  NaN  NaN  NaN  NaN

具有命名索引级别

produce_df([['a', 'b'], ['c', 'd']], [['1', '2'], ['3', '4']],
           row_names=['alpha1', 'alpha2'], column_names=['number1', 'number2'])

number1          1         2     
number2          3    4    3    4
alpha1 alpha2                    
a      c       NaN  NaN  NaN  NaN
       d       NaN  NaN  NaN  NaN
b      c       NaN  NaN  NaN  NaN
       d       NaN  NaN  NaN  NaN

关于numpy - 简单的多维 numpy ndarray 到 pandas 数据框方法？，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/36853594/

numpy - 简单的多维 numpy ndarray 到 pandas 数据框方法？

解决方案

演示

上一篇：Django 如何通过 FormView 重命名上下文对象？

下一篇：maven - jaxb-maven-plugin 的 IntelliJ bindingDirectory