python - 将 rec.array 转换为数据帧

标签 python arrays pandas numpy

我一直在尝试将 numpy rec.array 转换为数据帧。当前数组如下所示:

[rec.array([([0.2], [ 1.76405235,  0.40015721,  0.97873798,  2.2408932 ]),
            ([0.2], [ 1.86755799, -0.97727788,  0.95008842, -0.15135721]),
            ([0.2], [-0.10321885,  0.4105985 ,  0.14404357,  1.45427351]),
            ([0.2], [ 0.76103773,  0.12167502,  0.44386323,  0.33367433]),
            ([0.2], [ 1.49407907, -0.20515826,  0.3130677 , -0.85409574])],
           dtype=[('weights', '<f8', (1,)), ('integration', '<f8', (4,))]),
 rec.array([([0.1], [ 1.76405235,  0.40015721,  0.97873798,  2.2408932 ]),
            ([0.1], [ 1.86755799, -0.97727788,  0.95008842, -0.15135721]),
            ([0.1], [-0.10321885,  0.4105985 ,  0.14404357,  1.45427351]),
            ([0.1], [ 0.76103773,  0.12167502,  0.44386323,  0.33367433]),
            ([0.1], [ 1.49407907, -0.20515826,  0.3130677 , -0.85409574]),
            ([0.1], [-2.55298982,  0.6536186 ,  0.8644362 , -0.74216502]),
            ([0.1], [ 2.26975462, -1.45436567,  0.04575852, -0.18718385]),
            ([0.1], [ 1.53277921,  1.46935877,  0.15494743,  0.37816252]),
            ([0.1], [-0.88778575, -1.98079647, -0.34791215,  0.15634897]),
            ([0.1], [ 1.23029068,  1.20237985, -0.38732682, -0.30230275])],
           dtype=[('weights', '<f8', (1,)), ('integration', '<f8', (4,))]),
 rec.array([([0.16666667], [ 1.76405235,  0.40015721,  0.97873798,  2.2408932 ]),
            ([0.16666667], [ 1.86755799, -0.97727788,  0.95008842, -0.15135721]),
            ([0.16666667], [-0.10321885,  0.4105985 ,  0.14404357,  1.45427351]),
            ([0.16666667], [ 0.76103773,  0.12167502,  0.44386323,  0.33367433]),
            ([0.16666667], [ 1.49407907, -0.20515826,  0.3130677 , -0.85409574]),
            ([0.16666667], [-2.55298982,  0.6536186 ,  0.8644362 , -0.74216502])],
           dtype=[('weights', '<f8', (1,)), ('integration', '<f8', (4,))]),
 rec.array([([0.05882353], [ 1.76405235,  0.40015721,  0.97873798,  2.2408932 ]),
            ([0.05882353], [ 1.86755799, -0.97727788,  0.95008842, -0.15135721]),
            ([0.05882353], [-0.10321885,  0.4105985 ,  0.14404357,  1.45427351]),
            ([0.05882353], [ 0.76103773,  0.12167502,  0.44386323,  0.33367433]),
            ([0.05882353], [ 1.49407907, -0.20515826,  0.3130677 , -0.85409574]),
            ([0.05882353], [-2.55298982,  0.6536186 ,  0.8644362 , -0.74216502]),
            ([0.05882353], [ 2.26975462, -1.45436567,  0.04575852, -0.18718385]),
            ([0.05882353], [ 1.53277921,  1.46935877,  0.15494743,  0.37816252]),
            ([0.05882353], [-0.88778575, -1.98079647, -0.34791215,  0.15634897]),
            ([0.05882353], [ 1.23029068,  1.20237985, -0.38732682, -0.30230275]),
            ([0.05882353], [-1.04855297, -1.42001794, -1.70627019,  1.9507754 ]),
            ([0.05882353], [-0.50965218, -0.4380743 , -1.25279536,  0.77749036]),
            ([0.05882353], [-1.61389785, -0.21274028, -0.89546656,  0.3869025 ]),
            ([0.05882353], [-0.51080514, -1.18063218, -0.02818223,  0.42833187]),
            ([0.05882353], [ 0.06651722,  0.3024719 , -0.63432209, -0.36274117]),
            ([0.05882353], [-0.67246045, -0.35955316, -0.81314628, -1.7262826 ]),
            ([0.05882353], [ 0.17742614, -0.40178094, -1.63019835,  0.46278226])]],
           dtype=[('weights', '<f8', (1,)), ('integration', '<f8', (4,))])]

结果应该是一个五列的数据框,如下所示:

<表类=“s-表”> <标题> 权重 v_1 v_2 v_3 v_4 <正文> 0.2 1.76405235 0.40015721 0.97873798 2.2408932 0.2 1.86755799 -0.97727788 0.95008842 -0.15135721 .... .... ... ... ... 0.05882353 0.17742614 -0.40178094 -1.63019835 0.46278226

等等.. 然而,正如我所做的pd.DataFrame(my_list) ,生成的数据框大约有 90 列,而不是上面的 5 列。每列代表 [a], [w, x, y, z] 形式的数组的子列表。生成的数据帧应为:5 列,行数等于 32(对于上面的示例)。

最佳答案

我假设你的 recarray存储在名为 data 的变量中。您可以使用 pd.DataFrame 将数组转换为数据帧和 pd.concat 。然后你可以使用pandas.DataFrame.pop删除列表数组和 pandas.DataFrame.explode将包含列表的列转换为多列中的数据。

读取数据

df = pd.DataFrame()
for record in data:
    temp_df = pd.DataFrame(record.tolist())
    df = pd.concat([df, temp_df])

预处理和解析数据

df[['v_1', 'v_2', 'v_3', 'v_4']] = pd.DataFrame(df[1].tolist(), index= df.index)
df['weights'] = df.pop(0).explode()
df.pop(1)

输出:

这给了我们预期的输出:

         v_1       v_2       v_3       v_4   weights
0   1.764052  0.400157  0.978738  2.240893       0.2
1   1.867558 -0.977278  0.950088 -0.151357       0.2
2  -0.103219  0.410598  0.144044  1.454274       0.2
3   0.761038  0.121675  0.443863  0.333674       0.2
4   1.494079 -0.205158  0.313068 -0.854096       0.2
5   1.764052  0.400157  0.978738  2.240893       0.1
6   1.867558 -0.977278  0.950088 -0.151357       0.1
7  -0.103219  0.410598  0.144044  1.454274       0.1
8   0.761038  0.121675  0.443863  0.333674       0.1
9   1.494079 -0.205158  0.313068 -0.854096       0.1
10 -2.552990  0.653619  0.864436 -0.742165       0.1
11  2.269755 -1.454366  0.045759 -0.187184       0.1
12  1.532779  1.469359  0.154947  0.378163       0.1
13 -0.887786 -1.980796 -0.347912  0.156349       0.1
14  1.230291  1.202380 -0.387327 -0.302303       0.1
15  1.764052  0.400157  0.978738  2.240893  0.166667
16  1.867558 -0.977278  0.950088 -0.151357  0.166667
17 -0.103219  0.410598  0.144044  1.454274  0.166667
18  0.761038  0.121675  0.443863  0.333674  0.166667
19  1.494079 -0.205158  0.313068 -0.854096  0.166667
20 -2.552990  0.653619  0.864436 -0.742165  0.166667
21  1.764052  0.400157  0.978738  2.240893  0.058824
22  1.867558 -0.977278  0.950088 -0.151357  0.058824
23 -0.103219  0.410598  0.144044  1.454274  0.058824
24  0.761038  0.121675  0.443863  0.333674  0.058824
25  1.494079 -0.205158  0.313068 -0.854096  0.058824
26 -2.552990  0.653619  0.864436 -0.742165  0.058824
27  2.269755 -1.454366  0.045759 -0.187184  0.058824
28  1.532779  1.469359  0.154947  0.378163  0.058824
29 -0.887786 -1.980796 -0.347912  0.156349  0.058824
30  1.230291  1.202380 -0.387327 -0.302303  0.058824
31 -1.048553 -1.420018 -1.706270  1.950775  0.058824
32 -0.509652 -0.438074 -1.252795  0.777490  0.058824
33 -1.613898 -0.212740 -0.895467  0.386902  0.058824
34 -0.510805 -1.180632 -0.028182  0.428332  0.058824
35  0.066517  0.302472 -0.634322 -0.362741  0.058824
36 -0.672460 -0.359553 -0.813146 -1.726283  0.058824
37  0.177426 -0.401781 -1.630198  0.462782  0.058824

或者

使用 np.hstack 可以完成同样的事情同样,其中 data 是您的重新排列列表。

df = pd.DataFrame(np.hstack(data).tolist())
df['weights'] = df[0].explode()
df[['v_1', 'v_2', 'v_3', 'v_4']] = pd.DataFrame(df[1].tolist())
df.drop([0, 1], inplace=True, axis=1)

输出

这给了我们相同的输出

     weights       v_1       v_2       v_3       v_4
0        0.2  1.764052  0.400157  0.978738  2.240893
1        0.2  1.867558 -0.977278  0.950088 -0.151357
2        0.2 -0.103219  0.410598  0.144044  1.454274
3        0.2  0.761038  0.121675  0.443863  0.333674
4        0.2  1.494079 -0.205158  0.313068 -0.854096
5        0.1  1.764052  0.400157  0.978738  2.240893
6        0.1  1.867558 -0.977278  0.950088 -0.151357
7        0.1 -0.103219  0.410598  0.144044  1.454274
8        0.1  0.761038  0.121675  0.443863  0.333674
9        0.1  1.494079 -0.205158  0.313068 -0.854096
10       0.1 -2.552990  0.653619  0.864436 -0.742165
11       0.1  2.269755 -1.454366  0.045759 -0.187184
12       0.1  1.532779  1.469359  0.154947  0.378163
13       0.1 -0.887786 -1.980796 -0.347912  0.156349
14       0.1  1.230291  1.202380 -0.387327 -0.302303
15  0.166667  1.764052  0.400157  0.978738  2.240893
16  0.166667  1.867558 -0.977278  0.950088 -0.151357
17  0.166667 -0.103219  0.410598  0.144044  1.454274
18  0.166667  0.761038  0.121675  0.443863  0.333674
19  0.166667  1.494079 -0.205158  0.313068 -0.854096
20  0.166667 -2.552990  0.653619  0.864436 -0.742165
21  0.058824  1.764052  0.400157  0.978738  2.240893
22  0.058824  1.867558 -0.977278  0.950088 -0.151357
23  0.058824 -0.103219  0.410598  0.144044  1.454274
24  0.058824  0.761038  0.121675  0.443863  0.333674
25  0.058824  1.494079 -0.205158  0.313068 -0.854096
26  0.058824 -2.552990  0.653619  0.864436 -0.742165
27  0.058824  2.269755 -1.454366  0.045759 -0.187184
28  0.058824  1.532779  1.469359  0.154947  0.378163
29  0.058824 -0.887786 -1.980796 -0.347912  0.156349
30  0.058824  1.230291  1.202380 -0.387327 -0.302303
31  0.058824 -1.048553 -1.420018 -1.706270  1.950775
32  0.058824 -0.509652 -0.438074 -1.252795  0.777490
33  0.058824 -1.613898 -0.212740 -0.895467  0.386902
34  0.058824 -0.510805 -1.180632 -0.028182  0.428332
35  0.058824  0.066517  0.302472 -0.634322 -0.362741
36  0.058824 -0.672460 -0.359553 -0.813146 -1.726283
37  0.058824  0.177426 -0.401781 -1.630198  0.462782

关于python - 将 rec.array 转换为数据帧,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/73112100/

相关文章:

python - 嵌套集模型或其他表示层次结构的模型?

python - 从 API 访问 JSON 数据

python - 将字典从特定索引号附加到 pandas 数据帧

java - 如何将短数组转换为字节数组

java - 有效地从数组中获取落在某个范围内的元素

python - 将数据拆分为特征和标签后,标签列形状不一致

python - pandas diff() 为一阶差分给出 0 值,我想要实际值

python - Django 中的 db_type (尝试安装 pootle w/mysql)

javascript - 如何动态访问递增的json值

python - DataFrame 的给定列非零的行数