python - pandas:融化具有相同索引的多个列

我有以下 pandas 数据框:

+---+-----+-----+------+------+------+------+
|   |  A  |  B  | C_10 | C_20 | D_10 | D_20 |
+---+-----+-----+------+------+------+------+
| 1 | 0.1 | 0.2 |    1 |    2 |    3 |    4 |
| 2 | 0.3 | 0.4 |    5 |    6 |    7 |    8 |
+---+-----+-----+------+------+------+------+

现在我想融合列 C_10、C_20、D_10、D_20 以获得数据框如下:

+---+-----+-----+----+---+---+
|   |  A  |  B  | N  | C | D |
+---+-----+-----+----+---+---+
| 1 | 0.1 | 0.2 | 10 | 1 | 3 |
| 1 | 0.1 | 0.2 | 20 | 2 | 4 |
| 2 | 0.3 | 0.4 | 10 | 5 | 7 |
| 2 | 0.3 | 0.4 | 20 | 6 | 8 |
+---+-----+-----+----+---+---+

有没有简单的方法可以做到这一点？谢谢!

编辑:我尝试了wide_to_long，但如果数据框中存在重复的行，则此方法不起作用:

df = pd.DataFrame({
    'combination': [1, 1, 2, 2],
    'A': [0.1, 0.1, 0.2, 0.2],
    'B': [0.3, 0.3, 0.4, 0.4],
    'C_10': [1, 5, 6, 7],
    'C_20': [2, 6, 7, 8],
    'D_10': [3, 7, 8, 9],
    'D_20': [4, 8, 9, 10],
})

+--------------------------------------------------+
|    combination    A    B  C_10  C_20  D_10  D_20 |
+--------------------------------------------------+
| 0            1  0.1  0.3     1     2     3     4 |
| 1            1  0.1  0.3     5     6     7     8 |
| 2            2  0.2  0.4     6     7     8     9 |
| 3            2  0.2  0.4     7     8     9    10 |
+--------------------------------------------------+

如果我使用wide_to_long，我会收到以下错误:

> pd.wide_to_long(df, stubnames=['C','D'], i=['combination', 'A', 'B'], j='N', sep='_').reset_index()


---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-31-cc5863fa7ecc> in <module>
----> 1 pd.wide_to_long(df, stubnames=['C','D'], i=['combination', 'A', 'B'], j='N', sep='_').reset_index()

pandas/core/reshape/melt.py in wide_to_long(df, stubnames, i, j, sep, suffix)
    456 
    457     if df[i].duplicated().any():
--> 458         raise ValueError("the id variables need to uniquely identify each row")
    459 
    460     value_vars = [get_var_names(df, stub, sep, suffix) for stub in stubnames]

ValueError: the id variables need to uniquely identify each row

参数i被描述为“用作id变量的列。”，但我不明白这到底意味着什么。

最佳答案

使用wide_to_long :

df = pd.wide_to_long(df, stubnames=['C','D'], i=['A','B'], j='N', sep='_').reset_index()
print (df)
     A    B   N  C  D
0  0.1  0.2  10  1  3
1  0.1  0.2  20  2  4
2  0.3  0.4  10  5  7
3  0.3  0.4  20  6  8

编辑:如果A、B列的可能组合不唯一，则可以创建辅助列，并将索引转换为列index，应用解决方案并最后删除级别索引:

df = (pd.wide_to_long(df.reset_index(), 
                      stubnames=['C','D'],
                      i=['index','A','B'], 
                      j='N', 
                      sep='_')
        .reset_index(level=0, drop=True)
        .reset_index())
print (df)

     A    B   N  combination  C   D
0  0.1  0.3  10            1  1   3
1  0.1  0.3  20            1  2   4
2  0.1  0.3  10            1  5   7
3  0.1  0.3  20            1  6   8
4  0.2  0.4  10            2  6   8
5  0.2  0.4  20            2  7   9
6  0.2  0.4  10            2  7   9
7  0.2  0.4  20            2  8  10

关于python - pandas:融化具有相同索引的多个列，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/62213436/

python - pandas:融化具有相同索引的多个列

上一篇：javascript - 将 MergeMap 与从另一个可观察到的数据数组一起使用 - RxJs Angular

下一篇：python - Django迁移ForeignKey到IntegerField不丢数据的迁移策略