我有以下 pandas 数据框:
+---+-----+-----+------+------+------+------+
| | A | B | C_10 | C_20 | D_10 | D_20 |
+---+-----+-----+------+------+------+------+
| 1 | 0.1 | 0.2 | 1 | 2 | 3 | 4 |
| 2 | 0.3 | 0.4 | 5 | 6 | 7 | 8 |
+---+-----+-----+------+------+------+------+
现在我想融合列 C_10
、C_20
、D_10
、D_20
以获得数据框如下:
+---+-----+-----+----+---+---+
| | A | B | N | C | D |
+---+-----+-----+----+---+---+
| 1 | 0.1 | 0.2 | 10 | 1 | 3 |
| 1 | 0.1 | 0.2 | 20 | 2 | 4 |
| 2 | 0.3 | 0.4 | 10 | 5 | 7 |
| 2 | 0.3 | 0.4 | 20 | 6 | 8 |
+---+-----+-----+----+---+---+
有没有简单的方法可以做到这一点?谢谢!
编辑:我尝试了wide_to_long
,但如果数据框中存在重复的行,则此方法不起作用:
df = pd.DataFrame({
'combination': [1, 1, 2, 2],
'A': [0.1, 0.1, 0.2, 0.2],
'B': [0.3, 0.3, 0.4, 0.4],
'C_10': [1, 5, 6, 7],
'C_20': [2, 6, 7, 8],
'D_10': [3, 7, 8, 9],
'D_20': [4, 8, 9, 10],
})
+--------------------------------------------------+
| combination A B C_10 C_20 D_10 D_20 |
+--------------------------------------------------+
| 0 1 0.1 0.3 1 2 3 4 |
| 1 1 0.1 0.3 5 6 7 8 |
| 2 2 0.2 0.4 6 7 8 9 |
| 3 2 0.2 0.4 7 8 9 10 |
+--------------------------------------------------+
如果我使用wide_to_long
,我会收到以下错误:
> pd.wide_to_long(df, stubnames=['C','D'], i=['combination', 'A', 'B'], j='N', sep='_').reset_index()
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-31-cc5863fa7ecc> in <module>
----> 1 pd.wide_to_long(df, stubnames=['C','D'], i=['combination', 'A', 'B'], j='N', sep='_').reset_index()
pandas/core/reshape/melt.py in wide_to_long(df, stubnames, i, j, sep, suffix)
456
457 if df[i].duplicated().any():
--> 458 raise ValueError("the id variables need to uniquely identify each row")
459
460 value_vars = [get_var_names(df, stub, sep, suffix) for stub in stubnames]
ValueError: the id variables need to uniquely identify each row
参数i
被描述为“用作id变量的列。”,但我不明白这到底意味着什么。
最佳答案
使用wide_to_long
:
df = pd.wide_to_long(df, stubnames=['C','D'], i=['A','B'], j='N', sep='_').reset_index()
print (df)
A B N C D
0 0.1 0.2 10 1 3
1 0.1 0.2 20 2 4
2 0.3 0.4 10 5 7
3 0.3 0.4 20 6 8
编辑:如果A、B
列的可能组合不唯一,则可以创建辅助列,并将索引转换为列index
,应用解决方案并最后删除级别索引
:
df = (pd.wide_to_long(df.reset_index(),
stubnames=['C','D'],
i=['index','A','B'],
j='N',
sep='_')
.reset_index(level=0, drop=True)
.reset_index())
print (df)
A B N combination C D
0 0.1 0.3 10 1 1 3
1 0.1 0.3 20 1 2 4
2 0.1 0.3 10 1 5 7
3 0.1 0.3 20 1 6 8
4 0.2 0.4 10 2 6 8
5 0.2 0.4 20 2 7 9
6 0.2 0.4 10 2 7 9
7 0.2 0.4 20 2 8 10
关于python - pandas:融化具有相同索引的多个列,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/62213436/