输入数据框格式如下:
df1 = pd.DataFrame(
{
"A": ["A0", "A0", "A0", "A3"],
"B": ["B0", "B1", "B2", "B3"],
"C": ["C0", "C1", "C2", "C3"],
"D": ["D0", "D1", "D2", "D3"],
},
index=[0, 1, 2, 3])
我想以某种方式转换此表,如果列 A 中有多行具有相似值,则多余的行将成为新列。预期的表格是:
df2 = pd.DataFrame(
{
"A": ["A0", "A3"],
"B": ["B0", "B3"],
"C": ["C0", "C3"],
"D": ["D0", "D3"],
"new_B": ["B1", "NaN"],
"new_C": ["C1", "NaN"],
"new_D": ["D1", "NaN"],
"new_B_2": ["B2", "NaN"],
"new_C_2": ["C2", "NaN"],
"new_D_2": ["D2","NaN"],
},
index=[0, 1])
最佳答案
您可以使用 .cumcount
为每列 A
组创建顺序计数器,然后将此计数器与列 A
一起设置为 MultiIndex,然后是 .stack
+ .unstack
reshape
,最后使用列表推导展平列:
df2 = df1.set_index([df1.groupby('A').cumcount(), 'A']).stack().unstack([-1, 0])
df2.columns = [x if y == 0 else f'{x}_{y}' for x, y in df2]
如果列的顺序和格式无关紧要,则上述代码的简化版本:
df2 = df1.set_index([df1.groupby('A').cumcount().astype(str), 'A']).unstack(0)
df2.columns = df2.columns.map('_'.join)
B C D B_1 C_1 D_1 B_2 C_2 D_2
A
A0 B0 C0 D0 B1 C1 D1 B2 C2 D2
A3 B3 C3 D3 NaN NaN NaN NaN NaN NaN
关于python - 如何使用 Pandas 将某些行视为新列,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/65741681/