python - Pandas 在多个层面上融化

我有一个 df，其中包含零件号、年份和每月消耗数量，如下所示，其中 01、02 和 03 中的值是每年 1 月至 3 月的数量。

d = {'PN': [10506,10506,10507,10507],
 'Year': [2017, 2018, 2017, 2018],
 '01': [1,2,3,4],
 '02': [5,6,7,8],
 '03': [9,10,11,12]}
indata = pd.DataFrame(data = d)

我想通过将年和月组合为 YYYYMM 格式来将其重组为长格式，并按如下所示每行包含零件号、年月和数量。

dd = {'PN': [10506,10506,10506,10506,10506,10506,10507,10507,10507,10507,10507,10507],
  'YearMonth': [201701,201702,201703,201801,201802,201803,201701,201702,201703,201801,201802,201803],
  'Qty': [1,5,9,2,6,10,3,7,11,4,8,12]}
outdata = pd.DataFrame(data = dd)

由于我使用 pd.melt 失败，我尝试使用三重 for 循环，如下所示。

parts = pd.Series(indata['PN']).unique()
years = pd.Series(indata['Year']).unique()
months = ['01', '02', '03']

df = pd.DataFrame(columns = ['PN', 'YearMonth', 'Qty'])

for p in parts:
    for y in years:
        for m in months:
            yearmonth = str(y*100+int(m))
            qty = indata.loc[(indata['PN'] == p) & (indata['Year'] == y), m].iloc[0]
            row = [p, yearmonth, qty]
            df = df.append(row)
outdata = df

这看起来效率很低，我的追加函数不会在每个循环中添加一行，而是在新列中添加三行。

有什么建议吗？

最佳答案

使用melt首先进行 reshape ，然后按 assign 创建新列 YearMonth ，删除不必要的列和最后一个 sort_values :

df = (indata.melt(id_vars=['PN','Year'], var_name='v', value_name='Qty')
            .assign(YearMonth=lambda x: x['Year'].astype(str) + x['v'])
            .drop(['v', 'Year'], axis=1)
            .sort_values(['PN','YearMonth']))

print (df)
       PN  Qty YearMonth
0   10506    1    201701
4   10506    5    201702
8   10506    9    201703
1   10506    2    201801
5   10506    6    201802
9   10506   10    201803
2   10507    3    201701
6   10507    7    201702
10  10507   11    201703
3   10507    4    201801
7   10507    8    201802
11  10507   12    201803

关于python - Pandas 在多个层面上融化，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/49983345/

python - Pandas 在多个层面上融化

上一篇：python - 尝试在 dict 上调用方法，得到 AttributeError : 'dict' object attribute 'update' is read-only

下一篇：python - 从 NumPy 或 PyTorch 中的矩阵获取对角线 "stripe"