python - 使用新日期索引和带有标题子字符串的新列创建 Pandas DataFrame?

标签 python pandas dataframe

我想将下面的 DataFrame df 转换为另一个:

import pandas as pd
data = {
    'dates':['01/01/2018','02/01/2018','03/01/2018','04/01/2018','05/01/2018'],
    'A X':[1,1,2,1,1],
    'A Y':[1,1,3,1,1],
    'A Z':[1,1,4,1,1],
    'B X':[2,2,3,2,2],
    'B Y':[2,2,4,2,2],
    'C X':[3,3,4,3,3]
       }
df = pd.DataFrame(data, columns=['dates','A X','A Y','A Z','B X','B Y','C X'])

所需的数据帧:

dates   fields  A   B   C
01/01/2018  X   1   2   3
02/01/2018  X   1   2   3
03/01/2018  X   2   3   4
04/01/2018  X   1   2   3
05/01/2018  X   1   2   3
01/01/2018  Y   1   2   nan
02/01/2018  Y   1   2   nan
03/01/2018  Y   3   4   nan
04/01/2018  Y   1   2   nan
05/01/2018  Y   1   2   nan
01/01/2018  Z   1   nan nan
02/01/2018  Z   1   nan nan
03/01/2018  Z   4   nan nan
04/01/2018  Z   1   nan nan
05/01/2018  Z   1   nan nan

日期被设置为新的索引值,插入了一个名为“fields”的新列,其中包含从 df 的列标题中提取的字符串“X”、“Y”、“Z”。我怎样才能做到这一点? ( Pandas v0.22)

最佳答案

用途:


df = df.set_index('dates')
df.columns = df.columns.str.split(expand=True)
df = df.stack().reset_index().rename(columns={'level_1':'fields'}).sort_values('fields')
print (df)

         dates fields  A    B    C
0   01/01/2018      X  1  2.0  3.0
3   02/01/2018      X  1  2.0  3.0
6   03/01/2018      X  2  3.0  4.0
9   04/01/2018      X  1  2.0  3.0
12  05/01/2018      X  1  2.0  3.0
1   01/01/2018      Y  1  2.0  NaN
4   02/01/2018      Y  1  2.0  NaN
7   03/01/2018      Y  3  4.0  NaN
10  04/01/2018      Y  1  2.0  NaN
13  05/01/2018      Y  1  2.0  NaN
2   01/01/2018      Z  1  NaN  NaN
5   02/01/2018      Z  1  NaN  NaN
8   03/01/2018      Z  4  NaN  NaN
11  04/01/2018      Z  1  NaN  NaN
14  05/01/2018      Z  1  NaN  NaN

感谢@Paul H 的改进回答:

df = (df.set_index('dates')
       .rename(columns=lambda c: tuple(c.split()))
       .stack()
       .rename_axis(('dates','fields'))
       .sort_index(level='fields')
       .reset_index()
       )
print (df)


         dates fields  A    B    C
0   01/01/2018      X  1  2.0  3.0
1   02/01/2018      X  1  2.0  3.0
2   03/01/2018      X  2  3.0  4.0
3   04/01/2018      X  1  2.0  3.0
4   05/01/2018      X  1  2.0  3.0
5   01/01/2018      Y  1  2.0  NaN
6   02/01/2018      Y  1  2.0  NaN
7   03/01/2018      Y  3  4.0  NaN
8   04/01/2018      Y  1  2.0  NaN
9   05/01/2018      Y  1  2.0  NaN
10  01/01/2018      Z  1  NaN  NaN
11  02/01/2018      Z  1  NaN  NaN
12  03/01/2018      Z  4  NaN  NaN
13  04/01/2018      Z  1  NaN  NaN
14  05/01/2018      Z  1  NaN  NaN

关于python - 使用新日期索引和带有标题子字符串的新列创建 Pandas DataFrame?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/48517404/

相关文章:

python - pytest-monkeypatch 装饰器(不使用模拟/补丁)

python - Pandas/SQLalchemy 合并数据框和表

python - Pandas:仅当时间戳大于另一列的时间戳时才获取列的累计和

python - 如何将 pandas 数据透视表转换为常规数据框

python - 如何减去多索引数据框中的列?

python - 在 Python 的 for 循环中使用 pd.get_dummies 创建虚拟变量

Python:如何在 Python 中处理预期的可读缓冲区对象

python - 如何在 Django REST Framework 中对图片上传进行单元测试

python - 在 while 语句中比较两个日期时程序空闲

python - 如何从多个数据框列中制作单独的列表?