我想将下面的 DataFrame df 转换为另一个:
import pandas as pd
data = {
'dates':['01/01/2018','02/01/2018','03/01/2018','04/01/2018','05/01/2018'],
'A X':[1,1,2,1,1],
'A Y':[1,1,3,1,1],
'A Z':[1,1,4,1,1],
'B X':[2,2,3,2,2],
'B Y':[2,2,4,2,2],
'C X':[3,3,4,3,3]
}
df = pd.DataFrame(data, columns=['dates','A X','A Y','A Z','B X','B Y','C X'])
所需的数据帧:
dates fields A B C
01/01/2018 X 1 2 3
02/01/2018 X 1 2 3
03/01/2018 X 2 3 4
04/01/2018 X 1 2 3
05/01/2018 X 1 2 3
01/01/2018 Y 1 2 nan
02/01/2018 Y 1 2 nan
03/01/2018 Y 3 4 nan
04/01/2018 Y 1 2 nan
05/01/2018 Y 1 2 nan
01/01/2018 Z 1 nan nan
02/01/2018 Z 1 nan nan
03/01/2018 Z 4 nan nan
04/01/2018 Z 1 nan nan
05/01/2018 Z 1 nan nan
日期被设置为新的索引值,插入了一个名为“fields”的新列,其中包含从 df 的列标题中提取的字符串“X”、“Y”、“Z”。我怎样才能做到这一点? ( Pandas v0.22)
最佳答案
用途:
-
set_index
对于仅带有空格的列 -
split
MultiIndex
的列 - reshape
stack
-
reset_index
对于index
中的列
重命名
列- 按列
字段
对 DataFrame 进行排序sort_values
df = df.set_index('dates')
df.columns = df.columns.str.split(expand=True)
df = df.stack().reset_index().rename(columns={'level_1':'fields'}).sort_values('fields')
print (df)
dates fields A B C
0 01/01/2018 X 1 2.0 3.0
3 02/01/2018 X 1 2.0 3.0
6 03/01/2018 X 2 3.0 4.0
9 04/01/2018 X 1 2.0 3.0
12 05/01/2018 X 1 2.0 3.0
1 01/01/2018 Y 1 2.0 NaN
4 02/01/2018 Y 1 2.0 NaN
7 03/01/2018 Y 3 4.0 NaN
10 04/01/2018 Y 1 2.0 NaN
13 05/01/2018 Y 1 2.0 NaN
2 01/01/2018 Z 1 NaN NaN
5 02/01/2018 Z 1 NaN NaN
8 03/01/2018 Z 4 NaN NaN
11 04/01/2018 Z 1 NaN NaN
14 05/01/2018 Z 1 NaN NaN
感谢@Paul H 的改进回答:
df = (df.set_index('dates')
.rename(columns=lambda c: tuple(c.split()))
.stack()
.rename_axis(('dates','fields'))
.sort_index(level='fields')
.reset_index()
)
print (df)
dates fields A B C
0 01/01/2018 X 1 2.0 3.0
1 02/01/2018 X 1 2.0 3.0
2 03/01/2018 X 2 3.0 4.0
3 04/01/2018 X 1 2.0 3.0
4 05/01/2018 X 1 2.0 3.0
5 01/01/2018 Y 1 2.0 NaN
6 02/01/2018 Y 1 2.0 NaN
7 03/01/2018 Y 3 4.0 NaN
8 04/01/2018 Y 1 2.0 NaN
9 05/01/2018 Y 1 2.0 NaN
10 01/01/2018 Z 1 NaN NaN
11 02/01/2018 Z 1 NaN NaN
12 03/01/2018 Z 4 NaN NaN
13 04/01/2018 Z 1 NaN NaN
14 05/01/2018 Z 1 NaN NaN
关于python - 使用新日期索引和带有标题子字符串的新列创建 Pandas DataFrame?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/48517404/