我的数据框如下所示:
Subject Score
1 15
2 0
3 18
2 30
3 17
1 5
4 9
2 7
1 20
1 8
2 9
1 12
我想获取按主题分组的每条记录的前 3 个分数,作为新列中的列表,如下所示:
Subject Score Previous
1 15 []
2 0 []
3 18 []
2 30 [0]
3 17 [18]
1 5 [15]
4 9 []
2 7 [30,0]
1 20 [5,15]
1 8 [20,5,15]
2 9 [7,30,0]
1 12 [8,20,5]
下面的代码滚动所有未按主题分组的记录
df['Previous']= [x.values.tolist()[:-1] for x in points['Score'].rolling(4)]
如何获得上述预期结果?
最佳答案
由于滚动仅支持数值的生成,因此必须解决这个问题。
尝试sort_values
先然后groupby rolling
在 window + 1 上并去掉最后一个元素:
window = 3
df = df.sort_values('Subject')
df['Previous'] = [
x.agg(list)[:-1] for x in df.groupby('Subject')['Score'].rolling(window + 1)
]
Subject Score Previous
0 1 15 []
5 1 5 [15]
8 1 20 [15, 5]
9 1 8 [15, 5, 20]
11 1 12 [5, 20, 8]
1 2 0 []
3 2 30 [0]
7 2 7 [0, 30]
10 2 9 [0, 30, 7]
2 3 18 []
4 3 17 [18]
6 4 9 []
然后sort_index
恢复初始顺序:
df = df.sort_index()
Subject Score Previous
0 1 15 []
1 2 0 []
2 3 18 []
3 2 30 [0]
4 3 17 [18]
5 1 5 [15]
6 4 9 []
7 2 7 [0, 30]
8 1 20 [15, 5]
9 1 8 [15, 5, 20]
10 2 9 [0, 30, 7]
11 1 12 [5, 20, 8]
(可选使用扩展切片来反转列表并按照与上面预期输出相同的顺序获取元素):
window = 3
df = df.sort_values('Subject')
df['Previous'] = [x.agg(list)[-2::-1]
for x in df.groupby('Subject')['Score'].rolling(window + 1)]
df = df.sort_index()
df
:
Subject Score Previous
0 1 15 []
1 2 0 []
2 3 18 []
3 2 30 [0]
4 3 17 [18]
5 1 5 [15]
6 4 9 []
7 2 7 [30, 0]
8 1 20 [5, 15]
9 1 8 [20, 5, 15]
10 2 9 [7, 30, 0]
11 1 12 [8, 20, 5]
完整的工作示例:
import pandas as pd
df = pd.DataFrame({
'Subject': [1, 2, 3, 2, 3, 1, 4, 2, 1, 1, 2, 1],
'Score': [15, 0, 18, 30, 17, 5, 9, 7, 20, 8, 9, 12]
})
window = 3
df = df.sort_values('Subject')
df['Previous'] = [
x.agg(list)[:-1] for x in df.groupby('Subject')['Score'].rolling(window + 1)
]
df = df.sort_index()
print(df)
关于python - 如何在DataFrame中有条件地获取列的前n个值的列表?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/67870323/