我有一个 pandas 数据框:
df = pd.DataFrame({'start': [50, 100, 50000, 50030, 100000],
'end': [51, 101, 50001, 50031, 100001],
'value': [1, 2, 3, 4, 5]},
index=['id1', 'id2', 'id3', 'id4', 'id5'])
>>> df
start end value
id1 50 51 1
id2 100 101 2
id3 50000 50001 3
id4 50030 50031 4
id5 100000 100001 5
现在我想提取“start”列中150大小范围内的所有行组。输出应如下所示:
group group_start group_end min_val max_value id_count
1 50 101 1 2 2
2 50000 50031 3 4 2
3 100000 100001 5 5 1
如何提取这些组?
最佳答案
用途:
start = df['start'].iloc[0]
g = 0
gs = []
for val in df['start']:
if val-start<150:
gs.append(g)
else:
g+=1
start = val
gs.append(g)
df['g'] = gs
df.groupby('g').agg(group_start = ('start', 'first'), group_end = ('end', 'last'), min_val = ('value', 'min'), max_value = ('value', 'max'), id_count = ('value', 'count'))
输出:
基于评论:
df.groupby('g').agg(group_start = ('start', 'first'), group_end = ('end', 'last'), min_val = ('value', 'min'), max_value = ('value', 'max'), id_count = ('value', 'idxmax'))
关于python - pandas 获取某个大小范围内的行子集,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/72010673/