我在 pandas 中有一个数据框,其中包含我想根据其 id('square')将其分类的信息。我想获得每个组的平均亮度,并根据这个平均亮度,我想将数据帧分为 4 个类别,并获得 4 个输出数据帧。
示例数据框:
squares = pd.DataFrame({'square': {0: 1.0, 1: 1.0, 2: 2.0, 3: 2.0, 4: 5.0, 5: 6.0, 6: 7.0, 7: 8.0},
'time': {0: 1.0, 1: 2.0, 2: 1.0, 3: 2.0, 4: 3.0, 5: 3.0, 6: 4.0, 7: 5.0 },
'x': {0: 243, 1: 293, 2: 189, 3: 189, 4: 176, 5: 374, 6: 111, 7: 239},
'y': {0: 233, 1: 436, 2: 230, 3: 233, 4: 203, 5: 394, 6: 171, 7: 284},
'brightness': {0: 1000, 1: 1200, 2: 4000, 3: 5000, 4: 2000, 5: 8000, 6: 1300, 7: 4300 }})
squares = squares.set_index('time')
squares
brightness square x y
time
1.0 1000 1.0 243 233
2.0 1200 1.0 293 436
1.0 4000 2.0 189 230
2.0 5000 2.0 189 233
3.0 2000 5.0 176 203
3.0 6000 6.0 374 394
4.0 1300 7.0 111 171
5.0 4300 8.0 239 284
期望的最终结果:
squares_1
brightness square x y
time
1.0 1000 1.0 243 233
2.0 1200 1.0 293 436
3.0 2000 5.0 176 203
4.0 1300 7.0 111 171
squares_2
NaN
squares_3
brightness square x y
time
1.0 4000 2.0 189 230
2.0 5000 2.0 189 233
5.0 4300 8.0 239 284
squares_4
brightness square x y
time
3.0 6000 6.0 374 394
我从以下开始:
import pandas as pd
import numpy as np
from matplotlib import pyplot as plt
avg = squares.groupby('square')['brightness'].mean()
n, bins, patches = plt.hist(avg, bins = 4)
inds = np.digitize(avg, bins)
我不太确定如何继续。任何帮助表示赞赏!
最佳答案
您可以使用GroupBy.transform
对于新的 Series
的 mean
其大小与原始 DataFrame
相同,然后按 cut
进行分箱最后创建 DataFrame 字典:
squares = squares.set_index('time')
labs = [f'squares_{x+1}' for x in range(4)]
g = pd.cut(squares.groupby('square')['brightness'].transform('mean'), bins=4, labels=labs)
print (g)
time
1.0 squares_1
2.0 squares_1
1.0 squares_2
2.0 squares_2
3.0 squares_1
3.0 squares_4
4.0 squares_1
5.0 squares_2
Name: brightness, dtype: category
Categories (4, object): [squares_1 < squares_2 < squares_3 < squares_4]
dfs = dict(tuple(squares.groupby(g)))
<小时/>
print (dfs)
{'squares_1': square x y brightness
time
1.0 1.0 243 233 1000
2.0 1.0 293 436 1200
3.0 5.0 176 203 2000
4.0 7.0 111 171 1300, 'squares_2': square x y brightness
time
1.0 2.0 189 230 4000
2.0 2.0 189 233 5000
5.0 8.0 239 284 4300, 'squares_3': Empty DataFrame
Columns: [square, x, y, brightness]
Index: [], 'squares_4': square x y brightness
time
3.0 6.0 374 394 8000}
<小时/>
print (dfs['squares_1'])
square x y brightness
time
1.0 1.0 243 233 1000
2.0 1.0 293 436 1200
3.0 5.0 176 203 2000
4.0 7.0 111 171 1300
print (dfs['squares_2'])
square x y brightness
time
1.0 2.0 189 230 4000
2.0 2.0 189 233 5000
5.0 8.0 239 284 4300
print (dfs['squares_3'])
Empty DataFrame
Columns: [square, x, y, brightness]
Index: []
print (dfs['squares_4'])
square x y brightness
time
3.0 6.0 374 394 8000
关于python - 基于 groupby 和 binning 将数据帧拆分为多个数据帧,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/52423654/