python - 基于 groupby 和 binning 将数据帧拆分为多个数据帧

标签 python pandas dataframe pandas-groupby binning

我在 pandas 中有一个数据框,其中包含我想根据其 id('square')将其分类的信息。我想获得每个组的平均亮度,并根据这个平均亮度,我想将数据帧分为 4 个类别,并获得 4 个输出数据帧。

示例数据框:

squares = pd.DataFrame({'square': {0: 1.0, 1: 1.0, 2: 2.0, 3: 2.0, 4: 5.0, 5: 6.0, 6: 7.0, 7: 8.0},
                    'time': {0: 1.0, 1: 2.0, 2: 1.0, 3: 2.0, 4: 3.0, 5: 3.0, 6: 4.0, 7: 5.0 },
                    'x': {0: 243, 1: 293, 2: 189, 3: 189, 4: 176, 5: 374, 6: 111, 7: 239},
                    'y': {0: 233, 1: 436, 2: 230, 3: 233, 4: 203, 5: 394, 6: 171, 7: 284}, 
                    'brightness': {0: 1000, 1: 1200, 2: 4000, 3: 5000, 4: 2000, 5: 8000, 6: 1300, 7: 4300 }})

squares = squares.set_index('time')
squares


      brightness     square     x     y 
time
1.0     1000          1.0       243   233
2.0     1200          1.0       293   436
1.0     4000          2.0       189   230
2.0     5000          2.0       189   233
3.0     2000          5.0       176   203
3.0     6000          6.0       374   394 
4.0     1300          7.0       111   171
5.0     4300          8.0       239   284

期望的最终结果:

squares_1

      brightness     square     x     y 
time
1.0     1000          1.0       243   233
2.0     1200          1.0       293   436
3.0     2000          5.0       176   203
4.0     1300          7.0       111   171


squares_2

NaN


squares_3

      brightness     square     x     y 
time
1.0     4000          2.0       189   230
2.0     5000          2.0       189   233
5.0     4300          8.0       239   284


squares_4

      brightness     square     x     y 
time
3.0     6000          6.0       374   394 

我从以下开始:

import pandas as pd
import numpy as np
from matplotlib import pyplot as plt

avg = squares.groupby('square')['brightness'].mean()
n, bins, patches = plt.hist(avg, bins = 4)
inds = np.digitize(avg, bins)

我不太确定如何继续。任何帮助表示赞赏!

最佳答案

您可以使用GroupBy.transform对于新的 Seriesmean 其大小与原始 DataFrame 相同,然后按 cut 进行分箱最后创建 DataFrame 字典:

squares = squares.set_index('time')

labs = [f'squares_{x+1}' for x in range(4)]
g = pd.cut(squares.groupby('square')['brightness'].transform('mean'), bins=4, labels=labs)
print (g)
time
1.0    squares_1
2.0    squares_1
1.0    squares_2
2.0    squares_2
3.0    squares_1
3.0    squares_4
4.0    squares_1
5.0    squares_2
Name: brightness, dtype: category
Categories (4, object): [squares_1 < squares_2 < squares_3 < squares_4]

dfs = dict(tuple(squares.groupby(g)))
<小时/>
print (dfs)
{'squares_1':       square    x    y  brightness
time                              
1.0      1.0  243  233        1000
2.0      1.0  293  436        1200
3.0      5.0  176  203        2000
4.0      7.0  111  171        1300, 'squares_2':       square    x    y  brightness
time                              
1.0      2.0  189  230        4000
2.0      2.0  189  233        5000
5.0      8.0  239  284        4300, 'squares_3': Empty DataFrame
Columns: [square, x, y, brightness]
Index: [], 'squares_4':       square    x    y  brightness
time                              
3.0      6.0  374  394        8000}
<小时/>
print (dfs['squares_1'])
      square    x    y  brightness
time                              
1.0      1.0  243  233        1000
2.0      1.0  293  436        1200
3.0      5.0  176  203        2000
4.0      7.0  111  171        1300

print (dfs['squares_2'])
      square    x    y  brightness
time                              
1.0      2.0  189  230        4000
2.0      2.0  189  233        5000
5.0      8.0  239  284        4300

print (dfs['squares_3'])
Empty DataFrame
Columns: [square, x, y, brightness]
Index: []

print (dfs['squares_4'])
      square    x    y  brightness
time                              
3.0      6.0  374  394        8000

关于python - 基于 groupby 和 binning 将数据帧拆分为多个数据帧,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/52423654/

相关文章:

Python peewee mysql 内连接别名困难

python - 函数调用后记住数组值

python unicode : when written to file, 以不同的格式写入

python - 在 pandas DataFrame 列中存储列表

python - 计算 TSV 文件中所有其他点之间的距离?

python - Pandas 数字格式,带括号的负数

python - 附加具有不同列名称的数据框 - Pandas

python - 从具有多个索引的 Pandas 数据框中删除列

python - 使用 Pandas 数据框的简单线性回归

python - 根据条件增加数据框列