python - pandas 数据框中每行的变量 bin

给定一个坐标数据框，例如 df1 = pd.DataFrame({'x': np.tile(np.arange(20),5), 'y': np.repeat(np.arange(5),20)})

我想对每个 x 值进行分箱，但是，每行的分箱数量各不相同。更具体地说，箱的数量取决于 y 值。

例如点 x=6 且 y=2 如果箱数 = y+1 = 3 那么该行的 bin 为 (0, 6.33], (6.33, 12.67], (12.67, 19])，结果 bin 为 (0, 6.33]

生成的数据帧的一部分如下所示:

x    y    xbinned
18   2    (12.67, 19]
19   2    (12.67, 19]
0    3    (0, 4.75]
1    3    (0, 4.75]

以下命令生成所需的垃圾箱:

xbins = []

for y in df1.y:
    xbins.append(np.linspace(df1['x'].min(), df1['x'].max(), y+1))

但不能在剪切中使用:

df['xbinned'] = pd.cut(df.x, bins=xbins)

因为它需要一维数组而不是二维数组。

我该去哪里？我想我可以使用循环来做到这一点，但希望使用 pandas 函数来获得更加矢量化的解决方案。

最佳答案

IIUC:

df1['xbinned'] = (df1.groupby('y')
                     .apply(lambda d: pd.cut(d['x'], bins=d['y'][0]+1))
                     .reset_index(level=0, drop=True)
                 )

输出(部分)

     x  y         xbinned
18  18  0  (-0.019, 19.0]
19  19  0  (-0.019, 19.0]
38  18  1     (9.5, 19.0]
39  19  1     (9.5, 19.0]

关于python - pandas 数据框中每行的变量 bin，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/59201654/

上一篇：python - 类型错误 : expected string or bytes-like object and works on server but not on PC

下一篇：python - 如何从外部打印 exec() 函数内部抛出的 python 异常

python - Pandas (Python): How to apply values to similar row?

python - Pandas 时间序列和 groupby

python - 如何根据列条目从 Pandas 数据框中删除随机行？

python - 在 DataFrame 上使用 where

python - 如果在数组创建期间定义了 NumPy 数组的元素，为什么 Cython 需要更多 Python 调用？

python - 输入字典 : values that are lists

python - 将行分成两列并保持其他列相同

python - 如何从字典中写入文本文件，其中每个键都是一个新行？

python - 消息 : Error: Polling for changes failed: NetworkError when attempting to fetch resource while downloading file through Selenium and FirefoxProfile