python - Pandas DF - 测量频率，附加到适当的行并按 max(freq) 标准化

假设我创建以下数据框:

df = pd.DataFrame({'A':np.random.random(20), 'B':np.random.random(20)})
df
Out[162]: 
           A         B
0   0.888651  0.380360
1   0.513343  0.605991
2   0.560978  0.076174
3   0.209426  0.498564
4   0.121748  0.771653
5   0.843299  0.279264
6   0.644060  0.725061
7   0.200187  0.349093
8   0.807808  0.657373
9   0.212760  0.384311
10  0.000725  0.023815
11  0.614540  0.534569
12  0.083690  0.228761
13  0.202334  0.266114
14  0.104520  0.757514
15  0.039944  0.014512
16  0.465300  0.164657
17  0.247370  0.894628
18  0.980589  0.833938
19  0.734673  0.745574

那么我想要:

了解“B”列落入分箱的频率:np.arange(0, 1.05, 0.05)
将该信息添加为“freq”列。因此，例如，row[0]，其中 'B' 为 0.38，且介于 [0.35, 0.40) 之间，在数据框。因此我们将有 df['freq'][0] = 2
然后，我想要一个名为 'weights' 的新列，对于每一行，该列将为 max(freq)/freq

我可以用类似的方法解决1:df.groupby(pd.cut(df['B'], np.arange(0, 1.05, 0.05))).count()可能有更优雅的方法来做到这一点

我没能解决2

3 非常简单。

最终，我只需要由 1、2 和 3 创建的“权重”列。

最佳答案

您可以使用例如执行 1 np.digitize 和 2 使用 transform()。

import pandas as pd 
import numpy as np
df = pd.DataFrame({'A': np.random.random(20), 'B': np.random.random(20)})

bins = np.arange(0, 1.05, 0.05)
df["bins"] = np.digitize(df["B"], bins)
df["count"] = df.groupby("bins")["bins"].transform("count")
df["weight"] = df["count"].max()/df["count"]

df
Out[32]: 
           A         B  bins  count  weight
0   0.032735  0.948836    19      1     3.0
1   0.728310  0.671117    14      2     1.5
2   0.307804  0.328636     7      1     3.0
3   0.794719  0.257233     6      3     1.0
4   0.137138  0.480473    10      1     3.0
5   0.145847  0.754164    16      2     1.5
6   0.929552  0.187502     4      1     3.0
7   0.700309  0.655163    14      2     1.5
8   0.590829  0.561370    12      1     3.0
9   0.236366  0.814549    17      2     1.5
10  0.409573  0.444851     9      1     3.0
11  0.611366  0.842374    17      2     1.5
12  0.184661  0.725729    15      1     3.0
13  0.643751  0.299513     6      3     1.0
14  0.421400  0.294158     6      3     1.0
15  0.293585  0.112387     3      1     3.0
16  0.790870  0.609906    13      1     3.0
17  0.980155  0.757171    16      2     1.5
18  0.733151  0.393027     8      2     1.5
19  0.512966  0.398919     8      2     1.5

关于python - Pandas DF - 测量频率，附加到适当的行并按 max(freq) 标准化，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/49998709/

python - Pandas DF - 测量频率，附加到适当的行并按 max(freq) 标准化

上一篇：python - 如何在python中连接两个数据框

下一篇：python - 根据日期连接两个 DataFrame