python - 根据其他列中唯一值的长度在 Pandas 中创建一个新列

标签 python pandas dataframe comparison pandas-groupby

我有一个数据框如下:

df

    id   val
0    1   21
1    2   35
2    2   45
3    3   55
4    1   10
5    4   90
6    3   45
7    2   78
8    3   23

我想根据 id 中每个值的长度创建一个新列 cat。

如果 len(id) <= 1 cat 中的值应该是 'A'

如果 len(id) < 3 值应该是 'B'

如果 len(id) >= 3 值应该是 'C'

预期输出:

    id   val   cat
0    1   21     B
1    2   35     C
2    2   45     C
3    3   55     C
4    1   10     B
5    4   90     A
6    3   45     C
7    2   78     C
8    3   23     C

我尝试过的:

def test(series):
    if len(series) <= 1:
        return 'A'
    elif len(series) < 3:
        return 'B'
    else:
        return 'C'


df.groupby('id').apply(test)

以上代码错误:

TypeError: 'DataFrame' objects are mutable, thus they cannot be hashed

最佳答案

您可以使用map、value_counts 和pd.cut:

 df['cat'] = df.id.map(pd.cut(df.id.value_counts(),
                              bins=[0,1,2,np.inf],
                              labels=['A','B','C']))

输出:

   id  val cat
0   1   21   B
1   2   35   C
2   2   45   C
3   3   55   C
4   1   10   B
5   4   90   A
6   3   45   C
7   2   78   C
8   3   23   C

关于python - 根据其他列中唯一值的长度在 Pandas 中创建一个新列，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/49497627/

上一篇：python - Scikit 在使用 fit() 函数时学习 GaussianProcessClassifier 内存错误

下一篇：python - Pandas 数据框 : group by year and month

python - 将长数据帧转换为宽数据帧

python - 几分钟后 Google App Engine 内部服务器错误

javascript - 从反向代理 nginx 服务器为多个 websocket 客户端提供服务

python和数据帧: group by week and calculate the sum and difference

python - 查找 Pandas Series 包含包含字符的元素的位置的索引

python - 迭代 pandas 数据框中的行并匹配列表字典中的值以创建新列

python - 在 Pandas 中将数据帧子集为多个数据帧

python - 列的高效过滤

python - PyDev 无法识别 PyQt5