python - pandas:将多个类别合并为一个

假设我有类别，1 到 10，我想将 red 分配给值 3 到 5，将 green 分配给 1,6 和 7，以及 blue 到 2、8、9 和 10。

我该怎么做？如果我尝试

df.cat.rename_categories(['red','green','blue'])

我得到一个错误:ValueError: new categories need to have the same number of items than the old categories! 但是如果我把它放在

df.cat.rename_categories(['green','blue','red', 'red', 'red'
                        'green', 'green', 'blue', 'blue' 'blue'])

我会收到一条错误消息，提示存在重复值。

我能想到的唯一其他方法是编写一个 for 循环，它会遍历值的字典并替换它们。有没有更优雅的解决方法？

最佳答案

不确定是否优雅，但如果您将旧类别改为新类别，例如(注意添加的“紫色”):

>>> m = {"red": [3,4,5], "green": [1,6,7], "blue": [2,8,9,10], "purple": [11]}
>>> m2 = {v: k for k,vv in m.items() for v in vv}
>>> m2
{1: 'green', 2: 'blue', 3: 'red', 4: 'red', 5: 'red', 6: 'green', 
 7: 'green', 8: 'blue', 9: 'blue', 10: 'blue', 11: 'purple'}

您可以使用它来构建一个新的分类系列:

>>> df.cat.map(m2).astype("category", categories=set(m2.values()))
0    green
1     blue
2      red
3      red
4      red
5    green
6    green
7     blue
8     blue
9     blue
Name: cat, dtype: category
Categories (4, object): [green, purple, red, blue]

如果您确定所有分类值都是列中看到。但在这里，如果我们不这样做，我们就不会在生成的 Categorical 中看到 purple，因为它是根据它实际看到的类别构建的。

当然，如果您已经构建了列表['green','blue','red', etc.]，那么直接使用它创建一个新的分类列同样简单，完全绕过这个映射。

关于python - pandas:将多个类别合并为一个，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/32262982/

python - pandas:将多个类别合并为一个

上一篇：python - Wolfram Alpha 和 scipy.integrate.quad 对同一个积分给出了不同的答案

下一篇：python - 使用 numpy/pandas 按时间戳合并时间序列数据