假设我有一个包含三列分类数据的数据框,我想将三个分类列转换为单个值并映射到原始数据框。我知道这可以通过带有 this 的单列来实现。 ,但是多列怎么样?
示例:从此
>>>df = pd.DataFrame({'id':['0', '1', '2', '3','4'],
... 'x':['tall', 'short', 'tall', 'short', 'tall'],
... 'y':['fat', 'thin', 'thin', 'fat', 'fat'],
... 'z':['male', 'female', 'female', 'male', 'male']},
... dtype='category')
>>>df
id x y z
0 0 tall fat male
1 1 short thin female
2 2 tall thin female
3 3 short fat male
4 4 tall fat male
通过映射 x、y 和 z 列来实现此目的
>>>df
id x y z map
0 0 tall fat male 0
1 1 short thin female 1
2 2 tall thin female 2
3 3 short fat male 3
4 4 tall fat male 0
最佳答案
这是groupby().ngroup()
:
df['map'] = df.groupby(['x','y','z'], sort=False).ngroup()
或者,如果您的数据是字符串类型,您可以连接列(可能带有一些特殊字符),并使用单列方法:
# add('&') may not be needed
df['map'] = pd.factorize(df[['x','y','z']].add('&').sum(1))[0]
输出:
id x y z map
0 0 tall fat male 0
1 1 short thin female 1
2 2 tall thin female 2
3 3 short fat male 3
4 4 tall fat male 0
关于python - pandas 中多列分类值的映射,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/60313006/