我有这样一个数据框:
name = ['fred','fred','fred','james','james','rick','rick','jeff']
actionfigures = ['superman','batman','flash','greenlantern','flash','batman','joker','superman']
cars = ['lamborghini', 'ferrari','bugatti','ferrari','corvette','bugatti','bmw','bmw']
pets = ['cat','dog','bird','cat','dog','dog','fish','marmet']
test = pd.DataFrame({'name':name,'actfig':actionfigures,'car':cars,'pet':pets})
actfig car name pet
0 superman lamborghini fred cat
1 batman ferrari fred dog
2 flash bugatti fred bird
3 greenlantern ferrari james cat
4 flash corvette james dog
5 batman bugatti rick dog
6 joker bmw rick fish
7 superman bmw jeff marmet
如果我的术语不正确,请原谅我,但我想旋转数据,以便我得到 ['actionfigures','car','pet'] 列中每个名称的每个值的计数。
batman flash greenlantern joker superman bmw bugatti corvette ferrari lamborghini bird cat dog fish marmet
name
fred 1 1 0 0 1 0 1 0 1 1 1 1 1 0 0
james 0 1 1 0 0 0 0 1 1 0 0 1 1 0 0
jeff 0 0 0 0 1 1 0 0 0 0 0 0 0 0 1
rick 1 0 0 1 0 1 1 0 0 0 0 0 1 1 0
我原以为 test.pivot_table(index='name',columns=['actfig','car','pet'],aggfunc='size'])
会做它,但它给了我一些奇怪的多级列。
我想也许我可以为每一列连接 get_dummies
,然后按名称和总和进行分组,但我觉得 pandas prob 有更好的方法。
这将如何完成?
最佳答案
melt
和 pivot
test.melt('name').assign(new=1).pivot('name','value','new').fillna(0)
Out[239]:
value batman bird bmw bugatti cat corvette dog ferrari fish flash \
name
fred 1.0 1.0 0.0 1.0 1.0 0.0 1.0 1.0 0.0 1.0
james 0.0 0.0 0.0 0.0 1.0 1.0 1.0 1.0 0.0 1.0
jeff 0.0 0.0 1.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
rick 1.0 0.0 1.0 1.0 0.0 0.0 1.0 0.0 1.0 0.0
value greenlantern joker lamborghini marmet superman
name
fred 0.0 0.0 1.0 0.0 1.0
james 1.0 0.0 0.0 0.0 0.0
jeff 0.0 0.0 0.0 1.0 1.0
rick 0.0 1.0 0.0 0.0 0.0
或者get_dummies
pd.get_dummies(test.set_index('name')).sum(level=0)
Out[248]:
actfig_batman actfig_flash actfig_greenlantern actfig_joker \
name
fred 1 1 0 0
james 0 1 1 0
jeff 0 0 0 0
rick 1 0 0 1
actfig_superman car_bmw car_bugatti car_corvette car_ferrari \
name
fred 1 0 1 0 1
james 0 0 0 1 1
jeff 1 1 0 0 0
rick 0 1 1 0 0
car_lamborghini pet_bird pet_cat pet_dog pet_fish pet_marmet
name
fred 1 1 1 1 0 0
james 0 0 1 1 0 0
jeff 0 0 0 0 0 1
rick 0 0 0 1 1 0
编辑:根据 PiR
pd.get_dummies(test.set_index('name'), prefix_sep='|').sum(level=0).rename(columns=lambda c: c.rsplit('|', 1)[1])
关于python - Pandas - 透视多个分类列,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/46733674/