我编写了以下函数将变量转换为虚拟变量:
def convert_to_dummies(df, column):
dummies = pd.get_dummies(df[column])
df = pd.concat([df, dummies], axis=1)
df = df.drop(column, axis=1) #when dropping column don't forget "axis=1"
return df
但是当我将其应用于 df 中的分类变量时:
for col in ['col1', 'col2', ....]:
convert_to_dummies(df, col)
* 'col1', ''col2', ... are categorical columns in df.
我得到了原始的 df,并且没有任何分类变量被转换为虚拟变量。我做错了什么?
最佳答案
您需要分配回输出:
for col in ['col1', 'col2', ....]:
df = convert_to_dummies(df, col)
示例:
df = pd.DataFrame({'col1':list('abcdef'),
'col2':list('abadec'),
'col3':list('aaadee'),
'col4':list('aabbcc')})
print (df)
col1 col2 col3 col4
0 a a a a
1 b b a a
2 c a a b
3 d d d b
4 e e e c
5 f c e c
for col in ['col1', 'col2']:
df = convert_to_dummies(df, col)
print (df)
col3 col4 a b c d e f a b c d e
0 a a 1 0 0 0 0 0 1 0 0 0 0
1 a a 0 1 0 0 0 0 0 1 0 0 0
2 a b 0 0 1 0 0 0 1 0 0 0 0
3 d b 0 0 0 1 0 0 0 0 0 1 0
4 e c 0 0 0 0 1 0 0 0 0 0 1
5 e c 0 0 0 0 0 1 0 0 1 0 0
如果需要唯一的分类列,最好删除循环:
def convert_to_dummies_cols(df, cols):
#create all dummies once with all columns selected by subset
dummies = pd.get_dummies(df[cols], prefix='', prefix_sep='')
#aggregate max by columns
dummies = dummies.groupby(level=0, axis=1).max()
#add to original df
df = pd.concat([df, dummies], axis=1)
df = df.drop(cols, axis=1)
return df
#parameter is list of columns for dummies
df = convert_to_dummies_cols(df, ['col1', 'col2'])
print (df)
col3 col4 a b c d e f
0 a a 1 0 0 0 0 0
1 a a 0 1 0 0 0 0
2 a b 1 0 1 0 0 0
3 d b 0 0 0 1 0 0
4 e c 0 0 0 0 1 0
5 e c 0 0 1 0 0 1
关于数据帧上的 python 函数未返回预期结果,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/46512661/