age income student credit_rating Class_buys_computer
0 youth high no fair no
1 youth high no excellent no
2 middle_aged high no fair yes
3 senior medium no fair yes
4 senior low yes fair yes
5 senior low yes excellent no
6 middle_aged low yes excellent yes
7 youth medium no fair no
8 youth low yes fair yes
9 senior medium yes fair yes
10 youth medium yes excellent yes
11 middle_aged medium no excellent yes
12 middle_aged high yes fair yes
13 senior medium no excellent no
我正在使用这个数据集,并希望拥有诸如年龄
、收入
等变量,就像R<中的
,我怎样才能在Python中做到这一点因子变量
一样
最佳答案
您可以使用astype
带参数类别
:
cols = ['age','income','student']
for col in cols:
df[col] = df[col].astype('category')
print (df.dtypes)
age category
income category
student category
credit_rating object
Class_buys_computer object
dtype: object
如果需要转换所有列:
for col in df.columns:
df[col] = df[col].astype('category')
print (df.dtypes)
age category
income category
student category
credit_rating category
Class_buys_computer category
dtype: object
你需要循环,因为如果使用:
df = df.astype('category')
NotImplementedError: > 1 ndim Categorical are not supported at this time
Pandas documentation about categorical .
按评论编辑:
如果需要排序,请使用另一个解决方案 pandas.Categorical
:
df['age']=pd.Categorical(df['age'],categories=["youth","middle_aged","senior"],ordered=True)
print (df.age)
0 youth
1 youth
2 middle_aged
3 senior
4 senior
5 senior
6 middle_aged
7 youth
8 youth
9 senior
10 youth
11 middle_aged
12 middle_aged
13 senior
Name: age, dtype: category
Categories (3, object): [youth < middle_aged < senior]
然后您可以按列年龄
对DataFrame进行排序:
df = df.sort_values('age')
print (df)
age income student credit_rating Class_buys_computer
0 youth high no fair no
1 youth high no excellent no
7 youth medium no fair no
8 youth low yes fair yes
10 youth medium yes excellent yes
2 middle_aged high no fair yes
6 middle_aged low yes excellent yes
11 middle_aged medium no excellent yes
12 middle_aged high yes fair yes
3 senior medium no fair yes
4 senior low yes fair yes
5 senior low yes excellent no
9 senior medium yes fair yes
13 senior medium no excellent no
关于python - 如何在Python中拥有分类因子变量,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/38954525/