我想分解 pandas 中的一列数据帧并将其添加为新列。该列的值是一个字符串。
例如
COL_1
'TRY A TEST'
'TRY A TEST'
'PLAY Q'
'PLAY Q'
我希望将其转换为数字,例如:
COL_1 NEW_COL
'TRY A TEST' 0
'TRY A TEST' 0
'PLAY Q' 1
'PLAY Q' 1
但是,我得到了:
x = 'TRY A TEST'
my_df['NEW_COL'] = my_df['COL_1'].apply(lambda x: pd.factorize(x)[0])
(array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0], dtype=int64), array(['TRY A TEST'], dtype=object))
似乎每个字符都转换为数字。
我也遇到错误:
TypeError: 'float' object is not iterable
“COL_1”中没有 float ,它是字符串。
有什么建议吗?
最佳答案
简单的解决方案:
from sklearn import preprocessing
le = preprocessing.LabelEncoder()
my_df['NEW_COL'] = le.fit_transform(my_df['COL_1'].astype(str))
my_df
COL_1 NEW_COL
0 TRY A TEST 1
1 TRY A TEST 1
2 PLAY Q 0
3 PLAY Q 0
对于大型数据框/多列,您可以简单地使用 for 循环
例如。
my_df
pets owner location
0 cat Champ San_Diego
1 dog Ron New_York
2 cat Brick New_York
3 monkey Champ San_Diego
4 dog Veronica San_Diego
5 dog Ron New_York
############
for column in ['pets','owner','location']:
le = preprocessing.LabelEncoder()
my_df[str(column+'_num')] = le.fit_transform(my_df[column].astype(str))
############
my_df
pets owner location pets_num owner_num location_num
0 cat Champ San_Diego 0 1 1
1 dog Ron New_York 1 2 0
2 cat Brick New_York 0 0 0
3 monkey Champ San_Diego 2 1 1
4 dog Veronica San_Diego 1 3 1
5 dog Ron New_York 1 2 0
关于python - pandas 将字符串的类别转换为数字作为一个对象,但得到一组数字,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/55707620/