python - scikit-learn/python 中带有字符的随机森林

标签 python machine-learning scipy scikit-learn ipython

我有一个字符列和数字，但我想对字符列进行分类并应用随机森林分类器。我意识到有 OneHotEncoder 但没有任何例子。那么我如何对字符进行分类，例如性别列，其中“f”和“m”为整数(如(0,1))？

最佳答案

使用LabelEncoder它接受一个字符串数组并将其转换为一个整数数组。

示例:

from sklearn.preprocessing import LabelEncoder
import pandas as pd

data = pd.DataFrame()

data['age'] = [17,33,47]
data['gender'] = ['m','f','m']

enc = LabelEncoder()

print(data)
enc.fit(data['gender'])
data['gender'] = enc.transform(data['gender'])
print(data)

输出:

   age gender
0    17      m
1    33      f
2    47      m
   age  gender
0    17       1
1    33       0
2    47       1

关于python - scikit-learn/python 中带有字符的随机森林，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/35269969/

上一篇：python - 如何在 Python 日志记录中设置组合字段的宽度

下一篇：python - 按 Pandas 数据框列的相同部分字符串分组

相关文章：

python - 重启redis需要很长时间

python - 在 python 中 reshape 多头表

python - 为什么这个 lambda 函数比 for 循环版本更懒？

python - 洗牌后的不同结果

java - MALLET:如何实现基于crf的编辑距离？

python - 后验高斯过程(Python)

python - 在 python 中绘制一个球体以获得轨道轨迹

python - 使用单热编码将列表转换为二进制值

python - 功率谱密度-scipy.signal

python - 带有导数 : TypeError: 'numpy.float64' object is not callable 的 Scipy Newton