python - 想了解 Scikit-Learn 中的编码算法

我想对序数变量进行编码。例如，客户的满意度有4个级别:非常好、良好、中等和差。我尝试使用 scikit-learn 库中的 LabelEncoder，但编码返回值 Very Good，这应该是最好的值，是 2 而不是 3。

我想知道是否可以在 LabelEncoder 方法中为每个级别设置具体值。

最佳答案

您可以使用OrdinalEncoder并提供您自己的映射表。映射表的格式是一个列表列表，其中第 n 个列表保存输入数据第 n 列中的值。

from sklearn.preprocessing import OrdinalEncoder
import random
import pandas as pd

# the categorical values in the right order
satisfaction = ['Poor', 'Moderate', 'Good', 'Very Good']

# create the mapping list
mapping = [satisfaction]

# create some random data but reproducible data
random.seed(42)
X = pd.DataFrame({'satisfaction': [random.choice(satisfaction) for _ in range(25)]})
print(X)

0          Poor 
1          Poor 
2          Good 
3          Moderate 
4          Moderate 
5          Moderate

[...]

# create the encoder
enc = OrdinalEncoder(categories=mapping)

# transform your data
print(enc.fit_transform(X))

[[0.]
 [0.]
 [2.]
 [1.]
 [1.]
 [1.]
 ...
]

关于python - 想了解 Scikit-Learn 中的编码算法，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/57025433/

上一篇：python - 如何在谷歌云中运行python 3？

下一篇：python - 从列表中选择数据，同时保持顺序

相关文章：

vim - 在 Windows 上使用 Vim 打开 UCS-2le 文件

scikit-learn - 当我从 Pipeline 中删除 RF 模型时，它会失去准确性

python - PyPy 和高效数组

Python:在此示例中抛出异常是正确的用例吗？

Python:读写多个文件

javascript - 如何使用 ExpressJS 将 JSON 作为 UTF-8 发送？

python - 解码 json 编码为 GB2312

python - Scikit-learn Column Transformer 不返回特征名称

python - 使用 chi2 测试进行具有连续特征的特征选择 (Scikit Learn)

python - PyGame:平移平铺 map 会导致间隙