我已经开始使用精度和召回率来评估随机森林分类器。然而,尽管分类器的 CPU 和 GPU 实现的训练集和测试集是相同的,但我发现返回的评估分数存在差异。这是库中偶然出现的已知错误吗?
下面两个代码示例仅供引用。
Scikit-Learn(CPU)
from sklearn.metrics import recall_score, precision_score
from sklearn.ensemble import RandomForestClassifier
rf_cpu = RandomForestClassifier(n_estimators=5000, n_jobs=-1)
rf_cpu.fit(X_train, y_train)
rf_cpu_pred = clf.predict(X_test)
recall_score(rf_cpu_pred, y_test)
precision_score(rf_cpu_pred, y_test)
CPU Recall: 0.807186
CPU Precision: 0.82095
H2O4GPU(GPU)
from h2o4gpu.metrics import recall_score, precision_score
from h2o4gpu import RandomForestClassifier
rf_gpu = RandomForestClassifier(n_estimators=5000, n_gpus=1)
rf_gpu.fit(X_train, y_train)
rf_gpu_pred = clf.predict(X_test)
recall_score(rf_gpu_pred, y_test)
precision_score(rf_gpu_pred, y_test)
GPU Recall: 0.714286
GPU Precision: 0.809988
最佳答案
更正:意识到精确度和召回率的输入顺序错误。根据 Scikit-Learn documentation,顺序始终为 (y_true, y_pred)
.
更正的评估代码
recall_score(y_test, rf_gpu_pred)
precision_score(y_test, rf_gpu_pred)
关于python - H2O4GPU 和 Scikit-Learn 之间的分类分数不同,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/53290612/