python - Seaborn Regplot 和 Scikit-Learn 逻辑模型的计算方式不同？

我同时使用 Scikit-Learn 和 Seaborn 逻辑回归函数——前者用于提取模型信息(即对数几率、参数等)，后者用于绘制与概率估计拟合的结果 S 型曲线.

也许我的直觉对于如何解释这个图是不正确的，但我似乎没有得到我预期的结果:

#Build and visualize a simple logistic regression
ap_X = ap[['TOEFL Score']].values 
ap_y = ap['Chance of Admit'].values

ap_lr = LogisticRegression()
ap_lr.fit(ap_X, ap_y)

def ap_log_regplot(ap_X, ap_y):
    plt.figure(figsize=(15,10))
    sns.regplot(ap_X, ap_y, logistic=True, color='green')
    return None

ap_log_regplot(ap_X, ap_y)
plt.xlabel('TOEFL Score')
plt.ylabel('Probability')
plt.title('Logistic Regression: Probability of High Chance by TOEFL Score')
plt.show

看起来不错，但随后我尝试使用 Scikit-Learn 中的 predict_proba 函数来查找给定 TOEFL 的任意值的录取机会的概率得分(本例中为 108、104 和 112):

eight = ap_lr.predict_proba(108)[:, 1]
four = ap_lr.predict_proba(104)[:, 1]
twelve = ap_lr.predict_proba(112)[:, 1]
print(eight, four, twelve)

我从哪里得到:

[0.49939019] [0.44665597] [0.55213799]

对我来说，这似乎表明根据该数据集，托福成绩为 112 分的人被录取的机会为 55%。如果我将一条垂直线从 x 轴上的 112 延伸到 S 形曲线，我预计交点在 0.90 左右。

我是否正确地解释/建模了？我意识到我正在使用两个不同的包来计算模型系数，但使用另一个使用不同数据集的模型，我似乎得到了适合逻辑曲线的正确预测。

有什么想法吗？或者我是否完全不准确地建模/解释了这一点？

最佳答案

from sklearn.linear_model import LogisticRegression
from sklearn import metrics

x_train, x_test, y_train, y_test = train_test_split(x, y, test_size=0.4, random_state=4)
print(x_train.shape)
print(x_test.shape)
print(y_train.shape)
print(y_test.shape)

logreg = LogisticRegression()
logreg.fit(x_train, y_train)

y_pred = logreg.predict(x_test)
print('log: ', metrics.accuracy_score(y_test, y_pred))

您可以轻松找到这样的模型准确性，并决定可以将哪个模型用于您的应用程序数据。

关于python - Seaborn Regplot 和 Scikit-Learn 逻辑模型的计算方式不同？，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/52048631/

python - Seaborn Regplot 和 Scikit-Learn 逻辑模型的计算方式不同？

上一篇：webrtc - webRTC 上的屏幕共享

下一篇：Eclipse:使用同一编辑器选项卡通过多次搜索打开的文件