python - Python 中的多元逻辑回归显示错误

标签 python machine-learning scikit-learn regression logistic-regression

我正在尝试使用逻辑回归进行预测,并使用 Python 和 sklearn 库测试准确性。我使用从这里下载的数据:

http://archive.ics.uci.edu/ml/datasets/concrete+compressive+strength

其 Excel 文件。我写了一段代码,但总是得到同样的错误,错误是:

ValueError: Unknown label type: 'continuous'

我在进行线性回归时使用了相同的逻辑,并且它适用于线性回归。

这是代码:

import numpy as np
import pandas as pd
import xlrd
from sklearn import linear_model
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import train_test_split

#Reading data from excel

data = pd.read_excel("DataSet.xls").round(2)
data_size = data.shape[0]
#print("Number of data:",data_size,"\n",data.head())

my_data = data[(data["Superpl"] == 0) & (data["FlyAsh"] == 0) & (data["BlastFurSlag"] == 0)].drop(columns=["Superpl","FlyAsh","BlastFurSlag"])
my_data = my_data[my_data["Days"]<=28]
my_data_size = my_data.shape[0]
#print("Size of dataset for 28 days or less:", my_data_size, "\n", my_data.head())


def logistic_regression(data_input, cement, water,
                          coarse_aggr, fine_aggr, days):

    variable_list = []
    result_list = []

    for column in data_input:

        variable_list.append(column)
        result_list.append(column)

    variable_list = variable_list[:-1]
    result_list = result_list[-1]

    variables = data_input[variable_list]
    results = data_input[result_list]

    #accuracy of prediction (splittig dataframe in train and test)
    var_train, var_test, res_train, res_test = train_test_split(variables, results, test_size = 0.3, random_state = 42)

    #making logistic model and fitting the data into logistic model
    log_regression = linear_model.LogisticRegression()
    model = log_regression.fit(var_train, res_train)

    input_values = [cement, water, coarse_aggr, fine_aggr, days]

    #predicting the outcome based on the input_values
    predicted_strength = log_regression.predict([input_values]) #adding values for prediction
    predicted_strength = round(predicted_strength[0], 2)

    #calculating accuracy score
    score = log_regression.score(var_test, res_test)
    score = round(score*100, 2)

    prediction_info = "\nPrediction of future strenght: " + str(predicted_strength) + " MPa\n"
    accuracy_info = "Accuracy of prediction: " + str(score) + "%\n"
    full_info = prediction_info + accuracy_info

    return full_info

print(logistic_regression(my_data, 376.0, 214.6, 1003.5, 762.4, 3)) #true value affter 3 days: 16.28 MPa

最佳答案

虽然您没有提供数据详细信息,但从代码最后一行中的错误和注释来看:

#true value affter 3 days: 16.28 MPa

我的结论是,您处于回归(即数字预测)环境中。线性回归是适合此任务的模型,但逻辑回归不是:逻辑回归用于分类问题,因此它期望二元(或分类)数据为目标变量,不是连续值,因此会出现错误。

简而言之,您正在尝试应用不适合您的问题的模型。

更新(链接到数据后):确实,仔细阅读数据集描述,您会看到(添加了强调):

The concrete compressive strength is the regression problem

来自 scikit-learn User's Guide对于逻辑回归(再次强调):

Logistic regression, despite its name, is a linear model for classification rather than regression.

关于python - Python 中的多元逻辑回归显示错误,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/54945032/

相关文章:

python - 分割不可散列类型的字符串

python - 通过 while_loop 进行 Tensorflow 梯度

python - 使用 python : garbage collector is not working? 在 google colab 上工作

python - TensorFlow 错误 : ValueError ("Shapes %s and %s are incompatible" % (self, 其他))

python - ValueError : setting an array element with a sequence. 决策树

python - 模块未找到错误 : No module named 'sklearn.cross_validation' ??如何修复它?

python - 使用一个完全训练的文件和另一个完全测试的文件进行分类

python - SQLAlchemy 中 dateutil.relativedelta 支持的子类间隔

python - Twitter 错误 ImportError : cannot import name ReadTimeoutError, 有人吗?

python - Django : Groups in production