python - 名称错误 : name 'X' is not defined sklearn

标签 python scikit-learn data-science

我正在通过此演练解决这个多重回归问题,但是从

开始的代码

部分:#在网站上使用 One-hot-encoding 处理分类变量:https://towardsdatascience.com/what-makes-a-movie-hit-a-jackpot-learning-from-data-with-multiple-linear-regression-339f6c1a7022

到目前为止我已经运行了代码,但它不适用于 (X)

实际代码:

 from sklearn import preprocessing
 le = preprocessing.LabelEncoder()


# LabelEncoder for a number of columns
class MultiColumnLabelEncoder:

 def __init__(self, columns = None):
    self.columns = columns # list of column to encode
    def fit(self, X, y=None):
    return self
    def transform(self, X):
    '''
    Transforms columns of X specified in self.columns using
    LabelEncoder(). If no columns specified, transforms all
    columns in X.
    '''

    output = X.copy()

    if self.columns is not None:
        for col in self.columns:
            output[col] = LabelEncoder().fit_transform(output[col])
    else:
        for colname, col in output.iteritems():
            output[colname] = LabelEncoder().fit_transform(col)

    return output
def fit_transform(self, X, y=None):
    return self.fit(X, y).transform(X)

  le = MultiColumnLabelEncoder()
  X_train_le = le.fit_transform(X)

这是我收到的错误:

 Traceback (most recent call last):

  File "<ipython-input-63-581cea150670>", line 34, in <module>
    X_train_le = le.fit_transform(X)

NameError: name 'X' is not defined

最佳答案

您的代码应该无法工作,因为您遗漏了她在该代码片段之前编写的 40 行代码。她之前已经定义了X。代码可以从Github获取。

#importing the libraries
import pandas as pd
import numpy as np
import matplotlib
import matplotlib.pyplot as plt
import seaborn as sns
import statsmodels.api as sm
%matplotlib inline
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from sklearn.feature_selection import RFE
from sklearn.linear_model import RidgeCV, LassoCV, Ridge, Lasso
import statsmodels.api as sm
import pyreadr
from sklearn.metrics import mean_squared_error, r2_score
from sklearn.metrics import explained_variance_score
from sklearn import metrics
from sklearn.preprocessing import StandardScaler

result = pyreadr.read_r('Movies.RData')# also works for Rds
print(result.keys())
df = pd.DataFrame(result['movies'], columns=result['movies'].keys() )
df.shape

df.shape[0]
df.set_index("title", inplace=True) #setting the index name
df_1 = df.loc[:, ['imdb_rating','genre', 'runtime', 'best_pic_nom',
                  'top200_box', 'director', 'actor1']]

#Let's also check the column-wise distribution of null values
print(df_1.isnull().values.sum())
print(df_1.isnull().sum())

#Dropping missing values from my dataset
df_1.dropna(how='any', inplace=True)
print(df_1.isnull().values.sum()) #checking for missing values after the dropna()

#Splitting for 2 matrices: independent variables used for prediction and dependent variables (that is predicted)
X = df_1.drop(["imdb_rating", 'runtime'], axis = 1)   #Feature Matrix
y = df_1["imdb_rating"] #Dependent Variables

关于python - 名称错误 : name 'X' is not defined sklearn,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/56738929/

相关文章:

python - 使用两个不同版本的python,但sqlmap需要2.7

python - 如何在 Python 2 中打印程序工作目录?

python - 将 Python 翻译成 C

python - cross_validation模块是否从sklearn中删除了?如果是,从哪个版本开始?

python - Pandas - 检查列中的数字是否在行中

algorithm - 自动选择图例的比例(线性、幂、对数)

machine-learning - 使用最少的图像数据设计分类器

python - 通过 NumPy 标准化向量场

python - Scikit-Learn 隔离森林 decision_function 分数的范围是多少?

python - 如何将主成分分析的结果映射回输入模型的实际特征?