python-3.x - 值错误 : Found input variables with inconsistent numbers of samples: [2750, 1095]

标签 python-3.x machine-learning scikit-learn linear-regression

如果有人可以帮助我理解此错误以及我该如何修复它,那将会非常有帮助?我无法更改我的数据。

 X = train[['id', 'listing_type', 'floor', 'latitude', 'longitude', 
             'beds', 'baths','total_rooms','square_feet','group','grades']]
    Y = test['price']
    n = pd.get_dummies(train.group)  

训练数据如下所示:

id  listing_type    floor   latitude    longitude   beds    baths   total_rooms square_feet grades  high_price_high_freq    high_price_low_freq low_price
265183  10  4   40.756224   -73.962506  1   1   3   790 2   1   0   0   0
270356  10  7   40.778010   -73.962547  5   5   9   4825    2   1   0   0
176718  10  25  40.764955   -73.963483  2   2   4   1645    2   1   0   0
234589  10  5   40.741448   -73.994216  3   3   5   2989    2   1   0   0
270372  10  5   40.837000   -73.947787  1   1   3   1045    2   0   0   1

错误代码是:

from sklearn.cross_validation import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, Y, test_size=0.3, random_state=0)
from sklearn.linear_model import LinearRegression
regressor = LinearRegression()
regressor.fit(X_train, y_train)

错误消息:

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-479-ca78b7b5f096> in <module>()
      1 from sklearn.cross_validation import train_test_split
----> 2 X_train, X_test, y_train, y_test = train_test_split(X, Y, test_size=0.3, random_state=0)
      3 from sklearn.linear_model import LinearRegression
      4 regressor = LinearRegression()
      5 regressor.fit(X_train, y_train)

~\Anaconda3\lib\site-packages\sklearn\cross_validation.py in train_test_split(*arrays, **options)
   2057     if test_size is None and train_size is None:
   2058         test_size = 0.25
-> 2059     arrays = indexable(*arrays)
   2060     if stratify is not None:
   2061         cv = StratifiedShuffleSplit(stratify, test_size=test_size,

~\Anaconda3\lib\site-packages\sklearn\utils\validation.py in indexable(*iterables)
    227         else:
    228             result.append(np.array(X))
--> 229     check_consistent_length(*result)
    230     return result
    231 

~\Anaconda3\lib\site-packages\sklearn\utils\validation.py in check_consistent_length(*arrays)
    202     if len(uniques) > 1:
    203         raise ValueError("Found input variables with inconsistent numbers of"
--> 204                          " samples: %r" % [int(l) for l in lengths])
    205 
    206 

ValueError: Found input variables with inconsistent numbers of samples: [2750, 1095]

最佳答案

Y = test['price'] 可能应该是 Y = train['price'] (或任何功能名称)。

引发异常是因为您的 X 和 Y 具有不同数量的样本(行),而 train_test_split 不喜欢这样。

关于python-3.x - 值错误 : Found input variables with inconsistent numbers of samples: [2750, 1095],我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/51031746/

相关文章:

python-3.x - 计算完整图中的距离度量

tensorflow - Keras,训练期间验证集上的 auc 与 sklearn auc 不匹配

python - 用颜色填充图像但保留 alpha(PIL 中的颜色叠加)

python - 搜索列表中的值是否在字典中,其格式为 key-string, value-list(strings)

python - 我的模型是否欠拟合, tensorflow ?

r - 在 R 中使用神经网络进行预测

python - 对 Python 列表语法感到困惑

Python/Sklearn - 值错误 : could not convert string to float

python - 如何按名称导入自定义python包

opencv - OpenCV 中神经网络的层大小不起作用