python - 在数据集上拟合决策树分类器时出现 ValueError

标签 python machine-learning scikit-learn random-forest

我已经为我正在处理的数据集创建了特征 X 和标签 y。

此时,我想在其上训练一个随机森林分类器,但在将分类器拟合到训练数据时遇到 ValueError:setting an array element with a sequence.

在 X 和 y 特征和错误详细信息下方:

X:

(array([-8.1530527e-10,  8.9952795e-10, -9.1185753e-10, ...,
         0.0000000e+00,  0.0000000e+00,  0.0000000e+00], dtype=float32),
 array([0., 0., 0., ..., 0., 0., 0.], dtype=float32),
 array([0., 0., 0., ..., 0., 0., 0.], dtype=float32),
 array([-0.00050612, -0.00057967, -0.00035985, ...,  0.        ,
         0.        ,  0.        ], dtype=float32),
 array([ 6.8139506e-08, -2.3837963e-05, -2.4622474e-05, ...,
         3.1678758e-06, -2.4535689e-06,  0.0000000e+00], dtype=float32),
 array([ 0.0000000e+00,  0.0000000e+00,  0.0000000e+00, ...,
         6.9306935e-07, -6.6020442e-07,  0.0000000e+00], dtype=float32),
 array([-7.30260945e-05, -1.18022966e-04, -1.08280736e-04, ...,
         8.83421380e-05,  4.97258679e-06,  0.00000000e+00], dtype=float32),
 array([0., 0., 0., ..., 0., 0., 0.], dtype=float32),
 array([ 2.3406714e-05,  3.1186773e-05,  4.9467826e-06, ...,
         1.2180173e-07, -9.2944845e-08,  0.0000000e+00], dtype=float32),
 array([ 1.1845550e-06, -1.6399191e-06,  2.5565218e-06, ...,
        -8.7445065e-09,  5.9859917e-09,  0.0000000e+00], dtype=float32),
 array([0., 0., 0., ..., 0., 0., 0.], dtype=float32),
 array([-1.3284328e-05, -7.4090644e-07,  7.2679302e-07, ...,
         0.0000000e+00,  0.0000000e+00,  0.0000000e+00], dtype=float32),
 array([ 0.0000000e+00,  0.0000000e+00,  0.0000000e+00, ...,
         5.0694009e-08, -3.4546797e-08,  0.0000000e+00], dtype=float32),
 array([ 1.5591205e-07, -1.5845627e-07,  1.5362870e-07, ...,
         0.0000000e+00,  0.0000000e+00,  0.0000000e+00], dtype=float32),
 array([0., 0., 0., ..., 0., 0., 0.], dtype=float32),
 array([0.0000000e+00, 0.0000000e+00, 0.0000000e+00, ..., 1.1608539e-05,
        8.2463991e-09, 0.0000000e+00], dtype=float32),
 array([-3.6192148e-07, -1.4590451e-05, -5.3999561e-06, ...,
        -1.9935460e-05, -3.4417746e-05,  0.0000000e+00], dtype=float32),
 array([ 0.0000000e+00,  0.0000000e+00,  0.0000000e+00, ...,
        -2.5319534e-07,  2.6521766e-07,  0.0000000e+00], dtype=float32),
 array([ 0.0000000e+00,  0.0000000e+00,  0.0000000e+00, ...,
        -2.5055220e-08,  1.2936166e-08,  0.0000000e+00], dtype=float32),
 array([0., 0., 0., ..., 0., 0., 0.], dtype=float32),
 array([ 1.3387315e-05,  6.0913658e-07, -5.6471418e-07, ...,
         0.0000000e+00,  0.0000000e+00,  0.0000000e+00], dtype=float32),
 array([ 1.7200684e-02,  3.2272514e-02,  3.2961801e-02, ...,
        -1.6286784e-06, -8.5592075e-07,  0.0000000e+00], dtype=float32),
 array([0., 0., 0., ..., 0., 0., 0.], dtype=float32),
 array([ 0.0000000e+00,  0.0000000e+00,  0.0000000e+00, ...,
        -3.3923173e-11,  2.8026699e-11,  0.0000000e+00], dtype=float32),
 array([-0.00103188, -0.00075814, -0.00051426, ...,  0.        ,
         0.        ,  0.        ], dtype=float32),
 array([ 7.6278877e-07,  2.1624428e-05,  1.1150542e-05, ...,
         1.8263392e-09, -1.5558380e-09,  0.0000000e+00], dtype=float32),
 array([-1.2111740e-07,  6.3130176e-07, -1.8378003e-06, ...,
         1.1309878e-05,  5.4562256e-06,  0.0000000e+00], dtype=float32),
 array([0.00026949, 0.00028119, 0.00020081, ..., 0.00032586, 0.00046612,
        0.        ], dtype=float32),
 array([ 0.0000000e+00,  0.0000000e+00,  0.0000000e+00, ...,
        -7.8796054e-09,  1.7431153e-08,  0.0000000e+00], dtype=float32),
 array([1.42000988e-06, 1.30781755e-05, 2.77493709e-05, ...,
        0.00000000e+00, 0.00000000e+00, 0.00000000e+00], dtype=float32),
 array([ 2.9161662e-10, -6.3629275e-11, -3.0565092e-10, ...,
         0.0000000e+00,  0.0000000e+00,  0.0000000e+00], dtype=float32),
 array([ 2.2051008e-05,  1.6838792e-05,  3.5639907e-05, ...,
         4.5767497e-06, -1.2002213e-05,  0.0000000e+00], dtype=float32),
 array([0., 0., 0., ..., 0., 0., 0.], dtype=float32),
 array([ 0.0000000e+00,  0.0000000e+00,  0.0000000e+00, ...,
        -2.0104826e-10,  1.6824393e-10,  0.0000000e+00], dtype=float32),
 array([ 0.0000000e+00,  0.0000000e+00,  0.0000000e+00, ...,
        -4.8303300e-06, -1.2008861e-05,  0.0000000e+00], dtype=float32),
 array([0., 0., 0., ..., 0., 0., 0.], dtype=float32),
 array([ 0.0000000e+00,  0.0000000e+00,  0.0000000e+00, ...,
        -2.7673337e-07,  2.8604177e-07,  0.0000000e+00], dtype=float32),
 array([-0.00066044, -0.0009837 , -0.00090796, ..., -0.00171516,
        -0.0017666 ,  0.        ], dtype=float32),
 array([ 3.2218946e-11, -5.5296181e-11,  8.9530647e-11, ...,
         0.0000000e+00,  0.0000000e+00,  0.0000000e+00], dtype=float32),
 array([0., 0., 0., ..., 0., 0., 0.], dtype=float32),
 array([-1.3284328e-05, -7.4090644e-07,  7.2679302e-07, ...,
         0.0000000e+00,  0.0000000e+00,  0.0000000e+00], dtype=float32),
 array([ 4.9886359e-05,  1.4642075e-04,  4.4365996e-04, ...,
         6.3584002e-07, -6.2395281e-07,  0.0000000e+00], dtype=float32),
 array([-3.2826196e-04,  4.5522624e-03, -8.2306744e-04, ...,
        -2.2519816e-07, -6.2417300e-08,  0.0000000e+00], dtype=float32),
 array([ 3.1686827e-04,  4.6282235e-04,  1.0160641e-04, ...,
        -1.4605960e-05,  6.6572487e-05,  0.0000000e+00], dtype=float32),
 array([ 0.0000000e+00,  0.0000000e+00,  0.0000000e+00, ...,
        -7.1763244e-09, -2.8297892e-08,  0.0000000e+00], dtype=float32),
 array([0., 0., 0., ..., 0., 0., 0.], dtype=float32),
 array([-2.5870585e-07,  4.6514080e-07, -9.5607948e-07, ...,
         0.0000000e+00,  0.0000000e+00,  0.0000000e+00], dtype=float32),
 array([ 5.788035e-07, -6.493598e-07,  7.111379e-07, ...,  0.000000e+00,
         0.000000e+00,  0.000000e+00], dtype=float32),
 array([ 2.5118000e-04,  1.4220485e-03,  3.9536849e-04, ...,
         4.5242754e-04, -3.1405249e-05,  0.0000000e+00], dtype=float32),
 array([0., 0., 0., ..., 0., 0., 0.], dtype=float32),
 array([ 1.1985266e-07,  2.1360799e-07, -1.1951373e-06, ...,
        -1.3043609e-04,  1.2107374e-06,  0.0000000e+00], dtype=float32),
 array([0.0000000e+00, 0.0000000e+00, 0.0000000e+00, ..., 2.5944988e-08,
        1.2123945e-07, 0.0000000e+00], dtype=float32),
 array([0., 0., 0., ..., 0., 0., 0.], dtype=float32),
 array([0., 0., 0., ..., 0., 0., 0.], dtype=float32),
 array([-2.4280996e-06, -1.2362683e-05, -8.5034850e-07, ...,
        -1.0113516e-11,  5.1403621e-12,  0.0000000e+00], dtype=float32),
 array([9.6098862e-05, 1.6449913e-04, 1.1942573e-04, ..., 0.0000000e+00,
        0.0000000e+00, 0.0000000e+00], dtype=float32),
 array([ 1.3284328e-05,  7.4090644e-07, -7.2679302e-07, ...,
         0.0000000e+00,  0.0000000e+00,  0.0000000e+00], dtype=float32),
 array([ 2.4700081e-05,  2.9454704e-05,  8.0751715e-06, ...,
         1.2746801e-07, -1.6574201e-06,  0.0000000e+00], dtype=float32),
 array([8.4619669e-06, 9.7476968e-06, 2.0182479e-05, ..., 2.1081217e-11,
        4.0220186e-10, 0.0000000e+00], dtype=float32),
 array([0., 0., 0., ..., 0., 0., 0.], dtype=float32))

在下面

('08',
 '08',
 '06',
 '05',
 '05',
 '04',
 '06',
 '07',
 '01',
 '04',
 '03',
 '07',
 '03',
 '01',
 '03',
 '03',
 '02',
 '02',
 '02',
 '02',
 '05',
 '06',
 '04',
 '08',
 '07',
 '06',
 '04',
 '05',
 '07',
 '02',
 '08',
 '01',
 '08',
 '03',
 '08',
 '02',
 '03',
 '06',
 '04',
 '07',
 '04',
 '07',
 '05',
 '06',
 '08',
 '08',
 '04',
 '05',
 '05',
 '04',
 '06',
 '07',
 '05',
 '07',
 '01',
 '06',
 '02',
 '02',
 '03',
 '03')

分类器代码加上训练/测试拆分:

from sklearn.model_selection import train_test_split

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.33, random_state=42)

from sklearn.tree import DecisionTreeClassifier
dtree = DecisionTreeClassifier()
dtree.fit(X_train, y_train)

错误:

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-70-b6417fbfb8de> in <module>()
      1 from sklearn.tree import DecisionTreeClassifier
      2 dtree = DecisionTreeClassifier()
----> 3 dtree.fit(X_train, y_train)

/usr/local/lib/python3.6/dist-packages/sklearn/tree/tree.py in fit(self, X, y, sample_weight, check_input, X_idx_sorted)
    788             sample_weight=sample_weight,
    789             check_input=check_input,
--> 790             X_idx_sorted=X_idx_sorted)
    791         return self
    792 

/usr/local/lib/python3.6/dist-packages/sklearn/tree/tree.py in fit(self, X, y, sample_weight, check_input, X_idx_sorted)
    114         random_state = check_random_state(self.random_state)
    115         if check_input:
--> 116             X = check_array(X, dtype=DTYPE, accept_sparse="csc")
    117             y = check_array(y, ensure_2d=False, dtype=None)
    118             if issparse(X):

/usr/local/lib/python3.6/dist-packages/sklearn/utils/validation.py in check_array(array, accept_sparse, dtype, order, copy, force_all_finite, ensure_2d, allow_nd, ensure_min_samples, ensure_min_features, warn_on_dtype, estimator)
    431                                       force_all_finite)
    432     else:
--> 433         array = np.array(array, dtype=dtype, order=order, copy=copy)
    434 
    435         if ensure_2d:

ValueError: setting an array element with a sequence.

EDIT1:我将 X 和 y 都转换为 numpy 数组,但我收到的错误是相同的,详情如下

import numpy as np
X = np.asarray(X)
y = np.asarray(y)


X.shape, y.shape

输出:

((60,), (60,))

最佳答案

看来问题出在您的 X 上。可能构成它的数组之一具有不同的长度,这会导致您构建的元组,并且在由 DecisionTreeClassifier 处理时被 Scikit-learn 转换为 Numpy 数组, 转换为字符串向量,这不是决策树函数期望处理的内容。

只需检查这段代码:

X1 = (array([-8.1530527e-10,  8.9952795e-10, -9.1185753e-10,
         0.0000000e+00,  0.0000000e+00,  0.0000000e+00], dtype='float32'),
 array([0., 0., 0., 0., 0., 0.], dtype='float32'),
 array([0., 0., 0., 0., 0., 0.], dtype='float32'))

X2 = (array([-8.1530527e-10,  8.9952795e-10, -9.1185753e-10,
         0.0000000e+00,  0.0000000e+00,  0.0000000e+00], dtype='float32'),
 array([0., 0., 0., 0., 0., 0., 1], dtype='float32'),
 array([0., 0., 0., 0., 0., 0.], dtype='float32'))

print("X1:", np.array(X1).dtype, "\nX2:", np.array(X2).dtype)

只需更改 X2 的第二个元素并添加一个数字,即可使 X2 数组变成字符串数组(对象类型)。

关于python - 在数据集上拟合决策树分类器时出现 ValueError,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/53351781/

相关文章:

返回缩进字符串的 Python 三引号

javascript - Python WebSocket 不工作

nlp - 我们什么时候从文本中提取动词短语?

python - 如何使用带有图像的边界框进行多标签图像训练?

python - 在 GridSearchCV 的管道中交替使用不同的模型

python - 子线程中I/O函数调用超时

python - Python 3 中的函数生成器与类生成器

python - 如果投票中有 "tie",带有 ovo 模型的多类 SVC 如何进行预测?

machine-learning - scikit learn 的classification_report中最后一行是什么意思

python - 来自 AWS S3 的 Sklearn joblib 加载函数 IO 错误