python - 获取错误 AttributeError : 'bool' object has no attribute 'transpose' when attempting to fit machine learning model

标签 python pandas machine-learning scikit-learn model

我正在尝试创建一个机器学习模型来预测谁能在泰坦尼克号上幸存下来。每次我尝试拟合模型时,都会收到此错误:

Traceback (most recent call last):
  File "c:\Users\seand\.vscode\extensions\ms-python.python-2020.6.89148\pythonFiles\ptvsd_launcher.py", line 48, in <module>
    main(ptvsdArgs)
  File "c:\Users\seand\.vscode\extensions\ms-python.python-2020.6.89148\pythonFiles\lib\python\old_ptvsd\ptvsd\__main__.py", line 432, in main
    run()
  File "c:\Users\seand\.vscode\extensions\ms-python.python-2020.6.89148\pythonFiles\lib\python\old_ptvsd\ptvsd\__main__.py", line 316, in run_file
    runpy.run_path(target, run_name='__main__')
    return _run_module_code(code, init_globals, run_name,
  File "D:\Python\lib\runpy.py", line 97, in _run_module_code
    _run_code(code, mod_globals, init_globals,
  File "D:\Python\lib\runpy.py", line 87, in _run_code
    exec(code, run_globals)
  File "d:\Kaggle\Titanic\titanic4.py", line 100, in <module>
    cat_cols2 = pd.DataFrame(OneHot1.fit_transform(new_df[cat_columns]))
  File "D:\Python\lib\site-packages\pandas\core\frame.py", line 2806, in __getitem__
    indexer = self.loc._get_listlike_indexer(key, axis=1, raise_missing=True)[1]
  File "D:\Python\lib\site-packages\pandas\core\indexing.py", line 1552, in _get_listlike_indexer
    self._validate_read_indexer(
  File "D:\Python\lib\site-packages\pandas\core\indexing.py", line 1640, in _validate_read_indexer
    raise KeyError(f"None of [{key}] are in the [{axis_name}]")
KeyError: "None of [Index(['Name', 'Sex', 'Ticket', 'Cabin', 'Embarked'], dtype='object')] are in the [columns]"
PS D:\Kaggle\Titanic>  cd 'd:\Kaggle\Titanic'; ${env:PYTHONIOENCODING}='UTF-8'; ${env:PYTHONUNBUFFERED}='1'; & 'D:\Python\python.exe' 'c:\Users\seand\.vscode\extensions\ms-python.python-2020.6.89148\pythonFiles\ptvsd_launcher.py' '--default' '--client' '--host' 'localhost' '--port' '60778' 'd:\Kaggle\Titanic\titanic4.py'
Traceback (most recent call last):
  File "c:\Users\seand\.vscode\extensions\ms-python.python-2020.6.89148\pythonFiles\ptvsd_launcher.py", line 48, in <module>
    main(ptvsdArgs)
  File "c:\Users\seand\.vscode\extensions\ms-python.python-2020.6.89148\pythonFiles\lib\python\old_ptvsd\ptvsd\__main__.py", line 432, in main
    run()
  File "c:\Users\seand\.vscode\extensions\ms-python.python-2020.6.89148\pythonFiles\lib\python\old_ptvsd\ptvsd\__main__.py", line 316, in run_file
    runpy.run_path(target, run_name='__main__')
  File "D:\Python\lib\runpy.py", line 265, in run_path
    return _run_module_code(code, init_globals, run_name,
  File "D:\Python\lib\runpy.py", line 97, in _run_module_code
    _run_code(code, mod_globals, init_globals,
  File "D:\Python\lib\runpy.py", line 87, in _run_code
    exec(code, run_globals)
  File "d:\Kaggle\Titanic\titanic4.py", line 143, in <module>
    my_pipeline.fit(new_df,y)
  File "D:\Python\lib\site-packages\sklearn\pipeline.py", line 330, in fit
    Xt = self._fit(X, y, **fit_params_steps)
  File "D:\Python\lib\site-packages\sklearn\pipeline.py", line 292, in _fit
    X, fitted_transformer = fit_transform_one_cached(
  File "D:\Python\lib\site-packages\joblib\memory.py", line 352, in __call__
    return self.func(*args, **kwargs)
  File "D:\Python\lib\site-packages\sklearn\pipeline.py", line 740, in _fit_transform_one
    res = transformer.fit_transform(X, y, **fit_params)
  File "D:\Python\lib\site-packages\sklearn\compose\_column_transformer.py", line 531, in fit_transform
    result = self._fit_transform(X, y, _fit_transform_one)
  File "D:\Python\lib\site-packages\sklearn\compose\_column_transformer.py", line 458, in _fit_transform
    return Parallel(n_jobs=self.n_jobs)(
  File "D:\Python\lib\site-packages\joblib\parallel.py", line 1032, in __call__
    while self.dispatch_one_batch(iterator):
  File "D:\Python\lib\site-packages\joblib\parallel.py", line 847, in dispatch_one_batch
    self._dispatch(tasks)
  File "D:\Python\lib\site-packages\joblib\parallel.py", line 765, in _dispatch
    job = self._backend.apply_async(batch, callback=cb)
  File "D:\Python\lib\site-packages\joblib\_parallel_backends.py", line 206, in apply_async
    result = ImmediateResult(func)
  File "D:\Python\lib\site-packages\joblib\_parallel_backends.py", line 570, in __init__
    self.results = batch()
  File "D:\Python\lib\site-packages\joblib\parallel.py", line 252, in __call__
    return [func(*args, **kwargs)
  File "D:\Python\lib\site-packages\joblib\parallel.py", line 252, in <listcomp>
    return [func(*args, **kwargs)
    res = transformer.fit_transform(X, y, **fit_params)
  File "D:\Python\lib\site-packages\sklearn\pipeline.py", line 367, in fit_transform
    Xt = self._fit(X, y, **fit_params_steps)
  File "D:\Python\lib\site-packages\sklearn\pipeline.py", line 292, in _fit
    X, fitted_transformer = fit_transform_one_cached(
  File "D:\Python\lib\site-packages\joblib\memory.py", line 352, in __call__
    return self.func(*args, **kwargs)
  File "D:\Python\lib\site-packages\sklearn\pipeline.py", line 740, in _fit_transform_one
    res = transformer.fit_transform(X, y, **fit_params)
  File "D:\Python\lib\site-packages\sklearn\base.py", line 693, in fit_transform
    return self.fit(X, y, **fit_params).transform(X)
  File "D:\Python\lib\site-packages\sklearn\impute\_base.py", line 459, in transform
    coordinates = np.where(mask.transpose())[::-1]
AttributeError: 'bool' object has no attribute 'transpose'
PS D:\Kaggle\Titanic>  cd 'd:\Kaggle\Titanic'; ${env:PYTHONIOENCODING}='UTF-8'; ${env:PYTHONUNBUFFERED}='1'; & 'D:\Python\python.exe' 'c:\Users\seand\.vscode\extensions\ms-python.python-2020.6.89148\pythonFiles\ptvsd_launcher.py' '--default' '--client' '--host' 'localhost' '--port' '60800' 'd:\Kaggle\Titanic\titanic4.py' 
Traceback (most recent call last):
  File "c:\Users\seand\.vscode\extensions\ms-python.python-2020.6.89148\pythonFiles\ptvsd_launcher.py", line 48, in <module>
    main(ptvsdArgs)
  File "c:\Users\seand\.vscode\extensions\ms-python.python-2020.6.89148\pythonFiles\lib\python\old_ptvsd\ptvsd\__main__.py", line 432, in main
    run()
  File "c:\Users\seand\.vscode\extensions\ms-python.python-2020.6.89148\pythonFiles\lib\python\old_ptvsd\ptvsd\__main__.py", line 316, in run_file
    runpy.run_path(target, run_name='__main__')
  File "D:\Python\lib\runpy.py", line 265, in run_path
    return _run_module_code(code, init_globals, run_name,
  File "D:\Python\lib\runpy.py", line 97, in _run_module_code
    _run_code(code, mod_globals, init_globals,
  File "D:\Python\lib\runpy.py", line 87, in _run_code
    exec(code, run_globals)
  File "d:\Kaggle\Titanic\titanic4.py", line 122, in <module>
    my_pipeline.fit(new_df,y)
  File "D:\Python\lib\site-packages\sklearn\pipeline.py", line 330, in fit
    Xt = self._fit(X, y, **fit_params_steps)
  File "D:\Python\lib\site-packages\sklearn\pipeline.py", line 292, in _fit
    X, fitted_transformer = fit_transform_one_cached(
  File "D:\Python\lib\site-packages\joblib\memory.py", line 352, in __call__
    return self.func(*args, **kwargs)
  File "D:\Python\lib\site-packages\sklearn\pipeline.py", line 740, in _fit_transform_one
    res = transformer.fit_transform(X, y, **fit_params)
  File "D:\Python\lib\site-packages\sklearn\compose\_column_transformer.py", line 531, in fit_transform
    result = self._fit_transform(X, y, _fit_transform_one)
  File "D:\Python\lib\site-packages\sklearn\compose\_column_transformer.py", line 458, in _fit_transform
    return Parallel(n_jobs=self.n_jobs)(
  File "D:\Python\lib\site-packages\joblib\parallel.py", line 1032, in __call__
    while self.dispatch_one_batch(iterator):
  File "D:\Python\lib\site-packages\joblib\parallel.py", line 847, in dispatch_one_batch
    self._dispatch(tasks)
  File "D:\Python\lib\site-packages\joblib\parallel.py", line 765, in _dispatch
    job = self._backend.apply_async(batch, callback=cb)
  File "D:\Python\lib\site-packages\joblib\_parallel_backends.py", line 206, in apply_async
    result = ImmediateResult(func)
  File "D:\Python\lib\site-packages\joblib\_parallel_backends.py", line 570, in __init__
    self.results = batch()
  File "D:\Python\lib\site-packages\joblib\parallel.py", line 252, in __call__
    return [func(*args, **kwargs)
  File "D:\Python\lib\site-packages\joblib\parallel.py", line 252, in <listcomp>
    return [func(*args, **kwargs)
  File "D:\Python\lib\site-packages\sklearn\pipeline.py", line 740, in _fit_transform_one
    res = transformer.fit_transform(X, y, **fit_params)
  File "D:\Python\lib\site-packages\sklearn\pipeline.py", line 367, in fit_transform
    Xt = self._fit(X, y, **fit_params_steps)
  File "D:\Python\lib\site-packages\sklearn\pipeline.py", line 292, in _fit
    X, fitted_transformer = fit_transform_one_cached(
  File "D:\Python\lib\site-packages\joblib\memory.py", line 352, in __call__
    return self.func(*args, **kwargs)
  File "D:\Python\lib\site-packages\sklearn\pipeline.py", line 740, in _fit_transform_one
    res = transformer.fit_transform(X, y, **fit_params)
  File "D:\Python\lib\site-packages\sklearn\base.py", line 693, in fit_transform
    return self.fit(X, y, **fit_params).transform(X)
  File "D:\Python\lib\site-packages\sklearn\impute\_base.py", line 459, in transform
    coordinates = np.where(mask.transpose())[::-1]
AttributeError: 'bool' object has no attribute 'transpose'
我正在运行的代码如下:

from xgboost import XGBClassifier
from sklearn.preprocessing import OneHotEncoder
from sklearn.compose import ColumnTransformer
from sklearn.pipeline import Pipeline
from sklearn.impute import SimpleImputer
from sklearn.feature_selection import SelectFromModel
from itertools import combinations
import pandas as pd 
import numpy as np

#read in data
training_data = pd.read_csv('train.csv')
testing_data = pd.read_csv('test.csv')




#seperate X and Y
X_train_full = training_data.copy()
y = X_train_full.Survived
X_train_full.drop(['Survived'], axis=1, inplace=True)

y_test = testing_data

#get all str columns
cat_columns1 = [cname for cname in X_train_full.columns if
                    X_train_full[cname].dtype == "object"]

interactions = pd.DataFrame(index= X_train_full)

#create new features
for combination in combinations(cat_columns1,2):
    imputer = SimpleImputer(strategy='constant')

    new_col_name = '_'.join(combination)
    col1 = X_train_full[combination[0]]
    col2 = X_train_full[combination[1]]
    col1 = np.array(col1).reshape(-1,1)
    col2 = np.array(col2).reshape(-1,1)
    col1 = imputer.fit_transform(col1)
    col2 = imputer.fit_transform(col2)


    new_vals = col1 + '_' + col2
    OneHot = OneHotEncoder()




    interactions[new_col_name] = OneHot.fit_transform(new_vals)
 

interactions = interactions.reset_index(drop = True)


#create new dataframe with new features included
new_df = X_train_full.join(interactions)
 

#do the same for the test file
interactions2 = pd.DataFrame(index= y_test)
for combination in combinations(cat_columns1,2):
    imputer = SimpleImputer(strategy='constant')

    new_col_name = '_'.join(combination)
    col1 = y_test[combination[0]]
    col2 = y_test[combination[1]]
    col1 = np.array(col1).reshape(-1,1)
    col2 = np.array(col2).reshape(-1,1)
    col1 = imputer.fit_transform(col1)
    col2 = imputer.fit_transform(col2)


    new_vals = col1 + '_' + col2

    OneHot = OneHotEncoder()




    interactions2[new_col_name] = OneHot.fit_transform(new_vals)


    interactions2[new_col_name] = new_vals
 

interactions2 = interactions2.reset_index(drop = True)
y_test = y_test.join(interactions2)


#get names of cat columns (with new features added)
cat_columns = [cname for cname in new_df.columns if
                    new_df[cname].dtype == "object"]

# Select numerical columns
num_columns = [cname for cname in new_df.columns if 
                new_df[cname].dtype in ['int64', 'float64']]



#set up pipeline
numerical_transformer = SimpleImputer(strategy = 'constant')


categorical_transformer = Pipeline(steps=[
    ('imputer', SimpleImputer(strategy='constant')),
    ('onehot', OneHotEncoder(handle_unknown='ignore'))
])


preprocessor = ColumnTransformer(
    transformers=[
        ('num', numerical_transformer, num_columns),
        ('cat', categorical_transformer, cat_columns)
    ])
model = XGBClassifier()

my_pipeline = Pipeline(steps=[('preprocessor', preprocessor),
                              ('model', model)
                             ])
#fit model
my_pipeline.fit(new_df,y)

我正在阅读的 csv 文件可通过以下链接从 Kaggle 获得:
https://www.kaggle.com/c/titanic/data
我无法弄清楚是什么导致了这个问题。任何帮助将非常感激。

最佳答案

这可能是因为您的数据包含 pd.NA值。 pd.NA在 pandas 1.0.0 中引入,但仍被标记为实验性的。SimpleImputer最终会运行 data == np.nan ,这通常会返回一个 numpy 数组。相反,它在 data 时返回单个 bool 标量。包含 pd.NA值。
一个例子:

import pandas as pd
import numpy as np

test_pd_na = pd.DataFrame({"A": [1, 2, 3, pd.NA]})
test_np_nan = pd.DataFrame({"A": [1, 2, 3, np.nan]})

test_np_nan.to_numpy() == np.nan:
> array([[False],
       [False],
       [False],
       [False]])

test_pd_na.to_numpy() == np.nan

> False
解决方案是转换所有 pd.NA值为 np.nan运行前 SimpleImputer .您可以使用 .replace({pd.NA: np.nan})为此目的,在您的数据框中。不利的一面显然是您失去了福利pd.NA带来,例如缺少数据的整数列,而不是将这些列转换为浮点列。

关于python - 获取错误 AttributeError : 'bool' object has no attribute 'transpose' when attempting to fit machine learning model,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/62520099/

相关文章:

python - Eager Execution 与 tf.function 的关系

python - 将文本中的值替换为 python 字典中的整数值并求其总和

python - 开始使用 Keras 进行机器学习

python - 应如何对音频进行预处理以进行分类?

python - Pandas 重新采样将周末推迟到周五

python - 有两种方式从 Pandas DataFrame 中提取单列,有什么区别?

python - igraph 到达边界框的最边缘

python - PanelOLS pandas 线性模型文档

python - 计算数据框中所有条目对的编辑距离,无需重复对

open-source - 决策树归纳开源代码