python - 类型错误 : unsupported operand type(s) for -: 'int' and 'StandardScaler

标签 python machine-learning scikit-learn xgboost

大家好,感谢您的光临, 当我尝试获取预测形式训练模型时遇到问题,我尝试了很多代码来解决这个问题,基本上,我的数据包含很多类型,这是一个 sample of my data我想对我的数据进行 StandardScaler... 这是原始代码...

# Load the training set
df = pd.read_csv('training_data/R_training_set.csv', 
low_memory=False)
df = df.dropna(how='any',axis=0)
df.shape

ohe_fields=['one_way','surface_type','street_type','hour','weekday','month']

# One-Hot encode a couple of variables
df_ohe = pd.get_dummies(df,columns=ohe_fields)

# Get the one-hot variable names
ohe_feature_names = pd.get_dummies(df[ohe_fields],columns=ohe_fields).columns.tolist()
df_ohe.head()

这就是我的数据的样子

enter image description here

 # Sinuosity is typically close to 1, even for moderately curvy roads. A high sinuosity means a longer road.
feature_transforms = {
    'sinuosity': np.log
}
for feature,transform in feature_transforms.items():
    df_ohe[feature] = transform(df_ohe[feature])

# Continuously valued features
float_feature_names = [
    'accident_counts',
    'speed_limit',
    'aadt',
    'surface_width',
    'sinuosity',
    'euclidean_length',
    'segment_length',
    'road_orient_approx',
    'Rain',
    'dust',
    'temperature',
    'visibility',
    'wind_speed',
    'proximity_to_billboard',
    'proximity_to_major_road',
    'proximity_to_signal',
    'proximity_to_nearest_intersection',
    'proximity_to_nearest_exit',
    'population_density',
    'Hopspot'
]
float_features = df_ohe.xs(float_feature_names,axis=1).values

# Use scikit-learn's StandardScaler
scaler = StandardScaler()
float_scaled = scaler.fit_transform(float_features)
#print (float_features.mean(axis=0))

df_ohe[float_feature_names] = float_scaled
with open('scalers_4.pkl','wb') as fp:
    pickle.dump(scaler,fp)


y = df['target'].values
binary_feature_names = [
    'dew_point_temperature',
    'EXTREME_AIR_TEMPERATURE',
    'at_exit',
    'at_intersection',
]
df_ohe = df_ohe.xs(float_feature_names+binary_feature_names+ohe_feature_names,axis=1)


X = df_ohe.values
y = df['target'].values
feature_names = df_ohe.columns.tolist()


wrangler = {
    'float_scaler_mean': float_features.mean(axis=0),
    'float_scaler_std': float_scaled,
    'float_feature_names': float_feature_names,
    'ohe_fields': ohe_fields,
    'feature_names': feature_names,
    'feature_transforms': feature_transforms 
}

P.S 我使用 xgboost 算法来训练数据

with open('wrangler_2.pkl','rb') as fp:
    wrangler = pickle.load(fp)
float_scaler_mean = wrangler['float_scaler_mean']
float_scaler_std = wrangler['float_scaler_std']
float_feature_names = wrangler['float_feature_names']
ohe_fields = wrangler['ohe_fields']
feature_names = wrangler['feature_names']
booster = xgboost.Booster()
booster.load_model('new_0003.model')


def make_test_set(df,wrangler):
    float_scaler_mean = wrangler['float_scaler_mean']
    float_scaler_std = wrangler['float_scaler_std']
    float_feature_names = wrangler['float_feature_names']
    ohe_fields = wrangler['ohe_fields']
    feature_names = wrangler['feature_names']
    print(len(feature_names))
    df_ohe = pd.get_dummies(df,columns=ohe_fields)

    float_features = df.xs(float_feature_names,axis=1).values
    float_features = (float_features - float_scaler_mean) / float_scaler_std
    for i,fname in enumerate(float_feature_names):
        df_ohe[fname] = float_features[:,i]

    empty_features = list(set(feature_names) - set(df_ohe.columns.tolist()))

    #for f in empty_features:
    #df_ohe[f] = 0

    df_ohe = df_ohe[feature_names]#.drop(columns=['SegID.1','geometry','SHAPE']+drop_columns,errors='ignore')

    print(df_ohe.columns)
    print(df_ohe.columns.tolist())
    X = df_ohe.values
    feature_names = df_ohe.columns.tolist()
    return X, feature_names

X,names = make_test_set(test_df,wrangler)
print (X.shape)
print (X[0])

我收到错误

TypeError: unsupported operand type(s) for -: 'int' and 'StandardScaler'

我做错了什么吗? 引导我走正确的道路,谢谢...

最佳答案

当您创建 wrangler 变量时,您已经传递了 scaler 值,它是一个 StandardScalar 对象。尝试假设您定义 wrangler 时想要做什么,我可以想到:

wrangler = {
    'float_scaler_mean': float_features.mean(axis=0),
    'float_scaler_std': float_scaled,
    'float_feature_names': float_feature_names,
    'ohe_fields': ohe_fields,
    'feature_names': feature_names,
    'feature_transforms': feature_transforms 
}

你到底想做什么?您不能将 StandardScaler 对象作为 float_scaler_mean

执行 float_features = (float_features - float_scaler_mean)/float_scaler_std

关于python - 类型错误 : unsupported operand type(s) for -: 'int' and 'StandardScaler,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/58655461/

相关文章:

python - scikit-learn 中聚类超参数评估的网格搜索

python - 从命名元组列表中写入 numpy.array()

python - 绘制圆形部分并为其添加阴影

python - 如何使用 'set' 保持有序列表?

machine-learning - 如何找到最受欢迎的产品

python - 对 GridSearchCV 的工作感到困惑

python - Scikit-learn (0.22.1) KNNImputer 返回错误数量的值

python - python 中 SVM 的主动学习(例如池采样)

python-2.7 - 具有单个样本的 Sklearn 训练模型引发 DeprecationWarning

python - 如何在使用 Scikit-Learn 创建 ROC 曲线时使用预测分数