python - keras 准确率提高不超过 59%

标签 python tensorflow keras

这是我试过的代码:

# normalizing the train data
cols_to_norm = ["WORK_EDUCATION", "SHOP", "OTHER",'AM','PM','MIDDAY','NIGHT', 'AVG_VEH_CNT', 'work_traveltime', 'shop_traveltime','work_tripmile','shop_tripmile', 'TRPMILES_sum',
                'TRVL_MIN_sum', 'TRPMILES_mean', 'HBO', 'HBSHOP', 'HBW', 'NHB', 'DWELTIME_mean','TRVL_MIN_mean', 'work_dweltime', 'shop_dweltime', 'firsttrip_time', 'lasttrip_time']
dataframe[cols_to_norm] = dataframe[cols_to_norm].apply(lambda x: (x - x.min()) / (x.max()-x.min()))
# labels    
y = dataframe.R_SEX.values
# splitting train and test set
X_train, X_test, y_train, y_test =train_test_split(X, y, test_size=0.33, random_state=42)

model = Sequential()
model.add(Dense(256, input_shape=(X_train.shape[1],), activation='relu'))
model.add(Dense(256, activation='relu'))
model.add(layers.Dropout(0.3))
model.add(Dense(256, activation='relu'))
model.add(layers.Dropout(0.3))
model.add(Dense(1, activation='sigmoid'))

model.compile(loss='binary_crossentropy', optimizer='adam' , metrics=['acc'])
print(model.summary())

model.fit(X_train, y_train , batch_size=128, epochs=30, validation_split=0.2)
Epoch 23/30
1014/1014 [==============================] - 4s 4ms/step - loss: 0.6623 - acc: 0.5985 - val_loss: 0.6677 - val_acc: 0.5918
Epoch 24/30
1014/1014 [==============================] - 4s 4ms/step - loss: 0.6618 - acc: 0.5993 - val_loss: 0.6671 - val_acc: 0.5925
Epoch 25/30
1014/1014 [==============================] - 4s 4ms/step - loss: 0.6618 - acc: 0.5997 - val_loss: 0.6674 - val_acc: 0.5904
Epoch 26/30
1014/1014 [==============================] - 4s 4ms/step - loss: 0.6614 - acc: 0.6001 - val_loss: 0.6669 - val_acc: 0.5911
Epoch 27/30
1014/1014 [==============================] - 4s 4ms/step - loss: 0.6608 - acc: 0.6004 - val_loss: 0.6668 - val_acc: 0.5920
Epoch 28/30
1014/1014 [==============================] - 4s 4ms/step - loss: 0.6605 - acc: 0.6002 - val_loss: 0.6679 - val_acc: 0.5895
Epoch 29/30
1014/1014 [==============================] - 4s 4ms/step - loss: 0.6602 - acc: 0.6009 - val_loss: 0.6663 - val_acc: 0.5932
Epoch 30/30
1014/1014 [==============================] - 4s 4ms/step - loss: 0.6597 - acc: 0.6027 - val_loss: 0.6674 - val_acc: 0.5910
<tensorflow.python.keras.callbacks.History at 0x7fdd8143a278>
我曾尝试修改神经网络并仔细检查数据。
我能做些什么来改善结果吗?模型不够深?是否有适合我的数据的替代模型?这是否意味着这些特征没有预测值(value)?我有点困惑接下来要做什么。
谢谢你
更新:
我尝试在我的数据框中添加新列,这是用于性别分类的 KNN 模型的结果。这是我所做的:
#Import knearest neighbors Classifier model
from sklearn.neighbors import KNeighborsClassifier

#Create KNN Classifier
knn = KNeighborsClassifier(n_neighbors=41)

#Train the model using the training sets
knn.fit(X, y)

#predict sex for the train set so that it can be fed to the nueral net
y_pred = knn.predict(X)

#add the outcome of knn to the train set
X = X.assign(KNN_result=y_pred)
它将训练和验证准确度提高了 61%。
Epoch 26/30
1294/1294 [==============================] - 8s 6ms/step - loss: 0.6525 - acc: 0.6166 - val_loss: 0.6604 - val_acc: 0.6095
Epoch 27/30
1294/1294 [==============================] - 8s 6ms/step - loss: 0.6523 - acc: 0.6173 - val_loss: 0.6596 - val_acc: 0.6111
Epoch 28/30
1294/1294 [==============================] - 8s 6ms/step - loss: 0.6519 - acc: 0.6177 - val_loss: 0.6614 - val_acc: 0.6101
Epoch 29/30
1294/1294 [==============================] - 8s 6ms/step - loss: 0.6512 - acc: 0.6178 - val_loss: 0.6594 - val_acc: 0.6131
Epoch 30/30
1294/1294 [==============================] - 8s 6ms/step - loss: 0.6510 - acc: 0.6183 - val_loss: 0.6603 - val_acc: 0.6103
<tensorflow.python.keras.callbacks.History at 0x7fe981bbe438>
谢谢

最佳答案

在我看来,对于神经网络,您的数据不够多样化。您的数据集中有很多相似的值。这可能是精度低的原因。尝试一个简单的回归器而不是神经网络。
无论如何,如果您想使用神经网络,您应该更改以下内容:
通常对于回归,您应该将最后一层的激活函数设置为“relu”或“linear”,sigmoid 通常用于隐藏层。
先尝试改变这些。如果它不起作用,请尝试不同的策略:

  • 增加批量大小
  • 增加 epoch 数
  • 在运行之前(预处理阶段)对数据集应用白化。
  • 降低学习率,你应该使用 scheduler

  • 美白你可以这样做:
    from sklearn.decomposition import PCA
    
    pca = PCA(whiten=True)
    pca.fit(X)
    X = pca.transform(X)
    
    # make here train test split ...
    
    X_test = pca.transform(X_test) # use the same pca model for the test set.
    
    
    您的数据集中有很多零。这里有一个每列零值百分比的列表(在 0 和 1 之间):
    0.6611697598907094 WORK_EDUCATION
    0.5906196483663051 SHOP
    0.15968546556987515 OTHER
    0.4517919980835284 AM
    0.3695455825652879 PM
    0.449195697003247 MIDDAY
    0.8160996565242585 NIGHT
    0.03156998520561604 AVG_VEH_CNT
    1.618641571247746e-05 work_traveltime
    2.2660981997468445e-05 shop_traveltime
    0.6930343378622924 work_tripmile
    0.605410795044367 shop_tripmile
    0.185622578107549 TRPMILES_sum
    3.237283142495492e-06 TRVL_MIN_sum
    0.185622578107549 TRPMILES_mean
    0.469645614614391 HBO
    0.5744850291841075 HBSHOP
    0.8137429143965219 HBW
    0.5307266729469959 NHB
    0.2017960446874565 DWELTIME_mean
    1.618641571247746e-05 TRVL_MIN_mean
    0.6959996892208183 work_dweltime
    0.6099365168775757 shop_dweltime
    0.0009258629787537107 firsttrip_time
    0.002949164942813393 lasttrip_time
    0.7442934791405661 age_2.0
    0.7541995655566023 age_3.0
    0.7081200773063214 age_4.0
    0.9401296855626884 age_5.0
    0.3490503429901489 KNN_result
    

    关于python - keras 准确率提高不超过 59%,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/63564017/

    相关文章:

    python - 在 python 中,如果我对复杂数据执行 fft,然后仅对正频率进行 irfft,这对数据有何影响?

    python - sys.path 修改不适用于 python2 但适用于 python3

    Python Tensorflow 创建具有多个数组特征的 tfrecord

    python - Tensorflow 模型返回疯狂的损失值

    python - 如何在 Google Speech API Python 客户端中设置语言

    python - 如何在Python中加快像函数一样的卷积?

    python - Keras CNN,详细的训练进度条显示

    python - 如何在 Tensorflow 中可视化 cnn 中的权重(变量)?

    validation - 训练准确性提高但验证准确性保持不变

    machine-learning - Keras 中的多对一和多对多 LSTM 示例