python - 使用 CNN 的时尚 MNIST ,过度拟合?

标签 python tensorflow keras neural-network conv-neural-network

cnn_model = Sequential()

cnn_model.add(Conv2D(64,3, 3, input_shape = (28,28,1), activation='relu'))
cnn_model.add(MaxPooling2D(pool_size = (2, 2)))

cnn_model.add(Dropout(0.25))

cnn_model.add(Flatten())
cnn_model.add(Dense(output_dim = 32, activation = 'relu'))
cnn_model.add(Dense(output_dim = 10, activation = 'sigmoid'))

cnn_model.compile(loss ='sparse_categorical_crossentropy', optimizer=Adam(lr=0.001),metrics =['accuracy'])

epochs = 50

history = cnn_model.fit(X_train,
                        y_train,
                        batch_size = 512,
                        nb_epoch = epochs,
                        verbose = 1,
                        validation_data = (X_validate, y_validate))

我最后得到这个结果:

Epoch 50/50
48000/48000 [==============================] - 35s 728us/step - loss: 0.1265 - accuracy: 0.9537 - val_loss: 0.2425 - val_accuracy: 0.9167
training loss=0.125 ,validation loss=0.2425
training accuracy=95.3% ,validation accuracy=91.67

我的问题如下:

  1. 模型是否过度拟合或欠拟合?
  2. 我应该增加编号吗?纪元?

Graph of Losses

最佳答案

由于模型过度拟合,您可以

  1. 通过在 cnn_model.fit 中使用 shuffle=True随机排列数据。
  2. 使用提前停止
  3. 使用正则化

完整的代码,使用与您相同的架构,并减少了过度拟合,如下所示。损失有所增加,但我们可以通过添加更多的卷积层和池化层来改善它。

# To support both python 2 and python 3
from __future__ import division, print_function, unicode_literals
from io import open

# Common imports
import numpy as np
import os
import tensorflow as tf
from tensorflow.keras.layers import Input, Conv2D, MaxPool2D, Dense, Dropout, Flatten
from tensorflow.keras.models import Sequential
from tensorflow.keras.optimizers import Adam
import matplotlib.pyplot as plt
from tensorflow.keras.regularizers import l2
from matplotlib.pyplot import axis as ax

tf.__version__

X = tf.placeholder(tf.float32, shape=[None, 784], name="X")
X_reshaped = tf.reshape(X, shape=[-1, 28, 28, 1])
y = tf.placeholder(tf.int32, shape=[None], name="y")

# instantiate regularizer
Regularizer = l2(0.001)

cnn_model = Sequential()

cnn_model.add(Conv2D(64,3, 3, input_shape = (28,28,1), activation='relu', data_format='channels_last', 
                    activity_regularizer=Regularizer, kernel_regularizer=Regularizer))

cnn_model.add(MaxPool2D(pool_size = (2, 2)))

cnn_model.add(Dropout(0.25))

cnn_model.add(Flatten())

cnn_model.add(Dense(units = 32, activation = 'relu', 
                    activity_regularizer=Regularizer, kernel_regularizer=Regularizer))

cnn_model.add(Dense(units = 10, activation = 'sigmoid', 
                    activity_regularizer=Regularizer, kernel_regularizer=Regularizer))

cnn_model.compile(loss ='sparse_categorical_crossentropy', optimizer=Adam(lr=0.001),metrics =['accuracy'])

epochs = 50

cnn_model.summary()

(X_train, y_train), (X_test, y_test) = tf.keras.datasets.fashion_mnist.load_data()
X_train = X_train.astype(np.float32).reshape(-1, 28*28) / 255.0
X_train_reshaped = tf.reshape(X_train, shape=[-1, 28, 28, 1])
X_test = X_test.astype(np.float32).reshape(-1, 28*28) / 255.0
X_test_reshaped = tf.reshape(X_test, shape=[-1, 28, 28, 1])

y_train = tf.cast(y_train, dtype = tf.int32)

y_test = tf.cast(y_test, dtype = tf.int32)

def shuffle_batch(X, y, batch_size):
    rnd_idx = np.random.permutation(len(X))
    n_batches = len(X) // batch_size
    for batch_idx in np.array_split(rnd_idx, n_batches):
        X_batch, y_batch = X[batch_idx], y[batch_idx]
        yield X_batch, y_batch

#steps_per_epoch = X_train_reshaped.shape[0]//512
steps_per_epoch = X_train_reshaped.shape[0].value//512

epochs = 50

callback = tf.keras.callbacks.EarlyStopping(monitor='val_loss', patience=15)

history = cnn_model.fit(x = X_train_reshaped,
                        y = y_train,
                        batch_size = 512,
                        epochs = epochs, callbacks=[callback],
                        verbose = 1, validation_data = (X_test_reshaped, y_test),
                        validation_steps = 10, steps_per_epoch=steps_per_epoch, shuffle = True)

print(history.history.keys())

#  "Accuracy"
plt.plot(history.history['acc'])
plt.plot(history.history['val_acc'])
plt.title('model accuracy')
plt.ylabel('accuracy')
plt.xlabel('epoch')
plt.legend(['train', 'validation'], loc='upper left')
plt.show()
# "Loss"
plt.plot(history.history['loss'])
plt.plot(history.history['val_loss'])
plt.title('model loss')
axes = plt.axes()
axes.set_ylim([0.2, 1.5])
plt.ylabel('loss')
plt.xlabel('epoch')
plt.legend(['train', 'validation'], loc='upper left')
plt.show()

关于python - 使用 CNN 的时尚 MNIST ,过度拟合?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/58858517/

相关文章:

python-3.x - Keras 张量 - 使用来自另一个张量的索引获取值

python: ValueError: 从 excel 中解压(预期 2)数据的值太多

python - 使用 python 应用程序打包 OpenCV

python - 将变量传递给子进程调用

python - Dataframe列像groupby但不是怎么办?

tensorflow - Tensorflow 的 tf.keras.layers.Dense 和 PyTorch 的 torch.nn.Linear 的区别?

tensorflow - 如何在pytorch vgg16模型中进行类激活映射?

tensorflow - Keras 中 2D 图像的张量形状

python - Talos 多 GPU 功能

deep-learning - Keras 中的 CNN-LSTM : Dimension Error