python - 神经网络识别手写数字: Dealing with multiple outputs

我正在尝试编写自己的神经网络作为学习练习。具体来说，我正在尝试创建一个神经网络来识别手写数字。我正在使用sklearn的digits dataset ，但我自己编写了神经网络。

简单的测试是成功的，即“或”门或“与”门，因此我相信反向传播已成功实现，但我发现训练后，网络在使用手写数字的 8x8 像素图像。我目前有 64 个输入(8x8 图像)和 10 个输出(每个数字一个)，每个隐藏层有 2 个大小为 4 的隐藏层。我怀疑是多个输出导致了问题，网络通常会达到 [0.1, 0.1, 0.1...] 的激活值(即 0.0 * 9 + 1.0 * 1 的平均值)。

可能的想法:

1) 多个输出是否会导致问题？

2)是否需要更好的误差函数？

3)我是否只需要以较小的学习率对系统进行更长时间的训练？

Image showing the error over iterations

Image showing the prediction of a 1 (i.e. output should be ~[0,1,0,0,0,0,0,0,0,0]) after training

有人遇到过类似的问题吗？或者可以建议我哪里可能出错？如果之前有人问过这个问题但我没有找到，感谢您的耐心等待!代码如下:

编辑: charlesreid1 和 jdehesa 都是对的，事实上我的网络架构太简单了，无法处理这项任务。更具体地说，我有 2 层，每层 4 个神经元，每个层尝试处理 64 个输入。将我的隐藏层更改为 3 层，每层 100 个神经元，使我能够达到 90% 的准确度分数(假设输出 > 0.7 被视为阳性结果)。

    # Import our dependencies

    import numpy as np
    from sklearn import datasets

    class Neural_Network():

        #Initalising function
        def __init__(self, input_size, output_size, niteration = 100000):

            np.random.seed(1)

            self.niteration = niteration
            self.layer_sizes = np.array([input_size, output_size])
            self.weights = list()
            self.error = np.array([])

            # initialise random weights
            self._recreate_weights()


        def _recreate_weights(self):
            # Recreate the weights after adding a hidden layer
            self.weights = list()

            for i in np.arange(len(self.layer_sizes) - 1):

                weights = np.random.rand(self.layer_sizes[i], self.layer_sizes[i+1]) * 2 - 1
                self.weights.append(weights)
            self.momentum = [i * 0 for i in self.weights]


        def add_hidden_layer(self,size):
            # Add a new hidden layer to our neural network
            self.layer_sizes = np.insert(self.layer_sizes, -1, size)
            self._recreate_weights()



        def _sigmoid(self, x, deriv=False):

            if deriv:
                return self._sigmoid(x, deriv=False)*(1-self._sigmoid(x, deriv=False))
            else:
                return 1.0/(1+np.exp(-x))



        def predict(self, input_single, deriv=False, layer_output = False):

            data_current_layer = input_single
            output_list = list()
            output_list.append(np.array([data_current_layer]))
            for i in np.arange(len(self.layer_sizes) - 1):
                data_current_layer = self._sigmoid(np.dot(data_current_layer, self.weights[i]), deriv)
                output_list.append(np.array([data_current_layer]))

            return(output_list)



        def train2(self, input_training_data, input_training_labels):

            for iterations in np.arange(self.niteration):
                # Loop over all training sets niteration times

                updates = [i * 0 for i in network.weights] # Used for storing the update to the weights
                mean_error = np.array([]) # used for calculating the mean error

                for i in np.arange(len(input_training_data)): # For each training example

                    activations = list() # Store all my activations in a list
                    activations.append(np.array([input_training_data[i]]))

                    for j in np.arange(len(self.layer_sizes) - 1):
                        # Calculate all the activations for every layer

                        z = np.dot(activations[-1], self.weights[j])
                        a = self._sigmoid(z, deriv = False)
                        activations.append(a)

                    error = list()
                    error.append(a[-1] - np.array([input_training_labels[i]]))

                    for j in np.arange(len(self.layer_sizes) - 2):
                        # Calculate the error term for each layer

                        j2 = (-1 * j) - 1
                        j3 = j2 - 1
                        d = np.dot(error[j], self.weights[j2].T) * activations[j3] * (1 - activations[j3])
                        error.append(d)

                    for j in np.arange(len(self.layer_sizes) - 1):
                        # calculate the gradient for the error with respect to the weights

                        j2 = (-1 * j) - 1
                        updates[j] += np.dot(activations[j].T, error[j2])


                    mean_error = np.append(mean_error, np.sum(np.abs(error[0])))

                updates = [0.001*i/len(input_training_data) for i in updates] # Add in a learning rate
                self.error = np.append(self.error,np.mean(mean_error))

                for i in np.arange(len(self.weights)):
                    # update using a momentum term
                    self.momentum[i] -= updates[i]
                    self.weights[i]  += self.momentum[i]
                    self.momentum[i] *= 0.9

                if np.mod(iterations, 1000) == 0:
                    # Visually keep track of the error
                    print(iterations, self.error[-1])


    # Main Loop


    # Read in the dataset and divide into a training and test set
    data = datasets.load_digits()
    images = data.images
    labels = data.target
    targets = data.target_names

    training_images = images[:int(len(labels*0.8))]
    training_labels = labels[:int(len(labels*0.8))]

    training_images = images[:10]
    training_labels = labels[:10]

    test_images = images[int(len(labels*0.8)):]
    test_labels = labels[int(len(labels*0.8)):]

    # Flatten the training and test images using ravel. CAN PROBABLY DO THIS BEFORE DIVIDING THEM UP.
    training_images_list = list()
    for i in training_images:
        training_images_list.append(np.ravel(i))

    test_images_list = list()
    for i in test_images:
        test_images_list.append(np.ravel(i))


    # Change the training and test labels into a more usable format.

    training_labels_temp=np.zeros([np.size(training_labels), 10])
    for i in np.arange(np.size(training_labels)):
        training_labels_temp[i, training_labels[i]] = 1
    training_labels = training_labels_temp

    test_labels_temp=np.zeros([np.size(test_labels), 10])
    for i in np.arange(np.size(test_labels)):
        test_labels_temp[i, test_labels[i]] = 1
    test_labels = test_labels_temp


    # Build a 3 layered neural network, input - hidden - output

    if True:
        network = Neural_Network(input_size=64, output_size=10)

        network.add_hidden_layer(size=4)
        network.add_hidden_layer(size=4)
        network.add_hidden_layer(size=4)



        # Train the network on our training set
        #print(network.weights)
        network.train2(input_training_data = training_images_list, input_training_labels = training_labels)
        #print(network.weights)

        # Calculate the error on our test set

        #network.calculate_error(test_set = test_images, test_labels = test_labels)

最佳答案

问题肯定出在你的网络架构上——具体来说，是第一个隐藏层。您将 8x8 输入输入到具有 4 个神经元的隐藏层。首先，没有足够的神经元，64 个像素中包含的信息仅通过四个神经元就被洗掉了。另一个问题(如果有足够多的神经元，这个问题可能会消失)是，由于 predict() 函数使用点积，每个神经元都完全连接到输入。

识别手写数字的任务本质上与像素的空间配置相关，因此您的网络应该利用这些知识。您应该将输入图像的不同部分提供给第一层中的不同神经元。这为这些神经元提供了根据图像中像素的排列来放大更强或抑制更弱信号的机会(例如，如果您在角落看到大信号，则它不太可能是 1，如果您在右侧看到大信号)在中心，它不太可能是 0 等)。

概括这个想法就是convolutional neural networks都是关于 - 以及为什么它们在图像识别任务中如此有效。 O'Reilly Publishers 还写了另一篇不错的文章，名为 Not Another MNIST Tutorial这确实不是另一个教程，但展示了一些非常有用的可视化效果，有助于理解正在发生的事情。

总而言之，AND/OR 是一个非常简单的任务，但是您已经跳到了一个非常复杂的任务 - 并且您的神经网络架构应该具有在复杂性上进行相应跳跃所需的架构。卷积神经网络通常遵循以下架构模式:

划分图像的各个部分，将不同的部分分配给不同的神经元(卷积层)
重新组合图像不同部分的信息(池化层)
滤除微弱信号(dropout 层)
将空间信息转换为矢量信号(平坦层)
创建另一层神经元，与前一层(密集层)的神经元完全连接

用于处理更复杂任务的更大的 CNN 会将这些层组合成更大的嵌套架构和子网络。了解要使用的层组合是一门艺术，需要进行大量实验(因此 GPU 很受欢迎，使得迭代和实验速度更快)。但对于灰度手写数字，只要利用您已经了解的有关手头任务的信息，您就应该看到很大的改进，即它应该利用空间结构。

关于python - 神经网络识别手写数字: Dealing with multiple outputs，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/46406559/

python - 神经网络识别手写数字: Dealing with multiple outputs

上一篇：python - 为什么 tkinter 行为随机？

下一篇：python - pandas:添加每几行递增的新列