python - 简单感知器中的正确反向传播

标签 python machine-learning backpropagation gradient-descent perceptron

鉴于简单的或门问题:

or_input = np.array([[0,0], [0,1], [1,0], [1,1]])
or_output = np.array([[0,1,1,1]]).T

如果我们训练一个简单的单层感知器(没有反向传播),我们可以这样做:
import numpy as np
np.random.seed(0)

def sigmoid(x): # Returns values that sums to one.
    return 1 / (1 + np.exp(-x))

def cost(predicted, truth):
    return (truth - predicted)**2

or_input = np.array([[0,0], [0,1], [1,0], [1,1]])
or_output = np.array([[0,1,1,1]]).T

# Define the shape of the weight vector.
num_data, input_dim = or_input.shape
# Define the shape of the output vector. 
output_dim = len(or_output.T)

num_epochs = 50 # No. of times to iterate.
learning_rate = 0.03 # How large a step to take per iteration.

# Lets standardize and call our inputs X and outputs Y
X = or_input
Y = or_output
W = np.random.random((input_dim, output_dim))

for _ in range(num_epochs):
    layer0 = X
    # Forward propagation.
    # Inside the perceptron, Step 2. 
    layer1 = sigmoid(np.dot(X, W))

    # How much did we miss in the predictions?
    cost_error = cost(layer1, Y)

    # update weights
    W +=  - learning_rate * np.dot(layer0.T, cost_error)

# Expected output.
print(Y.tolist())
# On the training data
print([[int(prediction > 0.5)] for prediction in layer1])

[出去]:
[[0], [1], [1], [1]]
[[0], [1], [1], [1]]

使用反向传播,计算 d(cost)/d(X) , 以下步骤是否正确?
  • 通过乘以成本误差和成本的导数
  • 来计算 layer1 误差
  • 然后通过将第 1 层误差和 sigmoid 的导数相乘来计算第 1 层增量
  • 然后在输入和 layer1 delta 之间做一个点积来得到差异,即 d(cost)/d(X)

  • 然后是d(cost)/d(X)乘以学习率的负数来执行梯度下降。
    num_epochs = 0 # No. of times to iterate.
    learning_rate = 0.03 # How large a step to take per iteration.
    
    # Lets standardize and call our inputs X and outputs Y
    X = or_input
    Y = or_output
    W = np.random.random((input_dim, output_dim))
    
    for _ in range(num_epochs):
        layer0 = X
        # Forward propagation.
        # Inside the perceptron, Step 2. 
        layer1 = sigmoid(np.dot(X, W))
    
        # How much did we miss in the predictions?
        cost_error = cost(layer1, Y)
    
        # Back propagation.
        # multiply how much we missed from the gradient/slope of the cost for our prediction.
        layer1_error = cost_error * cost_derivative(cost_error)
    
        # multiply how much we missed by the gradient/slope of the sigmoid at the values in layer1
        layer1_delta = layer1_error * sigmoid_derivative(layer1)
    
        # update weights
        W +=  - learning_rate * np.dot(layer0.T, layer1_delta)
    

    在这种情况下,使用 cost_derivative 的实现应该如下所示和 sigmoid_derivative ?
    import numpy as np
    np.random.seed(0)
    
    def sigmoid(x): # Returns values that sums to one.
        return 1 / (1 + np.exp(-x))
    
    def sigmoid_derivative(sx):
        # See https://math.stackexchange.com/a/1225116
        return sx * (1 - sx)
    
    def cost(predicted, truth):
        return (truth - predicted)**2
    
    def cost_derivative(y):
        # If the cost is:
        # cost = y - y_hat
        # What's the derivative of d(cost)/d(y)
        # d(cost)/d(y) = 1
        return 2*y
    
    
    or_input = np.array([[0,0], [0,1], [1,0], [1,1]])
    or_output = np.array([[0,1,1,1]]).T
    
    # Define the shape of the weight vector.
    num_data, input_dim = or_input.shape
    # Define the shape of the output vector. 
    output_dim = len(or_output.T)
    
    num_epochs = 5 # No. of times to iterate.
    learning_rate = 0.03 # How large a step to take per iteration.
    
    # Lets standardize and call our inputs X and outputs Y
    X = or_input
    Y = or_output
    W = np.random.random((input_dim, output_dim))
    
    for _ in range(num_epochs):
        layer0 = X
        # Forward propagation.
        # Inside the perceptron, Step 2. 
        layer1 = sigmoid(np.dot(X, W))
    
        # How much did we miss in the predictions?
        cost_error = cost(layer1, Y)
    
        # Back propagation.
        # multiply how much we missed from the gradient/slope of the cost for our prediction.
        layer1_error = cost_error * cost_derivative(cost_error)
    
        # multiply how much we missed by the gradient/slope of the sigmoid at the values in layer1
        layer1_delta = layer1_error * sigmoid_derivative(layer1)
    
        # update weights
        W +=  - learning_rate * np.dot(layer0.T, layer1_delta)
    
    # Expected output.
    print(Y.tolist())
    # On the training data
    print([[int(prediction > 0.5)] for prediction in layer1])
    

    [出去]:
    [[0], [1], [1], [1]]
    [[0], [1], [1], [1]]
    

    顺便说一句,给定随机输入种子,即使没有 W和梯度下降或感知器,预测还是可以的:
    import numpy as np
    np.random.seed(0)
    
    # Lets standardize and call our inputs X and outputs Y
    X = or_input
    Y = or_output
    W = np.random.random((input_dim, output_dim))
    
    # On the training data
    predictions = sigmoid(np.dot(X, W))
    [[int(prediction > 0.5)] for prediction in predictions]
    

    最佳答案

    你几乎是正确的。在您的实现中,您将成本定义为误差的平方,这是始终为正的不幸结果。因此,如果您绘制均值 (cost_error),它会在每次迭代中缓慢上升,而您的权重也在缓慢下降。

    在您的特定情况下,您可以使用任何大于 0 的权重来使其工作:如果您尝试使用足够多的 epoch 实现您的实现,您的权重将变为负值,您的网络将不再工作。

    您可以删除成本函数中的方块:

    def cost(predicted, truth):
        return (truth - predicted)
    

    现在要更新您的权重,您需要评估错误“位置”处的梯度。所以你的需要是:
    d_predicted = output_errors * sigmoid_derivative(predicted_output)
    

    接下来,我们更新权重:
    W += np.dot(X.T, d_predicted) * learning_rate
    

    带有错误显示的完整代码:
    import numpy as np
    import matplotlib.pyplot as plt
    np.random.seed(0)
    
    def sigmoid(x): # Returns values that sums to one.
        return 1 / (1 + np.exp(-x))
    
    def sigmoid_derivative(sx):
        # See https://math.stackexchange.com/a/1225116
        return sx * (1 - sx)
    
    def cost(predicted, truth):
        return (truth - predicted)
    
    or_input = np.array([[0,0], [0,1], [1,0], [1,1]])
    or_output = np.array([[0,1,1,1]]).T
    
    # Define the shape of the weight vector.
    num_data, input_dim = or_input.shape
    # Define the shape of the output vector. 
    output_dim = len(or_output.T)
    
    num_epochs = 50 # No. of times to iterate.
    learning_rate = 0.1 # How large a step to take per iteration.
    
    # Lets standardize and call our inputs X and outputs Y
    X = or_input
    Y = or_output
    W = np.random.random((input_dim, output_dim))
    
    # W = [[-1],[1]] # you can try to set bad weights to see the training process
    error_list = []
    
    for _ in range(num_epochs):
        layer0 = X
        # Forward propagation.
        layer1 = sigmoid(np.dot(X, W))
    
        # How much did we miss in the predictions?
        cost_error = cost(layer1, Y)
        error_list.append(np.mean(cost_error)) # save the loss to plot later
    
        # Back propagation.
        # eval the gradient :
        d_predicted = cost_error * sigmoid_derivative(cost_error)
    
        # update weights
        W = W + np.dot(X.T, d_predicted) * learning_rate
    
    
    # Expected output.
    print(Y.tolist())
    # On the training data
    print([[int(prediction > 0.5)] for prediction in layer1])
    
    # plot error curve : 
    plt.plot(range(num_epochs), loss_list, '+b')
    plt.xlabel('Epoch')
    plt.ylabel('mean error')
    

    我还添加了一行来手动设置初始权重,以查看网络是如何学习的

    关于python - 简单感知器中的正确反向传播,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/56071569/

    相关文章:

    python - 在 MAC 10.7 或 10.8 上强制使用 32 位 Python

    Python:Pybrain 包中的 train() 方法返回什么?

    python - 在给定简单行向量的循环内创建并命名矩阵

    python - 来自循环的堆积条形图不添加不同的条形组件

    python - 将函数中的默认参数值设置为另一个参数的值

    python - 在 Python 中使用用户输入调用/选择变量(浮点值)

    algorithm - 皮质学习实现(Numenta 的 HTM 理论)

    machine-learning - 将数据集拆分为推荐系统上的训练集和测试集

    machine-learning - 绘制 Kohonen map - 了解可视化

    neural-network - 输入数据集如何输入神经网络?