python - 无法使用 2 层多层感知器 (MLP) 学习 XOR 表示

标签 python neural-network xor pytorch perceptron

使用 PyTorch nn.Sequential 模型,我无法学习 XOR bool 值的所有四种表示形式:

import numpy as np

import torch
from torch import nn
from torch.autograd import Variable
from torch import FloatTensor
from torch import optim

use_cuda = torch.cuda.is_available()

X = xor_input = np.array([[0,0], [0,1], [1,0], [1,1]])
Y = xor_output = np.array([[0,1,1,0]]).T

# Converting the X to PyTorch-able data structure.
X_pt = Variable(FloatTensor(X))
X_pt = X_pt.cuda() if use_cuda else X_pt
# Converting the Y to PyTorch-able data structure.
Y_pt = Variable(FloatTensor(Y), requires_grad=False)
Y_pt = Y_pt.cuda() if use_cuda else Y_pt

hidden_dim = 5

model = nn.Sequential(nn.Linear(input_dim, hidden_dim),
                      nn.Linear(hidden_dim, output_dim),
                      nn.Sigmoid())
criterion = nn.L1Loss()
learning_rate = 0.03
optimizer = optim.SGD(model.parameters(), lr=learning_rate)
num_epochs = 10000

for _ in range(num_epochs):
    predictions = model(X_pt)
    loss_this_epoch = criterion(predictions, Y_pt)
    loss_this_epoch.backward()
    optimizer.step()
    print([int(_pred > 0.5) for _pred in predictions], list(map(int, Y_pt)), loss_this_epoch.data[0])

学习后:

for _x, _y in zip(X_pt, Y_pt):
    prediction = model(_x)
    print('Input:\t', list(map(int, _x)))
    print('Pred:\t', int(prediction))
    print('Ouput:\t', int(_y))
    print('######')

[输出]:

Input:   [0, 0]
Pred:    0
Ouput:   0
######
Input:   [0, 1]
Pred:    1
Ouput:   1
######
Input:   [1, 0]
Pred:    0
Ouput:   1
######
Input:   [1, 1]
Pred:    0
Ouput:   0
######

我已经尝试在几个随机种子上运行相同的代码,但它没有设法学习所有 XOR 表示。

没有 PyTorch,我可以轻松地训练具有自定义导数函数的模型并手动执行反向传播,参见 https://www.kaggle.io/svf/2342536/635025ecf1de59b71ea4fa03eb84f9f9/results.html#After-some-enlightenment

为什么使用 PyTorch 的 2 层 MLP 没有学习异或表示?


PyTorch 中的模型如何:

hidden_dim = 5

model = nn.Sequential(nn.Linear(input_dim, hidden_dim),
                      nn.Linear(hidden_dim, output_dim),
                      nn.Sigmoid())

不同于手写的导数和手写的反向传播和优化器步骤来自https://www.kaggle.com/alvations/xor-with-mlp

一个隐层感知器网络是一样的吗?


已更新

奇怪的是,在 nn.Linear 层之间添加 nn.Sigmoid() 不起作用:

hidden_dim = 5

model = nn.Sequential(nn.Linear(input_dim, hidden_dim),
                      nn.Sigmoid(),
                      nn.Linear(hidden_dim, output_dim),
                      nn.Sigmoid())
criterion = nn.L1Loss()
learning_rate = 0.03
optimizer = optim.SGD(model.parameters(), lr=learning_rate)
num_epochs = 10000

for _ in range(num_epochs):
    predictions = model(X_pt)
    loss_this_epoch = criterion(predictions, Y_pt)
    loss_this_epoch.backward()
    optimizer.step()

for _x, _y in zip(X_pt, Y_pt):
    prediction = model(_x)
    print('Input:\t', list(map(int, _x)))
    print('Pred:\t', int(prediction))
    print('Ouput:\t', int(_y))
    print('######')

[输出]:

Input:   [0, 0]
Pred:    0
Ouput:   0
######
Input:   [0, 1]
Pred:    1
Ouput:   1
######
Input:   [1, 0]
Pred:    1
Ouput:   1
######
Input:   [1, 1]
Pred:    1
Ouput:   0
######

但是添加 nn.ReLU() 做到了:

model = nn.Sequential(nn.Linear(input_dim, hidden_dim),
                      nn.ReLU(), 
                      nn.Linear(hidden_dim, output_dim),
                      nn.Sigmoid())

...
for _x, _y in zip(X_pt, Y_pt):
prediction = model(_x)
print('Input:\t', list(map(int, _x)))
print('Pred:\t', int(prediction))
print('Ouput:\t', int(_y))
print('######')

[输出]:

Input:   [0, 0]
Pred:    0
Ouput:   0
######
Input:   [0, 1]
Pred:    1
Ouput:   1
######
Input:   [1, 0]
Pred:    1
Ouput:   1
######
Input:   [1, 1]
Pred:    1
Ouput:   0
######

sigmoid 是否足以进行非线性激活?

我知道 ReLU 适合 bool 输出的任务,但是Sigmoid 函数不应该产生相同/相似的效果吗?


更新 2

运行相同的训练 100 次:

from collections import Counter 
import random
random.seed(100)

import torch
from torch import nn
from torch.autograd import Variable
from torch import FloatTensor
from torch import optim
use_cuda = torch.cuda.is_available()


all_results=[]

for _ in range(100):
    hidden_dim = 2

    model = nn.Sequential(nn.Linear(input_dim, hidden_dim),
                          nn.ReLU(), # Does the sigmoid has a build in biased? 
                          nn.Linear(hidden_dim, output_dim),
                          nn.Sigmoid())

    criterion = nn.MSELoss()
    learning_rate = 0.03
    optimizer = optim.SGD(model.parameters(), lr=learning_rate)
    num_epochs = 3000

    for _ in range(num_epochs):
        predictions = model(X_pt)
        loss_this_epoch = criterion(predictions, Y_pt)
        loss_this_epoch.backward()
        optimizer.step()
        ##print([float(_pred) for _pred in predictions], list(map(int, Y_pt)), loss_this_epoch.data[0])

    x_pred = [int(model(_x)) for _x in X_pt]
    y_truth = list([int(_y[0]) for _y in Y_pt])
    all_results.append([x_pred == y_truth, x_pred, loss_this_epoch.data[0]])


tf, outputsss, losses__ = zip(*all_results)
print(Counter(tf))

100 次中它只学习了 18 次异或表示...-_-|||

最佳答案

这是因为 nn.Linear 没有内置激活,所以您的模型实际上是一个线性分类器,而 XOR 是无法使用线性分类器解决的问题的典型示例。

改变这个:

model = nn.Sequential(nn.Linear(input_dim, hidden_dim),
                      nn.Linear(hidden_dim, output_dim),
                      nn.Sigmoid())

对此:

model = nn.Sequential(nn.Linear(input_dim, hidden_dim),
                      nn.Sigmoid(),
                      nn.Linear(hidden_dim, output_dim),
                      nn.Sigmoid())

只有这样,您的模型才会等同于链接的 Kaggle 笔记本中的模型。

关于python - 无法使用 2 层多层感知器 (MLP) 学习 XOR 表示,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/48619928/

相关文章:

python - urls.py 中的空 url

python - Pandas 直方图标签和标题

machine-learning - 感知器算法/线性回归的基本直觉

c++ - C 或 C++ 标准库中是否有逻辑( bool )XOR 函数?

php - 如何使用 PHP 撤消/反转 XOR?

python - 我可以在 Python 中的空列表上使用 insert() 吗?

python - 如何在 scikit 学习回归中不标准化目标数据

python-2.7 - tensorflow 中不理解的数据类型

machine-learning - 批处理归一化根据测试期间的批处理而变化

java - 选择排序中的交换