neural-network - CrossEntropyLoss 在 2d 输出上显示精度较差

我正在一个简单的神经网络上尝试一些实验，该网络只是尝试学习一些随机数的平方，这些随机数表示为十进制数字数组，代码复制在下面，并通过注释指示更改。

使用nn.Softmax(dim=2)和criterion = nn.BCELoss()的版本工作正常。

但是对于像这样的情况，输出是 N 路分类(在本例中，输出数组，每个输出表示十个十进制数字之一)，CrossEntropyLoss 被认为是理想的，所以我做了这个改变。 nn.CrossEntropyLoss 为你做了 softmax，所以我还注释掉了 nn.Softmax 行。

结果并没有表现得稍微好一点，而是表现得更差；现在，它在训练集上的准确率达到了 76% 左右，而之前它的准确率达到了 100%。

我做错了什么？同样的替换在更简单的测试用例上效果很好 https://github.com/russellw/ml/blob/main/compound_output/single.py主要区别在于，case 只产生一个 N 路输出，而 this 则产生一个数组。我是否误解了 CrossEntropyLoss 如何处理形状或类似的形状？

import random
import torch
from torch import nn
from torch.utils.data import Dataset, DataLoader


def oneHot(n, i, s):
    for j in range(n):
        s.append(float(i == j))


size = 12


class Dataset1(Dataset):
    def __init__(self):
        s = []
        for _ in range(1000):
            a = random.randrange(10 ** size)

            x = []
            for c in str(a).zfill(size):
                oneHot(10, int(c), x)

            y = []
            for c in str(a ** 2).zfill(size * 2):
                y1 = []
                oneHot(10, int(c), y1)
                y.append(y1)

            x = torch.as_tensor(x)
            y = torch.as_tensor(y)
            s.append((x, y))
        self.s = s

    def __len__(self):
        return len(self.s)

    def __getitem__(self, i):
        return self.s[i]


trainDs = Dataset1()
testDs = Dataset1()

batchSize = 20
trainDl = DataLoader(trainDs, batch_size=batchSize)
testDl = DataLoader(testDs, batch_size=batchSize)
for x, y in trainDl:
    print(x.shape)
    print(y.shape)
    break


class View(nn.Module):
    def __init__(self, *shape):
        super(View, self).__init__()
        self.shape = shape

    def forward(self, x):
        batchSize = x.data.size(0)
        shape = (batchSize,) + self.shape
        return x.view(*shape)


hiddenSize = 100


class Net(nn.Module):
    def __init__(self):
        super(Net, self).__init__()
        self.layers = nn.Sequential(
            nn.Linear(size * 10, hiddenSize),
            nn.ReLU(),
            nn.Linear(hiddenSize, hiddenSize),
            nn.Tanh(),
            nn.Linear(hiddenSize, hiddenSize),
            nn.ReLU(),
            nn.Linear(hiddenSize, size * 2 * 10),
            View(size * 2, 10),
            #nn.Softmax(dim=2),
        )

    def forward(self, x):
        return self.layers(x)


device = torch.device("cpu")
model = Net().to(device)
print(sum(p.numel() for p in model.parameters()))


def accuracy(model, ds):
    n = 0
    for x, y in ds:
        # make input sample shape match a mini batch
        # for the sake of things like softmax that cause the model
        # to expect a specific shape
        x = x.unsqueeze(0)

        # this is just for reporting, not part of training
        # so we don't need to track gradients here
        with torch.no_grad():
            z = model(x)

            # conversely, the model will return a batch-shaped output
            # so unwrap it for comparison with the unwrapped expected output
            z = z[0]

        # at this point, if the output were a scalar mapped to one-hot
        # we could use a simple argmax comparison
        # but it is an array thereof
        # which makes comparison a little more complex
        assert y.shape[0] == size * 2
        assert z.shape[0] == size * 2
        for i in range(0, size * 2):
            if torch.argmax(y[i]) == torch.argmax(z[i]):
                n += 1
    return n / (len(ds) * size * 2)


#criterion = nn.BCELoss()
criterion = nn.CrossEntropyLoss()
optimizer = torch.optim.SGD(model.parameters(), lr=0.1)
epochs = 10000
interval = epochs // 10
for epoch in range(epochs + 1):
    for bi, (x, y) in enumerate(trainDl):
        x = x.to(device)
        y = y.to(device)

        loss = criterion(model(x), y)
        optimizer.zero_grad()
        loss.backward()
        optimizer.step()

        if epoch % interval == 0 and not bi:
            print(
                f"{epoch}\t{loss}\t{accuracy(model, trainDs)}\t{accuracy(model, testDs)}"
            )

最佳答案

即使它不会引发错误，torch.BCELoss实际上也不是您想要最小化的，因为它错误地将您的张量解释为多个二元分类。因此，最好切换到 torch.nn.CrossEntropyLoss。

正如您在其 documentation 中看到的那样该函数在目标处获取类号(不是单热编码)，并且仅支持批处理最多具有一维的张量。所以你可以尝试:

x = x.to(device)
y = y.to(device)
# Flat together the figures prediction in the batch
pred = model(x).reshape(-1, 10)  # shape (batch_size*2*size , 10)
# Reverse one-hot encoding for targets + flat
y = torch.argmax(y, dim=2).reshape(-1) # shape (batch_size*2*size, )
loss = criterion(pred, y)

如您所见，我使用您的配置(相同的架构、CPU、批量大小 20)在 epoch 1100 时获得了 100% 的训练准确率:

请注意，模型实际上在这种情况下过度拟合了训练数据，但这是另一个问题......

关于neural-network - CrossEntropyLoss 在 2d 输出上显示精度较差，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/73591918/

neural-network - CrossEntropyLoss 在 2d 输出上显示精度较差

上一篇：python - 在 pandas 数据框上应用数据透视表而不是分组

下一篇：go - 泛型函数级联调用时的泛型类型推断