python - Tensorflow ValueError : logits and labels must have the same shape ((None, 2) vs (None, 1))

我是机器学习的新手，我想我将从 keras 开始。在这里，我使用二元交叉熵将电影评论分类为三类分类(正面为 1，中性为 0，负面为 -1)。因此，当我尝试使用 tensorflow 估算器包装我的 keras 模型时，出现了错误。
代码如下:

import tensorflow as tf
import numpy as np
import pandas as pd
import numpy as K

csvfilename_train = 'train(cleaned).csv'
csvfilename_test = 'test(cleaned).csv'

# Read .csv files as pandas dataframes
df_train = pd.read_csv(csvfilename_train)
df_test = pd.read_csv(csvfilename_test)

train_sentences  = df_train['Comment'].values
test_sentences  = df_test['Comment'].values

# Extract labels from dataframes
train_labels = df_train['Sentiment'].values
test_labels = df_test['Sentiment'].values

vocab_size = 10000
embedding_dim = 16
max_length = 30
trunc_type = 'post'
oov_tok = '<OOV>'

from tensorflow.keras.preprocessing.text import Tokenizer
from tensorflow.keras.preprocessing.sequence import pad_sequences

tokenizer = Tokenizer(num_words = vocab_size, oov_token = oov_tok)
tokenizer.fit_on_texts(train_sentences)
word_index = tokenizer.word_index
sequences = tokenizer.texts_to_sequences(train_sentences)
padded = pad_sequences(sequences, maxlen = max_length, truncating = trunc_type)

test_sequences = tokenizer.texts_to_sequences(test_sentences)
test_padded = pad_sequences(test_sequences, maxlen = max_length)

model = tf.keras.Sequential([
    tf.keras.layers.Embedding(vocab_size, embedding_dim, input_length = max_length),
    tf.keras.layers.Flatten(),
    tf.keras.layers.Dense(6, activation = 'relu'),
    tf.keras.layers.Dense(2, activation = 'sigmoid'),
])
model.compile(loss = 'binary_crossentropy', optimizer = 'adam', metrics = ['accuracy'])

num_epochs = 10
model.fit(padded, train_labels, epochs = num_epochs, validation_data = (test_padded, test_labels))

错误如下:

---> 10 model.fit(padded, train_labels, epochs = num_epochs, validation_data = (test_padded, test_labels))

最后是这个:

ValueError: logits and labels must have the same shape ((None, 2) vs (None, 1))

最佳答案

您的代码有几个问题。

您使用了错误的损失函数。二元交叉熵损失用于二元分类问题，但此处您进行的是多类分类(3 类 - 正、负、中性)。
在最后一层使用 sigmoid 激活函数是错误的，因为 sigmoid 函数将 logit 值映射到 0 和 1 之间的范围(但是，您的类标签是 0、1 和 -1)。这清楚地表明，由于 sigmoid 函数(它只能映射 0 和 1 之间的值)，网络永远无法预测负值，因此永远无法学习预测负类。

正确的方法是将其视为多类分类问题，并使用分类交叉熵损失和softmax 激活 strong> 在你的最后一个 Dense 层中有 3 个单元(每个类一个)。请注意，必须将单热编码标签用于 categorical cross-entropy。损失和整数标签可以与 sparse categorical cross-entropy 一起使用损失。

下面是一个使用分类交叉熵损失的例子。

tf.keras.layers.Dense(3, activation = 'softmax')

注意 3 个变化:

损失函数改为分类交叉熵
没有。最终 Dense 层中的单元数为 3
标签的单热编码是必需的，可以使用 tf.one_hot 完成

tf.one_hot(train_labels, 3)

关于python - Tensorflow ValueError : logits and labels must have the same shape ((None, 2) vs (None, 1))，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/63375201/

python - Tensorflow ValueError : logits and labels must have the same shape ((None, 2) vs (None, 1))

上一篇：android - Flutter Android TV 应用程序无法使用方向键进行选择

下一篇：python - 列表理解以检查真值