我将通过此链接来了解用于文本分类的多 channel CNN 模型。
代码基于this tutorial.
我已经理解了大部分内容,但是我无法理解 Keras 如何定义某些层的输出形状。
这是代码:
定义一个具有三个输入 channel 的模型,用于处理 4 克、6 克和 8 克的电影评论文本。
#Skipped keras imports
# load a clean dataset
def load_dataset(filename):
return load(open(filename, 'rb'))
# fit a tokenizer
def create_tokenizer(lines):
tokenizer = Tokenizer()
tokenizer.fit_on_texts(lines)
return tokenizer
# calculate the maximum document length
def max_length(lines):
return max([len(s.split()) for s in lines])
# encode a list of lines
def encode_text(tokenizer, lines, length):
# integer encode
encoded = tokenizer.texts_to_sequences(lines)
# pad encoded sequences
padded = pad_sequences(encoded, maxlen=length, padding='post')
return padded
# define the model
def define_model(length, vocab_size):
# channel 1
inputs1 = Input(shape=(length,))
embedding1 = Embedding(vocab_size, 100)(inputs1)
conv1 = Conv1D(filters=32, kernel_size=4, activation='relu')(embedding1)
drop1 = Dropout(0.5)(conv1)
pool1 = MaxPooling1D(pool_size=2)(drop1)
flat1 = Flatten()(pool1)
# channel 2
inputs2 = Input(shape=(length,))
embedding2 = Embedding(vocab_size, 100)(inputs2)
conv2 = Conv1D(filters=32, kernel_size=6, activation='relu')(embedding2)
drop2 = Dropout(0.5)(conv2)
pool2 = MaxPooling1D(pool_size=2)(drop2)
flat2 = Flatten()(pool2)
# channel 3
inputs3 = Input(shape=(length,))
embedding3 = Embedding(vocab_size, 100)(inputs3)
conv3 = Conv1D(filters=32, kernel_size=8, activation='relu')(embedding3)
drop3 = Dropout(0.5)(conv3)
pool3 = MaxPooling1D(pool_size=2)(drop3)
flat3 = Flatten()(pool3)
# merge
merged = concatenate([flat1, flat2, flat3])
# interpretation
dense1 = Dense(10, activation='relu')(merged)
outputs = Dense(1, activation='sigmoid')(dense1)
model = Model(inputs=[inputs1, inputs2, inputs3], outputs=outputs)
# compile
model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])
# summarize
print(model.summary())
plot_model(model, show_shapes=True, to_file='multichannel.png')
return model
# load training dataset
trainLines, trainLabels = load_dataset('train.pkl')
# create tokenizer
tokenizer = create_tokenizer(trainLines)
# calculate max document length
length = max_length(trainLines)
# calculate vocabulary size
vocab_size = len(tokenizer.word_index) + 1
print('Max document length: %d' % length)
print('Vocabulary size: %d' % vocab_size)
# encode data
trainX = encode_text(tokenizer, trainLines, length)
print(trainX.shape)
# define model
model = define_model(length, vocab_size)
# fit model
model.fit([trainX,trainX,trainX], array(trainLabels), epochs=10, batch_size=16)
# save the model
model.save('model.h5')
运行代码:
运行该示例首先会打印准备好的训练数据集的摘要。 最大文档长度:1380 词汇量:44277 (1800、1380)
____________________________________________________________________________________________________
Layer (type) Output Shape Param # Connected to
====================================================================================================
input_1 (InputLayer) (None, 1380) 0
____________________________________________________________________________________________________
input_2 (InputLayer) (None, 1380) 0
____________________________________________________________________________________________________
input_3 (InputLayer) (None, 1380) 0
____________________________________________________________________________________________________
embedding_1 (Embedding) (None, 1380, 100) 4427700 input_1[0][0]
____________________________________________________________________________________________________
embedding_2 (Embedding) (None, 1380, 100) 4427700 input_2[0][0]
____________________________________________________________________________________________________
embedding_3 (Embedding) (None, 1380, 100) 4427700 input_3[0][0]
____________________________________________________________________________________________________
conv1d_1 (Conv1D) (None, 1377, 32) 12832 embedding_1[0][0]
____________________________________________________________________________________________________
conv1d_2 (Conv1D) (None, 1375, 32) 19232 embedding_2[0][0]
____________________________________________________________________________________________________
conv1d_3 (Conv1D) (None, 1373, 32) 25632 embedding_3[0][0]
____________________________________________________________________________________________________
dropout_1 (Dropout) (None, 1377, 32) 0 conv1d_1[0][0]
____________________________________________________________________________________________________
dropout_2 (Dropout) (None, 1375, 32) 0 conv1d_2[0][0]
____________________________________________________________________________________________________
dropout_3 (Dropout) (None, 1373, 32) 0 conv1d_3[0][0]
____________________________________________________________________________________________________
max_pooling1d_1 (MaxPooling1D) (None, 688, 32) 0 dropout_1[0][0]
____________________________________________________________________________________________________
max_pooling1d_2 (MaxPooling1D) (None, 687, 32) 0 dropout_2[0][0]
____________________________________________________________________________________________________
max_pooling1d_3 (MaxPooling1D) (None, 686, 32) 0 dropout_3[0][0]
____________________________________________________________________________________________________
flatten_1 (Flatten) (None, 22016) 0 max_pooling1d_1[0][0]
____________________________________________________________________________________________________
flatten_2 (Flatten) (None, 21984) 0 max_pooling1d_2[0][0]
____________________________________________________________________________________________________
flatten_3 (Flatten) (None, 21952) 0 max_pooling1d_3[0][0]
____________________________________________________________________________________________________
concatenate_1 (Concatenate) (None, 65952) 0 flatten_1[0][0]
flatten_2[0][0]
flatten_3[0][0]
____________________________________________________________________________________________________
dense_1 (Dense) (None, 10) 659530 concatenate_1[0][0]
____________________________________________________________________________________________________
dense_2 (Dense) (None, 1) 11 dense_1[0][0]
====================================================================================================
Total params: 14,000,337
Trainable params: 14,000,337
Non-trainable params: 0
____________________________________________________________________________________________________
还有
Epoch 6/10
1800/1800 [==============================] - 30s - loss: 9.9093e-04 - acc: 1.0000
Epoch 7/10
1800/1800 [==============================] - 29s - loss: 5.1899e-04 - acc: 1.0000
Epoch 8/10
1800/1800 [==============================] - 28s - loss: 3.7958e-04 - acc: 1.0000
Epoch 9/10
1800/1800 [==============================] - 29s - loss: 3.0534e-04 - acc: 1.0000
Epoch 10/10
1800/1800 [==============================] - 29s - loss: 2.6234e-04 - acc: 1.0000
我对Layer和输出形状的解释如下: 请帮助我理解它是否正确,因为我迷失在多维度中。
input_1 (InputLayer) (None, 1380) : ---> 1380
是每个数据点的特征总数(即 1380 个输入神经元)。 1800
是文档或数据点的总数。
embedding_1 (Embedding) (None, 1380, 100) 4427700 ----> 嵌入层为:1380个特征(单词),每个特征都是维度为100的向量。
这里的参数个数怎么是4427700
??
conv1d_1 (Conv1D) (None, 1377, 32) 12832 ------> Conv1d 的内核大小=4
。是使用32
次的1*4
过滤器吗?那么维度如何变成带有 12832
参数的 (None, 1377, 32)
呢?
max_pooling1d_1 (MaxPooling1D) (None, 688, 32) 和 MaxPooling1D(pool_size=2) 维度如何变成(None, 688, 32)
?
flatten_1(Flatten)(无,22016)这只是 688、32 的乘法?
** 每个 epoch 是否同时训练 1800 个数据点?**
请告诉我输出尺寸是如何计算的。任何引用或帮助将不胜感激。
最佳答案
请参阅以下答案:
input_1 (InputLayer) (None, 1380)
: ---> 1380 is the total number of features ( that is 1380 input neurons) per data point. 1800 is the total number of documents or data points.
是的。 model.fit([trainX,trainX,trainX], array(trainLabels), epochs=10, batch_size=16)
表示您希望网络在整个训练数据集的批量大小为 16。
这意味着,每 16 个数据点,将启动反向传播算法并更新权重。这将发生 1800/16
次,称为一个纪元。
1380
是第一层神经元的数量。
embedding_1 (Embedding) (None, 1380, 100) | 4427700
----> Embedding layer is : 1380 as features(words) and each feature is a vector of dimension 100.
1380 是输入的大小(前一层神经元的数量),100 是嵌入向量的大小(长度)。
此处的参数数量为 vocabulary_size * 100
,因为对于词汇表中的每个 v
,您需要训练 100 个参数。嵌入层实际上是一个由大小为 100 的vocabulary_size 向量构建的矩阵,其中每一行代表词汇表中每个单词的向量表示。
conv1d_1 (Conv1D) (None, 1377, 32) | 12832
------> Conv1d is of kernel size=4. Is it 1*4 filter which is used 32 times. Then how the dimension became (None, 1377, 32) with 12832 parameters?
由于内核的大小,1380 变成了 1377。想象一下以下输入(为了简化,大小为 10),内核大小为 4:
0123456789 #input
KKKK456789
0KKKK56789
12KKKK6789
123KKKK789
1234KKKK89
12345KKKK9
123456KKKK
看,内核无法进一步向右移动,因此对于输入大小 10 和内核大小 4,输出形状将为 7。
一般来说,对于 n 的输入形状和 k 的内核形状,输出形状将为 n - k + 1,因此对于 n=1380, k=4 的结果是1377
。
参数数量等于 12832,因为参数数量等于 output_channels * (input_channels * window_size + 1)
。在您的情况下,它是 32*(100*4 + 1)
。
max_pooling1d_1 (MaxPooling1D) (None, 688, 32) with MaxPooling1D(pool_size=2)
how the dimension became(None, 688, 32)
?
max_pooling
获取每两个连续的数字,并将它们替换为其中的最大值,因此您最终会得到 original_size/pool_size
值。
flatten_1 (Flatten) (None, 22016)
This is just multiplication of 688, 32?`
是的,这只是 688 和 32 的乘法。这是因为,展平操作执行以下操作:
1234
5678 -> 123456789012
9012
因此它获取所有维度的所有值并将其放入一维向量中。
Does every epoch trains 1800 data points at once?
没有。正如第一个答案中指出的那样,它以 16 个为一批。每个时期以随机顺序获取 1800 个数据点,每批 16 个数据点。纪元是一个术语,意思是一段时间,之后我们将再次开始读取数据。
编辑:
我将澄清一维卷积层应用于嵌入层的地方。
嵌入层的输出应解释为宽度为 1380、 channel 数为 100 的向量。
与输入具有三个 channel 的 RGB 图像类似的 2d 图像,当您应用由 32 个滤波器构建的卷积层(滤波器大小无关)时,其形状为 (width, height, 3),卷积运算同时应用于所有 channel ,输出形状将为 (new_width, new_height, 32)。请注意,输出形状与滤波器的数量相同。
回到你的例子。将嵌入层的输出形状视为(宽度, channel )。因此,将具有 32 个滤波器且内核大小等于 4 的 1d 卷积层应用于向量 1380 和深度 100。结果,您将得到形状 (1377, 32) 的输出。
关于python - 了解 Keras 层的形状,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/58832191/