python-3.x - MXNet (python3) 将残差卷积结构定义为来自 Gluon 模块的 Block

注意:

我是 MXNet 的新手。

似乎 Gluon模块旨在替换(？)Symbol模块作为高级神经网络 (nn) 接口(interface)。所以这个问题专门寻求使用 Gluon 的答案。模块。

语境

Residual neural networks (res-NNs)是相当流行的架构(链接提供了对 res-NNs 的评论)。简而言之，res-NNs 是一种架构，其中输入经过(一系列)转换(例如，通过标准 nn 层)，最后在激活函数之前与其纯粹的自身相结合:

所以主要这里的问题是“如何使用自定义 gluon.Block 实现 res-NN 结构？”接下来是:

我这样做的尝试(不完整，可能有错误)

作为 block 问题突出显示的子问题。

通常，子问题被视为并发的主要问题，导致帖子被标记为过于笼统。在这种情况下，它们是合法的子问题，因为我无法解决我的主要问题源于这些子问题，并且胶子模块的部分/初稿文档不足以回答它们。

主要问题

“如何使用自定义 gluon.Block 实现 res-NN 结构？”

首先让我们做一些导入:

import mxnet as mx
import numpy as np
import math
import random
gpu_device=mx.gpu()
ctx = gpu_device

在定义我们的 res-NN 结构之前，首先我们定义一个通用的卷积 NN(cnn)架构；即卷积→批范数。 → 斜坡。

class CNN1D(mx.gluon.Block):
    def __init__(self, channels, kernel, stride=1, padding=0, **kwargs):
        super(CNN1D, self).__init__(**kwargs) 
        with self.name_scope():
            self.conv = mx.gluon.nn.Conv1D(channels=channels, kernel_size=kernel, strides=1, padding=padding)      
            self.bn = mx.gluon.nn.BatchNorm()
            self.ramp = mx.gluon.nn.Activation(activation='relu')

    def forward(self, x):
        x = self.conv(x)
        x = self.bn(x)
        x = self.ramp(x)
        return x

Subquestion: mx.gluon.nn.Activation vs NDArray module's nd.relu? When to use which and why. In all MXNet tutorials / demos I saw in their documentation, custom gluon.Blocks use nd.relu(x) in the forward function.

Subquestion: self.ramp(self.conv(x)) vs mx.gluon.nn.Conv1D(activation='relu')(x)? i.e. what is the consequence of adding the activation argument to a layer? Does that mean the activation is automatically applied in the forward function when that layer is called?

现在我们有了一个可重复使用的 cnn 卡盘，让我们定义一个 res-NN，其中:

有chain_length cnn 卡盘数量

第一个 cnn 卡盘使用的步幅与所有后续

不同。

所以这是我的尝试:

class RES_CNN1D(mx.gluon.Block):
    def __init__(self, channels, kernel, initial_stride, chain_length=1, stride=1, padding=0, **kwargs):
        super(RES_CNN1D, self).__init__(**kwargs)
        with self.name_scope():
            num_rest = chain_length - 1
            self.ramp = mx.gluon.nn.Activation(activation='relu')
            self.init_cnn = CNN1D(channels, kernel, initial_stride, padding)
            # I am guessing this is how to correctly add an arbitrary number of chucks
            self.rest_cnn = mx.gluon.nn.Sequential()
            for i in range(num_rest):
                self.rest_cnn.add(CNN1D(channels, kernel, stride, padding))


    def forward(self, x):
        # make a copy of untouched input to send through chuncks
        y = x.copy()
        y = self.init_cnn(y)
        # I am guess that if I call a mx.gluon.nn.Sequential object that all nets inside are called / the input gets passed along all of them?
        y = self.rest_cnn(y)
        y += x
        y = self.ramp(y)
        return y

Subquestion: adding a variable number of layers, should one use the hacky eval("self.layer" + str(i) + " = mx.gluon.nn.Conv1D()") or is this what mx.gluon.nn.Sequential is meant for?

Subquestion: when defining the forward function in a custom gluon.Block which has an instance of mx.gluon.nn.Sequential (let us refer to it as self.seq), does self.seq(x) just pass the argument x down the line? e.g. if this is self.seq

self.seq = mx.gluon.nn.Sequential()

self.conv1 = mx.gluon.nn.Conv1D()

self.conv2 = mx.gluon.nn.Conv1D()

self.seq.add(self.conv1)

self.seq.add(self.conv2)

is self.seq(x) equivalent to self.conv2(self.conv1(x))?

它是否正确？

期望的结果

RES_CNN1D(10, 3, 2, chain_length=3)

应该是这样的

Conv1D(10, 3, stride=2)  -----
BatchNorm                    |
Ramp                         |
Conv1D(10, 3)                |
BatchNorm                    |
Ramp                         |
Conv1D(10, 3)                |
BatchNorm                    |
Ramp                         |
  |                          |
 (+)<-------------------------
  v
Ramp

最佳答案

self.ramp(self.conv(x)) 与 mx.gluon.nn.Conv1D(activation='relu')(x) 是的。后者将 relu 激活应用于 Conv1D 的输出。

mx.gluon.nn.Sequential 用于将多个层组合成一个 block 。通常您不需要将每一层显式定义为类属性。您可以创建一个列表来存储要分组的所有层，并使用 for 循环将所有列表元素添加到 mx.gluon.nn.Sequential 对象中。

是的。在 mx.gluon.nn.Sequential 上调用前向等于在所有子 block 上调用前向，具有计算图的拓扑顺序。

关于python-3.x - MXNet (python3) 将残差卷积结构定义为来自 Gluon 模块的 Block，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/46306782/

python-3.x - MXNet (python3) 将残差卷积结构定义为来自 Gluon 模块的 Block

上一篇：math - LDL 形式的 Cholesky 分解的时间复杂度

下一篇：ajax - ajax 调用后的奇怪结果。传递给 gsp 和奇怪的 js 语法错误时域类型更改。 chalice 2.3.7