python - 在 TensorFlow 中使用 3D 卷积进行批量归一化

我正在实现一个依赖于 3D 卷积的模型(用于类似于 Action 识别的任务)，并且我想使用批量标准化(参见 [Ioffe & Szegedy 2015])。我找不到任何专注于 3D convs 的教程，因此我在这里制作了一个简短的教程，我想和你一起回顾一下。

下面的代码引用了 TensorFlow r0.12，它明确地实例化了变量——我的意思是我没有使用 tf.contrib.learn，除了 tf.contrib.layers.batch_norm() 函数。我这样做既是为了更好地了解事物在幕后是如何工作的，也是为了获得更多的实现自由(例如，变量摘要)。

我将通过首先为全连接层编写示例，然后为 2D 卷积，最后为 3D 案例编写示例，从而顺利进入 3D 卷积案例。在浏览代码时，如果您可以检查一切是否正确完成，那就太好了 - 代码运行，但我不能 100% 确定我应用批处理规范化的方式。我以一个更详细的问题结束这篇文章。

import tensorflow as tf

# This flag is used to allow/prevent batch normalization params updates
# depending on whether the model is being trained or used for prediction.
training = tf.placeholder_with_default(True, shape=())

全连接(FC)案例

# Input.
INPUT_SIZE = 512
u = tf.placeholder(tf.float32, shape=(None, INPUT_SIZE))

# FC params: weights only, no bias as per [Ioffe & Szegedy 2015].
FC_OUTPUT_LAYER_SIZE = 1024
w = tf.Variable(tf.truncated_normal(
    [INPUT_SIZE, FC_OUTPUT_LAYER_SIZE], dtype=tf.float32, stddev=1e-1))

# Layer output with no activation function (yet).
fc = tf.matmul(u, w)

# Batch normalization.
fc_bn = tf.contrib.layers.batch_norm(
    fc,
    center=True,
    scale=True,
    is_training=training,
    scope='fc-batch_norm')

# Activation function.
fc_bn_relu = tf.nn.relu(fc_bn)
print(fc_bn_relu)  # Tensor("Relu:0", shape=(?, 1024), dtype=float32)

2D卷积(CNN)层案例

# Input: 640x480 RGB images (whitened input, hence tf.float32).
INPUT_HEIGHT = 480
INPUT_WIDTH = 640
INPUT_CHANNELS = 3
u = tf.placeholder(tf.float32, shape=(None, INPUT_HEIGHT, INPUT_WIDTH, INPUT_CHANNELS))

# CNN params: wights only, no bias as per [Ioffe & Szegedy 2015].
CNN_FILTER_HEIGHT = 3  # Space dimension.
CNN_FILTER_WIDTH = 3  # Space dimension.
CNN_FILTERS = 128
w = tf.Variable(tf.truncated_normal(
    [CNN_FILTER_HEIGHT, CNN_FILTER_WIDTH, INPUT_CHANNELS, CNN_FILTERS],
    dtype=tf.float32, stddev=1e-1))

# Layer output with no activation function (yet).
CNN_LAYER_STRIDE_VERTICAL = 1
CNN_LAYER_STRIDE_HORIZONTAL = 1
CNN_LAYER_PADDING = 'SAME'
cnn = tf.nn.conv2d(
    input=u, filter=w,
    strides=[1, CNN_LAYER_STRIDE_VERTICAL, CNN_LAYER_STRIDE_HORIZONTAL, 1],
    padding=CNN_LAYER_PADDING)

# Batch normalization.
cnn_bn = tf.contrib.layers.batch_norm(
    cnn,
    data_format='NHWC',  # Matching the "cnn" tensor which has shape (?, 480, 640, 128).
    center=True,
    scale=True,
    is_training=training,
    scope='cnn-batch_norm')

# Activation function.
cnn_bn_relu = tf.nn.relu(cnn_bn)
print(cnn_bn_relu)  # Tensor("Relu_1:0", shape=(?, 480, 640, 128), dtype=float32)

3D卷积(CNN3D)层案例

# Input: sequence of 9 160x120 RGB images (whitened input, hence tf.float32).
INPUT_SEQ_LENGTH = 9
INPUT_HEIGHT = 120
INPUT_WIDTH = 160
INPUT_CHANNELS = 3
u = tf.placeholder(tf.float32, shape=(None, INPUT_SEQ_LENGTH, INPUT_HEIGHT, INPUT_WIDTH, INPUT_CHANNELS))

# CNN params: wights only, no bias as per [Ioffe & Szegedy 2015].
CNN3D_FILTER_LENGHT = 3  # Time dimension.
CNN3D_FILTER_HEIGHT = 3  # Space dimension.
CNN3D_FILTER_WIDTH = 3  # Space dimension.
CNN3D_FILTERS = 96
w = tf.Variable(tf.truncated_normal(
    [CNN3D_FILTER_LENGHT, CNN3D_FILTER_HEIGHT, CNN3D_FILTER_WIDTH, INPUT_CHANNELS, CNN3D_FILTERS],
    dtype=tf.float32, stddev=1e-1))

# Layer output with no activation function (yet).
CNN3D_LAYER_STRIDE_TEMPORAL = 1
CNN3D_LAYER_STRIDE_VERTICAL = 1
CNN3D_LAYER_STRIDE_HORIZONTAL = 1
CNN3D_LAYER_PADDING = 'SAME'
cnn3d = tf.nn.conv3d(
    input=u, filter=w,
    strides=[1, CNN3D_LAYER_STRIDE_TEMPORAL, CNN3D_LAYER_STRIDE_VERTICAL, CNN3D_LAYER_STRIDE_HORIZONTAL, 1],
    padding=CNN3D_LAYER_PADDING)

# Batch normalization.
cnn3d_bn = tf.contrib.layers.batch_norm(
    cnn3d,
    data_format='NHWC',  # Matching the "cnn" tensor which has shape (?, 9, 120, 160, 96).
    center=True,
    scale=True,
    is_training=training,
    scope='cnn3d-batch_norm')

# Activation function.
cnn3d_bn_relu = tf.nn.relu(cnn3d_bn)
print(cnn3d_bn_relu)  # Tensor("Relu_2:0", shape=(?, 9, 120, 160, 96), dtype=float32)

我想确定的是上面的代码是否完全实现了 [Ioffe & Szegedy 2015] 中描述的批量标准化。在第二节结束时。 3.2:

For convolutional layers, we additionally want the normalization to obey the convolutional property – so that different elements of the same feature map, at different locations, are normalized in the same way. To achieve this, we jointly normalize all the activations in a minibatch, over all locations. [...] Alg. 2 is modified similarly, so that during inference the BN transform applies the same linear transformation to each activation in a given feature map.

更新我猜上面的代码对于 3D conv 案例也是正确的。事实上，当我定义我的模型时，如果我打印所有可训练变量，我也会看到预期的 beta 和 gamma 变量数量。例如:

Tensor("conv3a/conv3d_weights/read:0", shape=(3, 3, 3, 128, 256), dtype=float32)
Tensor("BatchNorm_2/beta/read:0", shape=(256,), dtype=float32)
Tensor("BatchNorm_2/gamma/read:0", shape=(256,), dtype=float32)

这对我来说看起来没问题，因为由于 BN，每个特征图都学习了一对 beta 和 gamma(总共 256 个)。

[Ioffe & Szegedy 2015]:批量标准化:通过减少内部协变量偏移来加速深度网络训练

最佳答案

这是一篇关于 3D batchnorm 的好帖子，但人们常常没有注意到，batchnorm 可以应用于任何秩大于 1 的张量。您的代码是正确的，但我忍不住在此添加了一些重要说明:

“标准”2D batchnorm(接受 4D 张量)在 tensorflow 中可以比 3D 或更高版本快得多，因为它支持 fused_batch_norm 实现，它适用 one kernel operation :

Fused batch norm combines the multiple operations needed to do batch normalization into a single kernel. Batch norm is an expensive process that for some models makes up a large percentage of the operation time. Using fused batch norm can result in a 12%-30% speedup.

有an issue on GitHub也支持 3D 滤镜，但最近没有任何事件，此时此问题已关闭且未解决。
虽然原始论文规定在 ReLU 激活之前使用 batchnorm(这就是您在上面的代码中所做的)，但有证据表明在激活之后使用 batchnorm 可能更好。这是对 Keras GitHub 的评论弗朗索瓦·肖莱特:

... I can guarantee that recent code written by Christian [Szegedy] applies relu before BN. It is still occasionally a topic of debate, though.
对于任何有兴趣在实践中应用归一化思想的人来说，最近有这个思想的研究进展，即 weight normalization和 layer normalization ，它修复了原始 batchnorm 的某些缺点，例如它们更适用于 LSTM 和循环网络。

关于python - 在 TensorFlow 中使用 3D 卷积进行批量归一化，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/41830723/

python - 在 TensorFlow 中使用 3D 卷积进行批量归一化

全连接(FC)案例

2D卷积(CNN)层案例

3D卷积(CNN3D)层案例

上一篇：python - 为什么 Windows 上的 Tkinter 渲染质量要差得多？

下一篇：python - 使用 PyOpenGL 绘图时出现问题