python - ConvNet 中的 col2im 实现

我正在尝试仅使用 numpy 实现CNN。

在进行反向传播时，我发现我必须使用 col2im 来 reshape dx，所以我检查了 https://github.com/huyouare/CS231n/blob/master/assignment2/cs231n/im2col.py 的实现。 .

import numpy as np


def get_im2col_indices(x_shape, field_height, field_width, padding=1, stride=1):
  # First figure out what the size of the output should be
  N, C, H, W = x_shape
  assert (H + 2 * padding - field_height) % stride == 0
  assert (W + 2 * padding - field_height) % stride == 0
  out_height = (H + 2 * padding - field_height) / stride + 1
  out_width = (W + 2 * padding - field_width) / stride + 1

  i0 = np.repeat(np.arange(field_height), field_width)
  i0 = np.tile(i0, C)
  i1 = stride * np.repeat(np.arange(out_height), out_width)
  j0 = np.tile(np.arange(field_width), field_height * C)
  j1 = stride * np.tile(np.arange(out_width), out_height)
  i = i0.reshape(-1, 1) + i1.reshape(1, -1)
  j = j0.reshape(-1, 1) + j1.reshape(1, -1)

  k = np.repeat(np.arange(C), field_height * field_width).reshape(-1, 1)

  return (k, i, j)


def im2col_indices(x, field_height, field_width, padding=1, stride=1):
  """ An implementation of im2col based on some fancy indexing """
  # Zero-pad the input
  p = padding
  x_padded = np.pad(x, ((0, 0), (0, 0), (p, p), (p, p)), mode='constant')

  k, i, j = get_im2col_indices(x.shape, field_height, field_width, padding,
                               stride)

  cols = x_padded[:, k, i, j]
  C = x.shape[1]
  cols = cols.transpose(1, 2, 0).reshape(field_height * field_width * C, -1)
  return cols


def col2im_indices(cols, x_shape, field_height=3, field_width=3, padding=1,
                   stride=1):
  """ An implementation of col2im based on fancy indexing and np.add.at """
  N, C, H, W = x_shape
  H_padded, W_padded = H + 2 * padding, W + 2 * padding
  x_padded = np.zeros((N, C, H_padded, W_padded), dtype=cols.dtype)
  k, i, j = get_im2col_indices(x_shape, field_height, field_width, padding,
                               stride)
  cols_reshaped = cols.reshape(C * field_height * field_width, -1, N)
  cols_reshaped = cols_reshaped.transpose(2, 0, 1)
  np.add.at(x_padded, (slice(None), k, i, j), cols_reshaped)
  if padding == 0:
    return x_padded
  return x_padded[:, :, padding:-padding, padding:-padding]

pass

我预计当我将 X 放入 im2col_indices 时，并将该输出放回 col2im_indices 将返回相同的 X，但事实并非如此。

我不明白 col2im 实际上是做什么的。

最佳答案

如果我是对的，输出不是相同的 X，因为 X 的每个单元格都转换为多个 col，并且在 im2col_indices 期间被相乘。

假设您有一个像这样的简单图像X

1 2 3 4 5 6 7 8 9

然后使用内核大小 3、步幅 1 和相同 填充对其进行转换，结果将是

0 0 0 0 1 2 0 4 5 0 0 0 1 2 3 4 5 6 0 0 0 2 3 0 5 6 0 0 1 2 0 4 5 0 7 8 1 2 3 4 5 6 7 8 9 2 3 0 5 6 0 8 9 0 0 4 5 0 7 8 0 0 0 4 5 6 7 8 9 0 0 0 5 6 0 8 9 0 0 0 0 * * * *

如您所见，第一个值为 1 的单元格显示在四个 col 中:0、1、3、4。

im2col_indices 首先将具有填充大小的图像初始化为零，然后将每个 col 添加到其中。关注第一个单元格，过程应该是这样的

1.零初始化图像

0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

2.添加col 0

0 0 0 0 0 0 0 0 - - 0 0 0 0 0 0 0 0 0 0 0 1 2 - - 0 1 2 0 0 0 0 0 0 0 + 0 4 5 - - = 0 4 5 0 0 0 0 0 0 0 - - - - - 0 0 0 0 0 0 0 0 0 0 - - - - - 0 0 0 0 0

3.添加第1列

0 0 0 0 0 - 0 0 0 - 0 0 0 0 0 0 1 2 0 0 - 1 2 3 - 0 2 4 3 0 0 4 5 0 0 + - 4 5 6 - = 0 8 10 6 0 0 0 0 0 0 - - - - - 0 0 0 0 0 0 0 0 0 0 - - - - - 0 0 0 0 0

4.添加第3列

0 0 0 0 0 - - - - - 0 0 0 0 0 0 2 4 3 0 0 1 2 - - 0 3 6 3 0 0 8 10 6 0 + 0 4 5 - - = 0 12 15 6 0 0 0 0 0 0 0 7 8 - - 0 7 8 0 0 0 0 0 0 0 - - - - - 0 0 0 0 0

5.添加第4列

0 0 0 0 0 - - - - - 0 0 0 0 0 0 3 6 3 0 - 1 2 3 - 0 4 8 6 0 0 12 15 6 0 + - 4 5 6 - = 0 16 20 12 0 0 7 8 0 0 - 7 8 9 - 0 14 16 9 0 0 0 0 0 0 - - - - - 0 0 0 0 0

转换回来后，第一个单元格会乘以 4。对于这个简单的图像， col2im_indices(im2col_indices(X)) 应该给你

4 12 12 24 45 36 28 48 36

与原始图像相比，四个角单元1 3 7 9乘以4，四个边缘单元2 4 6 8乘以6，中心单元 5 乘以 9。

对于大图像，大多数单元格都会乘以 9，我认为这大致意味着你的学习率实际上比你想象的要大 9 倍。

关于python - ConvNet 中的 col2im 实现，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/51703367/

python - ConvNet 中的 col2im 实现

上一篇：python - 我们可以从脚本中的变量传递 pytest 中的 html 日志路径吗

下一篇：python - 当路径不是当前目录时，os.path.isdir() 无法识别目录