python - Keras 编写一个接受图像的循环层

这个问题也作为一个问题存在:https://github.com/fchollet/keras/issues/4266

我正在尝试实现一个卷积 - LSTM。它是一个循环层，接受图像作为输入并使用卷积来计算 LSTM 中的各个门。因此，我尝试对 Recurrent 进行子类化并更改输入维度。

为了做到这一点，我阅读了 writing a custom layer 上的文档。并按照建议阅读源代码以了解幕后发生的事情。

我阅读了 recurrent.py 的代码，认为结构很清晰:您继承自 Recurrent 但不覆盖调用，而是提供自定义 step >function 和 Recurrent 将负责将步骤应用于序列中的每个条目。

作为起点，我获取了 GRU 的代码，并尝试使其适应我的需求。我想结合 2D 卷积和 GRU(通常是 LSTM，但这并不重要 - 我决定实现 C-GRU)

这个想法是在模型中使用通常的 2D 卷积，输出 3 个特征。这 3 个特征将用作 GRU 中的 r、z 和 h 激活。在自定义层中，我只需要跟踪状态。我的层甚至没有可训练的权重，它们包含在卷积中。

对原始 GRU 代码的显着更改是:

def step(self, x, states):
    # the previous state is a 2D vector
    h_tm1 = states[0]  # previous memory

    z=self.inner_activation(x[:,0,:,:])
    r=self.inner_activation(x[:,1,:,:])
    hh=self.activation(x[:,2,:,:])

    h = z * h_tm1 + (1 - z) * hh
    return h, [h]

如您所见，我只是重用了卷积中的特征。乘法应按元素进行。我将对此进行调试以确保它具有预期的行为。

由于状态变为 2D，我也更改了 initial_state:

def get_initial_states(self, x):
    initial_state=K.zeros_like(x)   # (samples, timesteps, input_dim)
                                    # input_dim = (3, x_dim, y_dim)
    initial_state=K.sum(initial_state, axis=(1,2)) # (samples, x_dim, y_dim)
    return initial_state

output_shape 似乎是针对循环网络进行硬编码的。我将覆盖它:

def get_output_shape_for(self, input_shape):
    #TODO: this is hardcoding for th layout
    return (input_shape[0],1,input_shape[2],input_shape[3])

另一个硬编码的东西是input_spec。在构造函数中，在调用 super 之后，我用输入维度覆盖它:

class CGRU(Recurrent):
    def __init__(self,
                 init='glorot_uniform', inner_init='orthogonal',
                 activation='tanh', inner_activation='hard_sigmoid', **kwargs):

        self.init = initializations.get(init)
        self.inner_init = initializations.get(inner_init)
        self.activation = activations.get(activation)
        self.inner_activation = activations.get(inner_activation)

        #removing the regularizers and the dropout

        super(CGRU, self).__init__(**kwargs)

        # this seems necessary in order to accept 5 input dimensions
        # (samples, timesteps, features, x, y)
        self.input_spec=[InputSpec(ndim=5)]

还有其他一些小变化。您可以在这里找到完整的代码:http://pastebin.com/60ztPis3

运行时，会产生以下错误消息:

theano.tensor.var.AsTensorError: ('Cannot convert [None] to TensorType', )

pastebin 上的整个错误消息:http://pastebin.com/Cdmr20Yn

我正在尝试调试代码。但这相当困难，它深入到 Keras 源代码。一件事:执行永远不会达到我的自定义步骤函数。所以显然配置中出现了问题。在Recurrent的call函数中，input_shape是一个包含条目(None, 40,1,40,40)

的元组

这是正确的。我的序列有 40 个元素。每一张都是一张具有 1 个特征、分辨率为 40x40 的图像。我正在使用“th”布局。

这是Recurrent的call函数。我的代码到达了对 K.rnn 的调用，该设置对我来说看起来很好。 input_spec 似乎是正确的。但在 K.rnn 期间它崩溃了。没有达到我的步进功能。

def call(self, x, mask=None):
    # input shape: (nb_samples, time (padded with zeros), input_dim)
    # note that the .build() method of subclasses MUST define
    # self.input_spec with a complete input shape.
    input_shape = self.input_spec[0].shape
    if self.stateful:
        initial_states = self.states
    else:
        initial_states = self.get_initial_states(x)
    constants = self.get_constants(x)
    preprocessed_input = self.preprocess_input(x)

    last_output, outputs, states = K.rnn(self.step, preprocessed_input,
                                         initial_states,
                                         go_backwards=self.go_backwards,
                                         mask=mask,
                                         constants=constants,
                                         unroll=self.unroll,
                                         input_length=input_shape[1])

此时我迷路了。在我看来，我缺少配置的某些部分。

更新:

嗯，现在我遇到了一个奇怪的问题: 我的代码现在是:

# this is the actual input, fed to the network
inputs = Input((1, 40, 40, 40))

# now reshape to a sequence
reshaped = Reshape((40, 1, 40, 40))(inputs)

conv_inputs = Input((1, 40, 40))
conv1 = Convolution2D(3, 3, 3, activation='relu', border_mode='same')(conv_inputs)
convmodel = Model(input=conv_inputs, output=conv1)
convmodel.summary()

#apply the segmentation to each layer
time_dist=TimeDistributed(convmodel)(reshaped)

from cgru import CGRU

up=CGRU(go_backwards=False, return_sequences=True, name="up")

up=up(time_dist)

output=Reshape([1,40,40,40])(up)

model=Model(input=inputs, output=output)
print(model.summary())

在以 Theano 作为后端的计算机上，这是可行的。模型总结为:

________________________________________________________________________________
Layer (type)                     Output Shape          Param #     Connected to                     
====================================================================================================
input_1 (InputLayer)             (None, 1, 40, 40, 40) 0                                            
____________________________________________________________________________________________________
reshape_1 (Reshape)              (None, 40, 1, 40, 40) 0           input_1[0][0]                    
____________________________________________________________________________________________________
timedistributed_1 (TimeDistribute(None, 40, 3, 40, 40) 30          reshape_1[0][0]                  
____________________________________________________________________________________________________
up (CGRU)                        (None, 40, 1, 40, 40) 0           timedistributed_1[0][0]          
____________________________________________________________________________________________________
reshape_2 (Reshape)              (None, 1, 40, 40, 40) 0           up[0][0]                         
====================================================================================================
Total params: 30
____________________________________________________________________________________________________

但是在以tensorflow为后端的计算机上，代码失败。我为 convmodel 添加了 model.summary()。到目前为止它的工作原理:

Layer (type)                     Output Shape          Param #     Connected to                     
====================================================================================================
input_4 (InputLayer)             (None, 1, 40, 40)     0                                            
____________________________________________________________________________________________________
convolution2d_2 (Convolution2D)  (None, 3, 40, 40)     30          input_4[0][0]                    
====================================================================================================
Total params: 30

但是程序崩溃了:

ValueError: Shapes (?, ?, 40, 40) and (40, ?, 40) are not compatible

Theano 和 Tensorflow 似乎对 batch_size 有不同(且不兼容)的占位符？请注意，我将 Keras 配置为在这两种情况下使用“th”图像布局。

最佳答案

我认为问题已经解决了。 initial_states 需要一个列表，并且 output_dimension 必须修复。现在看来有效了。底层后端还存在一些其他问题(例如 Theano 与 Tensorflow)，但这似乎与这个问题无关。

一旦我确定问题确实已得到解决并且该层能够学习，我将使用所有必要的步骤更新此答案。

关于python - Keras 编写一个接受图像的循环层，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/40381101/

python - Keras 编写一个接受图像的循环层

上一篇：r - 为什么我们在 CART 模型中得不到平滑的 ROC 曲线

下一篇：python - 矢量化我自己的文本数据