为了生成类似 Google-Dream
的图像,我尝试修改输入图像,并使用梯度上升` 优化 inceptionV3
网络。
Desired effect: https://github.com/google/deepdream/blob/master/dream.ipynb
(for more info on this, refer to [https://research.googleblog.com/2015/06/inceptionism-going-deeper-into-neural.html.)
就此而言,我使用迁移学习
方法微调了初始网络,并生成了模型:inceptionv3-ft.model
model.summary()
打印以下架构(由于空间限制,此处缩短):
__________________________________________________________________________________________________
Layer (type) Output Shape Param # Connected to
==================================================================================================
input_1 (InputLayer) (None, None, None, 3 0
__________________________________________________________________________________________________
conv2d_1 (Conv2D) (None, None, None, 3 864 input_1[0][0]
__________________________________________________________________________________________________
batch_normalization_1 (BatchNor (None, None, None, 3 96 conv2d_1[0][0]
__________________________________________________________________________________________________
activation_1 (Activation) (None, None, None, 3 0 batch_normalization_1[0][0]
__________________________________________________________________________________________________
conv2d_2 (Conv2D) (None, None, None, 3 9216 activation_1[0][0]
__________________________________________________________________________________________________
batch_normalization_2 (BatchNor (None, None, None, 3 96 conv2d_2[0][0]
__________________________________________________________________________________________________
activation_2 (Activation) (None, None, None, 3 0 batch_normalization_2[0][0]
__________________________________________________________________________________________________
conv2d_3 (Conv2D) (None, None, None, 6 18432 activation_2[0][0]
__________________________________________________________________________________________________
batch_normalization_3 (BatchNor (None, None, None, 6 192 conv2d_3[0][0]
__________________________________________________________________________________________________
activation_3 (Activation) (None, None, None, 6 0 batch_normalization_3[0][0]
__________________________________________________________________________________________________
max_pooling2d_1 (MaxPooling2D) (None, None, None, 6 0 activation_3[0][0]
__________________________________________________________________________________________________
conv2d_4 (Conv2D) (None, None, None, 8 5120 max_pooling2d_1[0][0]
__________________________________________________________________________________________________
batch_normalization_4 (BatchNor (None, None, None, 8 240 conv2d_4[0][0]
__________________________________________________________________________________________________
activation_4 (Activation) (None, None, None, 8 0 batch_normalization_4[0][0]
__________________________________________________________________________________________________
conv2d_5 (Conv2D) (None, None, None, 1 138240 activation_4[0][0]
__________________________________________________________________________________________________
batch_normalization_5 (BatchNor (None, None, None, 1 576 conv2d_5[0][0]
__________________________________________________________________________________________________
activation_5 (Activation) (None, None, None, 1 0 batch_normalization_5[0][0]
__________________________________________________________________________________________________
max_pooling2d_2 (MaxPooling2D) (None, None, None, 1 0 activation_5[0][0]
__________________________________________________________________________________________________
conv2d_9 (Conv2D) (None, None, None, 6 12288 max_pooling2d_2[0][0]
__________________________________________________________________________________________________
batch_normalization_9 (BatchNor (None, None, None, 6 192 conv2d_9[0][0]
__________________________________________________________________________________________________
activation_9 (Activation) (None, None, None, 6 0 batch_normalization_9[0][0]
__________________________________________________________________________________________________
conv2d_7 (Conv2D) (None, None, None, 4 9216 max_pooling2d_2[0][0]
__________________________________________________________________________________________________
conv2d_10 (Conv2D) (None, None, None, 9 55296 activation_9[0][0]
__________________________________________________________________________________________________
batch_normalization_7 (BatchNor (None, None, None, 4 144 conv2d_7[0][0]
__________________________________________________________________________________________________
batch_normalization_10 (BatchNo (None, None, None, 9 288 conv2d_10[0][0]
__________________________________________________________________________________________________
activation_7 (Activation) (None, None, None, 4 0 batch_normalization_7[0][0]
__________________________________________________________________________________________________
activation_10 (Activation) (None, None, None, 9 0 batch_normalization_10[0][0]
__________________________________________________________________________________________________
average_pooling2d_1 (AveragePoo (None, None, None, 1 0 max_pooling2d_2[0][0]
__________________________________________________________________________________________________
conv2d_6 (Conv2D) (None, None, None, 6 12288 max_pooling2d_2[0][0]
__________________________________________________________________________________________________
(...)
mixed9_1 (Concatenate) (None, None, None, 7 0 activation_88[0][0]
activation_89[0][0]
__________________________________________________________________________________________________
concatenate_2 (Concatenate) (None, None, None, 7 0 activation_92[0][0]
activation_93[0][0]
__________________________________________________________________________________________________
activation_94 (Activation) (None, None, None, 1 0 batch_normalization_94[0][0]
__________________________________________________________________________________________________
mixed10 (Concatenate) (None, None, None, 2 0 activation_86[0][0]
mixed9_1[0][0]
concatenate_2[0][0]
activation_94[0][0]
__________________________________________________________________________________________________
global_average_pooling2d_1 (Glo (None, 2048) 0 mixed10[0][0]
__________________________________________________________________________________________________
dense_1 (Dense) (None, 1024) 2098176 global_average_pooling2d_1[0][0]
__________________________________________________________________________________________________
dense_2 (Dense) (None, 1) 1025 dense_1[0][0]
==================================================================================================
Total params: 23,901,985
Trainable params: 18,315,137
Non-trainable params: 5,586,848
____________________________________
<小时/>
现在,我使用以下设置和代码来尝试调整和激活特定的高层对象,以使完整的对象出现在输入图像上:
settings = {
'features': {
'mixed2': 0.,
'mixed3': 0.,
'mixed4': 0.,
'mixed10': 0., #highest
},
}
model = load_model('inceptionv3-ft.model')
#Get the symbolic outputs of each "key" layer (we gave them unique names).
layer_dict = dict([(layer.name, layer) for layer in model.layers])
#Define the loss.
loss = K.variable(0.)
for layer_name in settings['features']:
# Add the L2 norm of the features of a layer to the loss.
assert layer_name in layer_dict.keys(), 'Layer ' + layer_name + ' not found in model.'
coeff = settings['features'][layer_name]
x = layer_dict[layer_name].output
print (x)
# We avoid border artifacts by only involving non-border pixels in the loss.
scaling = K.prod(K.cast(K.shape(x), 'float32'))
if K.image_data_format() == 'channels_first':
loss += coeff * K.sum(K.square(x[:, :, 2: -2, 2: -2])) / scaling
else:
loss += coeff * K.sum(K.square(x[:, 2: -2, 2: -2, :])) / scaling
# Compute the gradients of the dream wrt the loss.
grads = K.gradients(loss, dream)[0]
# Normalize gradients.
grads /= K.maximum(K.mean(K.abs(grads)), K.epsilon())
# Set up function to retrieve the value
# of the loss and gradients given an input image.
outputs = [loss, grads]
fetch_loss_and_grads = K.function([dream], outputs)
def eval_loss_and_grads(x):
outs = fetch_loss_and_grads([x])
loss_value = outs[0]
grad_values = outs[1]
return loss_value, grad_values
def resize_img(img, size):
img = np.copy(img)
if K.image_data_format() == 'channels_first':
factors = (1, 1,
float(size[0]) / img.shape[2],
float(size[1]) / img.shape[3])
else:
factors = (1,
float(size[0]) / img.shape[1],
float(size[1]) / img.shape[2],
1)
return scipy.ndimage.zoom(img, factors, order=1)
def gradient_ascent(x, iterations, step, max_loss=None):
for i in range(iterations):
loss_value, grad_values = eval_loss_and_grads(x)
if max_loss is not None and loss_value > max_loss:
break
print('..Loss value at', i, ':', loss_value)
x += step * grad_values
return x
def save_img(img, fname):
pil_img = deprocess_image(np.copy(img))
scipy.misc.imsave(fname, pil_img)
"""Process:
- Load the original image.
- Define a number of processing scales (i.e. image shapes),
from smallest to largest.
- Resize the original image to the smallest scale.
- For every scale, starting with the smallest (i.e. current one):
- Run gradient ascent
- Upscale image to the next scale
- Reinject the detail that was lost at upscaling time
- Stop when we are back to the original size.
To obtain the detail lost during upscaling, we simply
take the original image, shrink it down, upscale it,
and compare the result to the (resized) original image.
"""
# Playing with these hyperparameters will also allow you to achieve new effects
step = 0.01 # Gradient ascent step size
num_octave = 3 # Number of scales at which to run gradient ascent
octave_scale = 1.4 # Size ratio between scales
iterations = 20 # Number of ascent steps per scale
max_loss = 10.
img = preprocess_image(base_image_path)
if K.image_data_format() == 'channels_first':
original_shape = img.shape[2:]
else:
original_shape = img.shape[1:3]
successive_shapes = [original_shape]
for i in range(1, num_octave):
shape = tuple([int(dim / (octave_scale ** i)) for dim in original_shape])
successive_shapes.append(shape)
successive_shapes = successive_shapes[::-1]
original_img = np.copy(img)
shrunk_original_img = resize_img(img, successive_shapes[0])
for shape in successive_shapes:
print('Processing image shape', shape)
img = resize_img(img, shape)
img = gradient_ascent(img,
iterations=iterations,
step=step,
max_loss=max_loss)
upscaled_shrunk_original_img = resize_img(shrunk_original_img, shape)
same_size_original = resize_img(original_img, shape)
lost_detail = same_size_original - upscaled_shrunk_original_img
img += lost_detail
shrunk_original_img = resize_img(original_img, shape)
save_img(img, fname=result_prefix + '.png')
<小时/>
但无论我调整什么设置值,我似乎都只激活低级功能,例如边缘和曲线,或者充其量是混合功能。
理想情况下,设置应该能够访问各个层直至 channel 和单元,即
Layer4c - Unit 0,但我在 Keras
文档中没有找到任何实现该目的的方法:
see this: https://distill.pub/2017/feature-visualization/appendix/googlenet/4c.html
我了解到,使用 Caffe
框架可以为您提供更大的灵 active ,但系统范围内的安装是一个依赖 hell 。
那么,如何在 Keras 框架或 Caffe 以外的任何其他框架中激活该网络上的各个类?
最佳答案
对我有用的是以下内容:
为了避免在我的计算机上安装所有依赖项和 caffe
,我提取了此 Docker Image其中包含所有深度学习框架。
几分钟之内,我就拥有了 caffe
(以及 keras
、tensorflow
、CUDA
、theano
、lasagne
、torch
、openCV
)安装在主机中具有共享文件夹的容器中。
然后我运行了这个 caffe
脚本 -->
Deep Dream ,然后瞧。
由 caffe
生成的模型更加资源丰富,并且允许将上述类“打印”在输入图像或噪声上。
关于machine-learning - Keras - 可视化 CNN 网络上的类,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/48955104/