TensorFlow实战-TensorFlow实现经典卷积神经网络

2019年5月11日 200次阅读来源: mov觉得高数好难

本章将介绍AlexNet，VGGNet，Google Inception Net，和ResNet
AlexNet
其主要使用到的新技术：
（1）成功使用ReLU作为CNN的激活函数，解决了Sigmoid在网络较深时的梯度弥散问题；
（2）训练使用Dropout随机忽略一部分神经元，以避免过拟合；
（3）在CNN中使用重叠的最大池化，避免平均池化的模糊化效果。并且提出让步长比池化核的尺寸小，这样池化层的输出之间会有重叠和覆盖，提升了特征的丰富性；
（4）提出了LRN层，对局部神经元的活动创建竞争机制，使得其中响应比较大的值变得相对更大，并抑制其他反馈较小的神经元，增加了模型泛化能力；
（5）使用CUDA加速深度卷积网络的训练，同时AlexNet的设计让GPU之间的通信只在网络某些曾进行，控制通信的性能损耗；
（6）数据增强，大大减轻过拟合，提升泛化能力；
由于训练时间过长，本章将不设计实际数据的训练，只对它每个batch的前馈计算（forward）和反馈计算（backward）的速度进行测试。这里使用随机图片来计算。
首先载入几个库，然后定义主要参数：

from datetime import datetime
import math
import time
import tensorflow as tf
batch_size=32
num_batches=100

显示网络每一层结构，展示其姓名和尺寸：

def print_activations(t):
    print(t.op.name, ' ', t.get_shape().as_list())

接下来设计AlexNet的网络结构。先定义inference，接受images作为输入，返回最后一层pool5（第五个池化层）及parameters（需要训练的参数）。首先是第一个卷积层conv1。tf.name_scope()可以将scope内生成的Variable自动命名为conv1/xxx。然后定义第一个卷积层，先初始化卷积核参数kernel。然后进行卷积操作，每隔4×4取样一次，卷积核大小11×11。将卷积层的biases全部初始化为0，再加起来得到bias，并使用激活函数进行非线性处理，最后打印conv1，并且添加kernel和biases到parameters：

def inference(images):
    parameters = []
    # conv1
    with tf.name_scope('conv1') as scope:
        kernel = tf.Variable(tf.truncated_normal([11, 11, 3, 64], dtype=tf.float32,
                                                 stddev=1e-1), name='weights')
        conv = tf.nn.conv2d(images, kernel, [1, 4, 4, 1], padding='SAME')
        biases = tf.Variable(tf.constant(0.0, shape=[64], dtype=tf.float32),
                             trainable=True, name='biases')
        bias = tf.nn.bias_add(conv, biases)
        conv1 = tf.nn.relu(bias, name=scope)
        print_activations(conv1)
        parameters += [kernel, biases]

添加LRN层和最大池化层。参数基本是AlexNet论文中的推荐值。然后进行最大池化处理。VALID意思为取样不能超过边框，不像pool那样填充边界外的点：

  # pool1
    lrn1 = tf.nn.lrn(conv1, 4, bias=1.0, alpha=0.001 / 9.0, beta=0.75, name='lrn1')
    pool1 = tf.nn.max_pool(lrn1,
                           ksize=[1, 3, 3, 1],
                           strides=[1, 2, 2, 1],
                           padding='VALID',
                           name='pool1')
    print_activations(pool1)

接下来设计第二个卷积层，只有几个参数不同：

  # conv2
    with tf.name_scope('conv2') as scope:
        kernel = tf.Variable(tf.truncated_normal([5, 5, 64, 192], dtype=tf.float32,
                                                 stddev=1e-1), name='weights')
        conv = tf.nn.conv2d(pool1, kernel, [1, 1, 1, 1], padding='SAME')
        biases = tf.Variable(tf.constant(0.0, shape=[192], dtype=tf.float32),
                             trainable=True, name='biases')
        bias = tf.nn.bias_add(conv, biases)
        conv2 = tf.nn.relu(bias, name=scope)
        parameters += [kernel, biases]
    print_activations(conv2)

接下来同样先LRN处理，再进行最大池化处理，参数和之前完全一样：

  # pool2
    lrn2 = tf.nn.lrn(conv2, 4, bias=1.0, alpha=0.001 / 9.0, beta=0.75, name='lrn2')
    pool2 = tf.nn.max_pool(lrn2,
                           ksize=[1, 3, 3, 1],
                           strides=[1, 2, 2, 1],
                           padding='VALID',
                           name='pool2')
    print_activations(pool2)

第三个卷积层，同样是参数不同：

  # conv3
    with tf.name_scope('conv3') as scope:
        kernel = tf.Variable(tf.truncated_normal([3, 3, 192, 384],
                                                 dtype=tf.float32,
                                                 stddev=1e-1), name='weights')
        conv = tf.nn.conv2d(pool2, kernel, [1, 1, 1, 1], padding='SAME')
        biases = tf.Variable(tf.constant(0.0, shape=[384], dtype=tf.float32),
                             trainable=True, name='biases')
        bias = tf.nn.bias_add(conv, biases)
        conv3 = tf.nn.relu(bias, name=scope)
        parameters += [kernel, biases]
        print_activations(conv3)

第四层和第五层也是修改参数：

  # conv4
    with tf.name_scope('conv4') as scope:
        kernel = tf.Variable(tf.truncated_normal([3, 3, 384, 256],
                                                 dtype=tf.float32,
                                                 stddev=1e-1), name='weights')
        conv = tf.nn.conv2d(conv3, kernel, [1, 1, 1, 1], padding='SAME')
        biases = tf.Variable(tf.constant(0.0, shape=[256], dtype=tf.float32),
                             trainable=True, name='biases')
        bias = tf.nn.bias_add(conv, biases)
        conv4 = tf.nn.relu(bias, name=scope)
        parameters += [kernel, biases]
        print_activations(conv4)

  # conv5
    with tf.name_scope('conv5') as scope:
        kernel = tf.Variable(tf.truncated_normal([3, 3, 256, 256],
                                                 dtype=tf.float32,
                                                 stddev=1e-1), name='weights')
        conv = tf.nn.conv2d(conv4, kernel, [1, 1, 1, 1], padding='SAME')
        biases = tf.Variable(tf.constant(0.0, shape=[256], dtype=tf.float32),
                             trainable=True, name='biases')
        bias = tf.nn.bias_add(conv, biases)
        conv5 = tf.nn.relu(bias, name=scope)
        parameters += [kernel, biases]
        print_activations(conv5)

下面是一个最大池化层：

  # pool5
    pool5 = tf.nn.max_pool(conv5,
                           ksize=[1, 3, 3, 1],
                           strides=[1, 2, 2, 1],
                           padding='VALID',
                           name='pool5')
    print_activations(pool5)

    return pool5, parameters

至此函数就完成了，它可以创建AlexNet的卷积部分。还需要添加3个全连接层，隐含层节点数分别为4096,4096,1000。
接下来实现一个评估AlexNet每轮计算时间的函数time_tensorflow_run。第二个变量是评测的运算算子，第三个变量是测试的名称：

def time_tensorflow_run(session, target, info_string):
#  """Run the computation to obtain the target tensor and print timing stats.
#
#  Args:
#    session: the TensorFlow session to run the computation under.
#    target: the target Tensor that is passed to the session's run() function.
#    info_string: a string summarizing this run, to be printed with the stats.
#
#  Returns:
#    None
#  """
    num_steps_burn_in = 10
    total_duration = 0.0
    total_duration_squared = 0.0

我们进行num_batches+num_step_burn_in次迭代计算，使用time.time()计算时间，每次迭代通过session.run(target)执行。每10轮迭代显示当前迭代所需要的时间。同时每轮将total_duration和total_duration_squared累加，以便后面计算每轮耗时的均值和标准差：

    for i in range(num_batches + num_steps_burn_in):
        start_time = time.time()
        _ = session.run(target)
        duration = time.time() - start_time
        if i >= num_steps_burn_in:
            if not i % 10:
                print ('%s: step %d, duration = %.3f' %
                       (datetime.now(), i - num_steps_burn_in, duration))
            total_duration += duration
            total_duration_squared += duration * duration

循环结束后计算平均耗时mn和标准差sd：

    mn = total_duration / num_batches
    vr = total_duration_squared / num_batches - mn * mn
    sd = math.sqrt(vr)
    print ('%s: %s across %d steps, %.3f +/- %.3f sec / batch' %
           (datetime.now(), info_string, num_batches, mn, sd))

接下来是主函数。首先使用with tf.Graph().as_default()定义默认的Graph方便后面使用。然后使用预先定义的inference函数构建整个AlexNet网络，得到最后一个池化层的输出pool5和网络中需要训练的参数集合parameters。然后初始化所有参数：

def run_benchmark():
#  """Run the benchmark on AlexNet."""
    with tf.Graph().as_default():
    # Generate some dummy images.
        image_size = 224
    # Note that our padding definition is slightly different the cuda-convnet.
    # In order to force the model to start with the same activations sizes,
    # we add 3 to the image_size and employ VALID padding above.
        images = tf.Variable(tf.random_normal([batch_size,
                                           image_size,
                                           image_size, 3],
                                          dtype=tf.float32,
                                          stddev=1e-1))

    # Build a Graph that computes the logits predictions from the
    # inference model.
        pool5, parameters = inference(images)

    # Build an initialization operation.
        init = tf.global_variables_initializer()

    # Start running operations on the Graph.
        config = tf.ConfigProto()
        config.gpu_options.allocator_type = 'BFC'
        sess = tf.Session(config=config)
        sess.run(init)

下面进行forward评测，这里直接使用time_tensorflow_run统计运算时间，传入的target就是pool5。然后进行backward即训练过程的评测。grad求相对于所有模型参数的梯度，这样就模拟了训练过程。最后执行主函数：

    # Run the forward benchmark.
        time_tensorflow_run(sess, pool5, "Forward")

    # Add a simple objective so we can calculate the backward pass.
        objective = tf.nn.l2_loss(pool5)
    # Compute the gradient with respect to all the parameters.
        grad = tf.gradients(objective, parameters)
    # Run the backward benchmark.
        time_tensorflow_run(sess, grad, "Forward-backward")
run_benchmark()

应用CNN的主要瓶颈还是在训练，用CNN做预测问题不大。
VGGNet
VGGNet论文中全部使用了3×3的卷积核和2×2的池化核，通过不断加深网络结构来提升性能。其中经常出现多个完全一样的3×3卷积层，两个3×3的卷积层串联相当于1个5×5的卷积层，即一个像素会跟5×5的像素产生关联，可以说感受野5×5。
VGG训练的时候有小技巧，先训练级别A的简单网络，再复用A网络的权重来初始化后面的几个复杂模型，这样训练模型的收敛速度更快。
作者总结了一下结论：
①LRN层作用不大；
②越深的网络效果越好；
③1×1的卷积也是很有效的，但没有3×3的卷积好，大一些的卷积核可以学习更大的空间特征
（下面仅记录和其他模型不一样的，或者有特点的代码）
用来创建卷积层并把本层的参数存入参数列表。get_shape()[-1].value获取输入input_op的通道数。

def conv_op(input_op, name, kh, kw, n_out, dh, dw, p):
    n_in = input_op.get_shape()[-1].value

    with tf.name_scope(name) as scope:
        kernel = tf.get_variable(scope+"w",
                                 shape=[kh, kw, n_in, n_out],
                                 dtype=tf.float32, 
                                 initializer=tf.contrib.layers.xavier_initializer_conv2d())
        conv = tf.nn.conv2d(input_op, kernel, (1, dh, dw, 1), padding='SAME')
        bias_init_val = tf.constant(0.0, shape=[n_out], dtype=tf.float32)
        biases = tf.Variable(bias_init_val, trainable=True, name='b')
        z = tf.nn.bias_add(conv, biases)
        activation = tf.nn.relu(z, name=scope)
        p += [kernel, biases]
        return activation

下面是全连接层创建函数fc_op()。先获取输入input_op的通道数，再使用tf.get_variable创建全连接层的参数：

def fc_op(input_op, name, n_out, p):
    n_in = input_op.get_shape()[-1].value

    with tf.name_scope(name) as scope:
        kernel = tf.get_variable(scope+"w",
                                 shape=[n_in, n_out],
                                 dtype=tf.float32, 
                                 initializer=tf.contrib.layers.xavier_initializer())
        biases = tf.Variable(tf.constant(0.1, shape=[n_out], dtype=tf.float32), name='b')
        activation = tf.nn.relu_layer(input_op, kernel, biases, name=scope)
        p += [kernel, biases]
        return activation

我们将第5段卷积网络的输出结果进行扁平化，使用tf.reshape函数将每个样本化为长度为 7x7x512=25088的一维向量：

shp = pool5.get_shape()
    flattened_shape = shp[1].value * shp[2].value * shp[3].value
    resh1 = tf.reshape(pool5, [-1, flattened_shape], name="resh1")

下面定义评测的主函数run_benchmark，我们的目标依然是仅评测forward和backward的运算性能，并不进行实质的训练和预测。首先生成224×224的随机图片，方法和AlexNet中一样，通过tf.random_normal函数生成标准差为0.1的正态分布的随机数：

def run_benchmark():
    with tf.Graph().as_default():
        image_size = 224
        images = tf.Variable(tf.random_normal([batch_size,
                                               image_size,
                                               image_size, 3],
                                               dtype=tf.float32,
                                               stddev=1e-1))

接下来创建keep_prob的placeholder，并调用inference_op函数构建VGGNet-16的网络结构，获得predictions、softmax、fc8和参数列表p：

        keep_prob = tf.placeholder(tf.float32)
        predictions, softmax, fc8, p = inference_op(images, keep_prob)

Google Inception Net
Inception V1降低参数量的目的有两点，第一：参数越多模型越庞大，需要供模型学习的数据量就越大；第二：参数越多，耗费的计算资源也会更大。Inception V1参数少但效果好的原因除了模型层数更深、表达能力更强外，还有两点。一是除了最后的全连接层，用全局平均池化层来取代他。二是Inception V1中精心设计的Inception Module提高了参数的利用效率。
Inception V1比NIN更进一步的是增加了分支网络，NIN则主要是级联的卷积层和MLPConv层。
IM的基本结构有4个分支。第一个分支是1×1卷积，是一个非常优秀的结构。它可以对输出通道升维和降维。第二个分支线使用1×1卷积，然后连接3×3卷积，相当于进行了两次特征变换。第三个分支类似。
因为1×1的卷积性价比高，用很小的计算量就能增加一层特征变换和非线性化。IM中的4个分支在最后通过一个聚合操作合并。IM中包含看3个不同尺寸的卷积和1个最大池化，增加了网络对不同尺度的适应性。
如果数据集的概率分布可以被一个很大很稀疏的神经网络所表达，那么构筑这个网络的最佳方法是逐层构筑网络：将上一层高度相关的节点聚类，并将聚类出来的每一个小簇连接到一起。
Inception V2学习了VGGNet，用两个3×3的卷积代替5×5的大卷积（可以降低参数量并且减轻过拟合）。BN在用于神经网络的某层时，会对mini-batch内部进行标准化处理，使输出规范化到N（0,1）的正态分布，减少Internal Covariate Shift（内部神经元的改变）。
单纯使用BN获得的增益还不明显，还需要一些相应的调整：增大学习速率并加快学习衰减以适应BN规范化后的数据；去除Dropout并减轻L2正则；去除LRN；更彻底的进行shuffle；减少数据增强中的光学畸变。
Inception V3网络则主要是两方面的改造，一是引入Factorization into small convolutions的思想，将一个较大的二维卷积拆成两个较小的一维卷积。一方面节约了大量参数，加速运算并减轻了过拟合。同时增加了一层非线性扩展模型表达能力。
另一方面，Inception V3优化了Inception Module的结构。
Inception V4相比V3主要是结合了微软的ResNet。
下面仅记录不同的代码。
首先介绍tf.contrib.slim。

slim = tf.contrib.slim

他可以给函数的参数自动赋予某些默认值。例如weights_regularizer=slim.l2_regularizer(weight_decay))会对[slim.conv2d, slim.fully_connected]这两个函数的参数自动赋值，将参数weights_regularizer的值默认设为slim.l2_regularizer(weight_decay)。使用slim.arg_scope后就不需要每次都重复设置参数了，只要在有修改时设置。接下来嵌套一个slim.arg_scope，对卷积层生成函数slim.sonv2d的参数赋予默认值。

  with slim.arg_scope([slim.conv2d, slim.fully_connected],
                      weights_regularizer=slim.l2_regularizer(weight_decay)):
    with slim.arg_scope(
        [slim.conv2d],
        weights_initializer=trunc_normal(stddev),
        activation_fn=tf.nn.relu,
        normalizer_fn=slim.batch_norm,
        normalizer_params=batch_norm_params) as sc:
      return sc

同时，Inception V3论文中也提出了Factorization into Module思想，利用两个一维卷积模拟大尺寸的二维卷积，减少参数量同时是增加非线性。前面几层，卷积中还有一层1×1卷积，这也是前面提到的Inception Module中经常使用的结构之一，可以低成本的跨通道的对特征进行组合。
ResNet
Residual Neural Network由微软研究院Kaiming He等4名华人提出。ResNet的结构可以极快的加速超神神经网络的训练，模型的准确率也有非常大的提升。Highway Network的目标就是解决极深的神经网络难以训练的问题。前一层的信息，有一定比例可以不经过矩阵乘法和非线性变换，直接传输到下一层。ResNet最初的灵感出自问题：在不断加深神经网络深度时，会出现Degradation的问题，即准确率会先上升然后达到饱和，再持续增加深度则会导致准确率下降。这并不是过拟合的问题，因为不光在测试集上误差增大，训练集本身也会增大。假设某段神经网络的输入是x，期望输出是H(x)，如果我们直接把输入x传到输出作为初始结果，那么此时我们需要学习的目标就是F(x)=H(x)-x。
传统的卷积层或全连接层在信息传递时，或多或少会存在信息丢失、损耗等问题。ResNet在某种程度上解决了这个问题，通过直接将输入信息绕道到输出，保护信息的完整性，这个网络则只需要学习输入、输出差别的那一部分，简化学习目标和难度。
以下同只记录与众不同的代码。

class Block(collections.namedtuple('Block', ['scope', 'unit_fn', 'args'])):

使用collections.namedtuple设计ResNet基本Block模块组的named tuple，并用它创建Block类，但只包含数据结构，不包含具体方法。
下面定义一个降采样subsample的方法，参数包括inputs（输入），factor（采样因子）和scope。如果factor是1，则不做修改直接返回inputs；如果不为1，则使用slim.max_pool2d最大池化实现：

def subsample(inputs, factor, scope=None):
 if factor == 1:
   return inputs
 else:
   return slim.max_pool2d(inputs, [1, 1], stride=factor, scope=scope)

接下来定义堆叠Blocks的函数，参数中net即为输入，而outputs_collections则是用来收集各个end_points的collections。下面使用两层循环，逐个Block，逐个Residual Unit地堆叠，先使用两个tf.variable_scope将残差学习单元命名为block/unit_1的形式。在第二层循环中，我们拿到每个Block中每个Residual Unit的args，并展开为depth、depth_bottleneck和stribe，其含义在前面定义Blocks类时已经讲解过。然后使用unit_fn函数（即残差学习单元的生成函数）顺序地创建并连接所有残差单元：

@slim.add_arg_scope
def stack_blocks_dense(net, blocks,
                      outputs_collections=None):
 for block in blocks:
   with tf.variable_scope(block.scope, 'block', [net]) as sc:
     for i, unit in enumerate(block.args):

       with tf.variable_scope('unit_%d' % (i + 1), values=[net]):
         unit_depth, unit_depth_bottleneck, unit_stride = unit
         net = block.unit_fn(net,
                             depth=unit_depth,
                             depth_bottleneck=unit_depth_bottleneck,
                             stride=unit_stride)
     net = slim.utils.collect_named_outputs(outputs_collections, sc.name, net)
 return net

    原文作者：mov觉得高数好难
    原文地址: https://www.jianshu.com/p/c91b382d69c2
    本文转自网络文章，转载此文章仅为分享知识，如有侵权，请联系博主进行删除。