从零开始实现YOLO v3（part2）

2024年5月7日 94次阅读来源: 深度智能

（仅供学术交流，未经同意，请勿转载）

（本文翻译自：Tutorial on implementing YOLO v3 from scratch in PyTorch）

（这篇文章的原作者，原作者，原作者（重要的话说3遍）真的写得很好很用心，去github上给他打个星星✨吧）

这是从零开始实现YOLO v3检测器的教程的第2部分。在上一部分中，我解释了YOLO是如何工作的，在这一部分中，我们将在PyTorch中实现YOLO的层。换句话说，这是我们创建模型构建模块的部分。

本教程的代码旨在运行在Python 3.5和PyTorch 0.4上。它可以在这个Github中找到。

本教程分为5个部分：

第1部分：了解YOLO如何工作

第2部分（本文）：创建网络结构的层

第3部分：实现网络的前向传播

第4部分：目标分数阈值和非最大值抑制

第5部分：设计输入和输出流程

预备知识

本教程的第一部分——关于YOLO如何工作。
PyTorch的基本工作知识，包括如何使用nn.Module，nn.Sequential和torch.nn.parameter类创建自定义体系结构。

我默认你以前有过使用PyTorch的经验。如果你没有经验，我建议你学习这篇文章之前稍微浏览一下Pytorch框架。

开始

首先创建一个检测器代码所在的目录。

然后，创建文件http://darknet.net(Darknet是YOLO基础架构的名称)。该文件将包含创建YOLO网络的代码。我们将用一个名为util.py的文件作为补充，它将包含各种帮助函数的代码。将这两个文件保存在检测器文件夹中。您可以使用git来跟踪更改记录。

配置文件

官方代码（用C编写）使用配置文件构建网络。 cfg文件逐块描述网络的布局。如果您有caffe的经验，配置文件等同于用于描述网络的.protxt文件。

我们将使用作者发布的官方cfg文件来构建我们的网络。从这里下载并将其放置在您的检测器目录内名为cfg的文件夹中。如果你使用Linux，使用cd命令进入你的网络目录并输入：

mkdir cfg
cd cfg
wget https://raw.githubusercontent.com/pjreddie/darknet/master/cfg/yolov3.cfg

打开配置文件后，你会看到类似这样的东西。

[convolutional]
batch_normalize=1
filters=64
size=3
stride=2
pad=1
activation=leaky

[convolutional]
batch_normalize=1
filters=32
size=1
stride=1
pad=1
activation=leaky

[convolutional]
batch_normalize=1
filters=64
size=3
stride=1
pad=1
activation=leaky

[shortcut]
from=-3
activation=linear

我们在上面的代码中看到4个块。其中，3个描述卷积层（convolutional），后面是shortcut层。shortcut层是跳过连接，就像ResNet中使用的连接一样。在YOLO中有5种类型的层：

Convolutional

[convolutional]
batch_normalize=1  
filters=64  
size=3  
stride=1  
pad=1  
activation=leaky

Shortcut

[shortcut]
from=-3  
activation=linear

shortcut层是跳过连接（skip connection），类似于ResNet中使用的连接。 from参数为-3，表示shortcut层的输出是通过将shortcut层的前一层和前面的第三层的特征图相加得到的。

upsample

[upsample]
stride=2

对前一层的特征图应用双线性上采样，采样因子为stride。

Route

[route]
layers = -4

[route]
layers = -1, 61

route层具有一个layers属性，它可以具有一个或两个值。

当layers属性只有一个值时，它会输出由该值索引的层的特征图。在我们的示例中，它是-4，因此该层将输出位于Route层前面的第4层的特征图。

当层有两个值时，它会返回由其值所索引的层的特征图的连接。在我们的例子中，它是-1,61，该层输出来自前一层（-1）和第61层的特征图，它们沿着深度维度进行连接。

YOLO

[yolo]
mask = 0,1,2
anchors = 10,13,  16,30,  33,23,  30,61,  62,45,  59,119,  116,90,  156,198,  373,326
classes=80
num=9
jitter=.3
ignore_thresh = .5
truth_thresh = 1
random=1

YOLO层对应于第1部分中描述的检测层。anchors描述了9个锚，但仅使用由mask标记的属性索引的锚。这里，mask的值是0,1,2，这意味着使用第一，第二和第三个锚。这是有道理的，因为检测层的每个单元预测3个框。总共有三个检测层，共计9个锚。

Net

[net]
# Testing
batch=1
subdivisions=1
# Training
# batch=64
# subdivisions=16
width= 320
height = 320
channels=3
momentum=0.9
decay=0.0005
angle=0
saturation = 1.5
exposure = 1.5
hue=.1

cfg中有一种称为net的块，但我不会将其称为层，因为它仅描述有关网络输入和训练参数的信息。它不用于YOLO的前向传播。然而，它确实为我们提供了像网络输入大小这样的信息，我们用它来调整前向传播中的锚。

解析配置文件

在开始之前，在darknet.py文件的顶部添加必要的导入。

from __future__ import division

import torch 
import torch.nn as nn
import torch.nn.functional as F 
from torch.autograd import Variable
import numpy as np

我们定义一个名为parse_cfg的函数，它将配置文件的路径作为输入。

def parse_cfg(cfgfile):
    """
    Takes a configuration file
    
    Returns a list of blocks. Each blocks describes a block in the neural
    network to be built. Block is represented as a dictionary in the list
    
    """

它的作用是解析cfg，并将每个块存储为字典。块的属性及其值在字典中作为键值对存储。在cfg解析过程中，将这些字典——在代码中称为block的变量，添加到名为blocks的列表变量中。函数将返回这个blocks（原文是block，但实际上它返回的是blocks列表）。

我们首先将cfg文件的内容保存在字符串列表中。以下代码在此列表上执行一些预处理。

file = open(cfgfile, 'r')
lines = file.read().split('\n')                        # store the lines in a list
lines = [x for x in lines if len(x) > 0]               # get read of the empty lines 
lines = [x for x in lines if x[0] != '#']              # get rid of comments
lines = [x.rstrip().lstrip() for x in lines]           # get rid of fringe whitespaces

然后，我们遍历列表以获取blocks。

block = {}
blocks = []

for line in lines:
    if line[0] == "[":               # This marks the start of a new block
        if len(block) != 0:          # If block is not empty, implies it is storing values of previous block.
            blocks.append(block)     # add it the blocks list
            block = {}               # re-init the block
        block["type"] = line[1:-1].rstrip()     
    else:
        key,value = line.split("=") 
        block[key.rstrip()] = value.lstrip()
blocks.append(block)

return blocks

创建构建块

现在我们将使用上述parse_cfg返回的列表为配置文件中存在的块构造PyTorch模块。

我们在列表中有5种类型的层（如前文所述）。 PyTorch为convolutional和upsampling类型提供了预建的层。我们将通过扩展nn.Module类来为其余层编写我们自己的模块。

create_modules函数使用parse_cfg函数返回的blocks列表作为输入。

def create_modules(blocks):
    net_info = blocks[0]     #Captures the information about the input and pre-processing    
    module_list = nn.ModuleList()
    prev_filters = 3
    output_filters = []

在迭代blocks列表之前，我们定义一个变量net_info来存储网络的信息。

nn.ModuleList

我们的函数将返回一个nn.ModuleList。这个类相当于一个包含nn.Module对象的普通列表。但是，当我们添加nn.ModuleList作为nn.Module对象的成员（即，当我们向网络添加模块时），nn.ModuleList中的nn.Module对象（模块）的所有参数都作为参数添加nn.Module对象（也就是我们的网络，将nn.ModuleList作为成员加入）。

当我们定义一个新的卷积层时，我们必须定义它的内核维度。虽然内核的高度和宽度由cfg文件提供，但内核的深度正好是前一层中过滤器的数量（或特征图的深度）。这意味着我们需要持续跟踪应用卷积的层的过滤器数量。我们使用变量prev_filter来做到这一点。我们将其初始化为3，因为图像具有对应于RGB通道的3个滤波器。

route层的特征图来自前面的层（可能连接后的）的特征图。如果route层后有一个卷积层，那么内核将应用在前面层的特征图上，那些特征图正是route层的特征图。因此，我们不仅需要跟踪前一层中的过滤器数量，还要跟踪前面所有层的。迭代时，我们将每个块的输出过滤器数添加到列表output_filters。

现在，我们的想法是迭代块列表，并为每个块创建一个PyTorch模块

for index, x in enumerate(blocks[1:]):
        module = nn.Sequential()

        #check the type of block
        #create a new module for the block
        #append to module_list

nn.Sequential类用于顺序执行一些nn.Module对象。你查看一下cfg，你会意识到一个块可能包含多个层。例如，除了卷积层以外，convolutional型块还具有批量标准化层以及Leaky ReLU激活层。我们使用nn.Sequential和add_module函数将这些层串在一起。例如，下面就是我们创建卷积和上采样层的代码。

if (x["type"] == "convolutional"):
            #Get the info about the layer
            activation = x["activation"]
            try:
                batch_normalize = int(x["batch_normalize"])
                bias = False
            except:
                batch_normalize = 0
                bias = True

            filters= int(x["filters"])
            padding = int(x["pad"])
            kernel_size = int(x["size"])
            stride = int(x["stride"])

            if padding:
                pad = (kernel_size - 1) // 2
            else:
                pad = 0

            #Add the convolutional layer
            conv = nn.Conv2d(prev_filters, filters, kernel_size, stride, pad, bias = bias)
            module.add_module("conv_{0}".format(index), conv)

            #Add the Batch Norm Layer
            if batch_normalize:
                bn = nn.BatchNorm2d(filters)
                module.add_module("batch_norm_{0}".format(index), bn)

            #Check the activation. 
            #It is either Linear or a Leaky ReLU for YOLO
            if activation == "leaky":
                activn = nn.LeakyReLU(0.1, inplace = True)
                module.add_module("leaky_{0}".format(index), activn)

        #If it's an upsampling layer
        #We use Bilinear2dUpsampling
        elif (x["type"] == "upsample"):
            stride = int(x["stride"])
            upsample = nn.Upsample(scale_factor = 2, mode = "bilinear")
            module.add_module("upsample_{}".format(index), upsample)

Route 层 / Shortcut层

接下来，我们编写创建Route层和Shortcut层的代码。

#If it is a route layer
        elif (x["type"] == "route"):
            x["layers"] = x["layers"].split(',')
            #Start  of a route
            start = int(x["layers"][0])
            #end, if there exists one.
            try:
                end = int(x["layers"][1])
            except:
                end = 0
            #Positive anotation
            if start > 0: 
                start = start - index
            if end > 0:
                end = end - index
            route = EmptyLayer()
            module.add_module("route_{0}".format(index), route)
            if end < 0:
                filters = output_filters[index + start] + output_filters[index + end]
            else:
                filters= output_filters[index + start]

        #shortcut corresponds to skip connection
        elif x["type"] == "shortcut":
            shortcut = EmptyLayer()
            module.add_module("shortcut_{}".format(index), shortcut)

有必要对Route层的代码做一些解释。首先，我们提取层的属性的值，将其转换为整数并将其存储在列表中。

然后我们有一个名为EmptyLayer的新层，顾名思义就是一个空层。

route = EmptyLayer()

它被定义为

class EmptyLayer(nn.Module):
    def __init__(self):
        super(EmptyLayer, self).__init__()

等一下，一个空的层？

现在，空层可能看起来很奇怪，因为它什么都不做。Route层，就像任何其他层一样执行操作（使用前面的层/连接）。在PyTorch中，当我们定义一个新层时，它继承nn.Module，在nn.Module对象的forward函数写入层执行的操作。

为了设计Route块的层，我们必须构建一个nn.Module对象，它作为Layers的成员，使用Layers的属性值进行初始化。然后，我们可以在forward函数中编写代码来连接/获取特征图。最后，我们在网络的forward函数中执行该层的操作。

但是，如果连接代码相当简短（在特征图上调用torch.cat），那么设计一个如上所述的层将导致不必要的抽象，这只会增加代码。我们可以做一个空层来代替提出的Route层，然后直接在darknet的nn.Module对象的forward函数中执行连接。（如果你不明白这是什么意思，我建议你阅读PyTorch中如何使用nn.Module类，链接在本文末尾可以找到）

位于Route层之后的卷积层将其内核应用于（可能连接的）前面层的特征图。以下代码更新filters变量以保存Route层输出的过滤器数量。

if end < 0:
    #If we are concatenating maps
    filters = output_filters[index + start] + output_filters[index + end]
else:
    filters= output_filters[index + start]

shortcut层也使用空层，因为它执行非常简单的操作（相加）。没有必要更新filters变量，因为它仅仅将前一个层的特征图相加到后面的层的特征图上而已。

YOLO层

最后，我们编写用于创建YOLO层的代码。

#Yolo is the detection layer
        elif x["type"] == "yolo":
            mask = x["mask"].split(",")
            mask = [int(x) for x in mask]

            anchors = x["anchors"].split(",")
            anchors = [int(a) for a in anchors]
            anchors = [(anchors[i], anchors[i+1]) for i in range(0, len(anchors),2)]
            anchors = [anchors[i] for i in mask]

            detection = DetectionLayer(anchors)
            module.add_module("Detection_{}".format(index), detection)

我们定义了一个新的层DetectionLayer，它包含用于检测边界框的锚。

DetectionLayer被定义为

class DetectionLayer(nn.Module):
    def __init__(self, anchors):
        super(DetectionLayer, self).__init__()
        self.anchors = anchors

在循环结束时，我们会存储一些记录。

 module_list.append(module)
 prev_filters = filters
 output_filters.append(filters)

这就结束了循环的主体。在函数create_modules的结尾，我们返回一个包含net_info和module_list的元组。

return (net_info, module_list)

测试代码

您可以在darknet.py的末尾输入以下行并运行该文件来测试您的代码。

blocks = parse_cfg("cfg/yolov3.cfg")
print(create_modules(blocks))

你会看到一个很长的列表，（正好包含了106个项），其中的元素看起来类似这样：

.
.

  (9): Sequential(
     (conv_9): Conv2d (128, 64, kernel_size=(1, 1), stride=(1, 1), bias=False)
     (batch_norm_9): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True)
     (leaky_9): LeakyReLU(0.1, inplace)
   )
   (10): Sequential(
     (conv_10): Conv2d (64, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
     (batch_norm_10): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True)
     (leaky_10): LeakyReLU(0.1, inplace)
   )
   (11): Sequential(
     (shortcut_11): EmptyLayer(
     )
   )
.
.
.

这部分到此结束。在下一部分中，我们将组装我们创建的构建块，并用它从图像生成输出。

扩展阅读

PyTorch教程

nn.Module，nn.Parameter classes

nn.ModuleList和nn.Sequential

    原文作者：深度智能
    原文地址: https://zhuanlan.zhihu.com/p/36920744
    本文转自网络文章，转载此文章仅为分享知识，如有侵权，请联系博主进行删除。