PyTorch 实用指南

翻译自 Some important PyTorch tasks – A concise summary from a vision researcher

以前喜欢写 Keras,自从接触了 PyTorch,便一发不可收拾,喜欢的不得了,最近在网上看到了一篇比较实用的 PyTorch 指南,抽时间翻译一下,也为了自己学习。


import torch.nn as nn
import torch
from torch.autograd.variable import Variable
from torchvision import datasets, models, transforms

model = models.resnet18(pretrained = False)

Section 1 使用预训练好的 Resnet 网络进行微调

我们首先观察一下 Resnet 模型的各个层,然后再决定用哪一层进行微调。使用预训练的意义是我们想要这些层的参数固定不变 (注:往往只去优化后面的全连接层)。微调简单来说就是使用一个在大规模数据集上 (注:cv 里面通常是 ImageNet) 预训练好模型在我们的目标数据集上接着训练。当然,我们也可以不微调,这意味的重新造轮子,我后面会解释为什么。



  1. 更快
  2. 需要的数据集更少


现在,让我们看一下 Resnet18 模型的内在结构,使用函数 .children()。

child_counter = 0
for child in model.children():
    print(" child", child_counter, "is -")
    child_counter += 1


child 0 is -
Conv2d(3, 64, kernel_size=(7, 7), stride=(2, 2), padding=(3, 3), bias=False)
 child 1 is -
BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True)
 child 2 is -
ReLU (inplace)
 child 3 is -
MaxPool2d (size=(3, 3), stride=(2, 2), padding=(1, 1), dilation=(1, 1))
 child 4 is -
Sequential (
  (0): BasicBlock (
    (conv1): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
    (bn1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True)
    (relu): ReLU (inplace)
    (conv2): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
    (bn2): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True)
  (1): BasicBlock (
    (conv1): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
    (bn1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True)
    (relu): ReLU (inplace)
    (conv2): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
    (bn2): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True)
 child 5 is -
Sequential (
  (0): BasicBlock (
    (conv1): Conv2d(64, 128, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False)
    (bn1): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True)
    (relu): ReLU (inplace)
    (conv2): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
    (bn2): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True)
    (downsample): Sequential (
      (0): Conv2d(64, 128, kernel_size=(1, 1), stride=(2, 2), bias=False)
      (1): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True)
  (1): BasicBlock (
    (conv1): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
    (bn1): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True)
    (relu): ReLU (inplace)
    (conv2): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
    (bn2): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True)
 child 6 is -
Sequential (
  (0): BasicBlock (
    (conv1): Conv2d(128, 256, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False)
    (bn1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True)
    (relu): ReLU (inplace)
    (conv2): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
    (bn2): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True)
    (downsample): Sequential (
      (0): Conv2d(128, 256, kernel_size=(1, 1), stride=(2, 2), bias=False)
      (1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True)
  (1): BasicBlock (
    (conv1): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
    (bn1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True)
    (relu): ReLU (inplace)
    (conv2): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
    (bn2): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True)
 child 7 is -
Sequential (
  (0): BasicBlock (
    (conv1): Conv2d(256, 512, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False)
    (bn1): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True)
    (relu): ReLU (inplace)
    (conv2): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
    (bn2): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True)
    (downsample): Sequential (
      (0): Conv2d(256, 512, kernel_size=(1, 1), stride=(2, 2), bias=False)
      (1): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True)
  (1): BasicBlock (
    (conv1): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
    (bn1): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True)
    (relu): ReLU (inplace)
    (conv2): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
    (bn2): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True)
 child 8 is -
AvgPool2d (
 child 9 is -
Linear (512 -> 1000)

之后,我们使用函数 .parameters() 来获得每一层的参数值。每一层的参数都有一个 .requires_grad 属性来确认这一层的参数是要固定不变还是跟着训练 (默认是 true,参数随着网络的每次更新而更新,如果设置为 false,则表示参数固定不变)。

for child in model.children():
    for param in child.parameters():
        print("This is what a parameter looks like - \n",param)


This is what a parameter looks like - 
 Parameter containing:
(0 ,0 ,.,.) = 
  1.8160e-02  2.1680e-02  5.6358e-02  ...  -1.2987e-02 -6.1262e-02 -4.8870e-02
  2.6440e-02  1.0603e-02  1.9794e-02  ...  -4.2643e-02 -4.5565e-03 -4.8300e-02
  9.0205e-03  1.9536e-03  1.9925e-04  ...   1.1413e-02  1.1395e-02  2.8418e-03
                 ...                   ⋱                   ...                
 -2.4830e-02  8.1022e-03 -4.9934e-02  ...   2.2573e-02  1.6346e-02  3.9666e-02
 -2.3857e-02 -1.6275e-02  2.9058e-02  ...   3.0488e-02  2.0294e-02 -5.1073e-03
 -1.6848e-04  5.9266e-02 -5.8456e-03  ...   1.9757e-02 -7.8441e-02  1.3667e-02

(0 ,1 ,.,.) = 
 -1.6319e-02  3.3193e-02 -2.2146e-04  ...   1.2571e-03 -1.3313e-02 -4.7580e-02
 -4.9329e-02  3.2548e-02  5.4202e-03  ...  -4.5771e-02 -2.6863e-03 -3.6992e-03
  8.7714e-03  2.4772e-02  1.0026e-02  ...   1.6512e-02 -7.4382e-03  6.0990e-02
                 ...                   ⋱                   ...                
 -4.0751e-02  3.3605e-04 -2.1426e-02  ...   1.1318e-02 -1.5222e-04 -3.5020e-02
 -4.1432e-02 -9.1312e-03 -1.7572e-02  ...   1.6974e-03  5.9792e-03  1.2868e-02
 -4.4471e-02 -1.1013e-02  4.9902e-03  ...  -2.1241e-02  2.2371e-02 -2.1672e-02

(0 ,2 ,.,.) = 
  1.0826e-02 -4.4230e-02 -1.5594e-02  ...  -1.3197e-03  6.1211e-03 -1.6262e-02
 -1.3989e-02 -3.2357e-02  2.0250e-02  ...   7.5012e-03  2.8761e-04 -2.1318e-02
 -7.8574e-04  1.7702e-02  1.0301e-02  ...  -2.0074e-02  4.4735e-02  1.0149e-02
                 ...                   ⋱                   ...                
 -2.4707e-02  2.3952e-03  6.5615e-04  ...   4.4371e-02 -1.0678e-02  2.3425e-02
 -2.4330e-02  1.3018e-02  1.1473e-02  ...  -3.6666e-03 -2.1145e-02 -1.5511e-02
 -3.0876e-02 -1.6071e-02 -2.4506e-02  ...   2.7417e-03  6.2566e-03  1.6208e-02
[torch.FloatTensor of size 64x3x7x7]

很明显,训练过程中会伴随着大量的计算。现在,如果我们固定前 6 个 child 的参数不变的话,训练会得到很明显的加速。

child_counter = 0
for child in model.children():
    if child_counter < 6:
        print("child ",child_counter," was frozen")
        for param in child.parameters():
            param.requires_grad = False
    elif child_counter == 6:
        children_of_child_counter = 0
        for children_of_child in child.children():
            if children_of_child_counter < 1:
                for param in children_of_child.parameters():
                    param.requires_grad = False
                print('child ', children_of_child_counter, 'of child',child_counter,' was frozen')
                print('child ', children_of_child_counter, 'of child',child_counter,' was not frozen')
            children_of_child_counter += 1

        print("child ",child_counter," was not frozen")
    child_counter += 1


child  0  was frozen
child  1  was frozen
child  2  was frozen
child  3  was frozen
child  4  was frozen
child  5  was frozen
child  0 of child 6  was frozen
child  1 of child 6  was not frozen
child  7  was not frozen
child  8  was not frozen
child  9  was not frozen



optimizer = torch.optim.RMSprop(model.parameters(), lr=0.1)


optimizer = torch.optim.RMSprop(filter(lambda p: p.requires_grad, model.parameters()), lr=0.1) 

Section 2 模型的保存和加载

PyTorch 中保存模型有 2 种方式,建议的方式是使用 “state dictionaries”,这样更快并且更节省空间。这里面存放的只是参数的值,并不包括模型的结构。所以,你必须重新创建模型的结构并且载入这些参数。

# Let's assume we will save/load from a path MODEL_PATH

# Saving a Model, MODEL_PATH)

# Loading the model.

# First create a model and define it's architecture as done above in this notebook. 
# If you want a custom architecture.
# read below it's been covered below.
checkpoint = torch.load(MODEL_PATH)

Section 3 修改、删除或增加最后一层

和 Keras 里面不一样的是,PyTorch 中不能使用 .pop() 函数来移除最后一层。现在让我们来看看在 PyTorch 中怎么来做。


# Load the model
model = models.resnet18(pretrained = False)

# Get number of parameters going in to the last layer. 
# we need this to change the final layer. 
num_final_in = model.fc.in_features

# The final layer of the model is model.fc so we can basically just overwrite it 
# to have the output = number of classes we need. Say, 300 classes.
model.fc = nn.Linear(num_final_in, NUM_CLASSES)

删除最后一层 (通常,在需要一个层的参数时)

# Load the model
model = models.resnet18(pretrained = False)

我们可以使用 model.children() 来获得模型相关层的信息。之后,将他们转换成一个 list,就可以使用 list 操作来移除最后一层了。这里我们使用 PyTorch 的 nn.Sequential() 函数来将修改后的 list 重新装入模型。

new_model = nn.Sequential(*list(model.children())[:-1])


这部分会在下一个 section – creating custom models 里面介绍。

Section 4 自定义模型 : 结合 Section 1-3,在模型头部添加层

让我们来定义一个常用的 model。如前所述,这个 model 的参数将会有一部分来自预训练模型,另一部分来自自身的训练过程。看完下面这个例子,你会有很好的认识。

import torch.nn as nn
import math
import torch.utils.model_zoo as model_zoo
import torch
from torch.autograd.variable import Variable
from torchvision import datasets, models, transforms

# New models are defined as classes. 
# Then, when we want to create a model,
# we create an object instantiating this class.
class Resnet_Added_Layers_Half_Frozen(nn.Module):
    def __init__(self, LOAD_VIS_URL=None):
        super(ResnetCombinedFull2, self).__init__()
        # Start with half the resnet model, swap out the final layer 
        # because that's the model we had defined above. 
        model = models.resnet18(pretrained = False)
        num_final_in = model.fc.in_features
        model.fc = nn.Linear(num_final_in, 300)
        # Now that the architecture is defined same as above, 
        # let's load the model we would have trained above. 
        checkpoint = torch.load(MODEL_PATH)
        # Let's freeze the same as above. 
        # Same code as above without the print statements
        child_counter = 0
        for child in model.children():
            if child_counter < 6:
                for param in child.parameters():
                    param.requires_grad = False
            elif child_counter == 6:
                children_of_child_counter = 0
                for children_of_child in child.children():
                    if children_of_child_counter < 1:
                        for param in children_of_child.parameters():
                            param.requires_grad = False
                        children_of_child_counter += 1
                print("child ",child_counter," was not frozen")
            child_counter += 1
        # Now, let's define new layers that we want to add on top. 
        # Basically, these are just objects we define here. 
        # The "adding on top" is defined by the forward() function
        # which decides the flow of the input data into the model.
        # NOTE - Even the above model needs to be passed to self.
        self.vismodel = nn.Sequential(*list(model.children()))
        self.projective = nn.Linear(512, 400)
        self.nonlinearity = nn.ReLU(inplace=True)
        self.projective2 = nn.Linear(400, 300)
    # The forward function defines the flow of the input data 
    # and thus decides which layer/chunk goes on top of what.
    def forward(self,x):
        x = self.vismodel(x)
        x = torch.squeeze(x)
        x = self.projective(x)
        x = self.nonlinearity(x)
        x = self.projective2(x)
        return x

Section 5 自定义损失函数


  • 通过 class 来定义,和定义 model 一样,需要继承自 torch.nn.Module。
  • 使用 view() 来改变输入的纬度。
  • 使用 unsqueeze() 来增加 tensor 的纬度。
  • loss function 返回的值必须要是一个标量,不能是 vector 或者 tensor。
  • 返回值必须是 Variable 类型。这样才能用来更新参数。确保这样的前提是 x 和 y 都要是 Variable。

这里我举了一个 Regress_Loss 的例子。输入的 x 和 y 是两种不同的类型。通过将 x 进行 reshape 等操作将 x 转换到和 y 相同的 shape,然后返回 x 和 y 的 L2 距离作为 loss 的值。掌握了这个例子之后,定义其他的 loss function 也很容易了。

举例:x 的 shape 为 (5,10), y 的 shape (5,5,10)。所以我们需要给 x 增加一维和 y 匹配。(x-y) 的 shape 为 (5,5,10)。我们将三个纬度上的值都累加起来以得到一个标量值。

class Regress_Loss(torch.nn.Module):
    def __init__(self):
    def forward(self,x,y):
        y_shape = y.size()[1]
        x_added_dim = x.unsqueeze(1)
        x_stacked_along_dimension1 = x_added_dim.repeat(1, y_shape, 1)
        diff = torch.sum((y - x_stacked_along_dimension1)**2, 2)
        totloss = torch.sum(torch.sum(torch.sum(diff)))
        return totloss
