pytorch代码结构

昨天训练LSTM模型,在训练集上正常,在验证集上就不对了,主要问题在于把model.eval()和with torch.no_grad()同时使用,去掉model.eval()就好了,我真的是一脸懵逼。pytorch的代码结构没弄清楚,一直在踩坑(愚蠢如我),奉上跑通的的代码结构先:

import torch
import torch.optim as optim
import torch.nn as nn
# from my code
from model import Mymodel
from dataloader import MydataLoader
from args import get_args
args = get_args()

# load data
train_loader = MydataLoader(args.train_file, args.gpu)
valid_loader = MydataLoader(args.valid_file, args.gpu)

model = Mymodel(config)
if torch.cuda.is_available():
    model.cuda()
# show model parameters
for name, param in model.named_parameters():
    print(name, param.size())
criterion = nn.MarginRankingLoss(args.loss_margin) # Max margin ranking loss function
optimizer = optim.Adam(model.parameters(), lr=args.lr)

for epoch in range(1, args.epochs+1):
    if early_stop:
        print("Early stopping. Epoch: {}, Best Dev. Acc: {}".format(epoch, best_dev_acc))
        break

    n_correct, n_total = 0, 0
    losses = []
    model.train()
    for batch_idx, batch in enumerate(train_loader.next_batch()):
        iterations += 1
        ques, rels, neg_rels = batch
        neg_size = neg_rels.size(1)
        model.zero_grad()
        #optimizer.zero_grad()

        pos_score,neg_score = model(ques, rels, neg_rels,is_train=True)
        n_correct += (torch.sum(torch.gt(pos_score, neg_score), 1).data == neg_size).sum().item()
        n_total += len(ques)
        train_acc = 100. * n_correct / n_total

        ones = torch.ones(neg_score.size(0),neg_score.size(1)).cuda(args.gpu)
        loss = criterion(pos_score,neg_score,ones)
        losses.append(loss.item())
        loss.backward()

        # clip the gradient
        torch.nn.utils.clip_grad_norm_(model.parameters(), args.clip_gradient)
        optimizer.step()
        
        if iterations % args.dev_every == 0:
            #model.eval()
            with torch.no_grad():
            #if True:
                # model.eval()
                dev_acc = 0
                n_dev_correct = 0
                n_dev_total = 0
                for valid_batch_idx, valid_batch in enumerate(valid_loader.next_batch()):
                    val_ques, val_rels, val_neg_rels = valid_batch
                    val_neg_size = val_neg_rels.size(1)
                    val_ps, val_ns = model(val_ques, val_rels, val_neg_rels,is_train=True)
                    n_dev_correct += (torch.sum(torch.gt(val_ps, val_ns), 1).data  == val_neg_size).sum().item()
                    n_dev_total += len(val_ques)
                print("n_dev_correct,n_dev_total:",n_dev_correct,n_dev_total)
                dev_acc = 100 * n_dev_correct/n_dev_total

主要是以下这几个问题一直没搞清楚:

1.分不清楚model.train(),model.eval()在哪一步用

2.optimizer.zero_grad()和model.zero_grad()区别是啥?

3.用了with torch.no_grad()还用model.eval()吗?

第一个问题:我不是一个epoch测试一次,而是按照迭代次数进行测试:

for epoch in range(1, args.epochs+1):

    n_correct, n_total = 0, 0
    losses = []
    model.train()
    for batch_idx, batch in enumerate(train_loader.next_batch()):
        # 看这里,有人喜欢把model.train放这里,也没问题
        # model.train()
        ....
        # 看这里,就是这个model.eval()导致测试失效,为什么呢?看第三个问题
        # model.eval()
        if iterations % args.dev_every == 0:
            with torch.no_grad():
    

第二个问题:搬运网上的解释

ifoptimizer = optim.Optimizer(net.parameters()),they are the same.

there might be use cases where you would like to use different optimizers for different parts of your model. In such a case,model.zero_grad()would clear all parameters of the model, while theoptimizerX.zero_grad()call will just clean the gradients of the parameters that were passed to it

第三个问题:

https://stackoverflow.com/questions/55627780/evaluating-pytorch-models-with-torch-no-grad-vs-model-eval

They do different things, and have different scopes.

  • with torch.no_grad – disables tracking of gradients in autograd.
  • model.eval() changes the forward() behaviour of the module it is called upon
    • eg, it disables dropout and has batch norm use the entire population statistics

with torch.no_grad

The torch.autograd.no_grad documentation says:

Context-manager that disabled [sic] gradient calculation.

Disabling gradient calculation is useful for inference, when you are sure that you will not call
Tensor.backward(). It will reduce memory consumption for computations that would otherwise have
requires_grad=True. In this mode, the result of every computation will have
requires_grad=False, even when the inputs have
requires_grad=True.

model.eval()

The nn.Module.eval documentation says:

Sets the module in evaluation mode.

This has any effect only on certain modules. See documentations of particular modules for details of their behaviors in training/evaluation mode, if they are affected, e.g.
Dropout,
BatchNorm, etc

看到这里我总算明白了,我的模型里面用了Dropout,BatchNorm ,用了model.eval(),这两部分就失效了,可是为失效后效果相差甚远,待我仔细研究一下我的模型

找到模型问题啦,我训练的损失函数是maxmarginloss,即训练的目标是正样本与负样本的差距尽可能大。在我的模型里正负样本没有共享lstm编码器,正样本对应一个lstm,负样本对应另一个lstm,所以学出来的模型直接是正样本的lstm权重比负样本lstm权重高就完了,这样是不对的,改为共享lstm编码就对了。

    原文作者:Joyce Ng
    原文地址: https://zhuanlan.zhihu.com/p/64411611
    本文转自网络文章,转载此文章仅为分享知识,如有侵权,请联系博主进行删除。
点赞