三. caffe2&pytorch之在移动端部署深度学习模型(全过程!)

大佬看了笑笑就行啦~

底部demo演示

这里移动端平台我选的Android,因为手上目前只有Android机,之所以演示这个是因为目前caffe2在android上的部署只有官方的一个1000类的例子,还是用的pre-trained模型,没有明确的详细部署教程,这里也是记录一下自己的学习过程,感受一下caffe2的跨平台。

之前用caffe,也用tf,这里我选caffe2和pytorch + onnx,onnx这个想法是很好的:

“ONNX 的全称为“Open Neural Network Exchange”,即“开放的神经网络切换”。顾名思义,该项目的目的是让不同的神经网络开发框架做到互通互用。目前,Microsoft Cognitive Toolkit,PyTorch 和 Caffe2 已宣布支持 ONNX。”

我也在持续关注进展,这里模型我本次先选了小但有力的squeezenet1.1版。

后续看时间可能会测别的模型, 还想和百度的及腾讯的做个对比测试,看时间吧,毕竟最近要考试等各种事情.(逃

这里我部署的是一个7类classification模型,类别分别为:

水瓶,椅子,桌子,笔记本,眼镜,手机,鼠标

都是手头可以看得见的东西,图片爬的百度关键词,然后稍微选了一下,质量不是很高,最后效果还好,训练集大概每类500张,验证集大概每类100张。

ok,现在开始动手在Android手机上搭建自己的深度学习模型:

1. squeezenet网络搭建

首先试着搭建网络,这部分不难,网上也有许多例子了,其实最后这个demo并没有用搭建的网络训的模型,而是fine tune的一个pre-trained,这样能更快收敛,之所以演示这个过程,是为了后面的fine tune过程中改模型的参数和module时我为什么要这么改。

实现之前,当然推荐看一下paper:

https://arxiv.org/pdf/1602.07360.pdf

尤其时Fire模块和网络架构图:

《三. caffe2&pytorch之在移动端部署深度学习模型(全过程!)》
《三. caffe2&pytorch之在移动端部署深度学习模型(全过程!)》 图片来自论文

# Define model
class Fire(nn.Module):
    def __init__(self,inchn,sqzout_chn,exp1x1out_chn,exp3x3out_chn):
        super(Fire,self).__init__()
        self.inchn = inchn
        self.squeeze = nn.Conv2d(inchn,sqzout_chn,kernel_size=1)
        self.squeeze_act = nn.ReLU(inplace=True)
        self.expand1x1 = nn.Conv2d(sqzout_chn,exp1x1out_chn,kernel_size=1)
        self.expand1x1_act = nn.ReLU(inplace=True)
        self.expand3x3 = nn.Conv2d(sqzout_chn,exp3x3out_chn,kernel_size=3, padding=1)
        self.expand3x3_act = nn.ReLU(inplace=True)
    def forward(self, x):
        x = self.squeeze_act(self.squeeze(x))
        return torch.cat([
                self.expand1x1_act(self.expand1x1(x)),
                self.expand3x3_act(self.expand3x3(x))
                ], 1)

class Sqznet(nn.Module):
    # 这里我demo只用到7个类别:水瓶,椅子,桌子,笔记本,眼镜,手机,鼠标
    def __init__(self,num_class=7):
        super(Sqznet,self).__init__()
        self.num_class = num_class
        self.features = nn.Sequential(
            nn.Conv2d(3,64,kernel_size=3,stride=2),
            nn.ReLU(inplace=True),
            # 这里ceil_mode一定要设成False,不然finetune会报错,
            # 后面你会看到我finetune时也改了这里,
            # 因为目前onnx不支持squeezenet的 ceil_mode=True!!
            nn.MaxPool2d(kernel_size=3,stride=2,ceil_mode=False),
            Fire(64,16,64,64),
            Fire(128,16,64,64),
            nn.MaxPool2d(kernel_size=3,stride=2,ceil_mode=False),
            Fire(128,32,128,128),
            Fire(256,32,128,128),           
            nn.MaxPool2d(kernel_size=3,stride=2,ceil_mode=False),
            Fire(256,48,192,192),
            Fire(384,48,192,192),
            Fire(384,64,256,256),
            Fire(512,64,256,256),
        )
        final_conv = nn.Conv2d(512,self.num_class,kernel_size=1)
        self.classifier = nn.Sequential(
            nn.Dropout(p=0.5),
            final_conv,
            nn.ReLU(inplace=True),
            nn.AvgPool2d(13)
        )
        # 这里参考了官网的实现,就是做参数初始化
        for m in self.modules():
            if isinstance(m, nn.Conv2d):
                if m is final_conv:
                    init.normal(m.weight.data, mean=0.0, std=0.01)
                else:
                    init.kaiming_uniform(m.weight.data)
                if m.bias is not None:
                    m.bias.data.zero_()

    def forward(self, x):
        x = self.features(x)
        x = self.classifier(x)
        return x.view(x.size(0), self.num_class)

然后就可以开训啦,训完保存模型,或者这里我加载的pre-trained的模型,fine-tune不都这么干吗,反正最后你得拿到一个pytorch模型!

2. 数据集 和 数据预处理

训模型,当然要处理好数据啦,这里的数据处理参考了pytorch官网,毕竟我入门pytorch还没多久 ,具体请直击PyTorch documentation 很详细了(逃

数据集目录结构是这样的,子目录名就是对应的类别标签,即分别对应:水瓶,椅子,桌子,笔记本,眼镜,手机,鼠标。

《三. caffe2&pytorch之在移动端部署深度学习模型(全过程!)》

先做数据集加载和预处理:

数据加载主要用到数据加载器dataloader,数据预处理主要做一些常见的数据增强和规范化,用的 torchvision的transforms包,做完这个,就可以从加载器中迭代拿数据啦。

# Data augmentation and normalization for training
data_transforms = {
    'train': transforms.Compose([
        transforms.RandomResizedCrop(224),
        transforms.RandomHorizontalFlip(),
        transforms.ToTensor(),
        transforms.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225])
    ]),
    'val': transforms.Compose([
        transforms.Resize(256),
        transforms.CenterCrop(224),
        transforms.ToTensor(),
        transforms.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225])
    ]),
}

data_dir = 'datadir'
image_datasets = {x: datasets.ImageFolder(os.path.join(data_dir, x), data_transforms[x]) for x in ['train', 'val']}
dataloaders = {x: torch.utils.data.DataLoader(image_datasets[x], batch_size=16, shuffle=True, num_workers=4) for x in ['train', 'val']}
dataset_sizes = {x: len(image_datasets[x]) for x in ['train', 'val']}
class_names = image_datasets['train'].classes

读一个batch看看什么样,用到了torchvision:

batch_size为16,图片与类别一一对应,可见从百度爬的图片,如果只是做分类的话,质量还是可以的。

# Have a look at data
inputs, classes = next(iter(dataloaders['train']))
out = torchvision.utils.make_grid(inputs)
imshow(out, title=[class_names[x] for x in classes])

《三. caffe2&pytorch之在移动端部署深度学习模型(全过程!)》
《三. caffe2&pytorch之在移动端部署深度学习模型(全过程!)》

ok,到目前已经完成了数据方面的准备和对squeezenet网络的搭建和训练,也就是说已经拿到了用来finetune的squeezenet模型,接下来做fine tune,里面还是有许多注意点的。

3. Finetune一个squeezenet 模型

from torchvision import datasets, models, transforms
# Start Fine tuning
model_ft = models.squeezenet1_1(pretrained=True)
# 先看一下模型什么样
print(model_ft)

SqueezeNet(

(features): Sequential(

(0): Conv2d (3, 64, kernel_size=(3, 3), stride=(2, 2))

(1): ReLU(inplace)

(2): MaxPool2d(kernel_size=(3, 3), stride=(2, 2), dilation=(1, 1))

(3): Fire(

(squeeze): Conv2d (64, 16, kernel_size=(1, 1), stride=(1, 1))

(squeeze_activation): ReLU(inplace)

(expand1x1): Conv2d (16, 64, kernel_size=(1, 1), stride=(1, 1))

(expand1x1_activation): ReLU(inplace)

(expand3x3): Conv2d (16, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))

(expand3x3_activation): ReLU(inplace)

)

(4): Fire(

(squeeze): Conv2d (128, 16, kernel_size=(1, 1), stride=(1, 1))

(squeeze_activation): ReLU(inplace)

(expand1x1): Conv2d (16, 64, kernel_size=(1, 1), stride=(1, 1))

(expand1x1_activation): ReLU(inplace)

(expand3x3): Conv2d (16, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))

(expand3x3_activation): ReLU(inplace)

)

(5): MaxPool2d(kernel_size=(3, 3), stride=(2, 2), dilation=(1, 1))

(6): Fire(

(squeeze): Conv2d (128, 32, kernel_size=(1, 1), stride=(1, 1))

(squeeze_activation): ReLU(inplace)

(expand1x1): Conv2d (32, 128, kernel_size=(1, 1), stride=(1, 1))

(expand1x1_activation): ReLU(inplace)

(expand3x3): Conv2d (32, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))

(expand3x3_activation): ReLU(inplace)

)

(7): Fire(

(squeeze): Conv2d (256, 32, kernel_size=(1, 1), stride=(1, 1))

(squeeze_activation): ReLU(inplace)

(expand1x1): Conv2d (32, 128, kernel_size=(1, 1), stride=(1, 1))

(expand1x1_activation): ReLU(inplace)

(expand3x3): Conv2d (32, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))

(expand3x3_activation): ReLU(inplace)

)

(8): MaxPool2d(kernel_size=(3, 3), stride=(2, 2), dilation=(1, 1))

(9): Fire(

(squeeze): Conv2d (256, 48, kernel_size=(1, 1), stride=(1, 1))

(squeeze_activation): ReLU(inplace)

(expand1x1): Conv2d (48, 192, kernel_size=(1, 1), stride=(1, 1))

(expand1x1_activation): ReLU(inplace)

(expand3x3): Conv2d (48, 192, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))

(expand3x3_activation): ReLU(inplace)

)

(10): Fire(

(squeeze): Conv2d (384, 48, kernel_size=(1, 1), stride=(1, 1))

(squeeze_activation): ReLU(inplace)

(expand1x1): Conv2d (48, 192, kernel_size=(1, 1), stride=(1, 1))

(expand1x1_activation): ReLU(inplace)

(expand3x3): Conv2d (48, 192, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))

(expand3x3_activation): ReLU(inplace)

)

(11): Fire(

(squeeze): Conv2d (384, 64, kernel_size=(1, 1), stride=(1, 1))

(squeeze_activation): ReLU(inplace)

(expand1x1): Conv2d (64, 256, kernel_size=(1, 1), stride=(1, 1))

(expand1x1_activation): ReLU(inplace)

(expand3x3): Conv2d (64, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))

(expand3x3_activation): ReLU(inplace)

)

(12): Fire(

(squeeze): Conv2d (512, 64, kernel_size=(1, 1), stride=(1, 1))

(squeeze_activation): ReLU(inplace)

(expand1x1): Conv2d (64, 256, kernel_size=(1, 1), stride=(1, 1))

(expand1x1_activation): ReLU(inplace)

(expand3x3): Conv2d (64, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))

(expand3x3_activation): ReLU(inplace)

)

)

(classifier): Sequential(

(0): Dropout(p=0.5)

(1): Conv2d (512, 1000, kernel_size=(1, 1), stride=(1, 1))

(2): ReLU(inplace)

(3): AvgPool2d(kernel_size=13, stride=1, padding=0, ceil_mode=False, count_include_pad=True)

)

)

可以看到完整的网络结构图,很清晰,而fine tune的时候我们主要关注最后一层,用pre-trained模型的前面多层参数做参数初始化,这里主要关注一下几点:

①前面说过nn.MaxPool2d层的ceil_mode一定要设成False,不然finetune会报错,因为目前onnx不支持squeezenet的 ceil_mode=True,所以我们对模型可以做如下修改,手动将ceil_mode=False:

model_ft.features._modules["2"] = nn.MaxPool2d(kernel_size=3, stride=2, dilation=1,ceil_mode=False)

②就是常见的该输出类别数啦,取出相应module,将原来的1000改为7类,so easy

nn.Conv2d(num_ftrs, 7, kernel_size=(1, 1), stride=(1, 1))

所以fine tune之前可以想上面这样,先打出model结构看一看方便理解和fine tune

ok,到目前为止我们已经搞定了数据,拿到了pre-trained的model,而且改好了待fine tune的模型,接下来就是fine tune的训练啦:

首先定义一下训练要用的loss函数和训练策略,相信一看就懂啦:

criterion = nn.CrossEntropyLoss()
optimizer_ft = optim.SGD(model_ft.parameters(), lr=0.001, momentum=0.92)
exp_lr_scheduler = lr_scheduler.StepLR(optimizer_ft, step_size=15, gamma=0.1)

然后定义训练函数,就是符合pytorch语法的常见的训练套路啦:

# Define training Pipeline
def train_model(model, criterion, optimizer, scheduler, num_epochs=1):    
    best_model_wts = model.state_dict()
    best_acc = 0.0
    for epoch in range(num_epochs):
        print('Epoch {}/{}'.format(epoch, num_epochs - 1))       
        # Each epoch has a training and validation phase
        for phase in ['train', 'val']:
            if phase == 'train':
                scheduler.step()
                model.train(True)  # 训练模式
            else:
                model.train(False)  # 验证模式
            running_loss = 0.0
            running_corrects = 0
            # Iterate over data.
            iter=0
            for data in dataloaders[phase]:
                inputs, labels = data
		#out = torchvision.utils.make_grid(inputs) # have a look at train img
                #imshow(out, title=[class_names[x] for x in labels])
                #这里我用的cpu训的
                if use_gpu:
                    inputs = Variable(inputs.cuda())
                    labels = Variable(labels.cuda())
                else:
                    inputs, labels = Variable(inputs), Variable(labels)
                optimizer.zero_grad()
                # forward
                outputs = model(inputs)
                _, preds = torch.max(outputs.data, 1)
                loss = criterion(outputs, labels)
                print("phase:%s, epoch:%d/%d Iter %d: loss=%s"%(phase,epoch,num_epochs-1,iter,str(loss.data.numpy())))
                # backward + optimize only if in training phase
                if phase == 'train':
                    loss.backward()
                    optimizer.step()
                running_loss += loss.data[0]
                running_corrects += torch.sum(preds == labels.data)
                iter += 1
            epoch_loss = running_loss / dataset_sizes[phase]
            epoch_acc = running_corrects / dataset_sizes[phase]
            print('{} Loss: {:.4f} Acc: {:.4f}'.format(phase, epoch_loss, epoch_acc))
            # deep copy the model
            if phase == 'val' and epoch_acc > best_acc:
                best_acc = epoch_acc
                best_model_wts = model.state_dict()
        print('-' * 10)       
    print('Best val Acc: {:4f}'.format(best_acc))
    # load best model weights
    model.load_state_dict(best_model_wts)
    return model

接下来开训:

# 这里只用了30epoch,训练集每个epoch 210次iteration,验证集每个eopch 40次,
# 对于这个demo 7分类问题,足以
model_ft = train_model(model_ft, criterion, optimizer_ft, exp_lr_scheduler,
                       num_epochs=30)

以下是训练结果图:

《三. caffe2&pytorch之在移动端部署深度学习模型(全过程!)》
《三. caffe2&pytorch之在移动端部署深度学习模型(全过程!)》

以下时验证结果图:

《三. caffe2&pytorch之在移动端部署深度学习模型(全过程!)》
《三. caffe2&pytorch之在移动端部署深度学习模型(全过程!)》

ok,训完之后感觉应付这个demo演示还是可以的,于是接下来转成caffe2模型

4. 转caffe2,拿到init_net.pb 和 predict_net.pb

转caffe2还是很容易的,onnx提供了方便的接口:

注:这里之所以要设置一个输入x,因为onnx采用 track机制,会先随便拿个符合输入size的数据跑一遍,拿到网络结构

转完后先生成onnx object: sqz.onnx

from torch.autograd import Variable
import torch
batch_size=1  # 随便一个数
x = Variable(torch.randn(batch_size,3,224,224), requires_grad=True)
torch_out = torch.onnx._export(model_ft,
                              x,
                              "sqz.onnx",
                              export_params=True
                              )

接下来转成caffe2需要的init_net.pb, predict_net.pb:

import onnx
import onnx_caffe2.backend
# load onnx object
model = onnx.load("sqz.onnx")
prepared_backend = onnx_caffe2.backend.prepare(model)
from onnx_caffe2.backend import Caffe2Backend as c2
init_net, predict_net = c2.onnx_graph_to_caffe2_net(model.graph)
with open("squeeze_init_net.pb", "wb") as f:
    f.write(init_net.SerializeToString())
with open("squeeze_predict_net.pb", "wb") as f:
    f.write(predict_net.SerializeToString())

ok, 到目前我们已经拿到了转成caffe2之后的7分类model,已经可以用caffe2做分类了,不过这次我记录在移动端Android平台的学习,所以直接在Android手机上跑。

5.最后在Android上部署

注:玩之前,记得先在linux下装好android studio,这部分网上教程很多啦

由于我很久之前玩过一段时间Android编程,所以基本直接能看懂caffe2官网AICamera的例子,这个demo我基本没有做什么UI编程,就换了一下模型,主要关注以下两个地方:

在这里改成训练的模型:

/home/xxx/AICamera/app/src/main/assets/

在这里改成与训练的7分类模型对应的标签文件:

/home/xxx/Android/AICamera/app/src/main/cpp/

弄完,->Build APK -> 安装,搞定!

6.demo演示:

《三. caffe2&pytorch之在移动端部署深度学习模型(全过程!)》https://www.zhihu.com/video/928940105270439936

有没有很简单,很有趣

    原文作者:SuperHui
    原文地址: https://zhuanlan.zhihu.com/p/32342366
    本文转自网络文章,转载此文章仅为分享知识,如有侵权,请联系博主进行删除。
点赞