大佬看了笑笑就行啦~
底部demo演示
这里移动端平台我选的Android,因为手上目前只有Android机,之所以演示这个是因为目前caffe2在android上的部署只有官方的一个1000类的例子,还是用的pre-trained模型,没有明确的详细部署教程,这里也是记录一下自己的学习过程,感受一下caffe2的跨平台。
之前用caffe,也用tf,这里我选caffe2和pytorch + onnx,onnx这个想法是很好的:
“ONNX 的全称为“Open Neural Network Exchange”,即“开放的神经网络切换”。顾名思义,该项目的目的是让不同的神经网络开发框架做到互通互用。目前,Microsoft Cognitive Toolkit,PyTorch 和 Caffe2 已宣布支持 ONNX。”
我也在持续关注进展,这里模型我本次先选了小但有力的squeezenet1.1版。
后续看时间可能会测别的模型, 还想和百度的及腾讯的做个对比测试,看时间吧,毕竟最近要考试等各种事情.(逃
这里我部署的是一个7类classification模型,类别分别为:
水瓶,椅子,桌子,笔记本,眼镜,手机,鼠标。
都是手头可以看得见的东西,图片爬的百度关键词,然后稍微选了一下,质量不是很高,最后效果还好,训练集大概每类500张,验证集大概每类100张。
ok,现在开始动手在Android手机上搭建自己的深度学习模型:
1. squeezenet网络搭建
首先试着搭建网络,这部分不难,网上也有许多例子了,其实最后这个demo并没有用搭建的网络训的模型,而是fine tune的一个pre-trained,这样能更快收敛,之所以演示这个过程,是为了后面的fine tune过程中改模型的参数和module时我为什么要这么改。
实现之前,当然推荐看一下paper:
https://arxiv.org/pdf/1602.07360.pdf
尤其时Fire模块和网络架构图:
图片来自论文
# Define model
class Fire(nn.Module):
def __init__(self,inchn,sqzout_chn,exp1x1out_chn,exp3x3out_chn):
super(Fire,self).__init__()
self.inchn = inchn
self.squeeze = nn.Conv2d(inchn,sqzout_chn,kernel_size=1)
self.squeeze_act = nn.ReLU(inplace=True)
self.expand1x1 = nn.Conv2d(sqzout_chn,exp1x1out_chn,kernel_size=1)
self.expand1x1_act = nn.ReLU(inplace=True)
self.expand3x3 = nn.Conv2d(sqzout_chn,exp3x3out_chn,kernel_size=3, padding=1)
self.expand3x3_act = nn.ReLU(inplace=True)
def forward(self, x):
x = self.squeeze_act(self.squeeze(x))
return torch.cat([
self.expand1x1_act(self.expand1x1(x)),
self.expand3x3_act(self.expand3x3(x))
], 1)
class Sqznet(nn.Module):
# 这里我demo只用到7个类别:水瓶,椅子,桌子,笔记本,眼镜,手机,鼠标
def __init__(self,num_class=7):
super(Sqznet,self).__init__()
self.num_class = num_class
self.features = nn.Sequential(
nn.Conv2d(3,64,kernel_size=3,stride=2),
nn.ReLU(inplace=True),
# 这里ceil_mode一定要设成False,不然finetune会报错,
# 后面你会看到我finetune时也改了这里,
# 因为目前onnx不支持squeezenet的 ceil_mode=True!!
nn.MaxPool2d(kernel_size=3,stride=2,ceil_mode=False),
Fire(64,16,64,64),
Fire(128,16,64,64),
nn.MaxPool2d(kernel_size=3,stride=2,ceil_mode=False),
Fire(128,32,128,128),
Fire(256,32,128,128),
nn.MaxPool2d(kernel_size=3,stride=2,ceil_mode=False),
Fire(256,48,192,192),
Fire(384,48,192,192),
Fire(384,64,256,256),
Fire(512,64,256,256),
)
final_conv = nn.Conv2d(512,self.num_class,kernel_size=1)
self.classifier = nn.Sequential(
nn.Dropout(p=0.5),
final_conv,
nn.ReLU(inplace=True),
nn.AvgPool2d(13)
)
# 这里参考了官网的实现,就是做参数初始化
for m in self.modules():
if isinstance(m, nn.Conv2d):
if m is final_conv:
init.normal(m.weight.data, mean=0.0, std=0.01)
else:
init.kaiming_uniform(m.weight.data)
if m.bias is not None:
m.bias.data.zero_()
def forward(self, x):
x = self.features(x)
x = self.classifier(x)
return x.view(x.size(0), self.num_class)
然后就可以开训啦,训完保存模型,或者这里我加载的pre-trained的模型,fine-tune不都这么干吗,反正最后你得拿到一个pytorch模型!
2. 数据集 和 数据预处理
训模型,当然要处理好数据啦,这里的数据处理参考了pytorch官网,毕竟我入门pytorch还没多久 ,具体请直击PyTorch documentation 很详细了(逃
数据集目录结构是这样的,子目录名就是对应的类别标签,即分别对应:水瓶,椅子,桌子,笔记本,眼镜,手机,鼠标。
先做数据集加载和预处理:
数据加载主要用到数据加载器dataloader,数据预处理主要做一些常见的数据增强和规范化,用的 torchvision的transforms包,做完这个,就可以从加载器中迭代拿数据啦。
# Data augmentation and normalization for training
data_transforms = {
'train': transforms.Compose([
transforms.RandomResizedCrop(224),
transforms.RandomHorizontalFlip(),
transforms.ToTensor(),
transforms.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225])
]),
'val': transforms.Compose([
transforms.Resize(256),
transforms.CenterCrop(224),
transforms.ToTensor(),
transforms.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225])
]),
}
data_dir = 'datadir'
image_datasets = {x: datasets.ImageFolder(os.path.join(data_dir, x), data_transforms[x]) for x in ['train', 'val']}
dataloaders = {x: torch.utils.data.DataLoader(image_datasets[x], batch_size=16, shuffle=True, num_workers=4) for x in ['train', 'val']}
dataset_sizes = {x: len(image_datasets[x]) for x in ['train', 'val']}
class_names = image_datasets['train'].classes
读一个batch看看什么样,用到了torchvision:
batch_size为16,图片与类别一一对应,可见从百度爬的图片,如果只是做分类的话,质量还是可以的。
# Have a look at data
inputs, classes = next(iter(dataloaders['train']))
out = torchvision.utils.make_grid(inputs)
imshow(out, title=[class_names[x] for x in classes])
ok,到目前已经完成了数据方面的准备和对squeezenet网络的搭建和训练,也就是说已经拿到了用来finetune的squeezenet模型,接下来做fine tune,里面还是有许多注意点的。
3. Finetune一个squeezenet 模型
from torchvision import datasets, models, transforms
# Start Fine tuning
model_ft = models.squeezenet1_1(pretrained=True)
# 先看一下模型什么样
print(model_ft)
SqueezeNet(
(features): Sequential(
(0): Conv2d (3, 64, kernel_size=(3, 3), stride=(2, 2))
(1): ReLU(inplace)
(2): MaxPool2d(kernel_size=(3, 3), stride=(2, 2), dilation=(1, 1))
(3): Fire(
(squeeze): Conv2d (64, 16, kernel_size=(1, 1), stride=(1, 1))
(squeeze_activation): ReLU(inplace)
(expand1x1): Conv2d (16, 64, kernel_size=(1, 1), stride=(1, 1))
(expand1x1_activation): ReLU(inplace)
(expand3x3): Conv2d (16, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(expand3x3_activation): ReLU(inplace)
)
(4): Fire(
(squeeze): Conv2d (128, 16, kernel_size=(1, 1), stride=(1, 1))
(squeeze_activation): ReLU(inplace)
(expand1x1): Conv2d (16, 64, kernel_size=(1, 1), stride=(1, 1))
(expand1x1_activation): ReLU(inplace)
(expand3x3): Conv2d (16, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(expand3x3_activation): ReLU(inplace)
)
(5): MaxPool2d(kernel_size=(3, 3), stride=(2, 2), dilation=(1, 1))
(6): Fire(
(squeeze): Conv2d (128, 32, kernel_size=(1, 1), stride=(1, 1))
(squeeze_activation): ReLU(inplace)
(expand1x1): Conv2d (32, 128, kernel_size=(1, 1), stride=(1, 1))
(expand1x1_activation): ReLU(inplace)
(expand3x3): Conv2d (32, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(expand3x3_activation): ReLU(inplace)
)
(7): Fire(
(squeeze): Conv2d (256, 32, kernel_size=(1, 1), stride=(1, 1))
(squeeze_activation): ReLU(inplace)
(expand1x1): Conv2d (32, 128, kernel_size=(1, 1), stride=(1, 1))
(expand1x1_activation): ReLU(inplace)
(expand3x3): Conv2d (32, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(expand3x3_activation): ReLU(inplace)
)
(8): MaxPool2d(kernel_size=(3, 3), stride=(2, 2), dilation=(1, 1))
(9): Fire(
(squeeze): Conv2d (256, 48, kernel_size=(1, 1), stride=(1, 1))
(squeeze_activation): ReLU(inplace)
(expand1x1): Conv2d (48, 192, kernel_size=(1, 1), stride=(1, 1))
(expand1x1_activation): ReLU(inplace)
(expand3x3): Conv2d (48, 192, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(expand3x3_activation): ReLU(inplace)
)
(10): Fire(
(squeeze): Conv2d (384, 48, kernel_size=(1, 1), stride=(1, 1))
(squeeze_activation): ReLU(inplace)
(expand1x1): Conv2d (48, 192, kernel_size=(1, 1), stride=(1, 1))
(expand1x1_activation): ReLU(inplace)
(expand3x3): Conv2d (48, 192, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(expand3x3_activation): ReLU(inplace)
)
(11): Fire(
(squeeze): Conv2d (384, 64, kernel_size=(1, 1), stride=(1, 1))
(squeeze_activation): ReLU(inplace)
(expand1x1): Conv2d (64, 256, kernel_size=(1, 1), stride=(1, 1))
(expand1x1_activation): ReLU(inplace)
(expand3x3): Conv2d (64, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(expand3x3_activation): ReLU(inplace)
)
(12): Fire(
(squeeze): Conv2d (512, 64, kernel_size=(1, 1), stride=(1, 1))
(squeeze_activation): ReLU(inplace)
(expand1x1): Conv2d (64, 256, kernel_size=(1, 1), stride=(1, 1))
(expand1x1_activation): ReLU(inplace)
(expand3x3): Conv2d (64, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(expand3x3_activation): ReLU(inplace)
)
)
(classifier): Sequential(
(0): Dropout(p=0.5)
(1): Conv2d (512, 1000, kernel_size=(1, 1), stride=(1, 1))
(2): ReLU(inplace)
(3): AvgPool2d(kernel_size=13, stride=1, padding=0, ceil_mode=False, count_include_pad=True)
)
)
可以看到完整的网络结构图,很清晰,而fine tune的时候我们主要关注最后一层,用pre-trained模型的前面多层参数做参数初始化,这里主要关注一下几点:
①前面说过nn.MaxPool2d层的ceil_mode一定要设成False,不然finetune会报错,因为目前onnx不支持squeezenet的 ceil_mode=True,所以我们对模型可以做如下修改,手动将ceil_mode=False:
model_ft.features._modules["2"] = nn.MaxPool2d(kernel_size=3, stride=2, dilation=1,ceil_mode=False)
②就是常见的该输出类别数啦,取出相应module,将原来的1000改为7类,so easy
nn.Conv2d(num_ftrs, 7, kernel_size=(1, 1), stride=(1, 1))
所以fine tune之前可以想上面这样,先打出model结构看一看方便理解和fine tune
ok,到目前为止我们已经搞定了数据,拿到了pre-trained的model,而且改好了待fine tune的模型,接下来就是fine tune的训练啦:
首先定义一下训练要用的loss函数和训练策略,相信一看就懂啦:
criterion = nn.CrossEntropyLoss()
optimizer_ft = optim.SGD(model_ft.parameters(), lr=0.001, momentum=0.92)
exp_lr_scheduler = lr_scheduler.StepLR(optimizer_ft, step_size=15, gamma=0.1)
然后定义训练函数,就是符合pytorch语法的常见的训练套路啦:
# Define training Pipeline
def train_model(model, criterion, optimizer, scheduler, num_epochs=1):
best_model_wts = model.state_dict()
best_acc = 0.0
for epoch in range(num_epochs):
print('Epoch {}/{}'.format(epoch, num_epochs - 1))
# Each epoch has a training and validation phase
for phase in ['train', 'val']:
if phase == 'train':
scheduler.step()
model.train(True) # 训练模式
else:
model.train(False) # 验证模式
running_loss = 0.0
running_corrects = 0
# Iterate over data.
iter=0
for data in dataloaders[phase]:
inputs, labels = data
#out = torchvision.utils.make_grid(inputs) # have a look at train img
#imshow(out, title=[class_names[x] for x in labels])
#这里我用的cpu训的
if use_gpu:
inputs = Variable(inputs.cuda())
labels = Variable(labels.cuda())
else:
inputs, labels = Variable(inputs), Variable(labels)
optimizer.zero_grad()
# forward
outputs = model(inputs)
_, preds = torch.max(outputs.data, 1)
loss = criterion(outputs, labels)
print("phase:%s, epoch:%d/%d Iter %d: loss=%s"%(phase,epoch,num_epochs-1,iter,str(loss.data.numpy())))
# backward + optimize only if in training phase
if phase == 'train':
loss.backward()
optimizer.step()
running_loss += loss.data[0]
running_corrects += torch.sum(preds == labels.data)
iter += 1
epoch_loss = running_loss / dataset_sizes[phase]
epoch_acc = running_corrects / dataset_sizes[phase]
print('{} Loss: {:.4f} Acc: {:.4f}'.format(phase, epoch_loss, epoch_acc))
# deep copy the model
if phase == 'val' and epoch_acc > best_acc:
best_acc = epoch_acc
best_model_wts = model.state_dict()
print('-' * 10)
print('Best val Acc: {:4f}'.format(best_acc))
# load best model weights
model.load_state_dict(best_model_wts)
return model
接下来开训:
# 这里只用了30epoch,训练集每个epoch 210次iteration,验证集每个eopch 40次,
# 对于这个demo 7分类问题,足以
model_ft = train_model(model_ft, criterion, optimizer_ft, exp_lr_scheduler,
num_epochs=30)
以下是训练结果图:
以下时验证结果图:
ok,训完之后感觉应付这个demo演示还是可以的,于是接下来转成caffe2模型
4. 转caffe2,拿到init_net.pb 和 predict_net.pb
转caffe2还是很容易的,onnx提供了方便的接口:
注:这里之所以要设置一个输入x,因为onnx采用 track机制,会先随便拿个符合输入size的数据跑一遍,拿到网络结构
转完后先生成onnx object: sqz.onnx
from torch.autograd import Variable
import torch
batch_size=1 # 随便一个数
x = Variable(torch.randn(batch_size,3,224,224), requires_grad=True)
torch_out = torch.onnx._export(model_ft,
x,
"sqz.onnx",
export_params=True
)
接下来转成caffe2需要的init_net.pb, predict_net.pb:
import onnx
import onnx_caffe2.backend
# load onnx object
model = onnx.load("sqz.onnx")
prepared_backend = onnx_caffe2.backend.prepare(model)
from onnx_caffe2.backend import Caffe2Backend as c2
init_net, predict_net = c2.onnx_graph_to_caffe2_net(model.graph)
with open("squeeze_init_net.pb", "wb") as f:
f.write(init_net.SerializeToString())
with open("squeeze_predict_net.pb", "wb") as f:
f.write(predict_net.SerializeToString())
ok, 到目前我们已经拿到了转成caffe2之后的7分类model,已经可以用caffe2做分类了,不过这次我记录在移动端Android平台的学习,所以直接在Android手机上跑。
5.最后在Android上部署
注:玩之前,记得先在linux下装好android studio,这部分网上教程很多啦
由于我很久之前玩过一段时间Android编程,所以基本直接能看懂caffe2官网AICamera的例子,这个demo我基本没有做什么UI编程,就换了一下模型,主要关注以下两个地方:
在这里改成训练的模型:
/home/xxx/AICamera/app/src/main/assets/
在这里改成与训练的7分类模型对应的标签文件:
/home/xxx/Android/AICamera/app/src/main/cpp/
弄完,->Build APK -> 安装,搞定!
6.demo演示:
https://www.zhihu.com/video/928940105270439936
有没有很简单,很有趣!