三. caffe2&pytorch之在移動端部署深度學習模型(全過程!)

大佬看了笑笑就行啦~

底部demo演示

這裡移動端平台我選的Android,因為手上目前只有Android機,之所以演示這個是因為目前caffe2在android上的部署只有官方的一個1000類的例子,還是用的pre-trained模型,沒有明確的詳細部署教程,這裡也是記錄一下自己的學習過程,感受一下caffe2的跨平台。

之前用caffe,也用tf,這裡我選caffe2和pytorch + onnx,onnx這個想法是很好的:

"ONNX 的全稱為「Open Neural Network Exchange」,即「開放的神經網路切換」。顧名思義,該項目的目的是讓不同的神經網路開發框架做到互通互用。目前,Microsoft Cognitive Toolkit,PyTorch 和 Caffe2 已宣布支持 ONNX。"

我也在持續關注進展,這裡模型我本次先選了小但有力的squeezenet1.1版。

後續看時間可能會測別的模型, 還想和百度的及騰訊的做個對比測試,看時間吧,畢竟最近要考試等各種事情.(逃


這裡我部署的是一個7類classification模型,類別分別為:

水瓶,椅子,桌子,筆記本,眼鏡,手機,滑鼠

都是手頭可以看得見的東西,圖片爬的百度關鍵詞,然後稍微選了一下,質量不是很高,最後效果還好,訓練集大概每類500張,驗證集大概每類100張。

ok,現在開始動手在Android手機上搭建自己的深度學習模型:

1. squeezenet網路搭建

首先試著搭建網路,這部分不難,網上也有許多例子了,其實最後這個demo並沒有用搭建的網路訓的模型,而是fine tune的一個pre-trained,這樣能更快收斂,之所以演示這個過程,是為了後面的fine tune過程中改模型的參數和module時我為什麼要這麼改。

實現之前,當然推薦看一下paper:

arxiv.org/pdf/1602.0736

尤其時Fire模塊和網路架構圖:

圖片來自論文

# Define modelclass Fire(nn.Module): def __init__(self,inchn,sqzout_chn,exp1x1out_chn,exp3x3out_chn): super(Fire,self).__init__() self.inchn = inchn self.squeeze = nn.Conv2d(inchn,sqzout_chn,kernel_size=1) self.squeeze_act = nn.ReLU(inplace=True) self.expand1x1 = nn.Conv2d(sqzout_chn,exp1x1out_chn,kernel_size=1) self.expand1x1_act = nn.ReLU(inplace=True) self.expand3x3 = nn.Conv2d(sqzout_chn,exp3x3out_chn,kernel_size=3, padding=1) self.expand3x3_act = nn.ReLU(inplace=True) def forward(self, x): x = self.squeeze_act(self.squeeze(x)) return torch.cat([ self.expand1x1_act(self.expand1x1(x)), self.expand3x3_act(self.expand3x3(x)) ], 1)class Sqznet(nn.Module): # 這裡我demo只用到7個類別:水瓶,椅子,桌子,筆記本,眼鏡,手機,滑鼠 def __init__(self,num_class=7): super(Sqznet,self).__init__() self.num_class = num_class self.features = nn.Sequential( nn.Conv2d(3,64,kernel_size=3,stride=2), nn.ReLU(inplace=True), # 這裡ceil_mode一定要設成False,不然finetune會報錯, # 後面你會看到我finetune時也改了這裡, # 因為目前onnx不支持squeezenet的 ceil_mode=True!! nn.MaxPool2d(kernel_size=3,stride=2,ceil_mode=False), Fire(64,16,64,64), Fire(128,16,64,64), nn.MaxPool2d(kernel_size=3,stride=2,ceil_mode=False), Fire(128,32,128,128), Fire(256,32,128,128), nn.MaxPool2d(kernel_size=3,stride=2,ceil_mode=False), Fire(256,48,192,192), Fire(384,48,192,192), Fire(384,64,256,256), Fire(512,64,256,256), ) final_conv = nn.Conv2d(512,self.num_class,kernel_size=1) self.classifier = nn.Sequential( nn.Dropout(p=0.5), final_conv, nn.ReLU(inplace=True), nn.AvgPool2d(13) ) # 這裡參考了官網的實現,就是做參數初始化 for m in self.modules(): if isinstance(m, nn.Conv2d): if m is final_conv: init.normal(m.weight.data, mean=0.0, std=0.01) else: init.kaiming_uniform(m.weight.data) if m.bias is not None: m.bias.data.zero_() def forward(self, x): x = self.features(x) x = self.classifier(x) return x.view(x.size(0), self.num_class)

然後就可以開訓啦,訓完保存模型,或者這裡我載入的pre-trained的模型,fine-tune不都這麼幹嗎,反正最後你得拿到一個pytorch模型!

2. 數據集 和 數據預處理

訓模型,當然要處理好數據啦,這裡的數據處理參考了pytorch官網,畢竟我入門pytorch還沒多久 ,具體請直擊PyTorch documentation 很詳細了(逃

數據集目錄結構是這樣的,子目錄名就是對應的類別標籤,即分別對應:水瓶,椅子,桌子,筆記本,眼鏡,手機,滑鼠。

先做數據集載入和預處理:

數據載入主要用到數據載入器dataloader,數據預處理主要做一些常見的數據增強和規範化,用的 torchvision的transforms包,做完這個,就可以從載入器中迭代拿數據啦。

# Data augmentation and normalization for trainingdata_transforms = { "train": transforms.Compose([ transforms.RandomResizedCrop(224), transforms.RandomHorizontalFlip(), transforms.ToTensor(), transforms.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225]) ]), "val": transforms.Compose([ transforms.Resize(256), transforms.CenterCrop(224), transforms.ToTensor(), transforms.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225]) ]),}data_dir = "datadir"image_datasets = {x: datasets.ImageFolder(os.path.join(data_dir, x), data_transforms[x]) for x in ["train", "val"]}dataloaders = {x: torch.utils.data.DataLoader(image_datasets[x], batch_size=16, shuffle=True, num_workers=4) for x in ["train", "val"]}dataset_sizes = {x: len(image_datasets[x]) for x in ["train", "val"]}class_names = image_datasets["train"].classes

讀一個batch看看什麼樣,用到了torchvision:

batch_size為16,圖片與類別一一對應,可見從百度爬的圖片,如果只是做分類的話,質量還是可以的。

# Have a look at datainputs, classes = next(iter(dataloaders["train"]))out = torchvision.utils.make_grid(inputs)imshow(out, title=[class_names[x] for x in classes])

ok,到目前已經完成了數據方面的準備和對squeezenet網路的搭建和訓練,也就是說已經拿到了用來finetune的squeezenet模型,接下來做fine tune,裡面還是有許多注意點的。

3. Finetune一個squeezenet 模型

from torchvision import datasets, models, transforms# Start Fine tuningmodel_ft = models.squeezenet1_1(pretrained=True)# 先看一下模型什麼樣print(model_ft)

SqueezeNet(

(features): Sequential(

(0): Conv2d (3, 64, kernel_size=(3, 3), stride=(2, 2))

(1): ReLU(inplace)

(2): MaxPool2d(kernel_size=(3, 3), stride=(2, 2), dilation=(1, 1))

(3): Fire(

(squeeze): Conv2d (64, 16, kernel_size=(1, 1), stride=(1, 1))

(squeeze_activation): ReLU(inplace)

(expand1x1): Conv2d (16, 64, kernel_size=(1, 1), stride=(1, 1))

(expand1x1_activation): ReLU(inplace)

(expand3x3): Conv2d (16, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))

(expand3x3_activation): ReLU(inplace)

)

(4): Fire(

(squeeze): Conv2d (128, 16, kernel_size=(1, 1), stride=(1, 1))

(squeeze_activation): ReLU(inplace)

(expand1x1): Conv2d (16, 64, kernel_size=(1, 1), stride=(1, 1))

(expand1x1_activation): ReLU(inplace)

(expand3x3): Conv2d (16, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))

(expand3x3_activation): ReLU(inplace)

)

(5): MaxPool2d(kernel_size=(3, 3), stride=(2, 2), dilation=(1, 1))

(6): Fire(

(squeeze): Conv2d (128, 32, kernel_size=(1, 1), stride=(1, 1))

(squeeze_activation): ReLU(inplace)

(expand1x1): Conv2d (32, 128, kernel_size=(1, 1), stride=(1, 1))

(expand1x1_activation): ReLU(inplace)

(expand3x3): Conv2d (32, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))

(expand3x3_activation): ReLU(inplace)

)

(7): Fire(

(squeeze): Conv2d (256, 32, kernel_size=(1, 1), stride=(1, 1))

(squeeze_activation): ReLU(inplace)

(expand1x1): Conv2d (32, 128, kernel_size=(1, 1), stride=(1, 1))

(expand1x1_activation): ReLU(inplace)

(expand3x3): Conv2d (32, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))

(expand3x3_activation): ReLU(inplace)

)

(8): MaxPool2d(kernel_size=(3, 3), stride=(2, 2), dilation=(1, 1))

(9): Fire(

(squeeze): Conv2d (256, 48, kernel_size=(1, 1), stride=(1, 1))

(squeeze_activation): ReLU(inplace)

(expand1x1): Conv2d (48, 192, kernel_size=(1, 1), stride=(1, 1))

(expand1x1_activation): ReLU(inplace)

(expand3x3): Conv2d (48, 192, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))

(expand3x3_activation): ReLU(inplace)

)

(10): Fire(

(squeeze): Conv2d (384, 48, kernel_size=(1, 1), stride=(1, 1))

(squeeze_activation): ReLU(inplace)

(expand1x1): Conv2d (48, 192, kernel_size=(1, 1), stride=(1, 1))

(expand1x1_activation): ReLU(inplace)

(expand3x3): Conv2d (48, 192, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))

(expand3x3_activation): ReLU(inplace)

)

(11): Fire(

(squeeze): Conv2d (384, 64, kernel_size=(1, 1), stride=(1, 1))

(squeeze_activation): ReLU(inplace)

(expand1x1): Conv2d (64, 256, kernel_size=(1, 1), stride=(1, 1))

(expand1x1_activation): ReLU(inplace)

(expand3x3): Conv2d (64, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))

(expand3x3_activation): ReLU(inplace)

)

(12): Fire(

(squeeze): Conv2d (512, 64, kernel_size=(1, 1), stride=(1, 1))

(squeeze_activation): ReLU(inplace)

(expand1x1): Conv2d (64, 256, kernel_size=(1, 1), stride=(1, 1))

(expand1x1_activation): ReLU(inplace)

(expand3x3): Conv2d (64, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))

(expand3x3_activation): ReLU(inplace)

)

)

(classifier): Sequential(

(0): Dropout(p=0.5)

(1): Conv2d (512, 1000, kernel_size=(1, 1), stride=(1, 1))

(2): ReLU(inplace)

(3): AvgPool2d(kernel_size=13, stride=1, padding=0, ceil_mode=False, count_include_pad=True)

)

)

可以看到完整的網路結構圖,很清晰,而fine tune的時候我們主要關注最後一層,用pre-trained模型的前面多層參數做參數初始化,這裡主要關注一下幾點:

①前面說過nn.MaxPool2d層的ceil_mode一定要設成False,不然finetune會報錯,因為目前onnx不支持squeezenet的 ceil_mode=True,所以我們對模型可以做如下修改,手動將ceil_mode=False:

model_ft.features._modules["2"] = nn.MaxPool2d(kernel_size=3, stride=2, dilation=1,ceil_mode=False)

②就是常見的該輸出類別數啦,取出相應module,將原來的1000改為7類,so easy

nn.Conv2d(num_ftrs, 7, kernel_size=(1, 1), stride=(1, 1))

所以fine tune之前可以想上面這樣,先打出model結構看一看方便理解和fine tune


ok,到目前為止我們已經搞定了數據,拿到了pre-trained的model,而且改好了待fine tune的模型,接下來就是fine tune的訓練啦:

首先定義一下訓練要用的loss函數和訓練策略,相信一看就懂啦:

criterion = nn.CrossEntropyLoss()optimizer_ft = optim.SGD(model_ft.parameters(), lr=0.001, momentum=0.92)exp_lr_scheduler = lr_scheduler.StepLR(optimizer_ft, step_size=15, gamma=0.1)

然後定義訓練函數,就是符合pytorch語法的常見的訓練套路啦:

# Define training Pipelinedef train_model(model, criterion, optimizer, scheduler, num_epochs=1): best_model_wts = model.state_dict() best_acc = 0.0 for epoch in range(num_epochs): print("Epoch {}/{}".format(epoch, num_epochs - 1)) # Each epoch has a training and validation phase for phase in ["train", "val"]: if phase == "train": scheduler.step() model.train(True) # 訓練模式 else: model.train(False) # 驗證模式 running_loss = 0.0 running_corrects = 0 # Iterate over data. iter=0 for data in dataloaders[phase]: inputs, labels = data #out = torchvision.utils.make_grid(inputs) # have a look at train img #imshow(out, title=[class_names[x] for x in labels]) #這裡我用的cpu訓的 if use_gpu: inputs = Variable(inputs.cuda()) labels = Variable(labels.cuda()) else: inputs, labels = Variable(inputs), Variable(labels) optimizer.zero_grad() # forward outputs = model(inputs) _, preds = torch.max(outputs.data, 1) loss = criterion(outputs, labels) print("phase:%s, epoch:%d/%d Iter %d: loss=%s"%(phase,epoch,num_epochs-1,iter,str(loss.data.numpy()))) # backward + optimize only if in training phase if phase == "train": loss.backward() optimizer.step() running_loss += loss.data[0] running_corrects += torch.sum(preds == labels.data) iter += 1 epoch_loss = running_loss / dataset_sizes[phase] epoch_acc = running_corrects / dataset_sizes[phase] print("{} Loss: {:.4f} Acc: {:.4f}".format(phase, epoch_loss, epoch_acc)) # deep copy the model if phase == "val" and epoch_acc > best_acc: best_acc = epoch_acc best_model_wts = model.state_dict() print("-" * 10) print("Best val Acc: {:4f}".format(best_acc)) # load best model weights model.load_state_dict(best_model_wts) return model

接下來開訓:

# 這裡只用了30epoch,訓練集每個epoch 210次iteration,驗證集每個eopch 40次,# 對於這個demo 7分類問題,足以model_ft = train_model(model_ft, criterion, optimizer_ft, exp_lr_scheduler, num_epochs=30)

以下是訓練結果圖:

以下時驗證結果圖:

ok,訓完之後感覺應付這個demo演示還是可以的,於是接下來轉成caffe2模型

4. 轉caffe2,拿到init_net.pb 和 predict_net.pb

轉caffe2還是很容易的,onnx提供了方便的介面:

註:這裡之所以要設置一個輸入x,因為onnx採用 track機制,會先隨便拿個符合輸入size的數據跑一遍,拿到網路結構

轉完後先生成onnx object: sqz.onnx

from torch.autograd import Variableimport torchbatch_size=1 # 隨便一個數x = Variable(torch.randn(batch_size,3,224,224), requires_grad=True)torch_out = torch.onnx._export(model_ft, x, "sqz.onnx", export_params=True )

接下來轉成caffe2需要的init_net.pb, predict_net.pb:

import onnximport onnx_caffe2.backend# load onnx objectmodel = onnx.load("sqz.onnx")prepared_backend = onnx_caffe2.backend.prepare(model)from onnx_caffe2.backend import Caffe2Backend as c2init_net, predict_net = c2.onnx_graph_to_caffe2_net(model.graph)with open("squeeze_init_net.pb", "wb") as f: f.write(init_net.SerializeToString())with open("squeeze_predict_net.pb", "wb") as f: f.write(predict_net.SerializeToString())

ok, 到目前我們已經拿到了轉成caffe2之後的7分類model,已經可以用caffe2做分類了,不過這次我記錄在移動端Android平台的學習,所以直接在Android手機上跑。

5.最後在Android上部署

註:玩之前,記得先在linux下裝好android studio,這部分網上教程很多啦

由於我很久之前玩過一段時間Android編程,所以基本直接能看懂caffe2官網AICamera的例子,這個demo我基本沒有做什麼UI編程,就換了一下模型,主要關注以下兩個地方:

在這裡改成訓練的模型:

/home/xxx/AICamera/app/src/main/assets/

在這裡改成與訓練的7分類模型對應的標籤文件:

/home/xxx/Android/AICamera/app/src/main/cpp/

弄完,->Build APK -> 安裝,搞定!

6.demo演示:

https://www.zhihu.com/video/928940105270439936

有沒有很簡單,很有趣


推薦閱讀:

在 Android 中實現 Redux 的一點經驗
從 SQLite 逐步遷移到 Room
千元機和旗艦機相比,究竟哪裡縮水了?

TAG:Caffe2 | PyTorch | Android |