深度学习目标检测(YoloV5)项目——从0开始到项目落地部署
前言
訓(xùn)練和開(kāi)發(fā)環(huán)境是win10,顯卡RTX3080;cuda10.2,cudnn7.1;OpenCV4.5;yolov5用的是5s的模型,2020年8月13日的發(fā)布v3.0這個(gè)版本; ncnn版本是20210525;C++ IDE vs2019,Anaconda 3.5。
一、環(huán)境安裝
1.anaconda環(huán)境
- 創(chuàng)建環(huán)境
- 退出環(huán)境
查看已安裝的環(huán)境
conda info --env- 刪除環(huán)境
2.安裝依賴
git clone https://github.com/ultralytics/yolov5.gitcd yolov5pip install -r requirements.txt或者
git clone https://github.com/ultralytics/yolov5.git cd yolov5 conda install pytorch torchvision cudatoolkit=10.2 -c pytorch pip install cython matplotlib tqdm opencv-python tensorboard scipy pillow onnx pyyaml pandas seabornwin下盡量不要用cuda11,試了幾次都是要么找不到GPU,要么跑到一半崩了。
二、數(shù)據(jù)處理
1.數(shù)據(jù)標(biāo)注用labelme,身份證的數(shù)據(jù)我從網(wǎng)上找了一些公開(kāi)的模板數(shù)據(jù),然后用對(duì)抗生成了一批數(shù)據(jù)進(jìn)行標(biāo)注,300張樣本左右,labelme標(biāo)注出來(lái)的數(shù)據(jù)格式是xml。
2.在yolo/data 目錄下創(chuàng)建一個(gè)存放數(shù)據(jù)集的目錄,目錄下再分兩個(gè)目錄,JPEGImages存放原始圖像,Annotations存在放標(biāo)簽文件。
3.數(shù)據(jù)標(biāo)注用labelme標(biāo)注成.xml,但yolo要的標(biāo)簽格式是.txt,所以要把數(shù)據(jù)轉(zhuǎn)換過(guò)來(lái)。
- 數(shù)據(jù)生成訓(xùn)練集與驗(yàn)證集,在data/xxxx目錄下會(huì) train.txt 和val.txt,輸出所有標(biāo)注的類名,并在JPEGImages下生成與文件名對(duì)應(yīng)的.txt文件。
執(zhí)行命令:
-
輸出標(biāo)注的類名樣例:如[‘ida’, ‘idb’]。
-
生成的.txt文件
類名 歸一化后的目標(biāo)坐標(biāo)點(diǎn)
- 數(shù)據(jù)處理代碼
- generate_txt.py
三、模型訓(xùn)練
1.model/yolov5s.yaml,更改nc數(shù)目。
# parameters nc: 2 # 檢測(cè)總類別 depth_multiple: 0.33 # model depth multiple 網(wǎng)絡(luò)的深度系數(shù) width_multiple: 0.50 # layer channel multiple 卷積核的系數(shù)# anchors 候選框,可以改成自己目標(biāo)的尺寸,也可以增加候選框 anchors:- [10,13, 16,30, 33,23] # P3/8- [30,61, 62,45, 59,119] # P4/16- [116,90, 156,198, 373,326] # P5/32# YOLOv5 backbone backbone: #特征提取模塊# [from, number, module, args]# from - 輸入是什么,-1:上一層的輸出結(jié)果;# number - 該層的重復(fù)的次數(shù),要乘以系數(shù),小于1則等于1 源碼( n = max(round(n * gd), 1) if n > 1 else n)# module - 層的名字# args - 卷積核的個(gè)數(shù)[[-1, 1, Focus, [64, 3]], # 0-P1/2 # 64要乘以卷積核的個(gè)數(shù) 64*0.5 = 32個(gè)特征圖[-1, 1, Conv, [128, 3, 2]], # 1-P2/4[-1, 3, BottleneckCSP, [128]],[-1, 1, Conv, [256, 3, 2]], # 3-P3/8[-1, 9, BottleneckCSP, [256]],[-1, 1, Conv, [512, 3, 2]], # 5-P4/16[-1, 9, BottleneckCSP, [512]],[-1, 1, Conv, [1024, 3, 2]], # 7-P5/32[-1, 1, SPP, [1024, [5, 9, 13]]],[-1, 3, BottleneckCSP, [1024, False]], # 9]# YOLOv5 head head:[[-1, 1, Conv, [512, 1, 1]],[-1, 1, nn.Upsample, [None, 2, 'nearest']],[[-1, 6], 1, Concat, [1]], # cat backbone P4[-1, 3, BottleneckCSP, [512, False]], # 13[-1, 1, Conv, [256, 1, 1]],[-1, 1, nn.Upsample, [None, 2, 'nearest']],[[-1, 4], 1, Concat, [1]], # cat backbone P3[-1, 3, BottleneckCSP, [256, False]], # 17 (P3/8-small)[-1, 1, Conv, [256, 3, 2]],[[-1, 14], 1, Concat, [1]], # cat head P4[-1, 3, BottleneckCSP, [512, False]], # 20 (P4/16-medium)[-1, 1, Conv, [512, 3, 2]],[[-1, 10], 1, Concat, [1]], # cat head P5[-1, 3, BottleneckCSP, [1024, False]], # 23 (P5/32-large)[[17, 20, 23], 1, Detect, [nc, anchors]], # Detect(P3, P4, P5) [17,20,23] #17層、20層、23層;]2.在data目錄下添加一個(gè)xxx.yaml訓(xùn)練數(shù)據(jù)配置文件。
# download command/URL (optional) download: bash data/scripts/get_voc.sh# 訓(xùn)練集txt與驗(yàn)證集txt路徑 train: data/xxx/train.txt val: data/xxx/val.txt# 總類別數(shù) nc: 2# 類名 names: ['ida', 'idb']3.訓(xùn)練參數(shù)
parser = argparse.ArgumentParser()parser.add_argument('--weights', type=str, default='yolov5s.pt', help='initial weights path') # 權(quán)重文件,是否在使用預(yù)訓(xùn)練權(quán)重文件parser.add_argument('--cfg', type=str, default='', help='model.yaml path') # 網(wǎng)絡(luò)配置文件parser.add_argument('--data', type=str, default='data/coco128.yaml', help='data.yaml path') # 訓(xùn)練數(shù)據(jù)集目錄parser.add_argument('--hyp', type=str, default='data/hyp.scratch.yaml', help='hyperparameters path') #超參數(shù)配置文件parser.add_argument('--epochs', type=int, default=300) # 訓(xùn)練迭代次數(shù)parser.add_argument('--batch-size', type=int, default=32, help='total batch size for all GPUs') # batch-size大小parser.add_argument('--img-size', nargs='+', type=int, default=[640, 640], help='[train, test] image sizes') # 訓(xùn)練圖像大小parser.add_argument('--rect', action='store_true', help='rectangular training') #矩形訓(xùn)練parser.add_argument('--resume', nargs='?', const=True, default=False, help='resume most recent training') # 是否接著上一次的日志權(quán)重繼續(xù)訓(xùn)練parser.add_argument('--nosave', action='store_true', help='only save final checkpoint') # 不保存parser.add_argument('--notest', action='store_true', help='only test final epoch') # 不測(cè)試parser.add_argument('--noautoanchor', action='store_true', help='disable autoanchor check')parser.add_argument('--evolve', action='store_true', help='evolve hyperparameters') #超參數(shù)范圍parser.add_argument('--bucket', type=str, default='', help='gsutil bucket')parser.add_argument('--cache-images', action='store_true', help='cache images for faster training') #是否緩存圖像parser.add_argument('--image-weights', action='store_true', help='use weighted image selection for training')parser.add_argument('--device', default='', help='cuda device, i.e. 0 or 0,1,2,3 or cpu') # 用GPU或者CPU進(jìn)行訓(xùn)練parser.add_argument('--multi-scale', action='store_true', help='vary img-size +/- 50%%') #是否多尺度訓(xùn)練parser.add_argument('--single-cls', action='store_true', help='train as single-class dataset') # 是否一個(gè)類別parser.add_argument('--adam', action='store_true', help='use torch.optim.Adam() optimizer') # 優(yōu)化器先擇parser.add_argument('--sync-bn', action='store_true', help='use SyncBatchNorm, only available in DDP mode')parser.add_argument('--local_rank', type=int, default=-1, help='DDP parameter, do not modify')parser.add_argument('--log-imgs', type=int, default=16, help='number of images for W&B logging, max 100')parser.add_argument('--workers', type=int, default=8, help='maximum number of dataloader workers') #win不能改,win上改不改都容易崩parser.add_argument('--project', default='runs/train', help='save to project/name')parser.add_argument('--name', default='exp', help='save to project/name')parser.add_argument('--exist-ok', action='store_true', help='existing project/name ok, do not increment')opt = parser.parse_args()4.訓(xùn)練命令
- 單卡:
- 多卡:
5.測(cè)試模型
python test.py --weights runs/train/exp/weights/best.pt --data data/ODID.yaml --device 0 --verbose --weights: 訓(xùn)練得到的模型 --data:數(shù)據(jù)配置文件.txt --device:選擇gpu進(jìn)行評(píng)測(cè) --verbose:是否打印每一類的評(píng)測(cè)指標(biāo)OpenCV DNN C++ 推理
1.由于OpenCV DNN中的slice層不支持step為2,所以在轉(zhuǎn)換模型時(shí)需要修改代碼,修改的地方在models/common.py中Focus類
- 修改前:
- 修改后
2.轉(zhuǎn)換模型
python models/export.py --weights runs/exp/weights/best.pt # --weights: 訓(xùn)練得到的模型運(yùn)行后,onnx模型保存為了runs/exp/weights/best.onnx,這個(gè)模型就可以用OpenCV DNN進(jìn)行推理。
3.DNN C++推理
#include <iostream> #include <string> #include <vector> #include <fstream> #include <sstream> #include <opencv2/opencv.hpp> #include <opencv2/dnn.hpp>void imshow(std::string name, const cv::Mat& cv_src) {cv::namedWindow(name, 0);int max_rows = 800;int max_cols = 800;if (cv_src.rows >= cv_src.cols && cv_src.rows > max_rows){cv::resizeWindow(name, cv::Size(cv_src.cols * max_rows / cv_src.rows, max_rows));}else if (cv_src.cols >= cv_src.rows && cv_src.cols > max_cols){cv::resizeWindow(name, cv::Size(max_cols, cv_src.rows * max_cols / cv_src.cols));}cv::imshow(name, cv_src); }inline float sigmoid(float x) {return 1.f / (1.f + exp(-x)); }void sliceAndConcat(cv::Mat& img, cv::Mat* input) {const float* srcData = img.ptr<float>();float* dstData = input->ptr<float>();using Vec12f = cv::Vec<float, 12>;for (int i = 0; i < input->size[2]; i++){for (int j = 0; j < input->size[3]; j++){for (int k = 0; k < 3; ++k){dstData[k * input->size[2] * input->size[3] + i * input->size[3] + j] =srcData[k * img.size[2] * img.size[3] + 2 * i * img.size[3] + 2 * j];}for (int k = 0; k < 3; ++k){dstData[(3 + k) * input->size[2] * input->size[3] + i * input->size[3] + j] =srcData[k * img.size[2] * img.size[3] + (2 * i + 1) * img.size[3] + 2 * j];}for (int k = 0; k < 3; ++k) {dstData[(6 + k) * input->size[2] * input->size[3] + i * input->size[3] + j] =srcData[k * img.size[2] * img.size[3] + 2 * i * img.size[3] + 2 * j + 1];}for (int k = 0; k < 3; ++k){dstData[(9 + k) * input->size[2] * input->size[3] + i * input->size[3] + j] =srcData[k * img.size[2] * img.size[3] + (2 * i + 1) * img.size[3] + 2 * j + 1];}}} }std::vector<cv::String> getOutputNames(const cv::dnn::Net& net) {static std::vector<cv::String> names;if (names.empty()){std::vector<int> outLayers = net.getUnconnectedOutLayers();std::vector<cv::String> layersNames = net.getLayerNames();names.resize(outLayers.size());for (size_t i = 0; i < outLayers.size(); i++){names[i] = layersNames[outLayers[i] - 1];}}return names; }void drawPred(int classId, float conf, int left, int top, int right, int bottom, cv::Mat& frame,const std::vector<std::string> &classes) {cv::rectangle(frame, cv::Point(left, top), cv::Point(right, bottom), cv::Scalar(0, 255, 0), 3);std::string label = cv::format("%.2f", conf);if (!classes.empty()) {CV_Assert(classId < (int)classes.size());label = classes[classId] + ": " + label;}int baseLine;cv::Size labelSize = cv::getTextSize(label, cv::FONT_HERSHEY_SIMPLEX, 0.5, 1, &baseLine);top = std::max(top, labelSize.height);cv::rectangle(frame, cv::Point(left, top - round(1.5 * labelSize.height)), cv::Point(left + round(1.5 * labelSize.width), top + baseLine), cv::Scalar(0, 255, 0), cv::FILLED);cv::putText(frame, label, cv::Point(left, top), cv::FONT_HERSHEY_SIMPLEX, 0.75, cv::Scalar(), 2); }void postprocess(cv::Mat& cv_src, std::vector<cv::Mat>& outs, const std::vector<std::string>& classes, int net_size) {float confThreshold = 0.4f;float nmsThreshold = 0.5f;std::vector<int> classIds;std::vector<float> confidences;std::vector<cv::Rect> boxes;int strides[] = { 8, 16, 32 };std::vector<std::vector<int> > anchors = {{ 10,13, 16,30, 33,23 },{ 30,61, 62,45, 59,119 },{ 116,90, 156,198, 373,326 }};for (size_t k = 0; k < outs.size(); k++){float* data = outs[k].ptr<float>();int stride = strides[k];int num_classes = outs[k].size[4] - 5;for (int i = 0; i < outs[k].size[2]; i++){for (int j = 0; j < outs[k].size[3]; j++){for (int a = 0; a < outs[k].size[1]; ++a){float* record = data + a * outs[k].size[2] * outs[k].size[3] * outs[k].size[4] +i * outs[k].size[3] * outs[k].size[4] + j * outs[k].size[4];float* cls_ptr = record + 5;for (int cls = 0; cls < num_classes; cls++) {float score = sigmoid(cls_ptr[cls]) * sigmoid(record[4]);if (score > confThreshold){float cx = (sigmoid(record[0]) * 2.f - 0.5f + (float)j) * (float)stride;float cy = (sigmoid(record[1]) * 2.f - 0.5f + (float)i) * (float)stride;float w = pow(sigmoid(record[2]) * 2.f, 2) * anchors[k][2 * a];float h = pow(sigmoid(record[3]) * 2.f, 2) * anchors[k][2 * a + 1];float x1 = std::max(0, std::min(cv_src.cols, int((cx - w / 2.f) * (float)cv_src.cols / (float)net_size)));float y1 = std::max(0, std::min(cv_src.rows, int((cy - h / 2.f) * (float)cv_src.rows / (float)net_size)));float x2 = std::max(0, std::min(cv_src.cols, int((cx + w / 2.f) * (float)cv_src.cols / (float)net_size)));float y2 = std::max(0, std::min(cv_src.rows, int((cy + h / 2.f) * (float)cv_src.rows / (float)net_size)));classIds.push_back(cls);confidences.push_back(score);boxes.push_back(cv::Rect(cv::Point(x1, y1), cv::Point(x2, y2)));}}}}}}std::vector<int> indices;cv::dnn::NMSBoxes(boxes, confidences, confThreshold, nmsThreshold, indices);for (size_t i = 0; i < indices.size(); i++) {int idx = indices[i];cv::Rect box = boxes[idx];drawPred(classIds[idx], confidences[idx], box.x, box.y,box.x + box.width, box.y + box.height, cv_src, classes);} }int main(int argc, char* argv[]) {std::string path = "images";std::vector<std::string> filenames;cv::glob(path, filenames, false);for (auto name : filenames){cv::Mat cv_src = cv::imread(name);if (cv_src.empty()){continue;}std::vector<std::string> class_names{ "ida","idb" };int net_size = 640;cv::Mat blob = cv::dnn::blobFromImage(cv_src, 1.0 / 255, cv::Size(net_size, net_size),cv::Scalar(0, 0, 0), true, false);cv::dnn::Net net = cv::dnn::readNet("model/ODID_DNN.onnx");const int sz[] = { 1, 12, net_size / 2, net_size / 2 };cv::Mat input = cv::Mat(4, sz, blob.type());sliceAndConcat(blob, &input);net.setInput(input);auto t0 = cv::getTickCount();std::vector<cv::Mat> outs;net.forward(outs, getOutputNames(net));postprocess(cv_src, outs, class_names, net_size);auto t1 = cv::getTickCount();std::cout << "elapsed time: " << (t1 - t0) * 1000.0 / cv::getTickFrequency() << "ms" << std::endl;imshow("img", cv_src);cv::waitKey();}return 0; }四、NCNN推理
NCNN是目前我用到過(guò)最好用,也是最容易白嫖的推理加速庫(kù),特別是在移動(dòng)端部署的時(shí)候,真的不能更好的了,在些萬(wàn)分感激nihui大佬的無(wú)私貢獻(xiàn)。這里用的是ncnn編好的ncnn-20210525-windows-vs2019這個(gè)版本。
關(guān)于yolov5 ncnn推理可以看nihui大佬的知乎。
1.模型簡(jiǎn)化
https://github.com/daquexian/onnx-simplifier
2 .onnx轉(zhuǎn)ncnn模型
-
onnx轉(zhuǎn)為 ncnn 模型,會(huì)輸出很多 Unsupported slice step!,這是focus模塊轉(zhuǎn)換的報(bào)錯(cuò).
-
Focus模塊在v5中是圖片進(jìn)入backbone前,對(duì)圖片進(jìn)行切片操作,具體操作是在一張圖片中每隔一個(gè)像素拿到一個(gè)值,類似于鄰近下采樣,這樣就拿到了四張圖片,四張圖片互補(bǔ),長(zhǎng)的差不多,但是沒(méi)有信息丟失,這樣一來(lái),將W、H信息就集中到了通道空間,輸入通道擴(kuò)充了4倍,即拼接起來(lái)的圖片相對(duì)于原先的RGB三通道模式變成了12個(gè)通道,最后將得到的新圖片再經(jīng)過(guò)卷積操作,最終得到了沒(méi)有信息丟失情況下的二倍下采樣特征圖。以yolov5s為例,原始的640 × 640 × 3的圖像輸入Focus結(jié)構(gòu),采用切片操作,先變成320 × 320 × 12的特征圖,再經(jīng)過(guò)一次卷積操作,最終變成320 × 320 × 64的特征圖。
-
yolov5 Focus模塊實(shí)現(xiàn)
對(duì)應(yīng)的模型結(jié)構(gòu):
Split splitncnn_input0 1 4 images images_splitncnn_0 images_splitncnn_1 images_splitncnn_2 images_splitncnn_3 Crop Slice_4 1 1 images_splitncnn_3 171 -23309=1,0 -23310=1,2147483647 -23311=1,1 Crop Slice_9 1 1 171 176 -23309=1,0 -23310=1,2147483647 -23311=1,2 Crop Slice_14 1 1 images_splitncnn_2 181 -23309=1,1 -23310=1,2147483647 -23311=1,1 Crop Slice_19 1 1 181 186 -23309=1,0 -23310=1,2147483647 -23311=1,2 Crop Slice_24 1 1 images_splitncnn_1 191 -23309=1,0 -23310=1,2147483647 -23311=1,1 Crop Slice_29 1 1 191 196 -23309=1,1 -23310=1,2147483647 -23311=1,2 Crop Slice_34 1 1 images_splitncnn_0 201 -23309=1,1 -23310=1,2147483647 -23311=1,1 Crop Slice_39 1 1 201 206 -23309=1,1 -23310=1,2147483647 -23311=1,2 Concat Concat_40 4 1 176 186 196 206 207 0=0可視化:
- Focus模塊的優(yōu)點(diǎn):
Focus的作用無(wú)非是使圖片在下采樣的過(guò)程中,不帶來(lái)信息丟失的情況下,將W、H的信息集中到通道上,再使用3 × 3的卷積對(duì)其進(jìn)行特征提取,使得特征提取得更加的充分。
3 . 替換Focus模塊
- 更改.param文件
更改前:
更改后:
Input images 0 1 images YoloV5Focus focus 1 1 images 207 Convolution Conv_41 1 1 207 208 0=32 1=3 11=3 2=1 12=1 3=1 13=1 4=1 14=1 15=1 16=1 5=1 6=34564.動(dòng)態(tài)尺寸推理更改
- 靜態(tài)尺寸推理:按長(zhǎng)邊縮放到 640xH 或 Wx640,padding 到 640x640 再檢測(cè),如果 H/W 比較小,會(huì)在 padding 上浪費(fèi)大量運(yùn)算。
- 動(dòng)態(tài)尺寸推理:按長(zhǎng)邊縮放到 640xH 或 Wx640,padding 到 640xH2 或 W2x640 再檢測(cè),其中 H2/W2 是 H/W 向上取32倍數(shù),計(jì)算量少,速度更快。
- yolov5支持動(dòng)態(tài)尺寸推理, 但這里Reshape 層把輸出grid數(shù)寫死了,不把這三個(gè)參數(shù)更改成-1的話,則檢測(cè)的時(shí)候會(huì)檢測(cè)不到目標(biāo)或者檢測(cè)到滿圖像都是框。
更改前:
更改后:
5.更改部層數(shù),改到跟當(dāng)前層數(shù)一樣大小。
6.轉(zhuǎn)成FP16模型
6.yolov5s模型輸出
anchor(先驗(yàn)框)的信息在 yolov5/models/yolov5s.yaml文件里,pytorch的后處理在 yolov5/models/yolo.py Detect類 forward函數(shù),要對(duì)著改成c++代碼。
模型有3個(gè)輸出blob,分別對(duì)應(yīng)于 stride 8/16/32 的輸出。
每個(gè)輸出shape的格式是WHC:
- w=n+5,對(duì)應(yīng)于bbox的dx,dy,dw,dh,bbox置信度,n種分類的置信度。
- h=6400,對(duì)應(yīng)于整個(gè)圖片里全部anchor的xy,這個(gè)1600是stride=8的情況,輸入640的圖片,寬高劃分為640/8=80塊,80x80即6400
- c=3,對(duì)應(yīng)于三種anchor。
7.NCNN推理代碼,動(dòng)態(tài)注冊(cè)了YoloV5Focus層。
#include "YoloV5Detect.h"class YoloV5Focus : public ncnn::Layer { public:YoloV5Focus(){one_blob_only = true;}virtual int forward(const ncnn::Mat& bottom_blob, ncnn::Mat& top_blob, const ncnn::Option& opt) const{int w = bottom_blob.w;int h = bottom_blob.h;int channels = bottom_blob.c;int outw = w / 2;int outh = h / 2;int outc = channels * 4;top_blob.create(outw, outh, outc, 4u, 1, opt.blob_allocator);if (top_blob.empty())return -100;#pragma omp parallel for num_threads(opt.num_threads)for (int p = 0; p < outc; p++){const float* ptr = bottom_blob.channel(p % channels).row((p / channels) % 2) + ((p / channels) / 2);float* outptr = top_blob.channel(p);for (int i = 0; i < outh; i++){for (int j = 0; j < outw; j++){*outptr = *ptr;outptr += 1;ptr += 2;}ptr += w;}}return 0;} };DEFINE_LAYER_CREATOR(YoloV5Focus)int initYolov5Net(std::string& param_path, std::string& bin_path, ncnn::Net& yolov5_net,bool use_gpu) {bool has_gpu = false;yolov5_net.clear();//CPU相關(guān)設(shè)置(只實(shí)現(xiàn)了安卓端)/// 0 = all cores enabled(default)/// 1 = only little clusters enabled/// 2 = only big clusters enabled//ncnn::set_cpu_powersave(2);//ncnn::set_omp_num_threads(ncnn::get_big_cpu_count()); #if NCNN_VULKANncnn::create_gpu_instance();has_gpu = ncnn::get_gpu_count() > 0; #endifyolov5_net.opt.use_vulkan_compute = (use_gpu && has_gpu);yolov5_net.opt.use_bf16_storage = true;//動(dòng)態(tài)注冊(cè)層yolov5_net.register_custom_layer("YoloV5Focus", YoloV5Focus_layer_creator);//讀取模型int rp = yolov5_net.load_param(param_path.c_str());int rb = yolov5_net.load_model(bin_path.c_str());if (rp < 0 || rb < 0){return -1;}return 0; }static inline float sigmoid(float x) {return static_cast<float>(1.f / (1.f + exp(-x))); }static void generateProposals(const ncnn::Mat& anchors, int stride, const ncnn::Mat& in_pad, const ncnn::Mat& feat_blob, float prob_threshold, std::vector<Object>& objects) {const int num_grid = feat_blob.h;int num_grid_x;int num_grid_y;if (in_pad.w > in_pad.h){num_grid_x = in_pad.w / stride;num_grid_y = num_grid / num_grid_x;}else{num_grid_y = in_pad.h / stride;num_grid_x = num_grid / num_grid_y;}const int num_class = feat_blob.w - 5;const int num_anchors = anchors.w / 2;for (int q = 0; q < num_anchors; q++){const float anchor_w = anchors[q * 2];const float anchor_h = anchors[q * 2 + 1];const ncnn::Mat feat = feat_blob.channel(q);for (int i = 0; i < num_grid_y; i++){for (int j = 0; j < num_grid_x; j++){const float* featptr = feat.row(i * num_grid_x + j);// find class index with max class scoreint class_index = 0;float class_score = -FLT_MAX;for (int k = 0; k < num_class; k++){float score = featptr[5 + k];if (score > class_score){class_index = k;class_score = score;}}float box_score = featptr[4];float confidence = sigmoid(box_score) * sigmoid(class_score);if (confidence >= prob_threshold){// yolov5/models/yolo.py Detect forward// y = x[i].sigmoid()// y[..., 0:2] = (y[..., 0:2] * 2. - 0.5 + self.grid[i].to(x[i].device)) * self.stride[i] # xy// y[..., 2:4] = (y[..., 2:4] * 2) ** 2 * self.anchor_grid[i] # whfloat dx = sigmoid(featptr[0]);float dy = sigmoid(featptr[1]);float dw = sigmoid(featptr[2]);float dh = sigmoid(featptr[3]);float pb_cx = (dx * 2.f - 0.5f + j) * stride;float pb_cy = (dy * 2.f - 0.5f + i) * stride;float pb_w = pow(dw * 2.f, 2) * anchor_w;float pb_h = pow(dh * 2.f, 2) * anchor_h;float x0 = pb_cx - pb_w * 0.5f;float y0 = pb_cy - pb_h * 0.5f;float x1 = pb_cx + pb_w * 0.5f;float y1 = pb_cy + pb_h * 0.5f;Object obj;obj.rect.x = x0;obj.rect.y = y0;obj.rect.width = x1 - x0;obj.rect.height = y1 - y0;obj.label = class_index;obj.prob = confidence;objects.push_back(obj);}}}} }static inline float intersectionArea(const Object& a, const Object& b) {cv::Rect_<float> inter = a.rect & b.rect;return inter.area(); }static void qsortDescentInplace(std::vector<Object>& faceobjects, int left, int right) {int i = left;int j = right;float p = faceobjects[(left + right) / 2].prob;while (i <= j){while (faceobjects[i].prob > p)i++;while (faceobjects[j].prob < p)j--;if (i <= j){// swapstd::swap(faceobjects[i], faceobjects[j]);i++;j--;}}#pragma omp parallel sections{ #pragma omp section{if (left < j) qsortDescentInplace(faceobjects, left, j);} #pragma omp section{if (i < right) qsortDescentInplace(faceobjects, i, right);}} }static void qsortDescentInplace(std::vector<Object>& faceobjects) {if (faceobjects.empty())return;qsortDescentInplace(faceobjects, 0, faceobjects.size() - 1); }static void nmsSortedBboxes(const std::vector<Object>& faceobjects, std::vector<int>& picked, float nms_threshold) {picked.clear();const int n = faceobjects.size();std::vector<float> areas(n);for (int i = 0; i < n; i++){areas[i] = faceobjects[i].rect.area();}for (int i = 0; i < n; i++){const Object& a = faceobjects[i];int keep = 1;for (int j = 0; j < (int)picked.size(); j++){const Object& b = faceobjects[picked[j]];// intersection over unionfloat inter_area = intersectionArea(a, b);float union_area = areas[i] + areas[picked[j]] - inter_area;// float IoU = inter_area / union_areaif (inter_area / union_area > nms_threshold)keep = 0;}if (keep){picked.push_back(i);}} }int targetDetection(cv::Mat& cv_src, ncnn::Net& yolov5_net, std::vector<Object>& objects, int target_size,float prob_threshold, float nms_threshold) {int w = cv_src.cols, h = cv_src.rows;float scale = 1.0f;if (w > h){scale = (float)target_size / (float)w;w = target_size;h = h * scale;}else{scale = (float)target_size / (float)h;h = target_size;w = w * scale;}ncnn::Mat ncnn_in = ncnn::Mat::from_pixels_resize(cv_src.data, ncnn::Mat::PIXEL_BGR2RGB, cv_src.cols, cv_src.rows, w, h);//邊緣擴(kuò)展檢測(cè)的尺寸//源碼在 yolov5/utils/datasets.py letterbox方法int wpad = (w + 31) / 32 * 32 - w;int hpad = (h + 31) / 32 * 32 - h;ncnn::Mat in_pad;ncnn::copy_make_border(ncnn_in, in_pad, hpad / 2, hpad - hpad / 2, wpad / 2, wpad - wpad / 2, ncnn::BORDER_CONSTANT, 114.f);const float norm_vals[3] = { 1 / 255.f, 1 / 255.f, 1 / 255.f };in_pad.substract_mean_normalize(0, norm_vals);//創(chuàng)建一個(gè)提取器ncnn::Extractor ex = yolov5_net.create_extractor();ex.input("images", in_pad);std::vector<Object> proposals;//stride 8{ncnn::Mat out;ex.extract("750", out);ncnn::Mat anchors(6);anchors[0] = 10.f;anchors[1] = 13.f;anchors[2] = 16.f;anchors[3] = 30.f;anchors[4] = 33.f;anchors[5] = 23.f;std::vector<Object> objects8;generateProposals(anchors, 8, in_pad, out, prob_threshold, objects8);proposals.insert(proposals.end(), objects8.begin(), objects8.end());} stride 16{ncnn::Mat out;ex.extract("771", out);ncnn::Mat anchors(6);anchors[0] = 30.f;anchors[1] = 61.f;anchors[2] = 62.f;anchors[3] = 45.f;anchors[4] = 59.f;anchors[5] = 119.f;std::vector<Object> objects16;generateProposals(anchors, 16, in_pad, out, prob_threshold, objects16);proposals.insert(proposals.end(), objects16.begin(), objects16.end());}// stride 32{ncnn::Mat out;ex.extract("791", out);ncnn::Mat anchors(6);anchors[0] = 116.f;anchors[1] = 90.f;anchors[2] = 156.f;anchors[3] = 198.f;anchors[4] = 373.f;anchors[5] = 326.f;std::vector<Object> objects32;generateProposals(anchors, 32, in_pad, out, prob_threshold, objects32);proposals.insert(proposals.end(), objects32.begin(), objects32.end());}// sort all proposals by score from highest to lowestqsortDescentInplace(proposals);// apply nms with nms_thresholdstd::vector<int> picked;nmsSortedBboxes(proposals, picked, nms_threshold);int count = picked.size();objects.resize(count);for (int i = 0; i < count; i++){objects[i] = proposals[picked[i]];// adjust offset to original unpaddedfloat x0 = (objects[i].rect.x - (wpad / 2)) / scale;float y0 = (objects[i].rect.y - (hpad / 2)) / scale;float x1 = (objects[i].rect.x + objects[i].rect.width - (wpad / 2)) / scale;float y1 = (objects[i].rect.y + objects[i].rect.height - (hpad / 2)) / scale;// clipx0 = std::max(std::min(x0, (float)(cv_src.cols - 1)), 0.f);y0 = std::max(std::min(y0, (float)(cv_src.rows - 1)), 0.f);x1 = std::max(std::min(x1, (float)(cv_src.cols - 1)), 0.f);y1 = std::max(std::min(y1, (float)(cv_src.rows - 1)), 0.f);objects[i].rect.x = x0;objects[i].rect.y = y0;objects[i].rect.width = x1 - x0;objects[i].rect.height = y1 - y0;}return 0; }void drawObjects(const cv::Mat& cv_src, const std::vector<Object>& objects,std::vector<std::string> & class_names) {cv::Mat cv_detect = cv_src.clone();for (size_t i = 0; i < objects.size(); i++){const Object& obj = objects[i];std::cout << "Object label:" << obj.label << " Object prod:" << obj.prob<<" Object rect" << obj.rect << std::endl;cv::rectangle(cv_detect, obj.rect, cv::Scalar(255, 0, 0));std::string text = class_names[obj.label] + " " +std::to_string(int(obj.prob * 100)) +"%";int baseLine = 0;cv::Size label_size = cv::getTextSize(text, cv::FONT_HERSHEY_SIMPLEX, 0.5, 1, &baseLine);int x = obj.rect.x;int y = obj.rect.y - label_size.height - baseLine;if (y < 0)y = 0;if (x + label_size.width > cv_detect.cols)x = cv_detect.cols - label_size.width;cv::rectangle(cv_detect, cv::Rect(cv::Point(x, y), cv::Size(label_size.width, label_size.height + baseLine)),cv::Scalar(255, 255, 255), -1);cv::putText(cv_detect, text, cv::Point(x, y + label_size.height),cv::FONT_HERSHEY_SIMPLEX, 0.5, cv::Scalar(0, 0, 0));}cv::imshow("image", cv_detect);}int main(void) {std::string parma_path = "models/ODIDF16.param";std::string bin_parh = "models/ODIDF16.bin";ncnn::Net yolov5_net;initYolov5Net(parma_path,bin_parh,yolov5_net,true);std::vector<std::string> class_names{ "ida", "idb", "idback", "idhead" };std::string path = "images";std::vector<std::string> filenames;cv::glob(path, filenames, false);for (auto name : filenames){cv::Mat cv_src = cv::imread(name);if (cv_src.empty()){continue;}std::vector<Object> objects;double start = static_cast<double>(cv::getTickCount());targetDetection(cv_src, yolov5_net, objects);double time = ((double)cv::getTickCount() - start) / cv::getTickFrequency();std::cout << name <<"Detection time:" << time << "(second) " << std::endl;drawObjects(cv_src, objects, class_names);cv::waitKey();}return 0; }五、 編譯NCNN
1.依賴庫(kù):
- protobuf-3.4.0
下載地址:https://github.com/google/protobuf/archive/v3.4.0.zip
打開(kāi)VS2017或者VS2019本機(jī)工具命令,切到源碼目錄
-
Vulkan
https://vulkan.lunarg.com/sdk/home
版本:VulkanSDK-1.2.141.2
直接點(diǎn)擊安裝,之后驗(yàn)證是否安裝成功,運(yùn)行C:\VulkanSDK\1.2.141.2\Bin\vkcube.exe,出現(xiàn)下面圖像代表安裝成功。
-
glfw
https://www.glfw.org/
把glfw-3.3.2.bin.WIN64復(fù)制到VulkanSDK\1.2.141.2\Third-Party -
GLM
https://github.com/g-truc/glm/
把GLM復(fù)制到VulkanSDK\1.2.141.2\Third-Party -
添加系統(tǒng)路徑
2.NCNN增加自定義層
在代碼里面注冊(cè)自定義層時(shí),用ncnn2mem轉(zhuǎn)換模型之后在移動(dòng)端推理時(shí)會(huì)報(bào)讀入模型錯(cuò)誤的問(wèn)題,ncnn2mem之后的模型是以.h方式全部讀入到內(nèi)存,內(nèi)存方式注冊(cè)自定義層的時(shí)候,要用 TYPEINDEX 枚舉,這里可參考ncnn的增加自定義層。之前用的ncnn庫(kù)都是下載編譯好的庫(kù),要增加自定義則要git源碼進(jìn)行重新編譯。
2.1 添加自己定義層。 -
git 源碼
- 在ncnn定義源碼添加.h文件:src/layer/YoloV5Focus.h
YoloV5Focus.h
- 在ncnn定義源碼添加.cpp文件:src/layer/YoloV5Focus.cpp
YoloV5Focus.cpp
- 修改 src/CMakeLists.txt 注冊(cè) layer/YoloV5Focus
- win下OP的名字是大小寫不分的,但在別的系統(tǒng)或者移動(dòng)端要注意層名稱的大小寫問(wèn)題。
- 編譯ncnn
打開(kāi)VS2017或者VS2019本機(jī)工具命令,切到源碼目錄
2.使用添加自己定義層的NCNN庫(kù)的話,上面的推理代碼就可以不用動(dòng)態(tài)注冊(cè)層的那部分
class YoloV5Focus : public ncnn::Layer { public:YoloV5Focus(){one_blob_only = true;}virtual int forward(const ncnn::Mat& bottom_blob, ncnn::Mat& top_blob, const ncnn::Option& opt) const{int w = bottom_blob.w;int h = bottom_blob.h;int channels = bottom_blob.c;int outw = w / 2;int outh = h / 2;int outc = channels * 4;top_blob.create(outw, outh, outc, 4u, 1, opt.blob_allocator);if (top_blob.empty())return -100;#pragma omp parallel for num_threads(opt.num_threads)for (int p = 0; p < outc; p++){const float* ptr = bottom_blob.channel(p % channels).row((p / channels) % 2) + ((p / channels) / 2);float* outptr = top_blob.channel(p);for (int i = 0; i < outh; i++){for (int j = 0; j < outw; j++){*outptr = *ptr;outptr += 1;ptr += 2;}ptr += w;}}return 0;} };DEFINE_LAYER_CREATOR(YoloV5Focus)//動(dòng)態(tài)注冊(cè)層 yolov5_net.register_custom_layer("YoloV5Focus", YoloV5Focus_layer_creator);六、NCNN Int8量化模型
1.優(yōu)化模型
./ncnnoptimize yolov5.param yolov5.bin yolov5-opt.param yolov5-opt.bin 02.生成校準(zhǔn)表
./ncnn2table yolov5s-opt.param yolov5s-opt.bin imagelist.txt yolov5s-opt.table mean=[0,0,0] norm=[0.0039215,0.0039215,0.0039215] shape=[416,416,3] pixel=BGR thread=8 method=kl3.int8量化模型
./ncnn2int8 yolov5s-opt.param yolov5s-opt.bin yolov5s-int8.param yolov5s-int8.bin yolov5s.table4.int 8量化過(guò)的模型在移動(dòng)端和一些邊緣設(shè)備上的速度有明顯的提升,但精度有少許下降。
《新程序員》:云原生和全面數(shù)字化實(shí)踐50位技術(shù)專家共同創(chuàng)作,文字、視頻、音頻交互閱讀總結(jié)
以上是生活随笔為你收集整理的深度学习目标检测(YoloV5)项目——从0开始到项目落地部署的全部?jī)?nèi)容,希望文章能夠幫你解決所遇到的問(wèn)題。
- 上一篇: Android NDK开发——人脸检测与
- 下一篇: javascript实战项目——网页版贪