DL之SSD:SSD算法的简介(论文介绍)、架构详解、案例应用等配图集合之详细攻略
DL之SSD:SSD算法的簡(jiǎn)介(論文介紹)、架構(gòu)詳解、案例應(yīng)用等配圖集合之詳細(xì)攻略
?
?
?
目錄
SSD算法的簡(jiǎn)介(論文介紹)
0、SSD實(shí)驗(yàn)結(jié)果
1、架構(gòu)圖集合
2、SSD VS Yolo
SSD算法的架構(gòu)詳解
SSD算法的案例應(yīng)用
?
?
?
?
?
相關(guān)文章
DL之SSD:SSD算法的簡(jiǎn)介(論文介紹)、架構(gòu)詳解、案例應(yīng)用等配圖集合之詳細(xì)攻略
DL之SSD:SSD算法的架構(gòu)詳解
SSD算法的簡(jiǎn)介(論文介紹)
? ? ? ?SSD:,即Single shot multiboxdetector,單步驟多盒探測(cè)器。
Abstract
? ? ? ?We present a method for detecting objects in images using a single ?deep neural network. Our approach, named SSD, discretizes the output space of ?bounding boxes into a set of default boxes over different aspect ratios and scales ?per feature map location. At prediction time, the network generates scores for the ?presence of each object category in each default box and produces adjustments to ?the box to better match the object shape. Additionally, the network combines predictions ?from multiple feature maps with different resolutions to naturally handle ?objects of various sizes. SSD is simple relative to methods that require object ?proposals because it completely eliminates proposal generation and subsequent ?pixel or feature resampling stages and encapsulates all computation in a single ?network. This makes SSD easy to train and straightforward to integrate into systems ?that require a detection component.
? ? ? ??Experimental results on the PASCAL ?VOC, COCO, and ILSVRC datasets confirm that SSD has competitive accuracy ?to methods that utilize an additional object proposal step and is much faster, while ?providing a unified framework for both training and inference. For 300 × 300 input, ?SSD achieves 74.3% mAP1 ?on VOC2007 test at 59 FPS on a Nvidia Titan ?X and for 512 × 512 input, SSD achieves 76.9% mAP, outperforming a comparable ?state-of-the-art Faster R-CNN model. Compared to other single stage methods, ?SSD has much better accuracy even with a smaller input image size. Code is ?available at: https://github.com/weiliu89/caffe/tree/ssd .
? ? ? ?本論文提出了一種利用單個(gè)深度神經(jīng)網(wǎng)絡(luò)對(duì)圖像中目標(biāo)進(jìn)行檢測(cè)的方法。我們的方法名為SSD,它將邊界框的輸出空間離散為一組默認(rèn)框,每個(gè)特征映射位置具有不同的縱橫比和比例。在預(yù)測(cè)時(shí),網(wǎng)絡(luò)為每個(gè)默認(rèn)框中每個(gè)對(duì)象類別的存在生成評(píng)分,并對(duì)該框進(jìn)行調(diào)整以更好地匹配對(duì)象形狀。此外,該網(wǎng)絡(luò)結(jié)合了來(lái)自具有不同分辨率的多個(gè)特征圖的預(yù)測(cè),以自然地處理不同大小的對(duì)象。相對(duì)于需要對(duì)象建議的方法,SSD比較簡(jiǎn)單,因?yàn)?span style="color:#f33b45;">它完全消除了建議生成和隨后的像素或特征重采樣階段,并將所有計(jì)算封裝在一個(gè)網(wǎng)絡(luò)中。這使得SSD易于訓(xùn)練,并且易于集成到需要檢測(cè)組件的系統(tǒng)中。
? ? ? ??在PASCAL VOC、COCO和ILSVRC數(shù)據(jù)集上的實(shí)驗(yàn)結(jié)果證實(shí),相對(duì)于使用附加對(duì)象建議步驟的方法,SSD具有競(jìng)爭(zhēng)力的準(zhǔn)確性,而且速度更快,同時(shí)為訓(xùn)練和推理提供了統(tǒng)一的框架。對(duì)于300×300輸入,SSD在Nvidia Titan X上以59幀每秒的速度在VOC2007測(cè)試中實(shí)現(xiàn)了74.3%的mAP,對(duì)于512×512輸入,SSD實(shí)現(xiàn)了76.9%的mAP,超過了同類的最先進(jìn)的更快的R-CNN模型。與其他單級(jí)方法相比,即使在較小的輸入圖像尺寸下,SSD也具有更高的精度。代碼如下:https://github.com/weiliu89/ /tree/ssd。
Conclusions
? ? ? ?This paper introduces SSD, a fast single-shot object detector for multiple categories. A ?key feature of our model is the use of multi-scale convolutional bounding box outputs ?attached to multiple feature maps at the top of the network. This representation allows ?us to efficiently model the space of possible box shapes. We experimentally validate ?that given appropriate training strategies, a larger number of carefully chosen default ?bounding boxes results in improved performance. We build SSD models with at least an ?order of magnitude more box predictions sampling location, scale, and aspect ratio, than ?existing methods [5,7]. We demonstrate that given the same VGG-16 base architecture, ?SSD compares favorably to its state-of-the-art object detector counterparts in terms of ?both accuracy and speed. Our SSD512 model significantly outperforms the state-of-theart ?Faster R-CNN [2] in terms of accuracy on PASCAL VOC and COCO, while being ?3× faster. Our real time SSD300 model runs at 59 FPS, which is faster than the current ?real time YOLO [5] alternative, while producing markedly superior detection accuracy. ?
? ? ? ?本文介紹了一種單shot 多類別快速目標(biāo)檢測(cè)系統(tǒng)SSD。我們模型的一個(gè)關(guān)鍵特性是使用多尺度卷積邊界框輸出,附加到網(wǎng)絡(luò)頂部的多個(gè)特征映射上。這種表示使我們能夠有效地為可能的盒子形狀的空間建模。我們通過實(shí)驗(yàn)驗(yàn)證,在給定適當(dāng)?shù)挠?xùn)練策略下,大量精心選擇的缺省邊界框可以提高性能。與現(xiàn)有方法相比,我們構(gòu)建的SSD模型具有至少一個(gè)數(shù)量級(jí)的盒預(yù)測(cè)采樣位置、尺度和縱橫比[5,7]。我們證明,給定相同的VGG-16基礎(chǔ)架構(gòu),SSD在精度和速度方面都優(yōu)于其最先進(jìn)的對(duì)象檢測(cè)器。我們的SSD512模型在PASCAL VOC和COCO上的精度明顯優(yōu)于目前最先進(jìn)的R-CNN[2],同時(shí)速度提高了3倍。我們的實(shí)時(shí)SSD300模型以59幀每秒的速度運(yùn)行,這比當(dāng)前的實(shí)時(shí)YOLO[5]替代方案更快,同時(shí)產(chǎn)生明顯優(yōu)越的檢測(cè)精度。
? ? ? ?Apart from its standalone utility, we believe that our monolithic and relatively simple ?SSD model provides a useful building block for larger systems that employ an object ?detection component. A promising future direction is to explore its use as part of a system ?using recurrent neural networks to detect and track objects in video simultaneously.
? ? ? ?除了它的獨(dú)立實(shí)用程序之外,我們相信我們的統(tǒng)一的且相對(duì)簡(jiǎn)單的SSD模型為使用對(duì)象檢測(cè)組件的大型系統(tǒng)提供了一個(gè)有用的構(gòu)建塊。一個(gè)很有前途的未來(lái)方向是探索它作為一個(gè)系統(tǒng)的一部分,使用遞歸神經(jīng)網(wǎng)絡(luò)同時(shí)檢測(cè)和跟蹤視頻對(duì)象。
論文
Wei Liu, Dragomir Anguelov, Dumitru Erhan, Christian Szegedy, Scott Reed, Cheng-Yang Fu, Alexander C. Berg. SSD: Single shot multiboxdetector. ECCV 2016
https://arxiv.org/abs/1512.02325
論文地址:https://arxiv.org/pdf/1512.02325v5.pdf
?
0、SSD實(shí)驗(yàn)結(jié)果
Training: VOC2007 trainvaland VOC2012 trainval(16551 images)
Testing: VOC2007 test (4952 images)
1、單步驟和兩步驟在VOC2007數(shù)據(jù)集上比較
? ? ?兩個(gè)模型SSD300、SSD512分別可達(dá)到77%mAP且每秒46幀、80%mAP且每秒19幀。
? ? ?對(duì)比Yolov1,SDD不論是速度還是精度上,都超過!對(duì)比兩階段模型,比如FasterR-CNN,也超過!
2、SSD500模型——PASCAL VOC2007 test detection results
? ? ? ?Here is the accuracy comparison for different methods. For SSD, it uses image size of 300 ×300 or 512 ×512.這是不同方法的精度比較。對(duì)于SSD,它使用的圖像大小為300×300或512×512。
? ? ? ?The model is trained using SGD with initial learning rate 0.001, 0.9 momentum, 0.0005 weight decay, and batch size 32.
Using a Nvidia Titan X on VOC2007 test, SSD achieves 59 FPS with mAP74.3% on VOC2007 test, vs. Faster R-CNN 7 FPS with mAP73.2% or YOLO 45 FPS with mAP63.4%.
? ? ? ?模型采用SGD進(jìn)行訓(xùn)練,初始學(xué)習(xí)率0.001,動(dòng)量0.9,重量衰減0.0005,批量大小32。在VOC2007測(cè)試中使用Nvidia Titan X, SSD在VOC2007測(cè)試中使用mAP74.3%實(shí)現(xiàn)59幀/秒,而更快的R-CNN 7幀/秒使用mAP73.2%或YOLO 45幀/秒使用mAP63.4%。
? ? ? ? ?Fast 和Faster R-CNN都使用最小尺寸為600的輸入圖像。兩種SSD模型具有完全相同的設(shè)置,除了它們具有不同的輸入尺寸(300×300與512×512)。很明顯,更大的輸入尺寸可以帶來(lái)更好的結(jié)果,而更多的數(shù)據(jù)總是有幫助的。
? ? ? ? 圖表可知,采用【07+12】組合數(shù)據(jù)集可得到76.8mAP,而采用【07+12+COCO】組合,性能最好,為81.6mAP!
注:
Data: ”07”: VOC2007 trainval:采用07年數(shù)據(jù)集
”07+12”: union of VOC2007 and VOC2012 trainval:采用07年和12年的數(shù)據(jù)集
”07+12+COCO”: first train on COCO trainval35k then fine-tune on 07+12:采用COCO數(shù)據(jù)上訓(xùn)練+07年和12年數(shù)據(jù)集上微調(diào)
?
3、檢測(cè)速度(幀每秒為單位)
This is the recap of the speed performance in frame per second
? ? ? ?Pascal VOC2007測(cè)試結(jié)果。SSD300是唯一可實(shí)現(xiàn)70%以上mAP的實(shí)時(shí)檢測(cè)方法。通過使用更大的輸入圖像,SSD512在保持接近實(shí)時(shí)速度的同時(shí)優(yōu)于所有精確度方法。
?
4、SSD512模型——COCO test-dev檢測(cè)實(shí)例
Detection examples on COCO test-dev with SSD512 model
?
?
1、架構(gòu)圖集合
?
2、SSD VS Yolo
?
?
SSD算法的架構(gòu)詳解
更新……
DL之SSD:SSD算法的架構(gòu)詳解
?
?
SSD算法的案例應(yīng)用
更新……
?
?
?
總結(jié)
以上是生活随笔為你收集整理的DL之SSD:SSD算法的简介(论文介绍)、架构详解、案例应用等配图集合之详细攻略的全部?jī)?nèi)容,希望文章能夠幫你解決所遇到的問題。
- 上一篇: Ubuntu之make:make命令行工
- 下一篇: Py之pydotplus:pydotpl