當(dāng)前位置：首頁(yè) > 编程资源 > 编程问答 >内容正文

编程问答

简述yolo1-yolo3_使用YOLO框架进行对象检测的综合指南-第一部分

發(fā)布時(shí)間：2023/11/29 编程问答 21 豆豆

生活随笔收集整理的這篇文章主要介紹了简述yolo1-yolo3_使用YOLO框架进行对象检测的综合指南-第一部分小編覺(jué)得挺不錯(cuò)的,現(xiàn)在分享給大家,幫大家做個(gè)參考.

簡(jiǎn)述yolo1-yolo3

重點(diǎn) (Top highlight)

目錄： (Table Of Contents:)

Introduction
介紹
Why YOLO?
為什么選擇YOLO？
How does it work?
它是如何工作的？
Intersection over Union (IoU)
聯(lián)合路口(IoU)
Non-max suppression
非最大抑制
Network Architecture
網(wǎng)絡(luò)架構(gòu)
Training
訓(xùn)練
Limitation of YOLO
YOLO的局限性
Conclusion
結(jié)論

介紹： (Introduction:)

You Only Look Once (YOLO) is a new and faster approach to object detection. Traditional systems repurpose classifiers to perform detection. Basically, to detect any object, the system takes a classifier for that object and then classifies its presence at various locations in the image. Other systems generate potential bounding boxes in an image using region proposal methods and then run a classifier on these potential boxes. This results in a slightly efficient method. After classification, post-processing is used to refine the bounding boxes, eliminate duplicate detection, etc. Due to these complexities, the system becomes slow and hard to optimize because each component has to be trained separately.

“只看一次”(YOLO)是一種新的且更快的對(duì)象檢測(cè)方法。傳統(tǒng)系統(tǒng)重新利用分類器來(lái)執(zhí)行檢測(cè)。基本上，要檢測(cè)任何物體，系統(tǒng)會(huì)對(duì)該物體進(jìn)行分類，然后將其在圖像中各個(gè)位置的存在進(jìn)行分類。其他系統(tǒng)使用區(qū)域提議方法在圖像中生成潛在的邊界框，然后在這些潛在的框上運(yùn)行分類器。這導(dǎo)致一種稍微有效的方法。分類后，使用后處理來(lái)完善邊界框，消除重復(fù)檢測(cè)等。由于這些復(fù)雜性，系統(tǒng)變得緩慢且難以優(yōu)化，因?yàn)槊總€(gè)組件都必須單獨(dú)訓(xùn)練。

Object Detection with Confidence Score置信度分?jǐn)?shù)的目標(biāo)檢測(cè)

為什么選擇YOLO？ (Why YOLO?)

The base model can process images in real-time at 45 frames per second. A smaller version of the network, Fast YOLO can process images at 155 frames per second while achieving double the mAP of other real-time detectors. It outperforms other detection methods, including DPM (Deformable Parts Models) and R-CNN.

基本模型可以每秒45幀的速度實(shí)時(shí)處理圖像。 Fast YOLO是網(wǎng)絡(luò)的較小版本，可以每秒155幀的速度處理圖像，同時(shí)使其他實(shí)時(shí)檢測(cè)器的mAP達(dá)到兩倍。它優(yōu)于其他檢測(cè)方法，包括DPM(可變形零件模型)和R-CNN。

它是如何工作的？ (How Does It Work?)

YOLO reframes object detection as a single regression problem instead of a classification problem. This system only looks at the image once to detect what objects are present and where they are, hence the name YOLO.

YOLO將對(duì)象檢測(cè)重新構(gòu)造為單個(gè)回歸問(wèn)題，而不是分類問(wèn)題。該系統(tǒng)僅查看圖像一次即可檢測(cè)出存在的物體及其位置，因此命名為YOLO。

The system divides the image into an S x S grid. Each of these grid cells predicts B bounding boxes and confidence scores for these boxes. The confidence score indicates how sure the model is that the box contains an object and also how accurate it thinks the box is that predicts. The confidence score can be calculated using the formula:

系統(tǒng)將圖像劃分為S x S網(wǎng)格。這些網(wǎng)格單元中的每一個(gè)都預(yù)測(cè)B邊界框和這些框的置信度得分。置信度得分表明模型對(duì)盒子是否包含對(duì)象的確信程度，以及模型認(rèn)為盒子預(yù)測(cè)的準(zhǔn)確性。置信度得分可以使用以下公式計(jì)算：

C = Pr(object) * IoU

C = Pr(對(duì)象)* IoU

IoU: Intersection over Union between the predicted box and the ground truth.

IoU：預(yù)測(cè)框與地面實(shí)況之間的并集交集。

If no object exists in a cell, its confidence score should be zero.

如果單元格中不存在任何對(duì)象，則其置信度得分應(yīng)為零。

Bounding Box Predictions (Source: Author)邊界框預(yù)測(cè)(來(lái)源：作者)

Each bounding box consists of five predictions: x, y, w, h, and confidence where,

每個(gè)邊界框均包含五個(gè)預(yù)測(cè)： x，y，w，h和置信度，其中，

(x,y): Coordinates representing the center of the box. These coordinates are calculated with respect to the bounds of the grid cells.

(x，y)：表示框中心的坐標(biāo)。這些坐標(biāo)是相對(duì)于網(wǎng)格單元的邊界計(jì)算的。

w: Width of the bounding box.

w：邊框的寬度。

h: Height of the bounding box.

h：邊框的高度。

Each grid cell also predicts C conditional class probabilities Pr(Classi|Object). It only predicts one set of class probabilities per grid cell, regardless of the number of boxes B. During testing, these conditional class probabilities are multiplied by individual box confidence predictions which give class-specific confidence scores for each box. These scores show both the probability of that class and how well the box fits the object.

每個(gè)網(wǎng)格單元還預(yù)測(cè)C個(gè)條件類別概率Pr(Classi | Object) 。不管框B的數(shù)量如何，它僅預(yù)測(cè)每個(gè)網(wǎng)格單元的一組類別概率。在測(cè)試期間，這些條件類別概率乘以各個(gè)框的置信度預(yù)測(cè)，從而為每個(gè)框提供特定于類別的置信度得分。這些分?jǐn)?shù)既顯示了該類別的可能性，也顯示了盒子適合對(duì)象的程度。

Pr(Class i|Object)*Pr(Object)*IoU = Pr(Class i)*IoU.

Pr(類i |對(duì)象)* Pr(對(duì)象)* IoU = Pr(類i)* IoU。

The final predictions are encoded as an S x S x (B*5 + C) tensor.

最終預(yù)測(cè)被編碼為S x S x(B * 5 + C)張量。

聯(lián)合路口(IoU)： (Intersection Over Union (IoU):)

IoU is used to evaluate the object detection algorithm. It is the overlap between the ground truth and the predicted bounding box, i.e it calculates how similar the predicted box is with respect to the ground truth.

IoU用于評(píng)估對(duì)象檢測(cè)算法。它是基本事實(shí)和預(yù)測(cè)邊界框之間的重疊，即，它計(jì)算了預(yù)測(cè)框相對(duì)于基本事實(shí)的相似程度。

Demonstration of IoU (Edited by Author)IoU演示(作者編輯)

Usually, the threshold for IoU is kept as greater than 0.5. Although many researchers apply a much more stringent threshold like 0.6 or 0.7. If a bounding box has an IoU less than the specified threshold, that bounding box is not taken into consideration.

通常，IoU的閾值保持大于0.5。盡管許多研究人員采用了更為嚴(yán)格的閾值，例如0.6或0.7。如果邊界框的IoU小于指定的閾值，則不考慮該邊界框。

非最大抑制： (Non-Max Suppression:)

The algorithm may find multiple detections of the same object. Non-max suppression is a technique by which the algorithm detects the object only once. Consider an example where the algorithm detected three bounding boxes for the same object. The boxes with respective probabilities are shown in the image below.

該算法可以找到同一物體的多個(gè)檢測(cè)。非最大抑制是一種算法，算法僅將對(duì)象檢測(cè)一次。考慮一個(gè)示例，該算法檢測(cè)到同一對(duì)象的三個(gè)邊界框。下圖顯示了具有相應(yīng)概率的框。

Multiple Bounding Boxes Of the Same Object (Edited by Author)同一對(duì)象的多個(gè)邊界框(作者編輯)

The probabilities of the boxes are 0.7, 0.9, and 0.6 respectively. To remove the duplicates, we are first going to select the box with the highest probability and output that as a prediction. Then eliminate any bounding box with IoU > 0.5 (or any threshold value) with the predicted output. The result will be:

框的概率分別為0.7、0.9和0.6。要?jiǎng)h除重復(fù)項(xiàng)，我們首先選擇具有最高概率的框，然后將其輸出作為預(yù)測(cè)。然后，用預(yù)測(cè)輸出消除IoU> 0.5(或任何閾值)的任何邊界框。結(jié)果將是：

Bounding Box Selected After Non-Max Suppression (Edited by Author)非最大抑制后選擇的邊界框(作者編輯)

網(wǎng)絡(luò)架構(gòu)： (Network Architecture:)

The base model has 24 convolutional layers followed by 2 fully connected layers. It uses 1 x 1 reduction layers followed by a 3 x 3 convolutional layer. Fast YOLO uses a neural network with 9 convolutional layers and fewer filters in those layers. The complete network is shown in the figure.

基本模型具有24個(gè)卷積層，然后是2個(gè)完全連接的層。它使用1 x 1縮小層，然后是3 x 3卷積層。 Fast YOLO使用具有9個(gè)卷積層和較少層過(guò)濾器的神經(jīng)網(wǎng)絡(luò)。完整的網(wǎng)絡(luò)如圖所示。

Source)源 )

Note:

注意：

The architecture was designed for use in the Pascal VOC dataset, where S = 7, B = 2, and C = 20. This is the reason why final feature maps are 7 x 7, and also the output tensor is of the shape (7 x 7 x (2*5 + 20)). To use this network with a different number of classes or different grid size you might have to tune the layer dimensions.
該體系結(jié)構(gòu)設(shè)計(jì)用于Pascal VOC數(shù)據(jù)集，其中S = 7，B = 2和C =20。這就是為什么最終特征圖為7 x 7以及輸出張量為(7 x 7 x(2 * 5 + 20)。若要將此網(wǎng)絡(luò)用于不同數(shù)量的類或不同的網(wǎng)格尺寸，則可能必須調(diào)整圖層尺寸。
The final layer uses a linear activation function. The rest uses a leaky ReLU.
最后一層使用線性激活函數(shù)。其余使用泄漏的ReLU。

訓(xùn)練： (Training:)

Pre train the first 20 convolutional layers on the ImageNet 1000-class competition dataset followed by average — pooling layer and a fully connected layer.
在ImageNet 1000類競(jìng)賽數(shù)據(jù)集上訓(xùn)練前20個(gè)卷積層，然后進(jìn)行平均-池化層和完全連接的層。
Since detection requires better visual information, increase the input resolution from 224 x 224 to 448 x 448.
由于檢測(cè)需要更好的視覺(jué)信息，因此將輸入分辨率從224 x 224增加到448 x 448。
Train the network for 135 epochs. Throughout the training, use a batch size of 64, a momentum of 0.9, and a decay of 0.0005.
訓(xùn)練網(wǎng)絡(luò)135個(gè)紀(jì)元。在整個(gè)訓(xùn)練過(guò)程中，請(qǐng)使用64的批量大小，0.9的動(dòng)量和0.0005的衰減。
Learning Rate: For first epochs raise the learning rate from 10–3 to 10–2, else the model diverges due to unstable gradients. Continue training with 10–2 for 75 epochs, then 10–3 for 30 epochs, and then 10–4 for 30 epochs.
學(xué)習(xí)率：首先，將學(xué)習(xí)率從10–3提高到10–2，否則模型由于不穩(wěn)定的梯度而發(fā)散。繼續(xù)訓(xùn)練10–2代表75個(gè)時(shí)期，然后10–3代表30個(gè)時(shí)期，然后10–4代表30個(gè)時(shí)期。
To avoid overfitting, use dropout and data augmentation.
為避免過(guò)度擬合，請(qǐng)使用輟學(xué)和數(shù)據(jù)擴(kuò)充。

YOLO的局限性： (Limitations Of YOLO:)

Spatial constraints on bounding box predictions as each grid cell only predicts two boxes and can have only one class.
邊界框預(yù)測(cè)的空間約束，因?yàn)槊總€(gè)網(wǎng)格單元僅預(yù)測(cè)兩個(gè)框，并且只能具有一個(gè)類別。
It is difficult to detect small objects that appear in groups.
很難檢測(cè)出現(xiàn)在組中的小物體。
It struggles to generalize objects in new or unusual aspect ratios as the model learns to predict bounding boxes from data itself.
當(dāng)模型學(xué)習(xí)從數(shù)據(jù)本身預(yù)測(cè)邊界框時(shí)，它很難以新的或不尋常的寬高比來(lái)概括對(duì)象。

結(jié)論： (Conclusion:)

This was a brief explanation of the research paper as well as details obtained from various other sources. I hope I made this concept easier for you to understand.

這是對(duì)研究論文的簡(jiǎn)要說(shuō)明，以及從其他各種來(lái)源獲得的詳細(xì)信息。希望我使這個(gè)概念更容易理解。

Although if you really want to check your understanding, the best way is to implement the algorithm. In the next section, we will do exactly that. Many details cannot be explained via text and can only be understood while implementing it.

盡管如果您真的想檢查自己的理解，最好的方法是實(shí)現(xiàn)算法。在下一節(jié)中，我們將完全做到這一點(diǎn)。許多細(xì)節(jié)無(wú)法通過(guò)文本解釋，只能在實(shí)施過(guò)程中理解。

Thank you for reading. Click here to go to the next part.

感謝您的閱讀。單擊此處轉(zhuǎn)到下一部分。

翻譯自: https://towardsdatascience.com/object-detection-part1-4dbe5147ad0a

簡(jiǎn)述yolo1-yolo3

總結(jié)

以上是生活随笔為你收集整理的简述yolo1-yolo3_使用YOLO框架进行对象检测的综合指南-第一部分的全部?jī)?nèi)容，希望文章能夠幫你解決所遇到的問(wèn)題。

如果覺(jué)得生活随笔網(wǎng)站內(nèi)容還不錯(cuò)，歡迎將生活随笔推薦給好友。

上一篇：梦到很多死蛇是什么意思
下一篇： cnn对网络数据预处理_CNN中的数据预