日韩性视频-久久久蜜桃-www中文字幕-在线中文字幕av-亚洲欧美一区二区三区四区-撸久久-香蕉视频一区-久久无码精品丰满人妻-国产高潮av-激情福利社-日韩av网址大全-国产精品久久999-日本五十路在线-性欧美在线-久久99精品波多结衣一区-男女午夜免费视频-黑人极品ⅴideos精品欧美棵-人人妻人人澡人人爽精品欧美一区-日韩一区在线看-欧美a级在线免费观看

歡迎訪問 生活随笔!

生活随笔

當前位置: 首頁 > 人文社科 > 生活经验 >内容正文

生活经验

Paper9:Fast RCNN

發布時間:2023/11/27 生活经验 32 豆豆
生活随笔 收集整理的這篇文章主要介紹了 Paper9:Fast RCNN 小編覺得挺不錯的,現在分享給大家,幫大家做個參考.

code:s available under the open-source MIT License at https://github.com/rbgirshick/ fast-rcnn.

摘要:

Fast R-CNN在訓練和測試上的速度都得到提高,而且準確率也提高了。在on PASCAL VOC 2012上,Fast R-CNN trains the very deep VGG16 network 9× faster than R-CNN, is 213× faster at test-time, and achieves a higher mAP on PASCAL VOC 2012。Fast R-CNN與SPPNet相比,Fast R-CNN訓練VGG16更快,準確率也更高;

The Fast R-CNN method has several advantages:

1. Higher detection quality (mAP) than R-CNN, SPPnet

2. Training is single-stage, using a multi-task loss

3. Training can update all network layers

4. No disk storage is required for feature caching

mAP:mean Average Precision,簡單翻譯過來就是平均的平均精確度(沒錯,就是兩個平均),首先是一個類別內,求平均精確度(Average Precision),然后對所有類別的平均精確度再求平均(mean Average Precision)。

  • mAP: mean Average Precision, 即各類別AP的平均值
  • AP: PR曲線下面積,后文會詳細講解
  • PR曲線: Precision-Recall曲線
  • Precision: TP / (TP + FP)
  • Recall: TP / (TP + FN)
  • TP: IoU>0.5的檢測框數量(同一Ground Truth只計算一次)
  • FP: IoU<=0.5的檢測框,或者是檢測到同一個GT的多余檢測框的數量
  • FN: 沒有檢測到的GT的數量

Propose:First, numerous candidate object locations (often called “proposals”) must be processed。

1、Introduction

Methods:In this paper, we streamline the training process for state-of-the-art ConvNet-based object detectors [9, 11]. We propose a single-stage training algorithm that jointly learns to classify object proposals and refine their spatial locations.

Result:The resulting method can train a very deep detection network (VGG16 [20]) 9× faster than R-CNN [9] and 3× faster than SPPnet [11]. At runtime, the detection network processes images in 0.3s (excluding object proposal time) while achieving top accuracy on PASCAL VOC 2012 [7] with a mAP of 66% (vs. 62% for R-CNN).1。

首先指出R-CNN的缺點:The Region-based Convolutional Network method (R-CNN) [9] achieves excellent object detection accuracy by using a deep ConvNet to classify object proposals. R-CNN, however, has notable drawbacks:

1. Training is a multi-stage pipeline.(多階段訓練)

R-CNN first fine tunes a ConvNet on object proposals using log loss. Then, it fits SVMs to ConvNet features. These SVMs act as object detectors, replacing the softmax classifier learnt by fine-tuning. In the third training stage, bounding-box regressors are learned.

2. Training is expensive in space and time.(訓練階段對于空間和時間的消耗太高)

For SVM and bounding-box regressor training, features are extracted from each object proposal in each image and written to disk. With very deep networks, such as VGG16, this process takes 2.5 GPU-days for the 5k images of the VOC07 trainval set. These features require hundreds of gigabytes of storage.

3. Object detection is slow.(物體檢測比較慢)

At test-time, features are extracted from each object proposal in each test image. Detection with VGG16 takes 47s / image (on a GPU).

論述R-CNN和SPPNet:(再看一下金字塔池化spatial pyramid pooling15和微調算法11)

R-CNN很慢,因為它對每個對象建議執行一個ConvNet forward pass,而不共享計算。而Spatial pyramid pooling networks (SPPnets) [11] were proposed to speed up R-CNN by sharing computatio。

The SPPnet method computes a convolutional feature map for the entire input image and then classifies each object proposal using a feature vector extracted from the shared feature map. Features are extracted for a proposal by max pooling the portion of the feature map inside the proposal into a fixed-size output (e.g., 6 × 6). Multiple output sizes are pooled and then concatenated as in spatial pyramid pooling [15]. SPPnet accelerates R-CNN by 10 to 100× at test time. Training time is also reduced by 3× due to faster proposal feature extraction.

(SPPnet方法為整個輸入圖像計算卷積特征圖,然后使用從共享特征圖中提取的特征向量對每個對象提議進行分類。 通過最大程度地將提案中的要素地圖部分集中到固定大小的輸出(例如6×6)中,提取提案的要素。 合并多個輸出大小,然后像在空間金字塔合并中一樣進行串聯[15]。 在測試時,SPPnet將R-CNN的速度提高了10到100倍。 由于建議特征提取速度更快,培訓時間也減少了3倍。 )

SPPnet的缺點:

SPPnet also has notable drawbacks. Like R-CNN, training is a multi-stage pipeline that involves extracting features, fine-tuning a network with log loss, training SVMs, and finally fitting bounding-box regressors. Features are also written to disk. But unlike R-CNN, the fine-tuning algorithm proposed in [11] cannot update the convolutional layers that precede the spatial pyramid pooling. Unsurprisingly, this limitation (fixed convolutional layers) limits the accuracy of very deep networks.

(SPPnet也有明顯的缺點。 像R-CNN一樣,訓練是一個多階段的管道,涉及提取特征,對網絡進行log損失微調,訓練SVM,最后擬合邊界框回歸器。 特征也會寫入磁盤。 但是與R-CNN不同,文獻[11]中提出的微調算法無法更新空間金字塔池之前的卷積層。 毫無疑問,此限制(固定的卷積層)限制了非常深的網絡的準確性。 )

2、Fast RCNN architecture and training

快速的R-CNN網絡以一幅完整的圖像和一組對象建議(object proposals)作為輸入。First,該網絡首先對整個圖像進行卷積(conv)和最大池化層處理,生成conv特征映射。Then,然后,對每個對象提議object proposal,一個感興趣區域(RoI)池化層從特征映射feature map中提取一個固定長度的特征向量。每個特征向量被送到一個全連接序列中,最終分支成兩個sibling的輸出層;其中一個對K個對象類加上一個“背景”類產生softmax 概率估計,另一個輸出層為K個對象類中的每一個輸出四個實數值,每組4個值對K個類別之一精確的邊界框(bounding box)位置進行編碼。

2.1?The RoI pooling layer

?

2.2 Initializing from pre-trained networks

?

2.3 Fine-tuning for detection

為什么SPPnet不能更新空間金字塔池化層以下的權重?其根本原因是,當每個訓練樣本(即RoI)來自不同的圖像時,通過SPP層的反向傳播效率非常低,這正是R-CNN和SPPnet網絡的訓練方式。效率低的原因是每個RoI可能有一個非常大的接受野,通常跨越整個輸入圖像。因此前向傳播必須處理整個感受野,所以訓練輸入量很大(通常是整個圖像)。

fast R-cnn training advantage:(than R-cnn and SPPnet)

We propose a more efficient training method that takes advantage of feature sharing during training.In Fast R-CNN training, stochastic gradient descent (SGD) mini-batches are sampled hierarchically, first by sampling N images and then by sampling R/N RoIs from each image. Critically, RoIs from the same image share computation and memory in the forward and backward passes. Making N small decreases mini-batch computation. For example, when using N = 2 and R = 128, the proposed training scheme is roughly 64× faster than sampling one RoI from 128 different images (i.e., the R-CNN and SPPnet strategy)

Q:從每一個image 采樣R/N個ROIs,(R是什么?N是什么?N是images的個數,mini-batches of size R = 128

IOU(重疊度)(Intersection over Union)

物體檢測需要定位出物體的bounding box,就像下面的圖片一樣,我們不僅要定位出車輛的bounding box 我們還要識別出bounding box 里面的物體就是車輛。

? ? ?

  • ground-truth bounding boxes(人為在訓練集圖像中標出要檢測物體的大概范圍)
  • 我們的算法得出的結果范圍。

對于bounding box的定位精度,有一個很重要的概念: 因為我們算法不可能百分百跟人工標注的數據完全匹配,因此就存在一個定位精度評價公式:IOU。 它定義了兩個bounding box的重疊度,如下圖所示

? ? ? ? ? ? ? ? ? ? ? ?

2、IoU的計算?

IoU是兩個區域重疊的部分除以兩個區域的集合部分得出的結果,通過設定的閾值,與這個IoU計算結果比較

就是矩形框A、B的重疊面積占A、B并集的面積比例。

舉例如下:綠色框是準確值,紅色框是預測值。

?

總結

以上是生活随笔為你收集整理的Paper9:Fast RCNN的全部內容,希望文章能夠幫你解決所遇到的問題。

如果覺得生活随笔網站內容還不錯,歡迎將生活随笔推薦給好友。