當前位置：首頁 > 编程资源 > 编程问答 >内容正文

编程问答

探索 YOLO v3 实现细节 - 第6篇预测 (完结)

發布時間：2025/3/21 编程问答 35 豆豆

生活随笔收集整理的這篇文章主要介紹了探索 YOLO v3 实现细节 - 第6篇预测 (完结) 小編覺得挺不錯的,現在分享給大家,幫大家做個參考.

YOLO，即You Only Look Once的縮寫，是一個基于卷積神經網絡（CNN）的物體檢測算法。而YOLO v3是YOLO的第3個版本，即YOLO、YOLO 9000、YOLO v3，檢測效果，更準更強。

YOLO v3的更多細節，可以參考YOLO的官網。

YOLO是一句美國的俗語，You Only Live Once，你只能活一次，即人生苦短，及時行樂。

本文主要分享，如何實現YOLO v3的算法細節，Keras框架。這是第6篇，檢測圖片中的物體，使用訓練完成的模型，通過框置信度與類別置信度的乘積，篩選最優的檢測框。本系列一共6篇，已完結，這是一個完整版：）

本文的GitHub源碼：github.com/SpikeKing/k…

已更新：

第1篇訓練：mp.weixin.qq.com/s/T9LshbXoe…
第2篇模型：mp.weixin.qq.com/s/N79S9Qf1O…
第3篇網絡：mp.weixin.qq.com/s/hC4P7iRGv…
第4篇真值：mp.weixin.qq.com/s/5Sj7QadfV…
第5篇 Loss：mp.weixin.qq.com/s/4L9E4WGSh…

歡迎關注，微信公眾號 深度算法 （ID: DeepAlgorithm），了解更多深度技術！

1. 檢測函數

使用已經訓練完成的YOLO v3模型，檢測圖片中的物體，其中：

創建YOLO類的實例yolo；
使用Image.open()加載圖像image；
調用yolo.detect_image()檢測圖像image；
關閉yolo的session；
顯示檢測完成的圖像r_image；

實現：

def detect_img_for_test():yolo = YOLO()img_path = './dataset/img.jpg'image = Image.open(img_path)r_image = yolo.detect_image(image)yolo.close_session()r_image.show() 復制代碼

輸出：

2. YOLO參數

YOLO類的初始化參數：

anchors_path：anchor box的配置文件，9個寬高組合；
model_path：已訓練完成的模型，支持重新訓練的模型；
classes_path：類別文件，與模型文件匹配；
score：置信度的閾值，刪除小于閾值的候選框；
iou：候選框的IoU閾值，刪除同類別中大于閾值的候選框；
class_names：類別列表，讀取classes_path；
anchors：anchor box列表，讀取anchors_path；
model_image_size：模型所檢測圖像的尺寸，輸入圖像都需要按此填充；
colors：通過HSV色域，生成隨機顏色集合，數量等于類別數class_names；
boxes、scores、classes：檢測的核心輸出，函數generate()所生成，是模型的輸出封裝。

實現：

self.anchors_path = 'configs/yolo_anchors.txt' # Anchors self.model_path = 'model_data/yolo_weights.h5' # 模型文件 self.classes_path = 'configs/coco_classes.txt' # 類別文件self.score = 0.20 self.iou = 0.20 self.class_names = self._get_class() # 獲取類別 self.anchors = self._get_anchors() # 獲取anchor self.sess = K.get_session() self.model_image_size = (416, 416) # fixed size or (None, None), hw self.colors = self.__get_colors(self.class_names) self.boxes, self.scores, self.classes = self.generate() 復制代碼

在__get_colors()中：

將HSV的第0位H值，按1等分，其余SV值，均為1，生成一組HSV列表；
調用hsv_to_rgb，將HSV色域轉換為RGB色域；
0~1的RGB值乘以255，轉換為完整的顏色值，(0~255)；
隨機shuffle顏色列表；

實現：

@staticmethod def __get_colors(names):# 不同的框，不同的顏色hsv_tuples = [(float(x) / len(names), 1., 1.)for x in range(len(names))] # 不同顏色colors = list(map(lambda x: colorsys.hsv_to_rgb(*x), hsv_tuples))colors = list(map(lambda x: (int(x[0] * 255), int(x[1] * 255), int(x[2] * 255)), colors)) # RGBnp.random.seed(10101)np.random.shuffle(colors)np.random.seed(None)return colors 復制代碼

選擇HSV劃分，而不是RGB的原因是，HSV的顏色值偏移更好，畫出的框，顏色更容易區分。

3. 輸出封裝

boxes、scores、classes是在模型的基礎上，繼續封裝，由函數generate()所生成，其中：

boxes：框的四個點坐標，(top, left, bottom, right)；
scores：框的類別置信度，融合框置信度和類別置信度；
classes：框的類別；

在函數generate()中，設置參數：

num_anchors：anchor box的總數，一般是9個；
num_classes：類別總數，如COCO是80個類；
yolo_model：由yolo_body所創建的模型，調用load_weights加載參數；

實現：

num_anchors = len(self.anchors) # anchors的數量 num_classes = len(self.class_names) # 類別數self.yolo_model = yolo_body(Input(shape=(416, 416, 3)), 3, num_classes) self.yolo_model.load_weights(model_path) # 加載模型參數復制代碼

接著，設置input_image_shape為placeholder，即TF中的參數變量。在yolo_eval中：

繼續封裝yolo_model的輸出output；
anchors，anchor box列表；
類別class_names的總數len()；
輸入圖片的可選尺寸，input_image_shape，即(416, 416)；
score_threshold，框的整體置信度閾值score；
iou_threshold，同類別框的IoU閾值iou；
返回，框的坐標boxes，框的類別置信度scores，框的類別classes；

實現：

self.input_image_shape = K.placeholder(shape=(2,)) boxes, scores, classes = yolo_eval(self.yolo_model.output, self.anchors, len(self.class_names),self.input_image_shape, score_threshold=self.score, iou_threshold=self.iou) return boxes, scores, classes 復制代碼

輸出的scores值，都會大于score_threshold，小于的在yolo_eval()中已被刪除。

4. YOLO評估

在函數yolo_eval()中，完成預測邏輯的封裝，其中輸入：

yolo_outputs：YOLO模型的輸出，3個尺度的列表，即13-26-52，最后1維是預測值，由255=3x(5+80)組成，3是每層的anchor數，5是4個框值xywh和1個框中含有物體的置信度，80是COCO的類別數；
anchors：9個anchor box的值；
num_classes：類別個數，COCO是80個類別；
image_shape：placeholder類型的TF參數，默認(416, 416)；
max_boxes：圖中每個類別的最大檢測框數，20個；
score_threshold：框置信度閾值，小于閾值的框被刪除，需要的框較多，則調低閾值，需要的框較少，則調高閾值；
iou_threshold：同類別框的IoU閾值，大于閾值的重疊框被刪除，重疊物體較多，則調高閾值，重疊物體較少，則調低閾值；

其中，yolo_outputs格式，如下：

[(?, 13, 13, 255), (?, 26, 26, 255), (?, 52, 52, 255)] 復制代碼

其中，anchors列表，如下：

[(10,13), (16,30), (33,23), (30,61), (62,45), (59,119), (116,90), (156,198), (373,326)] 復制代碼

實現：

boxes, scores, classes = yolo_eval(self.yolo_model.output, self.anchors, len(self.class_names),self.input_image_shape, score_threshold=self.score, iou_threshold=self.iou)def yolo_eval(yolo_outputs, anchors, num_classes, image_shape,max_boxes=20, score_threshold=.6, iou_threshold=.5): 復制代碼

接著，處理參數：

num_layers，輸出特征圖的層數，3層；
anchor_mask，將anchors劃分為3個層，第1層13x13是678，第2層26x26是345，第3層52x52是012；
input_shape：輸入圖像的尺寸，也就是第0個特征圖的尺寸乘以32，即13x32=416，這與Darknet的網絡結構有關。

num_layers = len(yolo_outputs) anchor_mask = [[6, 7, 8], [3, 4, 5], [0, 1, 2]] if num_layers == 3 else [[3, 4, 5], [1, 2, 3]] # default setting input_shape = K.shape(yolo_outputs[0])[1:3] * 32 復制代碼

特征圖越大，13->52，檢測的物體越小，需要的anchors越小，所以anchors列表以倒序賦值。

接著，在YOLO的第l層輸出yolo_outputs中，調用yolo_boxes_and_scores()，提取框_boxes和置信度_box_scores，將3個層的框數據放入列表boxes和box_scores，再拼接concatenate展平，輸出的數據就是所有的框和置信度。

其中，輸出的boxes和box_scores的格式，如下：

boxes: (?, 4) # ?是框數 box_scores: (?, 80) 復制代碼

實現：

boxes = [] box_scores = [] for l in range(num_layers):_boxes, _box_scores = yolo_boxes_and_scores(yolo_outputs[l], anchors[anchor_mask[l]], num_classes, input_shape, image_shape)boxes.append(_boxes)box_scores.append(_box_scores) boxes = K.concatenate(boxes, axis=0) box_scores = K.concatenate(box_scores, axis=0) 復制代碼

concatenate的作用是：將多個層的數據展平，因為框已經還原為真實坐標，不同尺度沒有差異。

在函數yolo_boxes_and_scores()中：

yolo_head的輸出：box_xy是box的中心坐標，(0~1)相對位置；box_wh是box的寬高，(0~1)相對值；box_confidence是框中物體置信度；box_class_probs是類別置信度；
yolo_correct_boxes，將box_xy和box_wh的(0~1)相對值，轉換為真實坐標，輸出boxes是(y_min,x_min,y_max,x_max)的值；
reshape，將不同網格的值展平為框的列表，即(?,13,13,3,4)->(?,4)；
box_scores是框置信度與類別置信度的乘積，再reshape展平，(?,80)；
返回框boxes和框置信度box_scores。

實現：

def yolo_boxes_and_scores(feats, anchors, num_classes, input_shape, image_shape):'''Process Conv layer output'''box_xy, box_wh, box_confidence, box_class_probs = yolo_head(feats, anchors, num_classes, input_shape)boxes = yolo_correct_boxes(box_xy, box_wh, input_shape, image_shape)boxes = K.reshape(boxes, [-1, 4])box_scores = box_confidence * box_class_probsbox_scores = K.reshape(box_scores, [-1, num_classes])return boxes, box_scores 復制代碼

接著：

mask，過濾小于置信度閾值的框，只保留大于置信度的框，mask掩碼；
max_boxes_tensor，圖片中每個類別的最大檢測框數，max_boxes是20；

實現：

mask = box_scores >= score_threshold max_boxes_tensor = K.constant(max_boxes, dtype='int32') 復制代碼

接著：

通過掩碼mask和類別c，篩選框class_boxes和置信度class_box_scores；
通過NMS，非極大值抑制，篩選出框boxes的NMS索引nms_index；
根據索引，選擇gather輸出的框class_boxes和置信class_box_scores度，再生成類別信息classes；
將多個類別的數據組合，生成最終的檢測數據框，并返回。

實現：

boxes_ = [] scores_ = [] classes_ = [] for c in range(num_classes):class_boxes = tf.boolean_mask(boxes, mask[:, c])class_box_scores = tf.boolean_mask(box_scores[:, c], mask[:, c])nms_index = tf.image.non_max_suppression(class_boxes, class_box_scores, max_boxes_tensor, iou_threshold=iou_threshold)class_boxes = K.gather(class_boxes, nms_index)class_box_scores = K.gather(class_box_scores, nms_index)classes = K.ones_like(class_box_scores, 'int32') * cboxes_.append(class_boxes)scores_.append(class_box_scores)classes_.append(classes) boxes_ = K.concatenate(boxes_, axis=0) scores_ = K.concatenate(scores_, axis=0) classes_ = K.concatenate(classes_, axis=0) 復制代碼

輸出格式：

boxes_: (?, 4) scores_: (?,) classes_: (?,) 復制代碼

5. 檢測方法

第1步，圖像處理：

將圖像等比例轉換為檢測尺寸，檢測尺寸需要是32的倍數，周圍進行填充；

將圖片增加1維，符合輸入參數格式；

if self.model_image_size != (None, None): # 416x416, 416=32*13，必須為32的倍數，最小尺度是除以32assert self.model_image_size[0] % 32 == 0, 'Multiples of 32 required'assert self.model_image_size[1] % 32 == 0, 'Multiples of 32 required'boxed_image = letterbox_image(image, tuple(reversed(self.model_image_size))) # 填充圖像 else:new_image_size = (image.width - (image.width % 32), image.height - (image.height % 32))boxed_image = letterbox_image(image, new_image_size) image_data = np.array(boxed_image, dtype='float32') print('detector size {}'.format(image_data.shape)) image_data /= 255. # 轉換0~1 image_data = np.expand_dims(image_data, 0) # 添加批次維度，將圖片增加1維復制代碼

第2步，feed數據，圖像，圖像尺寸；

out_boxes, out_scores, out_classes = self.sess.run([self.boxes, self.scores, self.classes],feed_dict={self.yolo_model.input: image_data,self.input_image_shape: [image.size[1], image.size[0]],K.learning_phase(): 0}) 復制代碼

第3步，繪制邊框，自動設置邊框寬度，繪制邊框和類別文字，使用Pillow繪圖庫。

font = ImageFont.truetype(font='font/FiraMono-Medium.otf',size=np.floor(3e-2 * image.size[1] + 0.5).astype('int32')) # 字體 thickness = (image.size[0] + image.size[1]) // 512 # 厚度 for i, c in reversed(list(enumerate(out_classes))):predicted_class = self.class_names[c] # 類別box = out_boxes[i] # 框score = out_scores[i] # 執行度label = '{} {:.2f}'.format(predicted_class, score) # 標簽draw = ImageDraw.Draw(image) # 畫圖label_size = draw.textsize(label, font) # 標簽文字top, left, bottom, right = boxtop = max(0, np.floor(top + 0.5).astype('int32'))left = max(0, np.floor(left + 0.5).astype('int32'))bottom = min(image.size[1], np.floor(bottom + 0.5).astype('int32'))right = min(image.size[0], np.floor(right + 0.5).astype('int32'))print(label, (left, top), (right, bottom)) # 邊框if top - label_size[1] >= 0: # 標簽文字text_origin = np.array([left, top - label_size[1]])else:text_origin = np.array([left, top + 1])# My kingdom for a good redistributable image drawing library.for i in range(thickness): # 畫框draw.rectangle([left + i, top + i, right - i, bottom - i],outline=self.colors[c])draw.rectangle( # 文字背景[tuple(text_origin), tuple(text_origin + label_size)],fill=self.colors[c])draw.text(text_origin, label, fill=(0, 0, 0), font=font) # 文案del draw 復制代碼

補充

1. concatenate

concatenate將相同維度的數據元素連接到一起。

實現：

from keras import backend as Ksess = K.get_session()a = K.constant([[2, 4], [1, 2]]) b = K.constant([[3, 2], [5, 6]]) c = [a, b] c = K.concatenate(c, axis=0)print(sess.run(c)) """ [[2. 4.] [1. 2.] [3. 2.] [5. 6.]] """ 復制代碼

2. gather

gather以索引選擇列表元素。

實現：

from keras import backend as Ksess = K.get_session()a = K.constant([[2, 4], [1, 2], [5, 6]]) b = K.gather(a, [1, 2])print(sess.run(b)) """ [[1. 2.] [5. 6.]] """ 復制代碼

OK, that's all! Enjoy it!