當前位置：首頁 > 编程资源 > 编程问答 >内容正文

编程问答

yolo极大抑制_pytorch实现yolov3(4) 非极大值抑制nms

發布時間：2024/7/23 编程问答 41 豆豆

生活随笔收集整理的這篇文章主要介紹了 yolo极大抑制_pytorch实现yolov3(4) 非极大值抑制nms 小編覺得挺不錯的,現在分享給大家,幫大家做個參考.

在上一篇里我們實現了forward函數.得到了prediction.此時預測出了特別多的box以及各種class probability,現在我們要從中過濾出我們最終的預測box.

理解了yolov3的輸出的格式及每一個位置的含義,并不難理解源碼.我在閱讀源碼的過程中主要的困難在于對pytorch不熟悉,所以在這篇文章里,關于其中涉及的一些pytorch中的函數的用法我都已經用加粗標示了并且給出了相應的鏈接,測試代碼等.

obj score threshold

我們設置一個obj score thershold,超過這個值的才認為是有效的.

conf_mask = (prediction[:,:,4] > confidence).float().unsqueeze(2)

prediction = prediction*conf_mask

prediction是1*boxnum*boxattr

prediction[:,:,4]是1*boxnum 元素值為boxattr的index=4的那個值.

torch中的Tensor index和numpy是類似的,參看下列代碼輸出

import torch

x = torch.Tensor(1,3,10) # Create an un-initialized Tensor of size 2x3

print(x)

print(x.shape) # Print out the Tensor

y = x[:,:,4]

print(y)

print(y.shape)

z = x[:,:,4:6]

print(z)

print(z.shape)

print((y>0.5).float().unsqueeze(2))

#### 輸出如下

tensor([[[2.5226e-18, 1.6898e-04, 1.0413e-11, 7.7198e-10, 1.0549e-08,

4.0516e-11, 1.0681e-05, 2.9575e-18, 6.7333e+22, 1.7591e+22],

[1.7184e+25, 4.3222e+27, 6.1972e-04, 7.2443e+22, 1.7728e+28,

7.0367e+22, 5.9018e-10, 2.6540e-09, 1.2972e-11, 5.3370e-08],

[2.7001e-06, 2.6801e-09, 4.1292e-05, 2.1511e+23, 3.2770e-09,

2.5125e-18, 7.7052e+31, 1.9447e+31, 5.0207e+28, 1.1492e-38]]])

torch.Size([1, 3, 10])

tensor([[1.0549e-08, 1.7728e+28, 3.2770e-09]])

torch.Size([1, 3])

tensor([[[1.0549e-08, 4.0516e-11],

[1.7728e+28, 7.0367e+22],

[3.2770e-09, 2.5125e-18]]])

torch.Size([1, 3, 2])

tensor([[[0.],

[0.],

[0.]]])

Squeeze and unsqueeze 降低維度,升高維度.

t = torch.ones(2,1,2,1) # Size 2x1x2x1

r = torch.squeeze(t) # Size 2x2

r = torch.squeeze(t, 1) # Squeeze dimension 1: Size 2x2x1

# Un-squeeze a dimension

x = torch.Tensor([1, 2, 3])

r = torch.unsqueeze(x, 0) # Size: 1x3 表示在第0個維度添加1維

r = torch.unsqueeze(x, 1) # Size: 3x1 表示在第1個維度添加1維

這樣prediction中objscore

nms

#得到box坐標(top-left corner x, top-left corner y, right-bottom corner x, right-bottom corner y)

box_corner = prediction.new(prediction.shape)

box_corner[:,:,0] = (prediction[:,:,0] - prediction[:,:,2]/2)

box_corner[:,:,1] = (prediction[:,:,1] - prediction[:,:,3]/2)

box_corner[:,:,2] = (prediction[:,:,0] + prediction[:,:,2]/2)

box_corner[:,:,3] = (prediction[:,:,1] + prediction[:,:,3]/2)

prediction[:,:,:4] = box_corner[:,:,:4]

原始的prediction中boxattr存放的是x,y,w,h,...,不方便我們處理,我們將其轉換成(top-left corner x, top-left corner y, right-bottom corner x, right-bottom corner y)

接下來我們挨個處理每一張圖片對應的feature map.

batch_size = prediction.size(0)

write = False

for ind in range(batch_size):

#image_pred.shape=boxnum\*boxattr

image_pred = prediction[ind] #image Tensor box_num*box_attr

#confidence threshholding

#NMS

#返回每一行的最大值,及最大值所在的列.

max_conf, max_conf_score = torch.max(image_pred[:,5:5+ num_classes], 1)

#升級成和image_pred同樣的維度

max_conf = max_conf.float().unsqueeze(1)

max_conf_score = max_conf_score.float().unsqueeze(1)

seq = (image_pred[:,:5], max_conf, max_conf_score)

#沿著列的方向拼接. 現在image_pred變成boxnum\*7

image_pred = torch.cat(seq, 1)

這里涉及到torch.max的用法,參見https://blog.csdn.net/Z_lbj/article/details/79766690

torch.max(input, dim, keepdim=False, out=None) -> (Tensor, LongTensor)

按維度dim 返回最大值.可以這么記憶,沿著第dim維度比較.torch.max(0)即沿著行的方向比較,即得到每列的最大值.

假設input是二維矩陣,即行*列,行是第0維,列是第一維.

torch.max(a,0) 返回每一列中最大值的那個元素，且返回索引(返回最大元素在這一列的行索引)

torch.max(a,1) 返回每一行中最大值的那個元素，且返回其索引(返回最大元素在這一行的列索引)

c=torch.Tensor([[1,2,3],[6,5,4]])

print(c)

a,b=torch.max(c,1)

print(a)

print(b)

##輸出如下:

tensor([[1., 2., 3.],

[6., 5., 4.]])

tensor([3., 6.])

tensor([2, 0])

torch.cat(tensors, dim=0, out=None) → Tensor

>>> x = torch.randn(2, 3)

>>> x

tensor([[ 0.6580, -1.0969, -0.4614],

[-0.1034, -0.5790, 0.1497]])

>>> torch.cat((x, x, x), 0)

tensor([[ 0.6580, -1.0969, -0.4614],

[-0.1034, -0.5790, 0.1497],

[ 0.6580, -1.0969, -0.4614],

[-0.1034, -0.5790, 0.1497],

[ 0.6580, -1.0969, -0.4614],

[-0.1034, -0.5790, 0.1497]])

>>> torch.cat((x, x, x), 1)

tensor([[ 0.6580, -1.0969, -0.4614, 0.6580, -1.0969, -0.4614, 0.6580,

-1.0969, -0.4614],

[-0.1034, -0.5790, 0.1497, -0.1034, -0.5790, 0.1497, -0.1034,

-0.5790, 0.1497]])

接下來我們只處理obj_score非0的數據(obj_score

non_zero_ind = (torch.nonzero(image_pred[:,4]))

try:

image_pred_ = image_pred[non_zero_ind.squeeze(),:].view(-1,7)

except:

continue

#For PyTorch 0.4 compatibility

#Since the above code with not raise exception for no detection

#as scalars are supported in PyTorch 0.4

if image_pred_.shape[0] == 0:

continue

ok,接下來我們對每一種class做nms.

首先取到我們有哪些類別

#Get the various classes detected in the image

img_classes = unique(image_pred_[:,-1]) # -1 index holds the class index

然后依次對每一種類別做處理

for cls in img_classes:

#perform NMS

#get the detections with one particular class

#取出當前class為當前class且class prob!=0的行

cls_mask = image_pred_*(image_pred_[:,-1] == cls).float().unsqueeze(1)

class_mask_ind = torch.nonzero(cls_mask[:,-2]).squeeze()

image_pred_class = image_pred_[class_mask_ind].view(-1,7)

#sort the detections such that the entry with the maximum objectness

#confidence is at the top

#按照obj score從高到低做排序

conf_sort_index = torch.sort(image_pred_class[:,4], descending = True )[1]

image_pred_class = image_pred_class[conf_sort_index]

idx = image_pred_class.size(0) #Number of detections

for i in range(idx):

#Get the IOUs of all boxes that come after the one we are looking at

#in the loop

try:

#計算第i個和其后每一行的的iou

ious = bbox_iou(image_pred_class[i].unsqueeze(0), image_pred_class[i+1:])

except ValueError:

break

except IndexError:

break

#Zero out all the detections that have IoU > treshhold

#把與第i行iou>nms_conf的認為是同一個目標的box,將其轉成0

iou_mask = (ious < nms_conf).float().unsqueeze(1)

image_pred_class[i+1:] *= iou_mask

#把iou>nms_conf的移除掉

non_zero_ind = torch.nonzero(image_pred_class[:,4]).squeeze()

image_pred_class = image_pred_class[non_zero_ind].view(-1,7)

batch_ind = image_pred_class.new(image_pred_class.size(0), 1).fill_(ind) #Repeat the batch_id for as many detections of the class cls in the image

seq = batch_ind, image_pred_class

其中計算iou的代碼如下,不多解釋了.iou=交疊面積/總面積

def bbox_iou(box1, box2):

"""

Returns the IoU of two bounding boxes

"""

#Get the coordinates of bounding boxes

b1_x1, b1_y1, b1_x2, b1_y2 = box1[:,0], box1[:,1], box1[:,2], box1[:,3]

b2_x1, b2_y1, b2_x2, b2_y2 = box2[:,0], box2[:,1], box2[:,2], box2[:,3]

#get the corrdinates of the intersection rectangle

inter_rect_x1 = torch.max(b1_x1, b2_x1)

inter_rect_y1 = torch.max(b1_y1, b2_y1)

inter_rect_x2 = torch.min(b1_x2, b2_x2)

inter_rect_y2 = torch.min(b1_y2, b2_y2)

#Intersection area

inter_area = torch.clamp(inter_rect_x2 - inter_rect_x1 + 1, min=0) * torch.clamp(inter_rect_y2 - inter_rect_y1 + 1, min=0)

#Union Area

b1_area = (b1_x2 - b1_x1 + 1)*(b1_y2 - b1_y1 + 1)

b2_area = (b2_x2 - b2_x1 + 1)*(b2_y2 - b2_y1 + 1)

iou = inter_area / (b1_area + b2_area - inter_area)

return iou

tensor index操作用法如下:

image_pred_ = torch.Tensor([[1,2,3,4,9],[5,6,7,8,9]])

#print(image_pred_[:,-1] == 9)

has_9 = (image_pred_[:,-1] == 9)

print(has_9)

###執行順序是(image_pred_[:,-1] == 9).float().unsqueeze(1) 再做tensor乘法

cls_mask = image_pred_*(image_pred_[:,-1] == 9).float().unsqueeze(1)

print(cls_mask)

class_mask_ind = torch.nonzero(cls_mask[:,-2]).squeeze()

image_pred_class = image_pred_[class_mask_ind]

輸出如下:

tensor([1, 1], dtype=torch.uint8)

tensor([[1., 2., 3., 4., 9.],

[5., 6., 7., 8., 9.]])

torch.sort用法如下:

d=torch.Tensor([[1,2,3],[6,5,4]])

e=d[:,2]

print(e)

print(torch.sort(e))

輸出

tensor([3., 4.])

torch.return_types.sort(

values=tensor([3., 4.]),

indices=tensor([0, 1]))

總結一下我們做nms的流程

每一個image,會預測出N個detetction信息,包括4+1+C(4個坐標信息,1個obj score以及C個class probability)

首先過濾掉obj_score < confidence的行

每一行只取class probability最高的作為預測出來的類別

將所有的預測按照obj_score從大到小排序

循環每一種類別,開始做nms

比較第一個box與其后所有box的iou,刪除iou>threshold的box,即剔除所有相似box

比較下一個box與其后所有box的iou,刪除所有與該box相似的box

不斷重復上述過程,直至不再有相似box

至此,實現了當前處理的類別的多個box均是獨一無二的box.

write_results最終的返回值是一個n*8的tensor,其中8是(batch_index,4個坐標,1個objscore,1個class prob,一個class index)

def write_results(prediction, confidence, num_classes, nms_conf = 0.4):

print("prediction.shape=",prediction.shape)

#將obj_score < confidence的行置為0

conf_mask = (prediction[:,:,4] > confidence).float().unsqueeze(2)

prediction = prediction*conf_mask

#得到box坐標(top-left corner x, top-left corner y, right-bottom corner x, right-bottom corner y)

box_corner = prediction.new(prediction.shape)

box_corner[:,:,0] = (prediction[:,:,0] - prediction[:,:,2]/2)

box_corner[:,:,1] = (prediction[:,:,1] - prediction[:,:,3]/2)

box_corner[:,:,2] = (prediction[:,:,0] + prediction[:,:,2]/2)

box_corner[:,:,3] = (prediction[:,:,1] + prediction[:,:,3]/2)

#修改prediction第三個維度的前四列

prediction[:,:,:4] = box_corner[:,:,:4]

batch_size = prediction.size(0)

write = False

for ind in range(batch_size):

#image_pred.shape=boxnum\*boxattr

image_pred = prediction[ind] #image Tensor

#confidence threshholding

#NMS

##取出每一行的class score最大的一個

max_conf_score,max_conf = torch.max(image_pred[:,5:5+ num_classes], 1)

max_conf = max_conf.float().unsqueeze(1)

max_conf_score = max_conf_score.float().unsqueeze(1)

seq = (image_pred[:,:5], max_conf_score, max_conf)

image_pred = torch.cat(seq, 1) #現在變成7列,分別為左上角x,左上角y,右下角x,右下角y,obj score,最大probabilty,相應的class index

print(image_pred.shape)

non_zero_ind = (torch.nonzero(image_pred[:,4]))

try:

image_pred_ = image_pred[non_zero_ind.squeeze(),:].view(-1,7)

except:

continue

#For PyTorch 0.4 compatibility

#Since the above code with not raise exception for no detection

#as scalars are supported in PyTorch 0.4

if image_pred_.shape[0] == 0:

continue

#Get the various classes detected in the image

img_classes = unique(image_pred_[:,-1]) # -1 index holds the class index

for cls in img_classes:

#perform NMS

#get the detections with one particular class

#取出當前class為當前class且class prob!=0的行

cls_mask = image_pred_*(image_pred_[:,-1] == cls).float().unsqueeze(1)

class_mask_ind = torch.nonzero(cls_mask[:,-2]).squeeze()

image_pred_class = image_pred_[class_mask_ind].view(-1,7)

#sort the detections such that the entry with the maximum objectness

#confidence is at the top

#按照obj score從高到低做排序

conf_sort_index = torch.sort(image_pred_class[:,4], descending = True )[1]

image_pred_class = image_pred_class[conf_sort_index]

idx = image_pred_class.size(0) #Number of detections

for i in range(idx):

#Get the IOUs of all boxes that come after the one we are looking at

#in the loop

try:

#計算第i個和其后每一行的的iou

ious = bbox_iou(image_pred_class[i].unsqueeze(0), image_pred_class[i+1:])

except ValueError:

break

except IndexError:

break

#Zero out all the detections that have IoU > treshhold

#把與第i行iou>nms_conf的認為是同一個目標的box,將其轉成0

iou_mask = (ious < nms_conf).float().unsqueeze(1)

image_pred_class[i+1:] *= iou_mask

#把iou>nms_conf的移除掉

non_zero_ind = torch.nonzero(image_pred_class[:,4]).squeeze()

image_pred_class = image_pred_class[non_zero_ind].view(-1,7)

batch_ind = image_pred_class.new(image_pred_class.size(0), 1).fill_(ind) #Repeat the batch_id for as many detections of the class cls in the image

seq = batch_ind, image_pred_class

if not write:

output = torch.cat(seq,1) #沿著列方向,shape 1*8

write = True

else:

out = torch.cat(seq,1)

output = torch.cat((output,out)) #沿著行方向 shape n*8

try:

return output

except:

return 0

總結

以上是生活随笔為你收集整理的yolo极大抑制_pytorch实现yolov3(4) 非极大值抑制nms的全部內容，希望文章能夠幫你解決所遇到的問題。

如果覺得生活随笔網站內容還不錯，歡迎將生活随笔推薦給好友。

上一篇： c语言算术平均滤波法_单片机数字滤波的
下一篇： jlabel 不能连续两次set_为什么