一、目标检测入门VOC2012
文章目錄
- 前言
- 一、數據集VOC2012
- 1.數據處理
- 2.構建dataset
- 3.構建dataloader
- 二、網絡結構
- 三、損失函數
- 四、訓練與預測
- 1.訓練
- 2.預測
- 總結
前言
提示:這里是本文要記錄的大概內容:
本文介紹的網絡可以理解為SSD目標檢測的簡化版本,可以讓我們很好的入門。文章中所用的部分代碼在目標檢測詳解。
提示:以下是本篇文章正文內容
一、數據集VOC2012
1.數據處理
使用數據集合VOC2012。
解壓數據集合后,需要對數據集合進行處理,將圖像的xml文件轉換為目標檢測的label文件(txt),其中包含物體的類別,bbox的左上角點坐標以及bbox的寬、高,并將四個物理量歸一化。
運行make_label_txt函數最終得到label文件夾。
2.構建dataset
class VOC2012(Dataset):def __init__(self, is_train=True, is_aug=True):if is_train:self.filenames = list(pd.read_csv(DATASET_PATH + 'ImageSets/Main/train.txt', names=['filenames']).values.reshape(-1))else:self.filenames = list(pd.read_csv(DATASET_PATH + 'ImageSets/Main/val.txt', names=['filenames']).values.reshape(-1))self.image_path = DATASET_PATH + 'JPEGImages/'self.label_path = DATASET_PATH + 'labels/'self.is_aug = is_augdef __len__(self):return len(self.filenames)def __getitem__(self, item):image = cv2.imread(self.image_path + self.filenames[item] + '.jpg')h, w = image.shape[0:2]image = cv2.resize(image, (224, 224))if self.is_aug:aug = transforms.Compose([transforms.ToTensor()])image = aug(image)bbox = pd.read_csv(self.label_path + self.filenames[item] + '.txt', names=['labels', 'x', 'y', 'w', 'h'], sep=' ').valuesif bbox.dtype == 'float64':bbox = torch.tensor(bbox, dtype=torch.float64)label = bbox[:, 0].reshape(-1, 1)bbox = box_center_to_corner(bbox[:, 1:])bbox = torch.cat((label, bbox), dim=1)return image, bboxdef collate_fn(self, batch):images = list()boxes = list()for b in batch:if b[1].dtype == torch.float64:images.append(b[0])boxes.append(b[1])images = torch.stack(images, dim=0)return images, boxes注意:這里實現了collate_fn(self, batch)對數據的批次化處理,并對數據進行了一個簡單的過濾(因為數據中存在空文件)。
3.構建dataloader
train_data = VOC2012(True) train_loader = DataLoader(train_data, batch_size=64, shuffle=True, num_workers=2, collate_fn=train_data.collate_fn) # 注意加入參數collate_fn二、網絡結構
class VGGBase(nn.Module):def __init__(self):super(VGGBase, self).__init__()model_conv = models.vgg16(weights=models.VGG16_Weights.IMAGENET1K_V1)model_conv = nn.Sequential(*list(model_conv.children())[:-2])self.cnn = model_convdef forward(self, img):return self.cnn(img)class PredictionConvolutions(nn.Module):def __init__(self, n_classes):super(PredictionConvolutions, self).__init__()self.n_classes = n_classesn_boxes = 5self.loc_conv = nn.Conv2d(512, n_boxes * 4, kernel_size=3, padding=1)self.cl_conv = nn.Conv2d(512, n_boxes * n_classes, kernel_size=3, padding=1)self.init_conv2d()def init_conv2d(self):for c in self.children():if isinstance(c, nn.Conv2d):nn.init.xavier_uniform_(c.weight)nn.init.constant_(c.bias, 0.)def forward(self, pool5_feats):batch_size = pool5_feats.size(0)l_conv = self.loc_conv(pool5_feats)l_conv = l_conv.permute(0, 2, 3, 1).contiguous()locs = l_conv.view(batch_size, -1, 4)c_conv = self.cl_conv(pool5_feats)c_conv = c_conv.permute(0, 2, 3, 1).contiguous()classes_scores = c_conv.view(batch_size, -1, self.n_classes)return locs, classes_scoresclass SSD(nn.Module):def __init__(self, num_classes):super(SSD, self).__init__()self.num_classes = num_classesself.base = VGGBase()self.pred_convs = PredictionConvolutions(num_classes)self.sizes =[0.75, 0.5, 0.25]self.ratios = [1, 2, 0.5]def forward(self, image):image = self.base(image)anchors = multibox_prior(image, self.sizes, self.ratios)locs, classes_scores = self.pred_convs(image)locs = locs.reshape(locs.shape[0], -1)return anchors, locs, classes_scores網絡輸入一個224*224的圖像,采用vgg16提取特征得到7 * 7的特征圖。接著在7 * 7的特征圖的每個像素點上設置錨框或者說先驗框,先驗框的尺寸和寬高比為sizes =[0.75, 0.5, 0.25],ratios = [1, 2, 0.5],為了簡化先驗框我們只使用包含sizes[0]和ratios[0]的先驗框。對于每個anchor,我們需要預測兩類信息,一個是這個anchor的類別信息,一個是物體的邊界框信息。類別信息由21類別的得分組成(VOC數據集的20個類別 + 一個背景類),模型最終會選擇預測得分最高的類作為邊界框對象的類別。而邊界框信息是指,我們預測出了先驗框的偏移信息,對anchor進行微調,使得最終能夠準確預測出物體的bbox。在7x7的feature map后,接上兩個3x3的卷積層,即可分別完成分類和回歸的預測。
三、損失函數
為了簡化過程我們,使用交叉熵損失和L1損失來分別計算分類和回歸的損失,并將其封裝為函數。
cls_loss = nn.CrossEntropyLoss(reduction='none') bbox_loss = nn.L1Loss(reduction='none') def calc_loss(cls_preds, cls_labels, bbox_preds, bbox_labels, bbox_masks):batch_size, num_classes = cls_preds.shape[0], cls_preds.shape[2]cls = cls_loss(cls_preds.reshape(-1, num_classes), cls_labels.reshape(-1)).reshape(batch_size, -1).mean(dim=1)bbox = bbox_loss(bbox_preds * bbox_masks, bbox_labels * bbox_masks).mean(dim=1)return cls + bbox * 1000注意:因為先驗框信息做了歸一化處理,為了方便觀察,這里bbox*1000。
四、訓練與預測
目標檢測網絡的訓練大致是如下的流程:
- 設置各種超參數
- 定義數據加載模塊 dataloader
- 定義網絡 model
- 定義損失函數 loss
- 定義優化器 optimizer
- 遍歷訓練數據,預測-計算loss-反向傳播
1.訓練
def train(train_loader, model, criterion, optimizer, epoch):model.train()losses = 0.0for i, (images, boxes) in enumerate(train_loader):images = images.cuda()anchors, predicted_locs, predicted_scores = model(images)bbox_labels, bbox_masks, cls_labels = multibox_target(anchors, boxes)optimizer.zero_grad()l = calc_loss(predicted_scores, cls_labels, predicted_locs, bbox_labels, bbox_masks).mean()l.backward()optimizer.step()if i % 10 == 0:print(f'epoch:{epoch} loss{l.item()}')losses += l.item()return losses / len(train_loader) model = SSD(21) model = model.cuda() optimizer = optim.Adam(model.parameters(), lr=1e-4) train_loss = [] for epoch in range(1):loss = train(train_loader, model, calc_loss, optimizer, epoch)train_loss.append(loss) print(train_loss) torch.save(model.state_dict(), './model.pth')我們這里做了100輪次的訓練,
2.預測
### 預測 model_predict = SSD(21) model_predict.load_state_dict(torch.load('./model.pth')) model_predict = model_predict.cuda() def predict(image, model):model.eval()anchors, bbox_preds, cls_preds = model(image.cuda())cls_probs = F.softmax(cls_preds, dim=2).permute(0, 2, 1)output = multibox_detection(cls_probs, bbox_preds, anchors)idx = [i for i, row in enumerate(output[0]) if row[0] != -1]return output[0, idx]def display(image, output, threshold):fig = plt.imshow(image.permute(1, 2, 0).numpy()[:, :, ::-1])for row in output:score = float(row[1])predict_label = int(row[0])score_class = classes[predict_label] + ':' + str(score)if score < threshold:continuebbox = [row[2:6] * torch.tensor((224, 224, 224, 224), device=row.device)]print(bbox)show_bboxes(fig.axes, bbox, score_class, 'w') image, label = next(iter(train_loader)) output = predict(image[0].unsqueeze(0), model_predict) display(image[0], output.cpu(), threshold=0.9)
打印一下真實標簽作為對比。
總結
我們的網絡,預測效果還是不錯的,但對于小物體,較為密集的物體的檢測存在明顯的問題,可以使用真實的SSD來解決這個問題,另外我們固定了圖像的尺寸為224 * 224,可以采取更大的圖像輸入。
總結
以上是生活随笔為你收集整理的一、目标检测入门VOC2012的全部內容,希望文章能夠幫你解決所遇到的問題。
- 上一篇: 如何批量转换图片格式?怎样统一修改图片格
- 下一篇: 目标检测应用竞赛 | 天池铝型材表面瑕疵