Single Shot Multibox Detection (SSD)实战(下)
Single Shot Multibox Detection (SSD)實戰(下)
- Training
將逐步解釋如何訓練SSD模型進行目標檢測。
2.1. Data
Reading and Initialization
創建的Pikachu數據集。
batch_size = 32
train_iter, _ = d2l.load_data_pikachu(batch_size)
Pikachu數據集中有1個類別。在定義模塊之后,我們需要初始化模型參數并定義優化算法。
ctx, net = d2l.try_gpu(), TinySSD(num_classes=1)
net.initialize(init=init.Xavier(), ctx=ctx)
trainer = gluon.Trainer(net.collect_params(), ‘sgd’,
{‘learning_rate’: 0.2, ‘wd’: 5e-4})
2.2. Defining Loss and Evaluation Functions
目標檢測有兩種損失。一是錨箱類損失。為此,我們可以簡單地重用我們在圖像分類中使用的交叉熵損失函數。第二個損失是正錨箱偏移損失。偏移量預測是一個規范化問題。但是,在這里,我們沒有使用前面介紹的平方損失。相反,我們使用L1范數損失,即預測值與地面真實值之差的絕對值。mask變量bbox_masks從損失計算中刪除負錨定框和填充錨定框。最后,我們加入錨箱類別和補償損失,以找到模型的最終損失函數。
cls_loss = gluon.loss.SoftmaxCrossEntropyLoss()
bbox_loss = gluon.loss.L1Loss()
def calc_loss(cls_preds, cls_labels,
bbox_preds, bbox_labels, bbox_masks):
cls = cls_loss(cls_preds, cls_labels)bbox = bbox_loss(bbox_preds * bbox_masks,
bbox_labels * bbox_masks)
return cls + bbox
我們可以用準確率來評價分類結果。當我們使用L1范數損失,我們將使用平均絕對誤差來評估包圍盒預測結果。
def cls_eval(cls_preds, cls_labels):
# Because the category prediction results are placed in the final# dimension, argmax must specify this dimensionreturn float((cls_preds.argmax(axis=-1) == cls_labels).sum())
def bbox_eval(bbox_preds, bbox_labels, bbox_masks):
return float((np.abs((bbox_labels - bbox_preds) bbox_masks)).sum())
2.3. Training the Model
在模型訓練過程中,我們必須在模型的正向計算過程中生成多尺度錨盒(anchors),并預測每個錨盒的類別(cls_preds)和偏移量(bbox_preds)。然后,我們根據標簽信息Y標記每個錨定框的類別(cls_labels)和偏移量(bbox_labels)。最后,我們使用預測和標記的類別和偏移量值計算損失函數。為了簡化代碼,這里不計算訓練數據集。
num_epochs, timer = 20, d2l.Timer()
animator = d2l.Animator(xlabel=‘epoch’, xlim=[1, num_epochs],
legend=[‘class error’, ‘bbox mae’])
for epoch in range(num_epochs):
# accuracy_sum, mae_sum, num_examples, num_labelsmetric = d2l.Accumulator(4)train_iter.reset() # Read data from the start.for batch in train_iter:timer.start()X = batch.data[0].as_in_ctx(ctx)Y = batch.label[0].as_in_ctx(ctx)with autograd.record():# Generate multiscale
anchor boxes and predict the category and
# offset of eachanchors, cls_preds, bbox_preds = net(X)# Label the category
and offset of each anchor box
bbox_labels, bbox_masks,
cls_labels = npx.multibox_target(
anchors, Y, cls_preds.transpose(0,
2, 1))
# Calculate the loss
function using the predicted and labeled
# category and offset valuesl = calc_loss(cls_preds,
cls_labels, bbox_preds, bbox_labels,
bbox_masks)l.backward()trainer.step(batch_size)metric.add(cls_eval(cls_preds, cls_labels), cls_labels.size,bbox_eval(bbox_preds, bbox_labels, bbox_masks),bbox_labels.size)cls_err, bbox_mae = 1-metric[0]/metric[1], metric[2]/metric[3]animator.add(epoch+1, (cls_err, bbox_mae))
print(‘class err %.2e, bbox mae %.2e’ % (cls_err, bbox_mae))
print(’%.1f examples/sec on %s’ % (train_iter.num_image/timer.stop(),
ctx))
class err 2.35e-03, bbox mae 2.68e-03
4315.5 examples/sec on gpu(0)
3. Prediction
在預測階段,我們要檢測圖像中所有感興趣的對象。下面,我們讀取測試圖像并轉換其大小。然后,我們將其轉換為卷積層所需的四維格式。
img = image.imread(’…/img/pikachu.jpg’)
feature = image.imresize(img, 256, 256).astype(‘float32’)
X = np.expand_dims(feature.transpose(2, 0, 1), axis=0)
利用MultiBoxDetection函數,我們根據錨定框及其預測的偏移量來預測邊界框。然后,我們使用非最大值抑制來移除類似的邊界框。
def predict(X):
anchors, cls_preds, bbox_preds = net(X.as_in_ctx(ctx))cls_probs = npx.softmax(cls_preds).transpose(0, 2, 1)output = npx.multibox_detection(cls_probs, bbox_preds, anchors)idx = [i for i, row in enumerate(output[0]) if row[0] != -1]return output[0, idx]
output = predict(X)
最后,我們取置信度至少為0.3的所有邊界框,并將它們顯示為最終輸出。
def display(img, output, threshold):
d2l.set_figsize((5, 5))fig = d2l.plt.imshow(img.asnumpy())for row in output:score = float(row[1])if score < threshold:continueh, w = img.shape[0:2]bbox = [row[2:6] * np.array((w, h, w, h), ctx=row.ctx)]d2l.show_bboxes(fig.axes, bbox, '%.2f' % score, 'w')
display(img, output, threshold=0.3)
4. Loss Function
由于空間的限制,我們在本實驗中忽略了SSD模型的一些實現細節。您能否在以下方面進一步改進該模型?
For the predicted offsets, replace L1L1 norm loss with L1 regularization loss. This loss function uses a square function around zero for greater smoothness. This is the regularized area controlled by the hyperparameter σσ:
When σσ is large, this loss is similar to the L1L1 norm loss. When the value is small, the loss function is smoother.
sigmas = [10, 1, 0.5]
lines = [’-’, ‘–’, ‘-.’]
x = np.arange(-2, 2, 0.1)
d2l.set_figsize()
for l, s in zip(lines, sigmas):
y = npx.smooth_l1(x, scalar=s)d2l.plt.plot(x.asnumpy(), y.asnumpy(), l, label='sigma=%.1f' % s)
d2l.plt.legend
def focal_loss(gamma, x):
return -(1 - x) ** gamma * np.log(x)
x = np.arange(0.01, 1, 0.01)
for l, gamma in zip(lines, [0, 1, 5]):
y = d2l.plt.plot(x.asnumpy(), focal_loss(gamma, x).asnumpy(), l,label='gamma=%.1f' % gamma)
d2l.plt.legend();
Training and Prediction
When an object is relatively large compared to the image, the model normally adopts a larger input image size.
This generally produces a large number of negative anchor boxes when labeling anchor box categories. We can sample the negative anchor boxes to better balance the data categories. To do this, we can set the MultiBoxTarget function’s negative_mining_ratio parameter.
Assign hyper-parameters with different weights to the anchor box category loss and positive anchor box offset loss in the loss
function.
Refer to the SSD paper. What methods can be used to evaluate the precision of object detection models?
- Summary
SSD is a multiscale object detection model. This model generates different numbers of anchor boxes of different sizes based on the base network block and each multiscale feature block and predicts the categories and offsets of the anchor boxes to detect objects of different sizes.
During SSD model training, the loss function is calculated using the
predicted and labeled category and offset values.
總結
以上是生活随笔為你收集整理的Single Shot Multibox Detection (SSD)实战(下)的全部內容,希望文章能夠幫你解決所遇到的問題。
- 上一篇: Single Shot Multibox
- 下一篇: 基于Kaggle的图像分类(CIFAR-