AVOD-代码理解系列(四)
生活随笔
收集整理的這篇文章主要介紹了
AVOD-代码理解系列(四)
小編覺得挺不錯的,現在分享給大家,幫大家做個參考.
AVOD-代碼理解(四)
前段時間博主準備開題去了,現在回來繼續更新。拖了好長一段時間,還是不要半途而廢。最近發現AVOD代碼的一些小trick還是存在一些不理解之處,如果哪位朋友懂了可以討論一下,我會把一些暫時不能明白的地方指出來。最近有個同事一直催促我跟他一起仔細研讀那些小tricks的地方,等哪天我把課題的想法與思路再捋捋后再研究。
RPN->NMS
上一篇我們說到使用全連接層進行物體與背景判定,并生成bbox的6個回歸值。接下來是整個網絡結構的這一部分(不包括最后的部分):
代碼塊如下:
#并沒有用.就是一個可視化、可以自己選擇是否可視化with tf.variable_scope('histograms_feature_extractor'):with tf.variable_scope('bev_vgg'):for end_point in self.bev_end_points:tf.summary.histogram(end_point, self.bev_end_points[end_point])with tf.variable_scope('img_vgg'):for end_point in self.img_end_points:tf.summary.histogram(end_point, self.img_end_points[end_point])with tf.variable_scope('histograms_rpn'):with tf.variable_scope('anchor_predictor'):fc_layers = [cls_fc6, cls_fc7, cls_fc8, objectness,reg_fc6, reg_fc7, reg_fc8, offsets]for fc_layer in fc_layers:# fix the name to avoid tf warningstf.summary.histogram(fc_layer.name.replace(':', '_'),fc_layer)# Return the proposalswith tf.variable_scope('proposals'):#手動輸入的?anchors = self.placeholders[self.PL_ANCHORS]# Decode anchor regression offsetswith tf.variable_scope('decoding'):#得到回歸后的(x,y,z,dx,dy,dz).由最初的輸入變為回歸的值regressed_anchors = anchor_encoder.offset_to_anchor(anchors, offsets)with tf.variable_scope('bev_projection'):#[[-40,40],[0,70]]#返回bev_box_corner,bev_box_corners_norm_, bev_proposal_boxes_norm = anchor_projector.project_to_bev(regressed_anchors, self._bev_extents)with tf.variable_scope('softmax'):objectness_softmax = tf.nn.softmax(objectness)with tf.variable_scope('nms'):objectness_scores = objectness_softmax[:, 1]# Do NMS on regressed anchors#實現極大值抑制non max suppression,# 其中boxes是不同boxes的坐標,scores是不同boxes預測的分數,max_boxes是保留的最大box的個數。# iou_threshold是一個閾值,去掉大于這個閾值的所有boxes?。#_nms_size=1024,0.8#篩選出來的序數top_indices = tf.image.non_max_suppression(bev_proposal_boxes_norm, objectness_scores,max_output_size=self._nms_size,iou_threshold=self._nms_iou_thresh)#選擇篩選后的anchors和objectnesstop_anchors = tf.gather(regressed_anchors, top_indices)top_objectness_softmax = tf.gather(objectness_scores,top_indices)# top_offsets = tf.gather(offsets, top_indices)# top_objectness = tf.gather(objectness, top_indices)在上訴部分,regressed_anchors = anchor_encoder.offset_to_anchor(anchors, offsets)的解釋如下:
def offset_to_anchor(anchors, offsets):"""Decodes the anchor regression predictions with theanchor. 這一部分的主要工作就是根據公式計算回歸的anchor的參數值,包括[x,y,z,dx,dy,dz]Args:anchors: A numpy array or a tensor of shape [N, 6]representing the generated anchors.offsets: A numpy array or a tensor of shape[N, 6] containing the predicted offsets in theanchor format [x, y, z, dim_x, dim_y, dim_z].Returns:anchors: A numpy array of shape [N, 6]representing the predicted anchor boxes."""#確保anchors的shape是n*6fc.check_anchor_format(anchors)fc.check_anchor_format(offsets)# x = dx * dim_x + x_anchx_pred = (offsets[:, 0] * anchors[:, 3]) + anchors[:, 0]# y = dy * dim_y + y_anchy_pred = (offsets[:, 1] * anchors[:, 4]) + anchors[:, 1]# z = dz * dim_z + z_anchz_pred = (offsets[:, 2] * anchors[:, 5]) + anchors[:, 2]tensor_format = isinstance(anchors, tf.Tensor)if tensor_format:# dim_x = exp(log(dim_x) + dx)dx_pred = tf.exp(tf.log(anchors[:, 3]) + offsets[:, 3])# dim_y = exp(log(dim_y) + dy)dy_pred = tf.exp(tf.log(anchors[:, 4]) + offsets[:, 4])# dim_z = exp(log(dim_z) + dz)dz_pred = tf.exp(tf.log(anchors[:, 5]) + offsets[:, 5])anchors = tf.stack((x_pred,y_pred,z_pred,dx_pred,dy_pred,dz_pred), axis=1)else:dx_pred = np.exp(np.log(anchors[:, 3]) + offsets[:, 3])dy_pred = np.exp(np.log(anchors[:, 4]) + offsets[:, 4])dz_pred = np.exp(np.log(anchors[:, 5]) + offsets[:, 5])anchors = np.stack((x_pred,y_pred,z_pred,dx_pred,dy_pred,dz_pred), axis=1)return anchors前文的 _, bev_proposal_boxes_norm = anchor_projector.project_to_bev( regressed_anchors, self._bev_extents)部分的解釋如下:
def project_to_bev(anchors, bev_extents):"""Projects an array of 3D anchors into bird's eye view在查看kitti的數據集后,我發現它的數據采集系統的坐標系是這樣的:Camera:x=right,y=down,z=forwardVelodyne:x=forward,y=left,z=upGPS/IMU:x=fprward,y=left,z=up那就是說明實際上是在camera坐標下進行的,所以鳥瞰圖上實際就是xz軸。之后再細看,如何進行的點云鳥瞰圖投影!Args:anchors: list of anchors in anchor format (N x 6):N x [x, y, z, dim_x, dim_y, dim_z],can be a numpy array or tensorbev_extents: xz extents of the 3d area[[min_x, max_x], [min_z, max_z]]Returns:box_corners_norm: corners as a percentage of the map size, in theformat N x [x1, y1, x2, y2]. Origin is the top left corner(原點是左上角)"""#[[-40,40],[0,70]]tensor_format = isinstance(anchors, tf.Tensor)if not tensor_format:anchors = np.asarray(anchors)#這里的鳥瞰圖坐標是xz!#x,y,z是框的中心點,dx,dy,dz則分別是寬,高,長(以人在車里的視角看)!x = anchors[:, 0]z = anchors[:, 2]half_dim_x = anchors[:, 3] / 2.0half_dim_z = anchors[:, 5] / 2.0# Calculate extent ranges#[[-40,40],[0,70]]。z的方向才是車前方。所以只有正數。在觀察kitti的數據時可以看到應該是只涉及前方的物#體,然而現有的車載感知系統實際上是車周圍除開某個盲區都有。bev_x_extents_min = bev_extents[0][0]bev_z_extents_min = bev_extents[1][0]bev_x_extents_max = bev_extents[0][1]bev_z_extents_max = bev_extents[1][1]#80bev_x_extents_range = bev_x_extents_max - bev_x_extents_min#70bev_z_extents_range = bev_z_extents_max - bev_z_extents_min# 2D corners (top left, bottom right)#左上角與右下角x1 = x - half_dim_xx2 = x + half_dim_x# Flip z co-ordinates (origin changes from bottom left to top left)#翻轉z軸,原點從左下角變為左上角。這個地方為了防止有人不能理解究竟是怎么回事,我在下面畫了一張#草圖,可以加深理解。z1 = bev_z_extents_max - (z + half_dim_z)z2 = bev_z_extents_max - (z - half_dim_z)# Stack into (N x 4)if tensor_format:bev_box_corners = tf.stack([x1, z1, x2, z2], axis=1)else:bev_box_corners = np.stack([x1, z1, x2, z2], axis=1)# Convert from original xz into bev xz, origin moves to top left#[-40,0,40,70]bev_extents_min_tiled = [bev_x_extents_min, bev_z_extents_min,bev_x_extents_min, bev_z_extents_min]bev_box_corners = bev_box_corners - bev_extents_min_tiled# Calculate normalized box corners for ROI pooling#計算ROI池的標準化方框角extents_tiled = [bev_x_extents_range, bev_z_extents_range,bev_x_extents_range, bev_z_extents_range]#標準化bev_box_corners_norm = bev_box_corners / extents_tiled#[x1,z1,x2,z2],[]return bev_box_corners, bev_box_corners_normanchor投影計算:
總結
以上是生活随笔為你收集整理的AVOD-代码理解系列(四)的全部內容,希望文章能夠幫你解決所遇到的問題。
- 上一篇: BM匹配算法
- 下一篇: 经典字符串匹配算法——KMP算法