日韩性视频-久久久蜜桃-www中文字幕-在线中文字幕av-亚洲欧美一区二区三区四区-撸久久-香蕉视频一区-久久无码精品丰满人妻-国产高潮av-激情福利社-日韩av网址大全-国产精品久久999-日本五十路在线-性欧美在线-久久99精品波多结衣一区-男女午夜免费视频-黑人极品ⅴideos精品欧美棵-人人妻人人澡人人爽精品欧美一区-日韩一区在线看-欧美a级在线免费观看

歡迎訪問(wèn) 生活随笔!

生活随笔

當(dāng)前位置: 首頁(yè) > 编程资源 > 编程问答 >内容正文

编程问答

Deformable-DETR(two-stage version)中Encoder Proposal

發(fā)布時(shí)間:2023/12/14 编程问答 39 豆豆
生活随笔 收集整理的這篇文章主要介紹了 Deformable-DETR(two-stage version)中Encoder Proposal 小編覺(jué)得挺不錯(cuò)的,現(xiàn)在分享給大家,幫大家做個(gè)參考.

Deformable-DETR variants:Two-stage Deformable DETR

前言

  • two stage Deformable DETR

上圖為論文中關(guān)于two-stage的部分,介紹較少,DETR及其變體分為:one-stage\two-stage,其中one-stage的decoder部分queries的初始化是由隨機(jī)初始化的content queries(initially set zero and unlearnable) + position embeding(set randomly and learnable)。two-stage類似于RCNN,把encoder輸出的memory作為shared feature map用于ROI Proposal,并將proposals用于后面decoder的queries初始化。這樣可以加快decoder部分的收斂和穩(wěn)定性。

  • DINO中對(duì)于目前的初始化方法分為三類:
  • 第一類以DETR為首的static anchors
  • 第二類以deformable detr為首的dynamic anchors and contents
  • 第三類作者提出的dynamic anchors and static contents
  • 源碼部分

    • gen_encoder_output_proposals(mmdetection\mmdet\models\utils\transformer.py)
    # get proposals # get proposalsdef gen_encoder_output_proposals(self, memory, memory_padding_mask,spatial_shapes):"""Generate proposals from encoded memory.Args:memory (Tensor) : The output of encoder,has shape (bs, num_key, embed_dim). num_key isequal the number of points on feature map fromall level.memory_padding_mask (Tensor): Padding mask for memory.has shape (bs, num_key).spatial_shapes (Tensor): The shape of all feature maps.has shape (num_level, 2).Returns:tuple: A tuple of feature map and bbox prediction.- output_memory (Tensor): The input of decoder, \has shape (bs, num_key, embed_dim). num_key is \equal the number of points on feature map from \all levels.- output_proposals (Tensor): The normalized proposal \after a inverse sigmoid, has shape \(bs, num_keys, 4)."""N, S, C = memory.shapeproposals = []_cur = 0for lvl, (H, W) in enumerate(spatial_shapes):mask_flatten_ = memory_padding_mask[:, _cur:(_cur + H * W)].view(N, H, W, 1)valid_H = torch.sum(~mask_flatten_[:, :, 0, 0], 1)valid_W = torch.sum(~mask_flatten_[:, 0, :, 0], 1)grid_y, grid_x = torch.meshgrid(torch.linspace(0, H - 1, H, dtype=torch.float32, device=memory.device),torch.linspace(0, W - 1, W, dtype=torch.float32, device=memory.device))grid = torch.cat([grid_x.unsqueeze(-1), grid_y.unsqueeze(-1)], -1)scale = torch.cat([valid_W.unsqueeze(-1),valid_H.unsqueeze(-1)], 1).view(N, 1, 1, 2)grid = (grid.unsqueeze(0).expand(N, -1, -1, -1) + 0.5) / scalewh = torch.ones_like(grid) * 0.05 * (2.0**lvl)proposal = torch.cat((grid, wh), -1).view(N, -1, 4)proposals.append(proposal)_cur += (H * W)output_proposals = torch.cat(proposals, 1)output_proposals_valid = ((output_proposals > 0.01) &(output_proposals < 0.99)).all(-1, keepdim=True)output_proposals = torch.log(output_proposals / (1 - output_proposals))output_proposals = output_proposals.masked_fill(memory_padding_mask.unsqueeze(-1), float('inf'))output_proposals = output_proposals.masked_fill(~output_proposals_valid, float('inf'))output_memory = memoryoutput_memory = output_memory.masked_fill(memory_padding_mask.unsqueeze(-1), float(0))output_memory = output_memory.masked_fill(~output_proposals_valid,float(0))output_memory = self.enc_output_norm(self.enc_output(output_memory))return output_memory, output_proposals
    • class DeformableDetrTransformer(Transformer):
    def forward(self,mlvl_feats,mlvl_masks,query_embed,mlvl_pos_embeds,reg_branches=None,cls_branches=None,**kwargs):assert self.as_two_stage or query_embed is not Nonefeat_flatten = []mask_flatten = []lvl_pos_embed_flatten = []spatial_shapes = []for lvl, (feat, mask, pos_embed) in enumerate(zip(mlvl_feats, mlvl_masks, mlvl_pos_embeds)):bs, c, h, w = feat.shapespatial_shape = (h, w)spatial_shapes.append(spatial_shape)feat = feat.flatten(2).transpose(1, 2)mask = mask.flatten(1)pos_embed = pos_embed.flatten(2).transpose(1, 2)lvl_pos_embed = pos_embed + self.level_embeds[lvl].view(1, 1, -1)lvl_pos_embed_flatten.append(lvl_pos_embed)feat_flatten.append(feat)mask_flatten.append(mask)feat_flatten = torch.cat(feat_flatten, 1)mask_flatten = torch.cat(mask_flatten, 1)lvl_pos_embed_flatten = torch.cat(lvl_pos_embed_flatten, 1)spatial_shapes = torch.as_tensor(spatial_shapes, dtype=torch.long, device=feat_flatten.device)level_start_index = torch.cat((spatial_shapes.new_zeros((1, )), spatial_shapes.prod(1).cumsum(0)[:-1]))valid_ratios = torch.stack([self.get_valid_ratio(m) for m in mlvl_masks], 1)reference_points = \self.get_reference_points(spatial_shapes,valid_ratios,device=feat.device)feat_flatten = feat_flatten.permute(1, 0, 2) # (H*W, bs, embed_dims)lvl_pos_embed_flatten = lvl_pos_embed_flatten.permute(1, 0, 2) # (H*W, bs, embed_dims)memory = self.encoder(query=feat_flatten,key=None,value=None,query_pos=lvl_pos_embed_flatten,query_key_padding_mask=mask_flatten,spatial_shapes=spatial_shapes,reference_points=reference_points,level_start_index=level_start_index,valid_ratios=valid_ratios,**kwargs)memory = memory.permute(1, 0, 2)bs, _, c = memory.shapeif self.as_two_stage:output_memory, output_proposals = \self.gen_encoder_output_proposals(memory, mask_flatten, spatial_shapes)enc_outputs_class = cls_branches[self.decoder.num_layers](output_memory)enc_outputs_coord_unact = \reg_branches[self.decoder.num_layers](output_memory) + output_proposalstopk = self.two_stage_num_proposalstopk_proposals = torch.topk(enc_outputs_class[..., 0], topk, dim=1)[1]topk_coords_unact = torch.gather(enc_outputs_coord_unact, 1,topk_proposals.unsqueeze(-1).repeat(1, 1, 4))topk_coords_unact = topk_coords_unact.detach()reference_points = topk_coords_unact.sigmoid()init_reference_out = reference_pointspos_trans_out = self.pos_trans_norm(self.pos_trans(self.get_proposal_pos_embed(topk_coords_unact)))query_pos, query = torch.split(pos_trans_out, c, dim=2)else:query_pos, query = torch.split(query_embed, c, dim=1)query_pos = query_pos.unsqueeze(0).expand(bs, -1, -1)query = query.unsqueeze(0).expand(bs, -1, -1)reference_points = self.reference_points(query_pos).sigmoid()init_reference_out = reference_points# decoderquery = query.permute(1, 0, 2)memory = memory.permute(1, 0, 2)query_pos = query_pos.permute(1, 0, 2)inter_states, inter_references = self.decoder(query=query,key=None,value=memory,query_pos=query_pos,key_padding_mask=mask_flatten,reference_points=reference_points,spatial_shapes=spatial_shapes,level_start_index=level_start_index,valid_ratios=valid_ratios,reg_branches=reg_branches,**kwargs)inter_references_out = inter_referencesif self.as_two_stage:return inter_states, init_reference_out,\inter_references_out, enc_outputs_class,\enc_outputs_coord_unactreturn inter_states, init_reference_out, \inter_references_out, None, None

    總結(jié)

    以上是生活随笔為你收集整理的Deformable-DETR(two-stage version)中Encoder Proposal的全部?jī)?nèi)容,希望文章能夠幫你解決所遇到的問(wèn)題。

    如果覺(jué)得生活随笔網(wǎng)站內(nèi)容還不錯(cuò),歡迎將生活随笔推薦給好友。

    主站蜘蛛池模板: 久草视频2 | 色欲亚洲Av无码精品天堂 | 亚洲卡一卡二 | 精品人妻中文无码av在线 | 99久免费精品视频在线观78 | 国产成人精品一区二区三区四区 | 国产精品无码成人网站视频 | 成人免费性生活视频 | 欧美日韩精选 | 致单身男女免费观看完整版 | 中文字幕丰满乱子伦无码专区 | 色呦呦视频在线 | 少妇高潮大叫好爽喷水 | 手机av资源 | 深夜福利在线免费观看 | 每日在线观看av | 18av在线视频| 亚洲美女视频网 | 亚洲欧美激情在线观看 | 私人毛片 | 成人网免费 | 妞干网精品 | 91丨九色丨丰满 | 夜晚福利视频 | 99久久综合 | freesex性hd公交车上 | 国产成人在线一区二区 | av资源网站| 国产精品二三区 | 欧美激情图片 | 中文字幕一区三区 | 亚洲成人99| 麻豆精品一区二区三区 | 久久久久久久久久久电影 | 亚洲欧美中文字幕 | 91禁外国网站| 国产精品免费一区二区三区四区 | 日日噜噜噜噜人人爽亚洲精品 | 一区在线免费 | 成人a毛片久久免费播放 | 午夜网站在线观看 | 男人操女人逼逼视频 | 韩国美女视频在线观看18 | 黄网址在线观看 | 黄色片子网站 | 小优视频污 | 9.1成人看片免费版 日韩经典在线 | 欧美精品一区二区免费看 | 欧美日韩高清一区 | 国产精品xxxx | 国产强伦人妻毛片 | 农村妇女愉情三级 | 国产精品无码成人片 | 中文字幕一区二区三区人妻 | 国产成人综合一区二区三区 | 蜜臀久久99精品久久久画质超高清 | 国产精品国语对白 | 少妇三级全黄 | 都市激情亚洲综合 | 热热热热色 | 日日射av | av直播在线观看 | 精品一区二区三区毛片 | 亚洲九区 | 免费亚洲一区 | 羽月希奶水一区二区三区 | 天天干天天操天天爽 | 神马久久网站 | 成人黄色免费在线观看 | 午夜影院试看 | 免费成人av | 亚洲欧美日韩高清 | 成人午夜精品一区二区 | 伊人成年网 | 国产毛片毛片毛片 | 亚洲av成人无码久久精品老人 | 日韩av片在线看 | 久久午夜夜伦鲁鲁片无码免费 | 欧美成人午夜77777 | 日日射天天干 | 久一区二区三区 | 欧美色综合网站 | 麻豆精品视频 | 亚洲日本在线观看 | 日本一区二区三区四区在线观看 | 爱逼综合网 | 四虎永久地址 | 久久国产精品首页 | 欧美日韩v | 成人午夜sm精品久久久久久久 | 东北少妇bbbb搡bbb搡 | 污污视频在线观看免费 | 91免费福利| 777精品久无码人妻蜜桃 | 日韩成人短视频 | 污网站在线播放 | 日本成人a | 自拍偷拍第3页 | 日本中文视频 |