日韩性视频-久久久蜜桃-www中文字幕-在线中文字幕av-亚洲欧美一区二区三区四区-撸久久-香蕉视频一区-久久无码精品丰满人妻-国产高潮av-激情福利社-日韩av网址大全-国产精品久久999-日本五十路在线-性欧美在线-久久99精品波多结衣一区-男女午夜免费视频-黑人极品ⅴideos精品欧美棵-人人妻人人澡人人爽精品欧美一区-日韩一区在线看-欧美a级在线免费观看

歡迎訪問(wèn) 生活随笔!

生活随笔

當(dāng)前位置: 首頁(yè) > 编程资源 > 编程问答 >内容正文

编程问答

Deformable-DETR(two-stage version)中Encoder Proposal

發(fā)布時(shí)間:2023/12/14 编程问答 39 豆豆
生活随笔 收集整理的這篇文章主要介紹了 Deformable-DETR(two-stage version)中Encoder Proposal 小編覺(jué)得挺不錯(cuò)的,現(xiàn)在分享給大家,幫大家做個(gè)參考.

Deformable-DETR variants:Two-stage Deformable DETR

前言

  • two stage Deformable DETR

上圖為論文中關(guān)于two-stage的部分,介紹較少,DETR及其變體分為:one-stage\two-stage,其中one-stage的decoder部分queries的初始化是由隨機(jī)初始化的content queries(initially set zero and unlearnable) + position embeding(set randomly and learnable)。two-stage類似于RCNN,把encoder輸出的memory作為shared feature map用于ROI Proposal,并將proposals用于后面decoder的queries初始化。這樣可以加快decoder部分的收斂和穩(wěn)定性。

  • DINO中對(duì)于目前的初始化方法分為三類:
  • 第一類以DETR為首的static anchors
  • 第二類以deformable detr為首的dynamic anchors and contents
  • 第三類作者提出的dynamic anchors and static contents
  • 源碼部分

    • gen_encoder_output_proposals(mmdetection\mmdet\models\utils\transformer.py)
    # get proposals # get proposalsdef gen_encoder_output_proposals(self, memory, memory_padding_mask,spatial_shapes):"""Generate proposals from encoded memory.Args:memory (Tensor) : The output of encoder,has shape (bs, num_key, embed_dim). num_key isequal the number of points on feature map fromall level.memory_padding_mask (Tensor): Padding mask for memory.has shape (bs, num_key).spatial_shapes (Tensor): The shape of all feature maps.has shape (num_level, 2).Returns:tuple: A tuple of feature map and bbox prediction.- output_memory (Tensor): The input of decoder, \has shape (bs, num_key, embed_dim). num_key is \equal the number of points on feature map from \all levels.- output_proposals (Tensor): The normalized proposal \after a inverse sigmoid, has shape \(bs, num_keys, 4)."""N, S, C = memory.shapeproposals = []_cur = 0for lvl, (H, W) in enumerate(spatial_shapes):mask_flatten_ = memory_padding_mask[:, _cur:(_cur + H * W)].view(N, H, W, 1)valid_H = torch.sum(~mask_flatten_[:, :, 0, 0], 1)valid_W = torch.sum(~mask_flatten_[:, 0, :, 0], 1)grid_y, grid_x = torch.meshgrid(torch.linspace(0, H - 1, H, dtype=torch.float32, device=memory.device),torch.linspace(0, W - 1, W, dtype=torch.float32, device=memory.device))grid = torch.cat([grid_x.unsqueeze(-1), grid_y.unsqueeze(-1)], -1)scale = torch.cat([valid_W.unsqueeze(-1),valid_H.unsqueeze(-1)], 1).view(N, 1, 1, 2)grid = (grid.unsqueeze(0).expand(N, -1, -1, -1) + 0.5) / scalewh = torch.ones_like(grid) * 0.05 * (2.0**lvl)proposal = torch.cat((grid, wh), -1).view(N, -1, 4)proposals.append(proposal)_cur += (H * W)output_proposals = torch.cat(proposals, 1)output_proposals_valid = ((output_proposals > 0.01) &(output_proposals < 0.99)).all(-1, keepdim=True)output_proposals = torch.log(output_proposals / (1 - output_proposals))output_proposals = output_proposals.masked_fill(memory_padding_mask.unsqueeze(-1), float('inf'))output_proposals = output_proposals.masked_fill(~output_proposals_valid, float('inf'))output_memory = memoryoutput_memory = output_memory.masked_fill(memory_padding_mask.unsqueeze(-1), float(0))output_memory = output_memory.masked_fill(~output_proposals_valid,float(0))output_memory = self.enc_output_norm(self.enc_output(output_memory))return output_memory, output_proposals
    • class DeformableDetrTransformer(Transformer):
    def forward(self,mlvl_feats,mlvl_masks,query_embed,mlvl_pos_embeds,reg_branches=None,cls_branches=None,**kwargs):assert self.as_two_stage or query_embed is not Nonefeat_flatten = []mask_flatten = []lvl_pos_embed_flatten = []spatial_shapes = []for lvl, (feat, mask, pos_embed) in enumerate(zip(mlvl_feats, mlvl_masks, mlvl_pos_embeds)):bs, c, h, w = feat.shapespatial_shape = (h, w)spatial_shapes.append(spatial_shape)feat = feat.flatten(2).transpose(1, 2)mask = mask.flatten(1)pos_embed = pos_embed.flatten(2).transpose(1, 2)lvl_pos_embed = pos_embed + self.level_embeds[lvl].view(1, 1, -1)lvl_pos_embed_flatten.append(lvl_pos_embed)feat_flatten.append(feat)mask_flatten.append(mask)feat_flatten = torch.cat(feat_flatten, 1)mask_flatten = torch.cat(mask_flatten, 1)lvl_pos_embed_flatten = torch.cat(lvl_pos_embed_flatten, 1)spatial_shapes = torch.as_tensor(spatial_shapes, dtype=torch.long, device=feat_flatten.device)level_start_index = torch.cat((spatial_shapes.new_zeros((1, )), spatial_shapes.prod(1).cumsum(0)[:-1]))valid_ratios = torch.stack([self.get_valid_ratio(m) for m in mlvl_masks], 1)reference_points = \self.get_reference_points(spatial_shapes,valid_ratios,device=feat.device)feat_flatten = feat_flatten.permute(1, 0, 2) # (H*W, bs, embed_dims)lvl_pos_embed_flatten = lvl_pos_embed_flatten.permute(1, 0, 2) # (H*W, bs, embed_dims)memory = self.encoder(query=feat_flatten,key=None,value=None,query_pos=lvl_pos_embed_flatten,query_key_padding_mask=mask_flatten,spatial_shapes=spatial_shapes,reference_points=reference_points,level_start_index=level_start_index,valid_ratios=valid_ratios,**kwargs)memory = memory.permute(1, 0, 2)bs, _, c = memory.shapeif self.as_two_stage:output_memory, output_proposals = \self.gen_encoder_output_proposals(memory, mask_flatten, spatial_shapes)enc_outputs_class = cls_branches[self.decoder.num_layers](output_memory)enc_outputs_coord_unact = \reg_branches[self.decoder.num_layers](output_memory) + output_proposalstopk = self.two_stage_num_proposalstopk_proposals = torch.topk(enc_outputs_class[..., 0], topk, dim=1)[1]topk_coords_unact = torch.gather(enc_outputs_coord_unact, 1,topk_proposals.unsqueeze(-1).repeat(1, 1, 4))topk_coords_unact = topk_coords_unact.detach()reference_points = topk_coords_unact.sigmoid()init_reference_out = reference_pointspos_trans_out = self.pos_trans_norm(self.pos_trans(self.get_proposal_pos_embed(topk_coords_unact)))query_pos, query = torch.split(pos_trans_out, c, dim=2)else:query_pos, query = torch.split(query_embed, c, dim=1)query_pos = query_pos.unsqueeze(0).expand(bs, -1, -1)query = query.unsqueeze(0).expand(bs, -1, -1)reference_points = self.reference_points(query_pos).sigmoid()init_reference_out = reference_points# decoderquery = query.permute(1, 0, 2)memory = memory.permute(1, 0, 2)query_pos = query_pos.permute(1, 0, 2)inter_states, inter_references = self.decoder(query=query,key=None,value=memory,query_pos=query_pos,key_padding_mask=mask_flatten,reference_points=reference_points,spatial_shapes=spatial_shapes,level_start_index=level_start_index,valid_ratios=valid_ratios,reg_branches=reg_branches,**kwargs)inter_references_out = inter_referencesif self.as_two_stage:return inter_states, init_reference_out,\inter_references_out, enc_outputs_class,\enc_outputs_coord_unactreturn inter_states, init_reference_out, \inter_references_out, None, None

    總結(jié)

    以上是生活随笔為你收集整理的Deformable-DETR(two-stage version)中Encoder Proposal的全部?jī)?nèi)容,希望文章能夠幫你解決所遇到的問(wèn)題。

    如果覺(jué)得生活随笔網(wǎng)站內(nèi)容還不錯(cuò),歡迎將生活随笔推薦給好友。

    主站蜘蛛池模板: 茄子视频色| 午夜视频福利在线 | 国产成人免费片在线观看 | 欧美不卡二区 | 亚洲国产精品激情在线观看 | 一区二区三区黄 | 激情影音| 国产羞羞 | h片观看| 黄色正能量网站 | 欧洲一区二区视频 | 亚洲一区二区三区视频在线 | 国产第一页在线 | 色婷婷激情综合 | 久草视频免费看 | 午夜第一页 | 久一精品 | 白浆在线播放 | 热99精品 | 黄在线网站| 国产h在线| 羞羞影院体验区 | 爱情岛论坛亚洲自拍 | 嫩草一二三 | 超碰黑人| 亚洲视频一区在线 | 福利视频亚洲 | 蜜桃一区二区 | aⅴ天堂网 | 亚洲一区 中文字幕 | 亚洲无限观看 | 4438x全国最大成人网 | 老牛影视av老牛影视av | 木下凛凛子av一区二区三区 | 天天躁夜夜躁 | 欧美黑人性xxx | 国产精九九网站漫画 | 极品少妇xxx | 国产福利在线视频 | 91av在线播放 | caoprom在线 | 91精品成人 | 人妻久久久一区二区三区 | 日韩欧美麻豆 | 欧美aⅴ视频 | 亚洲热在线视频 | 亚洲人成色777777老人头 | 巨大乳の揉んで乳榨り奶水 | 婷婷色伊人 | www99热 | 国产成人精品免费网站 | 欧美精品偷拍 | 久久精精品久久久久噜噜 | 中国国语农村大片 | 午夜中文字幕 | av最新天堂 | 国产免费av片在线观看 | 国产中文字幕在线免费观看 | 日韩欧美在线观看一区 | 精品久久久久久久久久久 | 素人fc2av清纯18岁 | 波多野结衣av电影 | av色婷婷 | 1024久久| 国产又黄又猛的视频 | 欧美香蕉 | 爽爽视频在线观看 | 91精品啪在线观看国产 | 15p亚洲| www.青青草 | 天天插天天操 | 午夜影剧院| 在线观看日韩一区二区 | 国产毛片a | 99久久人妻无码中文字幕系列 | 尤物国产在线 | 精品国产乱码久久久久久88av | 99精品一区二区 | 国产精品久久久亚洲 | 国产第9页 | 无码粉嫩虎白一线天在线观看 | 亚洲av成人无码久久精品 | 麻豆日韩 | 日韩视频福利 | 成人网一区 | 国产福利资源在线 | 国产欧美精品区一区二区三区 | 中文字幕免费在线看线人动作大片 | 国产在线h| 国产视频不卡一区 | 对白超刺激精彩粗话av | 少妇高潮一69aⅹ | 欧洲视频一区二区三区 | 美女扒开粉嫩的尿囗给男生桶 | 国产亚洲欧美精品久久久久久 | 在线 色| 向日葵视频在线 | 樱花影院最新免费观看攻略 | 国产乱码精品一区二三区蜜臂 |