當前位置：首頁 > 编程资源 > 编程问答 >内容正文

编程问答

Attention的梳理、随想与尝试

發布時間：2024/1/17 编程问答 30 豆豆

生活随笔收集整理的這篇文章主要介紹了 Attention的梳理、随想与尝试小編覺得挺不錯的,現在分享給大家,幫大家做個參考.

轉自：https://zhuanlan.zhihu.com/p/38281113

（一）深度學習中的直覺

3 X 1 and 1 X 3 代替 3 X 3

LSTM中的門設計

Attention機制的本質來自于人類視覺注意力機制。人們視覺在感知東西的時候一般不會是一個場景從到頭看到尾每次全部都看，而往往是根據需求觀察注意特定的一部分。而且當人們發現一個場景經常在某部分出現自己想觀察的東西時，人們會進行學習在將來再出現類似場景時把注意力放到該部分上：

將更多的注意力聚焦到有用的部分，Attention的本質就是加權。但值得注意的是，同一張圖片，人在做不同任務的時候，注意力的權重分布應該是不同的。

基于以上的直覺，Attention可以用于：

學習權重分布：

這個加權可以是保留所有分量均做加權（即soft attention）；也可以是在分布中以某種采樣策略選取部分分量（即hard attention），此時常用RL來做；
這個加權可以作用在原圖上，也可以作用在特征圖上；
這個加權可以在時間維度、空間維度、mapping維度以及feature維度。

2. 任務聚焦、解耦（通過attention mask）

多任務模型，可以通過Attention對feature進行權重再分配，聚焦各自關鍵特征。

（二）發展歷程

Attention機制最早是在視覺圖像領域提出來的，應該是在九幾年思想就提出來了，但是真正火起來應該算是2014年google mind團隊的這篇論文《Recurrent Models of Visual Attention》，他們在RNN模型上使用了attention機制來進行圖像分類。隨后，Bahdanau等人在論文《Neural Machine Translation by Jointly Learning to Align and Translate》中，使用類似attention的機制在機器翻譯任務上將翻譯和對齊同時進行，他們的工作算是第一個將attention機制應用到NLP領域中。接著attention機制被廣泛應用在基于RNN/CNN等神經網絡模型的各種NLP任務中。2017年，google機器翻譯團隊發表的《Attention is all you need》中大量使用了自注意力（self-attention）機制來學習文本表示。自注意力機制也成為了大家近期的研究熱點，并在各種NLP任務上進行探索。下圖展示了attention研究進展的大概趨勢：

（三）Attention設計

3.1 定義

Attention(Q,K,V)=softmax(\frac{QK^T}{\sqrt{d_k}})V

Google 2017年論文Attention is All you need中，為Attention做了一個抽象定義：

An attention function can be described as mapping a query and a set of key-value pairs to an output, where the query, keys, values, and output are all vectors. The output is computed as a weighted sum of the values, where the weight assigned to each value is computed by a compatibility function of the query with the corresponding key.
注意力是將一個查詢和鍵值對映射到輸出的方法，Q、K、V均為向量，輸出通過對V進行加權求和得到，權重就是Q、K相似度。

計算Attention Weighted Value有三個步驟：

計算Q、K相似度得分

得分歸一化(Attention Weight)

根據得分對V進行加權

3.2 分類

3.2.1 按輸出分類

Soft attention
Hard attention

soft attention輸出注意力分布的概率值，hard attention 輸出onehot向量。

3.2.2 按關注的范圍分類

Effective Approaches to Attention-based Neural Machine Translation

Globle attention

全局注意力顧名思義對整個feature mapping進行注意力加權。

Local attention

局部注意力有兩種，第一種首先通過一個hard-globle-attention鎖定位置，在位置上下某個local窗口進行注意力加權。

第二種是在某中業務場景下，比如對于一個問題"Where is the football?", "where"和"football’"在句子中起著總結性的作用。而這種attention只和句子中每個詞自身相關。Location-based的意思就是，這里的attention沒有其他額外所關注的對象，即attention的向量就是q本身，即Q=K，其attention score為：

?$score(Q,K)=activation(W^TQ+b)?$

3.2.3 按計算score的函數不同

（四）業務應用

chatbot意圖分類

采用：Self-attention + Dot-product-score

?效果：

觀察到：

attention自動mask了<PAD>字符；

對于分類作用更大的關鍵詞，給予了更高的attention weight；

（四）思考

多步負荷預測

多任務多輸出模型，每步預測對于特征的關注點應該不一樣，學習一個feature mapping 的mask attention。

異常數據mask負荷預測

在原始feature mapping 后接一個attention，自動mask 異常輸入，提升模型的魯棒性。

（六）Reference

Paper

Hierarchical Attention Networks for Document Classification

Attention Is All You Need

Neural Machine Translation by Jointly Learning to Align and Translate

Show, Attend and Tell: Neural Image Caption Generation with Visual Attention

Fully Convolutional Network with Task Partitioning for Inshore Ship Detection in Optical Remote Sensing Images

Effective Approaches to Attention-based Neural Machine Translation

github

pytorch-attention

seq2seq

PyTorch-Batch-Attention-Seq2seq

Blog

一文讀懂「Attention is All You Need」| 附代碼實現

Attention Model（mechanism）的套路

【計算機視覺】深入理解Attention機制

自然語言處理中的自注意力機制

Encoder-Decoder模型和Attention模型

創作挑戰賽新人創作獎勵來咯，堅持創作打卡瓜分現金大獎

總結

以上是生活随笔為你收集整理的Attention的梳理、随想与尝试的全部內容，希望文章能夠幫你解決所遇到的問題。

如果覺得生活随笔網站內容還不錯，歡迎將生活随笔推薦給好友。

上一篇： sklearn API 文档
下一篇： Elasticsearch的Scroll