當(dāng)前位置：首頁 > 编程资源 > 编程问答 >内容正文

编程问答

论文笔记之：Graph Attention Networks

發(fā)布時(shí)間：2025/7/14 编程问答 39 豆豆

生活随笔收集整理的這篇文章主要介紹了论文笔记之：Graph Attention Networks 小編覺得挺不錯(cuò)的,現(xiàn)在分享給大家,幫大家做個(gè)參考.

Graph Attention Networks

2018-02-06??16:52:49

Abstract：

　　本文提出一種新穎的 graph attention networks (GATs), 可以處理 graph 結(jié)構(gòu)的數(shù)據(jù)，利用 masked self-attentional layers 來解決基于 graph convolutions 以及他們的預(yù)測(cè) 的前人方法（prior methods）的不足。

　　對(duì)象：graph-structured data.

　　方法：masked self-attentional layers.?

　　目標(biāo)：to address the shortcomings of prior methods based on graph convolutions or their approximations.?

　　具體方法：By stacking layers in which nodes are able to attend over their neghborhood's feature. We enables specifying different weights to different nodes in a neighborhood, without requiring any kinds of costly matrix operation or depending on knowing the graph structure upfront.?

Introduction：

　　Background：CNN 已經(jīng)被廣泛的應(yīng)用于各種 grid 結(jié)構(gòu)的數(shù)據(jù)當(dāng)中，各種 task 都取得了不錯(cuò)的效果，如：物體檢測(cè)，語義分割，機(jī)器翻譯等等。但是，有些數(shù)據(jù)結(jié)構(gòu)，不是這種 grid-like structure 的，如：3D meshes, social networks, telecommunication networks, biological networks, brain connection。

　　已經(jīng)有多個(gè)嘗試將 RNN 和 graph 結(jié)構(gòu)的東西結(jié)合起來，來進(jìn)行表示。

　　目前，將 convolution 應(yīng)用到 the graph domain，常見的有兩種做法：

　　1. spectral approaches?

　　2. non-spectral approaches (spatial based methods)

　　文章對(duì)這兩種方法進(jìn)行了簡(jiǎn)要的介紹，回顧了一些最近的相關(guān)工作。

　　然后就提到了 Attention Mechanisms，這種思路已經(jīng)被廣泛的應(yīng)用于各種場(chǎng)景中。其中一個(gè)優(yōu)勢(shì)就是：they allow for dealing with variable sized inputs, focusing on the most relvant parts of the input to make decisions。當(dāng) attention 被用來計(jì)算 single sequence 的表示時(shí)，通常被稱為：self-attention or intra-attention。將這種方法和 CNN/RNN 結(jié)合在一起，就可以得到非常好的結(jié)果了。

　　受到最新工作的啟發(fā)，我們提出了 attention-based architecture 來執(zhí)行 node classification of graph-structured data。This idea is to compute the hidden representations of each node in the graph, by attending over its neighbors, following a self-attention stategy。這個(gè)注意力機(jī)制有如下幾個(gè)有趣的性質(zhì)：

　　1. 操作是非常有效的。

　　2. 可應(yīng)用到有不同度的 graph nodes，通過給其緊鄰指定不同的權(quán)重；

　　3. 這個(gè)模型可以直接應(yīng)用到 inductive learning problems, including tasks where the model has to generalize to completely unseen graphs.??

　　Our approach of sharing a neural network computation across edges is reminiscent of the formulation of relational networks (Santoro et al., 2017), wherein relations between objects (regional features from an image extracted by a convolutional neural network) are aggregated across all object pairs, by employing a shared mechanism.?　　

　　作者在三個(gè)數(shù)據(jù)集上進(jìn)行了實(shí)驗(yàn)，達(dá)到頂尖的效果，表明了 attention-based models 在處理任意結(jié)構(gòu)的 graph 的潛力。

GAT Architecture ：

1. Graph Attentional Layer?

　　本文所提出 attentional layer 的輸入是一組節(jié)點(diǎn)特征（a set of node features），?其中，N 是節(jié)點(diǎn)的個(gè)數(shù)，F 是每個(gè)節(jié)點(diǎn)的特征數(shù)。該層產(chǎn)生一組新的節(jié)點(diǎn)特征，作為其輸出，即：。

　　為了得到充分表達(dá)能力，將輸入特征轉(zhuǎn)換為高層特征，至少我們需要一個(gè)可學(xué)習(xí)的線性轉(zhuǎn)換（one learnable linear transformation）。為了達(dá)到該目標(biāo)，作為初始步驟，一個(gè)共享的線性轉(zhuǎn)換，參數(shù)化為? weight matrix，W，應(yīng)用到每一個(gè)節(jié)點(diǎn)上。我們?nèi)缓笤诿恳粋€(gè)節(jié)點(diǎn)上，進(jìn)行 self-attention --- a shared attentional mechanism a：計(jì)算 attention coefficients?

　　表明 node j's feature 對(duì) node i 的重要性。最 general 的形式，該模型允許 every node to attend on every other node, dropping all structural information. 我們將這種 graph structure 通過執(zhí)行 masked attention 來注射到該機(jī)制當(dāng)中 ---? 我們僅僅對(duì) nodes $j$ 計(jì)算 $e_{ij}$，其中，graph 中節(jié)點(diǎn) i 的一些近鄰，記為：$N_{i}$。在我們的實(shí)驗(yàn)當(dāng)中，這就是 the first-order neighbors of $i$。

　　為了使得系數(shù)簡(jiǎn)單的適應(yīng)不同的節(jié)點(diǎn)，我們用 softmax function 對(duì)所有的 j 進(jìn)行歸一化：

　　在我們的實(shí)驗(yàn)當(dāng)中，該 attention 機(jī)制 a? 是一個(gè) single-layer feedforward neural network，參數(shù)化為權(quán)重向量?。全部展開，用 attention 機(jī)制算出來的系數(shù)，可以表達(dá)為：

　　其中，$*^T$ 代表轉(zhuǎn)置，|| 代表 concatenation operation。

　　一旦得到了，該歸一化的 attention 系數(shù)可以用來計(jì)算對(duì)應(yīng)特征的線性加權(quán)，可以得到最終的每個(gè)節(jié)點(diǎn)的輸出向量：

　　為了穩(wěn)定 self-attention 的學(xué)習(xí)過程，我們發(fā)現(xiàn)將我們的機(jī)制拓展到 multi-head attention 是有好處的，類似于：Attention is all you need. 特別的，K 個(gè)獨(dú)立的 attention 機(jī)制執(zhí)行公式（4）的轉(zhuǎn)換，然后將其特征進(jìn)行組合，得到下面的特征輸出：

　　特別的，如果我們執(zhí)行在 network 的最后輸出層執(zhí)行該 multi-head attention，concatenation 就不再是必須的了，相反的，我們采用 averaging，推遲執(zhí)行最終非線性，

?　　所提出 attention 加權(quán)機(jī)制的示意圖，如下所示：

總結(jié)

以上是生活随笔為你收集整理的论文笔记之：Graph Attention Networks的全部?jī)?nèi)容，希望文章能夠幫你解決所遇到的問題。

如果覺得生活随笔網(wǎng)站內(nèi)容還不錯(cuò)，歡迎將生活随笔推薦給好友。

上一篇： nginx: [emerg] socke
下一篇： HTTP权威指南阅读笔记五：Web服务器