當前位置：首頁 > 编程资源 > 编程问答 >内容正文

编程问答

架构垂直伸缩和水平伸缩区别_简单的可伸缩图神经网络

發布時間：2023/12/15 编程问答 46 豆豆

生活随笔收集整理的這篇文章主要介紹了架构垂直伸缩和水平伸缩区别_简单的可伸缩图神经网络小編覺得挺不錯的,現在分享給大家,幫大家做個參考.

架構垂直伸縮和水平伸縮區別

巨型圖上的深度學習 (Deep learning on giant graphs)

TL;DR: One of the challenges that have so far precluded the wide adoption of graph neural networks in industrial applications is the difficulty to scale them to large graphs such as the Twitter follow graph. The interdependence between nodes makes the decomposition of the loss function into individual nodes’ contributions challenging. In this post, we describe a simple graph neural network architecture developed at Twitter that can work on very large graphs.

TL; DR： 迄今為止，阻礙圖神經網絡在工業應用中廣泛采用的挑戰之一是難以將它們縮放到大型圖(例如Twitter跟隨圖)。節點之間的相互依賴性使損失函數分解成單個節點的貢獻變得困難。在這篇文章中，我們描述了一種在Twitter上開發的簡單圖神經網絡架構，該架構可以處理非常大的圖。

This post was co-authored with Fabrizo Frasca and Emanuele Rossi.

這篇文章與 Fabrizo Frasca 和 Emanuele Rossi 合著。

Graph Neural Networks (GNNs) are a class of ML models that have emerged in recent years for learning on graph-structured data. GNNs have been successfully applied to model systems of relation and interactions in a variety of different domains, including social science, computer vision and graphics, particle physics, chemistry, and medicine. Until recently, most of the research in the field has focused on developing new GNN models and testing them on small graphs (with Cora, a citation network containing only about 5K nodes, still being widely used [1]); relatively little effort has been invested in dealing with large-scale applications. On the other hand, industrial problems often deal with giant graphs, such as Twitter or Facebook social networks containing hundreds of millions of nodes and billions of edges. A big part of methods described in the literature are unsuitable for these settings.

摹拍攝和神經網絡(GNNS)是一類ML車型已經出現在最近幾年的學習上圖的結構化數據。 GNN已成功應用于各種不同領域的關系和相互作用的模型系統，包括社會科學，計算機視覺和圖形，粒子物理學，化學和醫學。直到最近，該領域的大多數研究都集中在開發新的GNN模型并在小圖形上對其進行測試(使用僅包含約5K節點的引用網絡Cora，至今仍在廣泛使用[1])；在處理大規模應用程序上投入了相對較少的精力。另一方面，工業問題通常涉及巨型圖，例如包含數億節點和數十億邊緣的Twitter或Facebook社交網絡。文獻中描述的方法的很大一部分不適用于這些設置。

In a nutshell, graph neural networks operate by aggregating the features from local neighbour nodes. Arranging the d-dimensional node features into an n×d matrix X (here n denotes the number of nodes), the simplest convolution-like operation on graphs implemented in the popular GCN model [2] combines node-wise transformations with feature diffusion across adjacent nodes

簡而言之，圖神經網絡通過聚集來自本地鄰居節點的特征進行操作。將d維節點特征排列到n × d矩陣X中 (此處n表示節點數)，流行的GCN模型 [2]中實現的對圖的最簡單的類似卷積運算將結點變換與整個特征擴散相鄰節點

Y = ReLU(AXW).

Y = ReLU( AXW )。

Here W is a learnable matrix shared across all nodes and A is a linear diffusion operator amounting to a weighted average of features in a neighbourhood [3]. Multiple layers of this form can be applied in sequence like in traditional CNNs. Graph neural networks can be designed to make predictions at the level of nodes (e.g. for applications such as detecting malicious users in a social network), edges (e.g. for link prediction, a typical scenario in recommender systems), or the entire graphs (e.g. predicting chemical properties of molecular graphs). The node-wise classification task can be carried out, for instance, by a two-layer GCN of the form

這里W是在所有節點上共享的可學習矩陣， A是線性擴散算子，等于鄰域中特征的加權平均值[3]。這種形式的多層可以像傳統的CNN一樣順序應用。可以將圖神經網絡設計為在節點(例如，用于諸如檢測社交網絡中的惡意用戶的應用程序)，邊緣(例如，用于鏈接預測，推薦系統中的典型場景)，或整個圖(例如，預測分子圖的化學性質)。節點分類任務可以例如通過以下形式的兩層GCN來執行

Y = softmax(A ReLU(AXW)W’).

Y = softmax( A ReLU( AXW ) W ')。

Why is scaling graph neural networks challenging? In the aforementioned node-wise prediction problem, the nodes play the role of samples on which the GNN is trained. In traditional machine learning settings, it is typically assumed that the samples are drawn from some distribution in a statistically independent manner. This, in turn, allows to decompose the loss function into the individual sample contributions and employ stochastic optimisation techniques working with small subsets (mini-batches) of the training data at a time. Virtually every deep neural network architecture is nowadays trained using mini-batches.

w ^ HY被縮放圖形神經網絡的挑戰？在上述的逐節點預測問題中，節點扮演訓練GNN的樣本的角色。在傳統的機器學習設置中，通常假設樣本是從某種分布中以統計獨立的方式提取的。反過來，這又允許將損失函數分解為單個樣本貢獻，并采用隨機優化技術來一次處理訓練數據的小子集(微型批次)。如今，幾乎每個深度神經網絡體系結構都使用小型批次進行培訓。

In graphs, on the other hand, the fact that the nodes are inter-related via edges creates statistical dependence between samples in the training set. Moreover, because of the statistical dependence between nodes, sampling can introduce bias — for instance it can make some nodes or edges appear more frequently than on others in the training set — and this ‘side-effect’ would need proper handling. Last but not least, one has to guarantee that the sampled subgraph maintains a meaningful structure that the GNN can exploit.

另一方面，在圖中，節點通過邊相互關聯的事實在訓練集中的樣本之間產生了統計依賴性。此外，由于節點之間的統計依賴性，采樣可能會引入偏差(例如，它可能使某些節點或邊緣比訓練集中的其他節點或邊緣出現得更頻繁)，并且這種“副作用”需要適當處理。最后但并非最不重要的一點是，必須保證采樣的子圖保持GNN可以利用的有意義的結構。

In many early works on graph neural networks, these problems were swept under the carpet: architectures such as GCN and ChebNet [2], MoNet [4] and GAT [5] were trained using full-batch gradient descent. This has led to the necessity to hold the whole adjacency matrix of the graph and the node features in memory. As a result, for example, an L-layer GCN model has time complexity 𝒪(Lnd2) and memory complexity 𝒪(Lnd +Ld2) [7], prohibitive even for modestly-sized graphs.

在許多關于圖神經網絡的早期工作中，這些問題被掩蓋了：使用全梯度梯度下降訓練了諸如GCN和ChebNet [2]，MoNet [4]和GAT [5]之類的體系結構。這導致必須將圖形的整個鄰接矩陣和節點特征保存在內存中。其結果是，例如，一個L -層GCN模型具有時間復雜度𝒪(LND2)和存儲復雜𝒪(LND + Ld的 2)[7]，望而卻步即使對于適度大小的曲線圖。

The first work to tackle the problem of scalability was GraphSAGE [8], a seminal paper of Will Hamilton and co-authors. GraphSAGE used neighbourhood sampling combined with mini-batch training to train GNNs on large graphs (the acronym SAGE, standing for “sample and aggregate”, is a reference to this scheme). The main idea is that in order to compute the training loss on a single node with an L-layer GCN, only the L-hop neighbours of that node are necessary, as nodes further away in the graph are not involved in the computation. The problem is that, for graphs of the “small-world” type, such as social networks, the 2-hop neighbourhood of some nodes may already contain millions of nodes, making it too big to be stored in memory [9]. GraphSAGE tackles this problem by sampling the neighbours up to the L-th hop: starting from the training node, it samples uniformly with replacement [10] a fixed number k of 1-hop neighbours, then for each of these neighbours it again samples k neighbours, and so on for L times. In this way, for every node we are guaranteed to have a bounded L-hop sampled neighbourhood of 𝒪(k?) nodes. If we then construct a batch with b training nodes, each with its own independent L-hop neighbourhood, we get to a memory complexity of 𝒪(bk?) independent of the graph size n. The computational complexity of one batch of GraphSAGE is 𝒪(bLd2k?).

噸他解決可擴展性的問題，第一個工作是GraphSAGE [8]，威爾·漢密爾頓和共同作者的開創性論文。 GraphSAGE使用鄰域采樣結合小批量訓練在大型圖上訓練GNN(首字母縮寫SAGE代表“樣本和集合”，是此方案的參考)。主要思想是，為了在具有L層GCN的單個節點上計算訓練損失，僅需要該節點的L跳鄰居，因為圖中更遠的節點不參與計算。問題在于，對于“ 小世界 ”類型的圖，例如社交網絡，某些節點的2跳鄰域可能已經包含數百萬個節點，從而使其太大而無法存儲在內存中[9]。 GraphSAGE通過對直到第L跳的鄰居進行采樣來解決此問題：從訓練節點開始，它用替換[10]固定數目k的1跳鄰居進行均勻采樣，然后對這些鄰居中的每個鄰居再次采樣k L個鄰居，等等。這樣，對于每個節點，我們都可以保證有𝒪( k? )個節點的有界L跳采樣鄰域。如果然后用b個訓練節點構造一個批處理節點，每個訓練節點都有自己獨立的L跳鄰域，我們得到的存儲復雜度為𝒪( bk? )，與圖的大小n無關。一批GraphSAGE的計算復雜度為𝒪(BLD2k?)。

Neighbourhood sampling procedure of GraphSAGE. A batch of b nodes is subsampled from the full graph (in this example, b=2 and the red and light yellow nodes are used for training). On the right, the 2-hop neighbourhood graphs sampled with k=2, which are independently used to compute the embedding and therefore the loss for the red and light yellow nodes. GraphSAGE的鄰域采樣過程。從完整圖中對b個節點進行了批量采樣(在此示例中，b = 2，紅色和淺黃色節點用于訓練)。在右側，以k = 2采樣的2跳鄰域圖被獨立用于計算嵌入，因此計算出紅色和淺黃色節點的損耗。

A notable drawback of GraphSAGE is that sampled nodes might appear multiple times, thus potentially introducing a lot of redundant computation. For instance, in the figure above the dark green node appears in both the l-hop neighbourhood for the two training nodes, and therefore its embedding is computed twice in the batch. With the increase of the batch size b and the number of samples k, the amount of redundant computation increases as well. Moreover, despite having 𝒪(bk?) nodes in memory for each batch, the loss is computed on only b of them, and therefore, the computation for the other nodes is also in some sense wasted.

GraphSAGE的一個顯著缺點是，采樣的節點可能會出現多次，因此可能引入大量的冗余計算。例如，在上圖中，深綠色節點出現在兩個訓練節點的l跳附近，因此在批次中兩次對其進行嵌入。隨著批次大小b和樣本數量k的增加，冗余計算量也增加。此外，盡管每批內存中有𝒪( bk? )個節點，但僅對其中的b個進行了損耗計算，因此，在某種意義上，其他節點的計算也被浪費了。

Multiple follow-up works focused on improving the sampling of mini-batches in order to remove redundant computation of GraphSAGE and make each batch more efficient. The most recent works in this direction are ClusterGCN [11] and GraphSAINT [12], which take the approach of graph-sampling (as opposed to neighbourhood-sampling of GraphSAGE). In graph-sampling approaches, for each batch, a subgraph of the original graph is sampled, and a full GCN-like model is run on the entire subgraph. The challenge is to make sure that these subgraphs preserve most of the original edges and still present a meaningful topological structure.

多項后續工作的重點是改進小型批次的采樣，以消除GraphSAGE的冗余計算并提高每批的效率。這方面的最新工作是ClusterGCN [11]和GraphSAINT [12]，它們采用了圖抽樣的方法(與GraphSAGE的鄰域抽樣相反)。在圖采樣方法中，對于每個批次，都會對原始圖的一個子圖進行采樣，然后在整個子圖上運行一個完整的類似GCN的模型。面臨的挑戰是確保這些子圖保留大多數原始邊緣，并且仍然呈現有意義的拓撲結構。

ClusterGCN achieves this by first clustering the graph. Then, at each batch, the model is trained on one cluster. This allows the nodes in each batch to be as tightly connected as possible.

ClusterGCN通過首先對圖形進行聚類來實現此目的。然后，在每一批次中，在一個群集上訓練模型。這允許每批中的節點盡可能緊密地連接。

GraphSAINT proposes instead a general probabilistic graph sampler constructing training batches by sampling subgraphs of the original graph. The graph sampler can be designed according to different schemes: for example, it can perform uniform node sampling, uniform edge sampling, or “importance sampling” by using random walks to compute the importance of nodes and use it as the probability distribution for sampling.

圖圣提出了一個通用的概率圖采樣器，它通過對原始圖的子圖進行采樣來構造訓練批次。可以根據不同的方案設計圖形采樣器：例如，它可以通過使用隨機游走來計算節點的重要性并將其用作采樣的概率分布，來執行統一節點采樣，統一邊緣采樣或“重要性采樣”。

It is also important to note that one of the advantages of sampling is that during training it acts as a sort of edge-wise dropout, which regularises the model and can help the performance [13]. However, edge dropout would require to still see all the edges at inference time, which is not feasible here. Another effect graph sampling might have is reducing the bottleneck [14] and the resulting “over-squashing” phenomenon that stems from the exponential expansion of the neighbourhood.

還需要注意的是，采樣的優點之一是，在訓練過程中，采樣起到了某種邊沿下降的作用，這使模型規則化并可以幫助提高性能[13]。但是，邊緣丟失將要求仍在推理時看到所有邊緣，這在此處不可行。效果圖采樣的另一個效果可能是減少瓶頸[14]，以及由于鄰域的指數擴展而導致的“過度擠壓”現象。

In our recent paper with Ben Chamberlain, Davide Eynard, and Federico Monti [15], we investigated the extent to which it is possible to design simple, sampling-free architectures for node-wise classification problems. You may wonder why one would prefer to abandon sampling strategies in light of the indirect benefits we have just highlighted above. There are a few reasons for that. First, instances of node classification problems may significantly vary from one another and, to the best of our knowledge, no work so far has systematically studied when sampling actually provides positive effects other than just alleviating computational complexity. Second, the implementation of sampling schemes introduces additional complexity and we believe a simple, strong, sampling-free, scalable baseline architecture is appealing.

在我們與Ben Chamberlain，Davide Eynard和Federico Monti共同撰寫的最新論文中[15]，我們研究了針對節點分類問題設計簡單，無采樣架構的可能性。您可能想知道，鑒于我們上面剛剛強調的間接好處，為什么有人更愿意放棄抽樣策略。有幾個原因。首先，節點分類問題的實例可能彼此之間有很大的不同，據我們所知，到目前為止，當采樣實際上提供了積極的效果而不僅僅是減輕計算復雜性時，還沒有系統地研究任何工作。其次，采樣方案的實施會帶來額外的復雜性，我們相信簡單，強大，無采樣，可擴展的基準架構很有吸引力。

Our approach is motivated by several recent empirical findings. First, simple fixed aggregators (such as GCN) were shown to often outperform in many cases more complex ones, such as GAT or MPNN [16]. Second, while deep learning success was built on models with a lot of layers, in graph deep learning it is still an open question whether depth is needed. In particular, Wu and coauthors [17] argue that a GCN model with a single multi-hop diffusion layer can perform on par with models with multiple layers.

我們的方法是基于最近的一些經驗發現。首先，在許多情況下，簡單的固定聚合器(例如GCN)在許多復雜的情況下(例如GAT或MPNN)通常都優于[16]。第二，雖然深度學習成功是建立在具有許多層的模型上的，但在圖深度學習中，是否需要深度仍然是一個懸而未決的問題。尤其是，Wu和合著者[17]認為具有單個多跳擴散層的GCN模型可以與具有多個層的模型媲美。

By combining different, fixed neighbourhood aggregators within a single convolutional layer, it is possible to obtain an extremely scalable model without resorting to graph sampling [18]. In other words, all the graph related (fixed) operations are in the first layer of the architecture and can therefore be precomputed; the pre-aggregated information can then be fed as inputs to the rest of the model which, due to the lack of neighbourhood aggregation, boils down to a multi-layer perceptron (MLP). Importantly, the expressivity in the graph filtering operations can still be retained even with such a shallow convolutional scheme by employing several, possibly specialised and more complex, diffusion operators. As an example, it is possible to design operators to include local substructure counting [19] or graph motifs [20].

通過在單個卷積層中組合不同的，固定的鄰域聚合器，可以在不依靠圖采樣的情況下獲得極具可擴展性的模型[18]。換句話說，所有與圖相關的(固定)操作都在體系結構的第一層中，因此可以進行預先計算。然后，可以將預先聚合的信息作為模型其余部分的輸入，由于缺少鄰域聚合，因此可以歸結為多層感知器(MLP)。重要的是，即使采用這種淺卷積方案，通過采用幾個可能是專門的，更復雜的擴散算子，仍可以保留圖形過濾操作中的表達能力。例如，可以將算子設計為包括局部子結構計數 [19]或圖形主題[20]。

SIGN architecture comprises one GCN-like layer with multiple linear diffusion operators possibly acting on multi-hop neighbourhoods, followed by MLP applied node-wise. The key to its efficiency is the pre-computation of the diffused features (marked in red).

SIGN體系結構包括一個類似GCN的層，其中可能具有多個線性擴散算符，它們可能作用于多跳鄰域，隨后是在節點級應用MLP。其效率的關鍵是對擴散特征(以紅色標記)進行預先計算。

The proposed scalable architecture, which we call Scalable Inception-like Graph Network (SIGN) has the following form for node-wise classification tasks:

所提出的可伸縮體系結構，我們稱為可伸縮初始類圖網絡(SIGN)，具有以下形式用于節點分類任務：

Y = softmax(ReLU(XW? | A?XW? | A?XW? | … | A?XW?) W’)

Y = SOFTMAX(RELU(XW?|一個 ?XW?| A?XW?| ... | A?XW?)W“)

Here A? are linear diffusion matrices (such as a normalised adjacency matrix, its powers, or a motif matrix) and W? and W’ are learnable parameters. As depicted in the figure above, the network can be made deeper with additional node-wise layers,

這里甲 ?是線性的擴散矩陣(例如歸一化的鄰接矩陣，它的權力，或基序矩陣)和W?和W'是可學習的參數。如上圖所示，可以通過附加的節點層使網絡更深，

Y = softmax(ReLU(…ReLU(XW? | A?XW? | … | A?XW?) W’)… W’’)

Y = SOFTMAX(RELU(... RELU(XW?|一個 ?XW?| ... | A?XW?)W“)... W '')

Finally, when employing different powers for the same diffusion operator (e.g. A?=B1, A?=B2, etc.), the graph operations effectively aggregate from neighbours in further and further hops, akin to having convolutional filters of different receptive fields within the same network layer. This analogy to the popular inception module in classical CNNs explains the name of the proposed architecture [21].

最后，采用不同的功率時為相同的擴散操作者(例如甲 ?= B1，A?= B2等)，從在進一步的和進一步的跳鄰居有效聚集體，類似于具有不同接受的卷積濾波器的圖操作同一網絡層中的字段。這與經典CNN中流行的初始模塊的類比解釋了所提議體系結構的名稱[21]。

As already mentioned, the matrix products A?X,…, A?X in the above equations do not depend on the learnable model parameters and can thus be pre-computed. In particular, for very large graphs this pre-computation can be scaled efficiently using distributed computing infrastructures such as Apache Spark. This effectively reduces the computational complexity of the overall model to that of an MLP. Moreover, by moving the diffusion to the pre-computation step, we can aggregate information from all the neighbours, avoiding sampling and the possible loss of information and bias that comes with it [22].

如已經提到的，矩陣產品甲 ?X，...，A?X在上述等式不依賴于可學習模型參數，因此可以預先計算。特別是對于非常大的圖，可以使用分布式計算基礎結構(例如Apache Spark)有效地縮放此預計算。這有效地將整個模型的計算復雜度降低到了MLP。此外，通過將擴散轉移到預計算步驟，我們可以匯總來自所有鄰居的信息，避免采樣以及隨之而來的信息丟失和偏差[22]。

The main advantage of SIGN is its scalability and efficiency, as it can be trained using standard mini-batch gradient descent. We found our model to be up to two orders of magnitude faster than ClusterGCN and GraphSAINT at inference time, while also being significantly faster at training time (all this while maintaining accuracy performances very close to that of the state-of-the-art GraphSAINT).

SIGN的噸他主要優點是它的可擴展性和效率，因為它可以使用標準的小批量梯度下降的培訓。我們發現我們的模型在推理時比ClusterGCN和GraphSAINT快多達兩個數量級，同時在訓練時也快得多(所有這些都保持了精確度與最新GraphSAINT相當) )。

Convergence of different methods on the OGBN-Products dataset. Variants of SIGN converge faster and to a higher validation F1 score than GraphSaint and ClusterGCN. OGBN-Products數據集上不同方法的收斂性。與GraphSaint和ClusterGCN相比，SIGN的變體收斂更快，并且驗證F1得分更高。 Preprocessing, training and inference time (in seconds) of different methods on the OGBN-Products dataset. While having a slower pre-processing, SIGN is faster at training and nearly two orders of magnitude faster at inference time than other methods. OGBN-Products數據集上不同方法的預處理，訓練和推斷時間(以秒為單位)。與其他方法相比，盡管SIGN的預處理速度較慢，但??其在訓練中的速度更快，在推理時的速度快了將近兩個數量級。

Moreover, our model supports any diffusion operators. For different types of graphs, different diffusion operators may be necessary, and we found some tasks to benefit from having motif-based operators such as triangle counts.

而且，我們的模型支持任何擴散算子。對于不同類型的圖形，可能需要不同的擴散算子，并且我們發現一些任務可以受益于使用基于圖案的算子，例如三角形數。

Performance of SIGN and other scalable methods on node-wise classification tasks on some popular datasets. Diffusion operators based on triangular motifs gave an interesting performance gain on Flickr and provided some improvements on PPI and Yelp. SIGN和其他可伸縮方法在某些流行數據集上的節點分類任務上的性能。基于三角形圖案的擴散算子在Flickr上獲得了令人感興趣的性能提升，并對PPI和Yelp進行了一些改進。

Despite the limitation of having only a single graph convolutional layer and linear diffusion operators, SIGN performs very well in practice, achieving results on par or even better than much more complex models. Given its speed and simplicity of implementation, we envision SIGN to be a simple baseline graph learning method for large-scale applications. Perhaps more importantly, the success of such a simple model leads to a more fundamental question: do we really need deep graph neural networks? We conjecture that in many problems of learning on social networks and “small world” graphs, we should use richer local structures rather than resort to brute-force deep architectures. Interestingly, traditional CNNs architectures evolved according to an opposite trend (deeper networks with smaller filters) because of computational advantages and the ability to compose complex features of simpler ones. We are not sure if the same approach is right for graphs, where compositionality is much more complex (e.g. certain structures cannot be computed by message passing, no matter how deep the network is). For sure, more elaborate experiments are still needed to test this conjecture.

d espite在實踐中僅具有單個圖表卷積層和線性擴散運營商，SIGN進行很好的限制，實現媲美，甚至優于更復雜的模型的結果。鑒于其實現的速度和簡便性，我們設想SIGN是用于大型應用程序的簡單基線圖學習方法。也許更重要的是，這種簡單模型的成功帶來了一個更根本的問題：我們真的需要深度圖神經網絡嗎？我們推測在社交網絡和“小世界”圖學習的許多問題中，我們應該使用更豐富的本地結構，而不是采用蠻力的深度架構。有趣的是，傳統的CNN架構因其計算優勢和組合較簡單特征的復雜功能而根據相反的趨勢(具有更小的濾波器的更深網絡)發展了。我們不確定相同的方法是否適用于圖，圖的組成要復雜得多(例如，無論網絡有多深，某些結構都無法通過消息傳遞來計算)。可以肯定的是，仍然需要進行更復雜的實驗來檢驗這一猜想。

[1] The recently introduced Open Graph Benchmark now offers large-scale graphs with millions of nodes. It will probably take some time for the community to switch to it.

[1]最近引入的Open Graph Benchmark現在提供具有數百萬個節點的大規模圖形。社區可能要花一些時間才能轉向它。

[2] T. Kipf and M. Welling, Semi-supervised classification with graph convolutional networks (2017). Proc. ICLR introduced the popular GCN architecture, which was derived as a simplification of the ChebNet model proposed by M. Defferrard et al. Convolutional neural networks on graphs with fast localized spectral filtering (2016). Proc. NIPS.

[2] T. Kipf和M.Welling ，圖卷積網絡的半監督分類 (2017年)。進程 ICLR引入了流行的GCN體系結構，該體系結構是M.Defferrard等人提出的ChebNet模型的簡化。具有快速局部頻譜濾波的圖上的卷積神經網絡 (2016)。進程 NIPS。

[3] As the diffusion operator, Kipf and Welling used the graph adjacency matrix with self-loops (i.e., the node itself contributes to its feature update), but other choices are possible as well. The diffusion operation can be made feature-dependent of the form A(X)X (i.e., it is still a linear combination of the node features, but the weights depend on the features themselves) like in MoNet [4] or GAT [5] models, or completely nonlinear, 𝒜(X), like in message-passing neural networks (MPNN) [6]. For simplicity, we focus the discussion on the GCN model applied to node-wise classification.

[3]作為擴散算子，Kiff和Welling使用具有自環的圖鄰接矩陣(即，節點本身對其特征更新有所貢獻)，但是其他選擇也是可能的。可以使擴散操作取決于形式A ( X ) X (即，它仍然是節點特征的線性組合，但權重取決于特征本身)，就像在MoNet [4]或GAT [5]中一樣]模型或完全非線性的模型， X ( X )，就像在消息傳遞神經網絡(MPNN)[6]中一樣。為簡單起見，我們將討論重點放在應用于節點分類的GCN模型上。

[4] F. Monti et al., Geometric Deep Learning on Graphs and Manifolds Using Mixture Model CNNs (2017). In Proc. CVPR.

[4] F.Monti等人，《使用混合模型CNN在圖形和流形上進行幾何深度學習》 (2017)。在過程中。 CVPR。

[5] P. Veli?kovi? et al., Graph Attention Networks (2018). In Proc. ICLR.

[5] P.Veli?kovi?等人， Graph Attention Networks (2018)。在過程中。 ICLR。

[6] J. Gilmer et al., Neural message passing for quantum chemistry (2017). In Proc. ICML.

[6] J. Gilmer等人，《量子化學的神經信息傳遞》 (2017年)。在過程中。 ICML。

[7] Here we assume for simplicity that the graph is sparse with the number of edges |?|=𝒪(n).

[7]在這里，為簡單起見，我們假設圖是稀疏的，邊的數量| ? | =𝒪( n )。

[8] W. Hamilton et al., Inductive Representation Learning on Large Graphs (2017). In Proc. NeurIPS.

[8] W. Hamilton等人，大圖的歸納表示學習 (2017)。在過程中。 NeurIPS。

[9] The number of neighbours in such graphs tends to grow exponentially with the neighbourhood expansion.

[9]在這種圖中，鄰居的數量會隨著鄰域擴展而呈指數增長。

[10] Sampling with replacement means that some neighbour nodes can appear more than once, in particular if the number of neighbours is smaller than k.

[10]替換抽樣意味著某些鄰居節點可能出現不止一次，特別是如果鄰居數小于k時。

[11] W.-L. Chiang et al., Cluster-GCN: An Efficient Algorithm for Training Deep and Large Graph Convolutional Networks (2019). In Proc. KDD.

[11] W.-L. Chiang等人，《 Cluster-GCN：一種用于訓練深度圖和大型圖卷積網絡的高效算法》 (2019年)。在過程中。 KDD。

[12] H. Zeng et al., GraphSAINT: Graph Sampling Based Inductive Learning Method (2020) In Proc. ICLR.

[12] H. Zeng等人， GraphSAINT：《基于圖采樣的歸納學習方法》 (2020) ICLR。

[13] Y. Rong et al. DropEdge: Towards deep graph convolutional networks on node classification (2020). In Proc. ICLR. An idea similar to DropOut where a random subset of edges is used during training.

[13] Y. Rong等。 DropEdge：關于節點分類的深圖卷積網絡 (2020年)。在過程中。 ICLR。類似于DropOut的想法，在訓練過程中使用隨機的邊緣子集。

[14] U. Alon and E. Yahav, On the bottleneck of graph neural networks and its practical implications (2020). arXiv:2006.05205. Identified the over-squashing phenomenon in graph neural networks, which is similar to one observed in sequential recurrent models.

[14] U. Alon和E. Yahav，關于圖神經網絡的瓶頸及其實際意義 (2020年)。 arXiv：2006.05205??。確定了圖神經網絡中的過度擠壓現象，這與在順序遞歸模型中觀察到的現象相似。

[15] Frasca et al., SIGN: Scalable Inception Graph Neural Networks (2020). ICML workshop on Graph Representation Learning and Beyond.

[15] Frasca等人， SIGN：可伸縮初始圖神經網絡 (2020)。 ICML關于圖表示學習及其他的研討會。

[16] O. Shchur et al. Pitfalls of graph neural network evaluation (2018). Workshop on Relational Representation Learning. Shows that simple GNN models perform on par with more complex ones.

[16] O. Shchur等。圖神經網絡評估的陷阱 (2018)。關系表示學習研討會。表明簡單的GNN模型可以與更復雜的模型相提并論。

[17] F. Wu et al., Simplifying graph neural networks (2019). In Proc. ICML.

[17] F.Wu等人，簡化圖神經網絡 (2019)。在過程中。 ICML。

[18] While we stress that SIGN does not need sampling for computational efficiency, there are other reasons why graph subsampling is useful. J. Klicpera et al. Diffusion improves graph learning (2020). Proc. NeurIPS show that sampled diffusion matrices improve performance of graph neural networks. We observed the same phenomenon in early SIGN experiments.

[18]雖然我們強調SIGN不需要采樣就可以提高計算效率，但是還有其他原因使圖子采樣很有用。 J.Klicpera等。擴散改善了圖學習 (2020)。進程 NeurIPS顯示，采樣擴散矩陣可提高圖神經網絡的性能。我們在早期的SIGN實驗中觀察到了相同的現象。

[19] G. Bouritsas et al. Improving graph neural network expressivity via subgraph isomorphism counting (2020). arXiv:2006.09252. Shows how provably powerful GNNs can be obtained by structural node encoding.

[19] G. Bouritsas等。通過子圖同構計數 (2020) 提高圖神經網絡表達能力。 arXiv：2006.09252。說明如何通過結構節點編碼獲得功能強大的GNN。

[20] F. Monti, K. Otness, M. M. Bronstein, MotifNet: a motif-based graph convolutional network for directed graphs (2018). arXiv:1802.01572. Uses motif-based diffusion operators.

[20] F. Monti，K。Otness，MM Bronstein， MotifNet：用于有向圖的基于主題的圖卷積網絡 (2018年)。 arXiv：1802.01572。使用基于主題的擴散算子。

[21] C. Szegedi et al., Going deeper with convolution (2015). Proc. CVPR proposed the inception module in the already classical Google LeNet architecture. To be fair, we were not the first to think of graph inception modules. Our collaborator Anees Kazi from TU Munich, who was a visiting student at Imperial College last year, introduced them first.

[21] C. Szegedi等人， “通過卷積深入研究” (2015年)。進程 CVPR在已經很經典的Google LeNet架構中提出了初始模塊。公平地說，我們并不是第一個想到圖初始模塊的人。我們的合作者來自慕尼黑工業大學的Anees Kazi，去年曾在帝國理工學院做訪問學生，他首先介紹了它們。

[22] Note that reaching higher-order neighbours is normally achieved by depth-wise stacking graph convolutional layers operating with direct neighbours; in our architecture this is directly achieved in the first layer by powers of graph operators.

[22]注意，到達高階鄰居通常是通過與直接鄰居一起操作的深度堆疊圖卷積層來實現的；在我們的架構中，這是通過圖運算符的力量直接在第一層實現的。

SIGN implementation is available in PyTorch Geometric. Interested in Graph Deep Learning? See other posts on Medium.

在 PyTorch Geometric中可以使用 SIGN 實現。對圖深度學習感興趣嗎？請參閱 Medium上的其他帖子。

翻譯自: https://medium.com/@michael.bronstein/simple-scalable-graph-neural-networks-7eb04f366d07

架構垂直伸縮和水平伸縮區別

總結

以上是生活随笔為你收集整理的架构垂直伸缩和水平伸缩区别_简单的可伸缩图神经网络的全部內容，希望文章能夠幫你解決所遇到的問題。

如果覺得生活随笔網站內容還不錯，歡迎將生活随笔推薦給好友。

上一篇：因存在山寨现象惠普起诉多家中国第三方墨
下一篇： yolo opencv_如何使用Yolo