GNN交通流量预测
簡介
隨著圖網絡的發展,其在非歐氏數據上的應用被進一步開發,而交通預測就是圖網絡的一個主要應用領域。交通預測指的是根據歷史交通數據和交通網絡的拓撲結構來預測未來的交通狀況,包括但不限于速度、流量、擁堵等。交通流量預測是其他上層任務如路徑規劃的基礎,是工業界非常關心的任務。本文采用三種經典的圖網絡模型(GCN、ChebNet和GAT)進行圖結構的交通流量預測,整個項目采用PyTorch實現,項目全部源碼見文末Github地址。
數據準備
本文采用的是ASTGCN這篇采用圖網絡進行交通預測論文提供的數據集PEMS,包括PEMS04和PEMS08。其中PEMS04是2018年1月1日開始采集的連續59天的307的探測器獲得的流量數據,每5分鐘采集一次,所以原始流量數據data.npz讀取后shape為(307, 16992, 3),其中3維特征為flow, occupy, speed,原始鄰接矩陣數據是一個distance.csv文件,它包含是from,to,distance的格式,方便起見,本文距離(對應圖上的邊權)只要節點相連都取1。相似的是,PEMS08是2016年7月1日開始采集的連續62天170個節點的流量數據,其數據shape為(170, 17856, 3)。
可以理解,構建的Graph上每個節點就是一個探測器獲得的流量數據,每條邊就是兩個節點連接的權重。下圖是對PEMS04中224號節點一天內的三種采集數據的可視化,可以看到只有流量數據起伏比較大,因此本文也只采用流量數據進行預測。
這里也給出PEMS的數據集百度云鏈接(code:zczc)。
關于數據讀取方面的內容,這里不多提及,對這種時序數據讀取和預處理包括劃分不熟悉的可以查看我文末的Github。
模型構建
首先是ChebNet模型的構建,它將SCNN中的譜域圖卷積核替換為了切比雪夫多項式,如下式,最終要學習的只有K+1K+1K+1個參數,大大減少了SCNN巨大的參數量(拉普拉斯矩陣特征分解求解特征向量)。
x?Ggθ=UgθUTx=∑k=0KβkTk(L^)xx \star_{G} g_{\theta}=U g_{\theta} U^{T} x =\sum_{k=0}^{K} \beta_{k} T_{k}(\hat{L}) x x?G?gθ?=Ugθ?UTx=k=0∑K?βk?Tk?(L^)x
ChebNet的代碼實現如下,不過,雖然減少了參數量,但是卷積核有了嚴格的空間局限性,KKK就是卷積核的“感受野半徑”,即將距離中心節點kkk個節點作為鄰域節點(k=1k = 1k=1時便相當于普通的3x3卷積,鄰域為1)。
class ChebConv(nn.Module):def __init__(self, in_c, out_c, K, bias=True, normalize=True):"""ChebNet conv:param in_c: input channels:param out_c: output channels:param K: the order of Chebyshev Polynomial:param bias: if use bias:param normalize: if use norm"""super(ChebConv, self).__init__()self.normalize = normalizeself.weight = nn.Parameter(torch.Tensor(K + 1, 1, in_c, out_c)) # [K+1, 1, in_c, out_c]init.xavier_normal_(self.weight)if bias:self.bias = nn.Parameter(torch.Tensor(1, 1, out_c))init.zeros_(self.bias)else:self.register_parameter("bias", None)self.K = K + 1def forward(self, inputs, graph):""":param inputs: he input data, [B, N, C]:param graph: the graph structure, [N, N]:return: convolution result, [B, N, D]"""L = ChebConv.get_laplacian(graph, self.normalize) # [N, N]mul_L = self.cheb_polynomial(L).unsqueeze(1) # [K, 1, N, N]result = torch.matmul(mul_L, inputs) # [K, B, N, C]result = torch.matmul(result, self.weight) # [K, B, N, D]result = torch.sum(result, dim=0) + self.bias # [B, N, D]return resultdef cheb_polynomial(self, laplacian):"""Compute the Chebyshev Polynomial, according to the graph laplacian:param laplacian: the multi order Chebyshev laplacian, [K, N, N]:return:"""N = laplacian.size(0) # [N, N]multi_order_laplacian = torch.zeros([self.K, N, N], device=laplacian.device, dtype=torch.float) # [K, N, N]multi_order_laplacian[0] = torch.eye(N, device=laplacian.device, dtype=torch.float)if self.K == 1:return multi_order_laplacianelse:multi_order_laplacian[1] = laplacianif self.K == 2:return multi_order_laplacianelse:for k in range(2, self.K):multi_order_laplacian[k] = 2 * torch.mm(laplacian, multi_order_laplacian[k - 1]) - \multi_order_laplacian[k - 2]return multi_order_laplacian@staticmethoddef get_laplacian(graph, normalize):"""compute the laplacian of the graph:param graph: the graph structure without self loop, [N, N]:param normalize: whether to used the normalized laplacian:return:"""if normalize:D = torch.diag(torch.sum(graph, dim=-1) ** (-1 / 2))L = torch.eye(graph.size(0), device=graph.device, dtype=graph.dtype) - torch.mm(torch.mm(D, graph), D)else:D = torch.diag(torch.sum(graph, dim=-1))L = D - graphreturn Lclass ChebNet(nn.Module):def __init__(self, in_c, hid_c, out_c, K):""":param in_c: int, number of input channels.:param hid_c: int, number of hidden channels.:param out_c: int, number of output channels.:param K:"""super(ChebNet, self).__init__()self.conv1 = ChebConv(in_c=in_c, out_c=hid_c, K=K)self.conv2 = ChebConv(in_c=hid_c, out_c=out_c, K=K)self.act = nn.ReLU()def forward(self, data, device):graph_data = data["graph"].to(device)[0] # [N, N]flow_x = data["flow_x"].to(device) # [B, N, H, D]B, N = flow_x.size(0), flow_x.size(1)flow_x = flow_x.view(B, N, -1) # [B, N, H*D]output_1 = self.act(self.conv1(flow_x, graph_data))output_2 = self.act(self.conv2(output_1, graph_data))return output_2.unsqueeze(2)接著是GCN模型的構建,按照下面的公式,先計算出標準化的拉普拉斯矩陣,再和WWW和XXX先后矩陣乘法,就得到了最后的輸出。GCN是一個著名的譜域圖卷積方法,它對ChebNet進行進一步簡化,只采用一階切比雪夫多項式,一個卷積核只有一個θ\thetaθ需要學習,雖然卷積核減小了,但是通過多層堆疊可以獲得卷積神經網絡類似的能力。因此,GCN也被認為是譜域到空域的一個過渡方法。
x?Ggθ=θ(D~?1/2W~D~?1/2)xx \star_{G} g_{\theta}=\theta\left(\tilde{D}^{-1 / 2} \tilde{W} \tilde{D}^{-1 / 2}\right) x x?G?gθ?=θ(D~?1/2W~D~?1/2)x
PyTorch實現GCN的源碼如下。
class GCN(nn.Module):def __init__(self, in_c, hid_c, out_c):"""GCN:param in_c: input channels:param hid_c: hidden nodes:param out_c: output channels"""super(GCN, self).__init__()self.linear_1 = nn.Linear(in_c, hid_c)self.linear_2 = nn.Linear(hid_c, out_c)self.act = nn.ReLU()def forward(self, data, device):graph_data = data["graph"].to(device)[0] # [N, N]graph_data = self.process_graph(graph_data)flow_x = data["flow_x"].to(device) # [B, N, H, D]B, N = flow_x.size(0), flow_x.size(1)flow_x = flow_x.view(B, N, -1) # [B, N, H*D] H = 6, D = 1output_1 = self.linear_1(flow_x) # [B, N, hid_C]output_1 = self.act(torch.matmul(graph_data, output_1)) # [N, N], [B, N, Hid_C]output_2 = self.linear_2(output_1)output_2 = self.act(torch.matmul(graph_data, output_2)) # [B, N, 1, Out_C]return output_2.unsqueeze(2)@staticmethoddef process_graph(graph_data):N = graph_data.size(0)matrix_i = torch.eye(N, dtype=graph_data.dtype, device=graph_data.device)graph_data += matrix_i # A~ [N, N]degree_matrix = torch.sum(graph_data, dim=-1, keepdim=False) # [N]degree_matrix = degree_matrix.pow(-1)degree_matrix[degree_matrix == float("inf")] = 0. # [N]degree_matrix = torch.diag(degree_matrix) # [N, N]return torch.mm(degree_matrix, graph_data) # D^(-1) * A = \hat(A)最后,是空域里比較經典的圖注意力網絡GAT,它將注意力機制引入了圖卷積模型之中。論文作者認為鄰域中所有的節點共享一個相同的卷積核參數會限制模型的能力,因為鄰域內每一個節點和中心節點的關聯度是不一樣的,所以在進行卷積操作時,需要對鄰域節點“區別對待”,作者采用的是attention機制來對中心節點和鄰域節點的關聯度進行建模。它的具體步驟如下(詳細算法流程請查看原文):
eij=α(wh?i,Wh?j)=a?T[Wh?i∥Wh?j]e_{i j}=\alpha\left(\mathbf{w} \vec{h}_{i}, \mathbf{W} \vec{h}_{j}\right)=\vec{a}^{T}\left[\mathbf{W} \vec{h}_{i} \| \mathbf{W} \vec{h}_{j}\right] eij?=α(whi?,Whj?)=aT[Whi?∥Whj?]
αij=softmax?j(eij)=exp?(eij)∑k∈Niexp?(eik)\alpha_{i j}=\operatorname{softmax}_{j}\left(e_{i j}\right)=\frac{\exp \left(e_{i j}\right)}{\sum_{k \in \mathcal{N}_{i}} \exp \left(e_{i k}\right)} αij?=softmaxj?(eij?)=∑k∈Ni??exp(eik?)exp(eij?)?
h?i′=f(∑j∈NiαijWh?j)\vec{h}_{i}^{\prime}=f\left(\sum_{j \in \mathcal{N}_{i}} \alpha_{i j} \mathbf{W} \vec{h}_{j}\right) hi′?=f???j∈Ni?∑?αij?Whj????
GAT的代碼實現如下,我這里采用了多頭注意力。
class GraphAttentionLayer(nn.Module):def __init__(self, in_c, out_c, alpha=0.2):"""graph attention layer:param in_c::param out_c::param alpha:"""super(GraphAttentionLayer, self).__init__()self.in_c = in_cself.out_c = out_cself.alpha = alphaself.W = nn.Parameter(torch.empty(size=(in_c, out_c)))nn.init.xavier_uniform_(self.W.data, gain=1.414)self.a = nn.Parameter(torch.empty(size=(2 * out_c, 1)))nn.init.xavier_uniform_(self.a.data, gain=1.414)self.leakyrelu = nn.LeakyReLU(self.alpha)def forward(self, features, adj):B, N = features.size(0), features.size(1)adj = adj + torch.eye(N, dtype=adj.dtype).cuda() # A+Ih = torch.matmul(features, self.W) # [B,N,out_features]# [B, N, N, 2 * out_features]a_input = torch.cat([h.repeat(1, 1, N).view(B, N * N, -1), h.repeat(1, N, 1)], dim=2).view(B, N, -1, 2 * self.out_c)e = self.leakyrelu(torch.matmul(a_input, self.a).squeeze(3)) # [B,N, N, 1] => [B, N, N]zero_vec = -1e12 * torch.ones_like(e)attention = torch.where(adj > 0, e, zero_vec) # [B,N,N]attention = F.softmax(attention, dim=2) # softmax [N, N]# attention = F.dropout(attention, 0.5)h_prime = torch.matmul(attention, h) # [B,N, N]*[N, out_features] => [B,N, out_features]return h_primedef __repr__(self):return self.__class__.__name__ + ' (' + str(self.in_features) + ' -> ' + str(self.out_features) + ')'class GAT(nn.Module):def __init__(self, in_c, hid_c, out_c, n_heads=8):""":param in_c: int, number of input channels.:param hid_c: int, number of hidden channels.:param out_c: int, number of output channels.:param K:"""super(GAT, self).__init__()self.attentions = nn.ModuleList([GraphAttentionLayer(in_c, hid_c) for _ in range(n_heads)])# self.conv1 = GraphAttentionLayer(in_c, hid_c)self.conv2 = GraphAttentionLayer(hid_c * n_heads, out_c)self.act = nn.ReLU()def forward(self, data):# data prepareadj = data["graph"][0] # [N, N]x = data["flow_x"] # [B, N, H, D]B, N = x.size(0), x.size(1)x = x.view(B, N, -1) # [B, N, H*D]# forwardoutputs = torch.cat([attention(x, adj) for attention in self.attentions], dim=-1)outputs = self.act(outputs)# output_1 = self.act(self.conv1(flow_x, adj))output_2 = self.act(self.conv2(outputs, adj))return output_2.unsqueeze(2) # [B,1,N,1]訓練實驗
我這邊當作一個回歸任務來看,采用比較基礎的MSE損失和Adam+SGD的訓練策略,同等配置下訓練并測試ChebNet、GCN和GAT,進行可視化,結果如下圖。訓練代碼有點長,就不貼了,感興趣的可以訪問文末Github。
上圖可以看到,ChebNet靠著較大的復雜性和理論基礎,在這個任務取得了不錯的效果。
最后做個總結,本文只是對幾個圖卷積模型進行了簡單的實驗,事實上三個模型都有類似flow_x = flow_x.view(B, N, -1)的代碼段,這代表我們將時序數據拼接到了一起,這就無疑等同于放棄了時間信息,實際上,對于這種時序任務,時間信息是至關重要的,像STGCN、ASTGCN、DCRNN等方法都是考慮時序特征才獲得了很不錯的效果。
補充說明
本文在PEMS數據集上進行了交通流量預測,項目和博客書寫過程中參考了一篇博文,在他的基礎上我進行了完善。完整的代碼開放在Github上,歡迎star和fork。
總結
- 上一篇: EfficientDet解读
- 下一篇: SAGPool图分类