當前位置：首頁 > 编程资源 > 编程问答 >内容正文

编程问答

如何计算一个神经网络在使用momentum时的hessian矩阵（论文调研）

發布時間：2023/12/20 编程问答 20 豆豆

生活随笔收集整理的這篇文章主要介紹了如何计算一个神经网络在使用momentum时的hessian矩阵（论文调研）小編覺得挺不錯的,現在分享給大家,幫大家做個參考.

根據[4]中的說法，“Though results on the Hessian of individual layers were not included in this study”,似乎每個層都有一個對應的Hessian矩陣。

根據[5]中的說法，最后一層的hessian矩陣很好計算，但是如果下一層，那就很不好計算

下面的這些對hessian矩陣的理論處理可能有幫助，先記載一下：
－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－
[7]很清晰地講解了分母是否轉置對求導結果的影響,如下：
對于 $x=(x1…xN)Tx=\left(x_{1} \dots x_{N}\right)^{T}$

$?f(x)?x=(?f(x)?x1?f(x)?x2??f(x)?xN)\frac{\partial f(x)}{\partial x}=\left(\begin{array}{c}{\frac{\partial f(x)}{\partial x_{1}}} \\ {\frac{\partial f(x)}{\partial x_{2}}} \\ {\vdots} \\ {\frac{\partial f(x)}{\partial x_{N}}}\end{array}\right)$

$(?f(x)?x)T=?f(x)?xT=(?f(x)?x1?f(x)?x2…?f(x)?xN)\left(\frac{\partial f(x)}{\partial x}\right)^{T}=\frac{\partial f(x)}{\partial x^{T}}=\left(\frac{\partial f(x)}{\partial x_{1}} \quad \frac{\partial f(x)}{\partial x_{2}} \quad \ldots \quad \frac{\partial f(x)}{\partial x_{N}}\right)$

$?2f(x)?x?xT=(?2f(x)?x12?2f(x)?x1?x2??2f(x)?x1?xN?2f(x)?x2?x1?2f(x)?x22??2f(x)?xN?1?x2??2f(x)?xN?x1???2f(x)?xN2)\frac{\partial^{2} f(x)}{\partial x \partial x^{T}}=\left(\begin{array}{cccc}{\frac{\partial^{2} f(x)}{\partial x_{1}^{2}}} & {\frac{\partial^{2} f(x)}{\partial x_{1} \partial x_{2}}} & {\cdots} & {\frac{\partial^{2} f(x)}{\partial x_{1} \partial x_{N}}} \\ {\frac{\partial^{2} f(x)}{\partial x_{2} \partial x_{1}}} & {\frac{\partial^{2} f(x)}{\partial x_{2}^{2}}} & {} & {\vdots} \\ {} & {\frac{\partial^{2} f(x)}{\partial x_{N-1}\partial x_{2}}} & {} & {\vdots} \\ {\frac{\partial^{2} f(x)}{\partial x_{N} \partial x_{1}}} & {\cdots} & {\cdots} & {\frac{\partial^{2} f(x)}{\partial x_{N}^{2}}}\end{array}\right)$

－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－

粘貼工具是:Mathpix Snipping Tool,第一次發現這工具截圖然后轉化不準的問題，sigh…
－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－
[1]中的(2.16)~(2.18)無法核實,
(3.1)~(3.3)中出現了奇怪的符號δ沒有說明是什么含義
(2.8)對于 $b_{ni}$ 的定義很奇怪,[1]中根據(2.15)與(2.12)的比較，可知該文是在論述二分類目的的神經網絡,該文作者無法聯系上，最終放棄閱讀。

[3]使用彈簧振子在模仿神經網絡的不斷振蕩，分別從微分方程和差分方程兩個角度來論述為什么momentum這種optimizer能夠加速收斂

聯系了[4]作者，回復是需要谷歌的大量設備以及專門腳本才能復現，并不能在家里實現，連他自己手上都沒有代碼。
－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－

至于hessian-free的意思指的是，計算Hv而不是直接計算H，這樣避開計算H的龐大工作量。
計算 $H^{-1}V$ 的目標是為了在訓練神經網絡時,二階牛頓法的迭代項中有所使用.
－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－

##################下面幾個github鏈接和hessian-free相關####################################
[7]這個作者不回復了,棄坑

https://github.com/drasmuss/hessianfree
這個里面的代碼主要是共軛梯度法，直接舍棄了和Jacobian和Hessian相關的操作

https://github.com/NithinTangellamudi/HessianFreeImplementation
代碼各種語法錯誤，棄坑

☆☆☆☆☆☆☆☆☆☆☆☆☆☆☆☆☆☆☆☆☆☆☆☆☆☆☆☆☆☆☆☆☆☆☆☆☆☆☆☆☆☆☆☆☆☆☆☆☆
下面的還在研究中:
#------------------------------------------------------------------------------------------

[8]中的代碼配合論文[9]:
hessian-free部分的代碼如下：

def gauss_vect_mult(v):"""Multiply a vector by the Gauss-Newton matrix JHJ'where J is the Jacobian between output and params and H is the Hessian between costs and outputH should be diagonal and positive.Also add the ridge"""Jv = T.Rop(output, params, v)HJv = T.Rop(T.grad(opt_cost,output), output, Jv)JHJv = T.Lop(output, params, HJv)if not isinstance(JHJv,list):JHJv = [JHJv]JHJv = [a+ridge*b for a,b in zip(JHJv,v)]return JHJv

給作者發了郵件詢問理由支持，但是沒有回復
#------------------------------------------------------------------------------------------
[10]代碼是下面論文[11]的一部分

hessian-free部分的代碼如下：

def gauss_newton_product(cost, p, v, s): # this computes the product Gv = J'HJv (G is the Gauss-Newton matrix)Jv = T.Rop(s, p, v)HJv = T.grad(T.sum(T.grad(cost, s)*Jv), s, consider_constant=[Jv], disconnected_inputs='ignore')Gv = T.grad(T.sum(HJv*s), p, consider_constant=[HJv, Jv], disconnected_inputs='ignore')Gv = map(T.as_tensor_variable, Gv) # for CudaNdarrayreturn Gv

給作者發了郵件詢問理由支持，但是沒有回復
#------------------------------------------------------------------------------------------
[12]涉及到元學習

Reference:
[1]Exact Calculation of the Hessian Matrix for the Multilayer Perceptron
[2]A fast procedure for re-training the multilayer perceptron
[3]On the Momentum Term in Gradient Descent Learning Algorithms
[4]Negative eigen values of the hessian in deep neural networks
[5]Most efficient way to calculate hessian of cost function in neural network
[6]https://onlinelibrary.wiley.com/doi/pdf/10.1002/9780470173862.app4
[7]https://github.com/moonl1ght/HessianFreeOptimization/issues/1
[8]https://github.com/doomie/HessianFree
[9]Improved Preconditioner for Hessian Free Optimization
[10]https://github.com/boulanni/theano-hf
[11]Modeling Temporal Dependencies in High-Dimensional Sequences: Application to Polyphonic Music Generation and Transcription
[12]https://github.com/ozzzp/MLHF

總結

以上是生活随笔為你收集整理的如何计算一个神经网络在使用momentum时的hessian矩阵（论文调研）的全部內容，希望文章能夠幫你解決所遇到的問題。

如果覺得生活随笔網站內容還不錯，歡迎將生活随笔推薦給好友。

上一篇：矩阵行列式的几何意义验证
下一篇：牛顿法中为何出现hessian矩阵