日韩性视频-久久久蜜桃-www中文字幕-在线中文字幕av-亚洲欧美一区二区三区四区-撸久久-香蕉视频一区-久久无码精品丰满人妻-国产高潮av-激情福利社-日韩av网址大全-国产精品久久999-日本五十路在线-性欧美在线-久久99精品波多结衣一区-男女午夜免费视频-黑人极品ⅴideos精品欧美棵-人人妻人人澡人人爽精品欧美一区-日韩一区在线看-欧美a级在线免费观看

歡迎訪問 生活随笔!

生活随笔

當前位置: 首頁 > 编程资源 > 编程问答 >内容正文

编程问答

如何计算一个神经网络在使用momentum时的hessian矩阵(论文调研)

發布時間:2023/12/20 编程问答 20 豆豆
生活随笔 收集整理的這篇文章主要介紹了 如何计算一个神经网络在使用momentum时的hessian矩阵(论文调研) 小編覺得挺不錯的,現在分享給大家,幫大家做個參考.

根據[4]中的說法,“Though results on the Hessian of individual layers were not included in this study”,似乎每個層都有一個對應的Hessian矩陣。

根據[5]中的說法,最后一層的hessian矩陣很好計算,但是如果下一層,那就很不好計算

下面的這些對hessian矩陣的理論處理可能有幫助,先記載一下:
------------------------------------------
[7]很清晰地講解了分母是否轉置對求導結果的影響,如下:
對于x=(x1…xN)Tx=\left(x_{1} \dots x_{N}\right)^{T}x=(x1?xN?)T

?f(x)?x=(?f(x)?x1?f(x)?x2??f(x)?xN)\frac{\partial f(x)}{\partial x}=\left(\begin{array}{c}{\frac{\partial f(x)}{\partial x_{1}}} \\ {\frac{\partial f(x)}{\partial x_{2}}} \\ {\vdots} \\ {\frac{\partial f(x)}{\partial x_{N}}}\end{array}\right)?x?f(x)?=????????x1??f(x)??x2??f(x)???xN??f(x)?????????

(?f(x)?x)T=?f(x)?xT=(?f(x)?x1?f(x)?x2…?f(x)?xN)\left(\frac{\partial f(x)}{\partial x}\right)^{T}=\frac{\partial f(x)}{\partial x^{T}}=\left(\frac{\partial f(x)}{\partial x_{1}} \quad \frac{\partial f(x)}{\partial x_{2}} \quad \ldots \quad \frac{\partial f(x)}{\partial x_{N}}\right)(?x?f(x)?)T=?xT?f(x)?=(?x1??f(x)??x2??f(x)??xN??f(x)?)

?2f(x)?x?xT=(?2f(x)?x12?2f(x)?x1?x2??2f(x)?x1?xN?2f(x)?x2?x1?2f(x)?x22??2f(x)?xN?1?x2??2f(x)?xN?x1???2f(x)?xN2)\frac{\partial^{2} f(x)}{\partial x \partial x^{T}}=\left(\begin{array}{cccc}{\frac{\partial^{2} f(x)}{\partial x_{1}^{2}}} & {\frac{\partial^{2} f(x)}{\partial x_{1} \partial x_{2}}} & {\cdots} & {\frac{\partial^{2} f(x)}{\partial x_{1} \partial x_{N}}} \\ {\frac{\partial^{2} f(x)}{\partial x_{2} \partial x_{1}}} & {\frac{\partial^{2} f(x)}{\partial x_{2}^{2}}} & {} & {\vdots} \\ {} & {\frac{\partial^{2} f(x)}{\partial x_{N-1}\partial x_{2}}} & {} & {\vdots} \\ {\frac{\partial^{2} f(x)}{\partial x_{N} \partial x_{1}}} & {\cdots} & {\cdots} & {\frac{\partial^{2} f(x)}{\partial x_{N}^{2}}}\end{array}\right)?x?xT?2f(x)?=??????????x12??2f(x)??x2??x1??2f(x)??xN??x1??2f(x)???x1??x2??2f(x)??x22??2f(x)??xN?1??x2??2f(x)???????x1??xN??2f(x)????xN2??2f(x)???????????

------------------------------------------

粘貼工具是:Mathpix Snipping Tool,第一次發現這工具截圖然后轉化不準的問題,sigh…
------------------------------------------
[1]中的(2.16)~(2.18)無法核實,
(3.1)~(3.3)中出現了奇怪的符號δ沒有說明是什么含義
(2.8)對于bnib_{ni}bni?的定義很奇怪,[1]中根據(2.15)與(2.12)的比較,可知該文是在論述二分類目的的神經網絡,該文作者無法聯系上,最終放棄閱讀。

[3]使用彈簧振子在模仿神經網絡的不斷振蕩,分別從微分方程和差分方程兩個角度來論述為什么momentum這種optimizer能夠加速收斂

聯系了[4]作者,回復是需要谷歌的大量設備以及專門腳本才能復現,并不能在家里實現,連他自己手上都沒有代碼。
------------------------------------------

至于hessian-free的意思指的是,計算Hv而不是直接計算H,這樣避開計算H的龐大工作量。
計算H?1VH^{-1}VH?1V的目標是為了在訓練神經網絡時,二階牛頓法的迭代項中有所使用.
------------------------------------------

##################下面幾個github鏈接和hessian-free相關####################################
[7]這個作者不回復了,棄坑

https://github.com/drasmuss/hessianfree
這個里面的代碼主要是共軛梯度法,直接舍棄了和Jacobian和Hessian相關的操作

https://github.com/NithinTangellamudi/HessianFreeImplementation
代碼各種語法錯誤,棄坑

☆☆☆☆☆☆☆☆☆☆☆☆☆☆☆☆☆☆☆☆☆☆☆☆☆☆☆☆☆☆☆☆☆☆☆☆☆☆☆☆☆☆☆☆☆☆☆☆☆
下面的還在研究中:
#------------------------------------------------------------------------------------------

[8]中的代碼配合論文[9]:
hessian-free部分的代碼如下:

def gauss_vect_mult(v):"""Multiply a vector by the Gauss-Newton matrix JHJ'where J is the Jacobian between output and params and H is the Hessian between costs and outputH should be diagonal and positive.Also add the ridge"""Jv = T.Rop(output, params, v)HJv = T.Rop(T.grad(opt_cost,output), output, Jv)JHJv = T.Lop(output, params, HJv)if not isinstance(JHJv,list):JHJv = [JHJv]JHJv = [a+ridge*b for a,b in zip(JHJv,v)]return JHJv

給作者發了郵件詢問理由支持,但是沒有回復
#------------------------------------------------------------------------------------------
[10]代碼是下面論文[11]的一部分

hessian-free部分的代碼如下:

def gauss_newton_product(cost, p, v, s): # this computes the product Gv = J'HJv (G is the Gauss-Newton matrix)Jv = T.Rop(s, p, v)HJv = T.grad(T.sum(T.grad(cost, s)*Jv), s, consider_constant=[Jv], disconnected_inputs='ignore')Gv = T.grad(T.sum(HJv*s), p, consider_constant=[HJv, Jv], disconnected_inputs='ignore')Gv = map(T.as_tensor_variable, Gv) # for CudaNdarrayreturn Gv

給作者發了郵件詢問理由支持,但是沒有回復
#------------------------------------------------------------------------------------------
[12]涉及到元學習

Reference:
[1]Exact Calculation of the Hessian Matrix for the Multilayer Perceptron
[2]A fast procedure for re-training the multilayer perceptron
[3]On the Momentum Term in Gradient Descent Learning Algorithms
[4]Negative eigen values of the hessian in deep neural networks
[5]Most efficient way to calculate hessian of cost function in neural network
[6]https://onlinelibrary.wiley.com/doi/pdf/10.1002/9780470173862.app4
[7]https://github.com/moonl1ght/HessianFreeOptimization/issues/1
[8]https://github.com/doomie/HessianFree
[9]Improved Preconditioner for Hessian Free Optimization
[10]https://github.com/boulanni/theano-hf
[11]Modeling Temporal Dependencies in High-Dimensional Sequences: Application to Polyphonic Music Generation and Transcription
[12]https://github.com/ozzzp/MLHF

總結

以上是生活随笔為你收集整理的如何计算一个神经网络在使用momentum时的hessian矩阵(论文调研)的全部內容,希望文章能夠幫你解決所遇到的問題。

如果覺得生活随笔網站內容還不錯,歡迎將生活随笔推薦給好友。