當前位置：首頁 > 人工智能 > pytorch >内容正文

pytorch

深度学习优化算法实现(Momentum, Adam)

發布時間：2025/3/15 pytorch 38 豆豆

生活随笔收集整理的這篇文章主要介紹了深度学习优化算法实现(Momentum, Adam) 小編覺得挺不錯的,現在分享給大家,幫大家做個參考.

除了常見的梯度下降法外，還有幾種比較通用的優化算法；表現都優于梯度下降法。本文只記錄完成吳恩達深度學習作業時遇到的Momentum和Adam算法，而且只有簡要的代碼。具體原理請看深度學習優化算法解析(Momentum, RMSProp, Adam),比較具體的說明了吳恩達版本的三種優化算法的講述！

Momentum

初始化

def initialize_velocity(parameters):"""Initializes the velocity as a python dictionary with:- keys: "dW1", "db1", ..., "dWL", "dbL" - values: numpy arrays of zeros of the same shape as the correspondinggradients/parameters.Arguments:parameters -- python dictionary containing your parameters.parameters['W' + str(l)] = Wlparameters['b' + str(l)] = blReturns:v -- python dictionary containing the current velocity.v['dW' + str(l)] = velocity of dWlv['db' + str(l)] = velocity of dbl"""L = len(parameters) // 2 # number of layers in the neural networksv = {}# Initialize velocityfor l in range(L):### START CODE HERE ### (approx. 2 lines)v['dW' + str(l + 1)] = np.zeros(np.shape(parameters['W' + str(l + 1)]))v['db' + str(l + 1)] = np.zeros(np.shape(parameters['b' + str(l + 1)]))### END CODE HERE ###return v

更新參數

def update_parameters_with_momentum(parameters, grads, v, beta, learning_rate):"""Update parameters using MomentumArguments:parameters -- python dictionary containing your parameters:parameters['W' + str(l)] = Wlparameters['b' + str(l)] = blgrads -- python dictionary containing your gradients foreach parameters:grads['dW' + str(l)] = dWlgrads['db' + str(l)] = dblv -- python dictionary containing the current velocity:v['dW' + str(l)] = ...v['db' + str(l)] = ...beta -- the momentum hyperparameter, scalarlearning_rate -- the learning rate, scalarReturns:parameters -- python dictionary containing your updated parameters v -- python dictionary containing your updated velocities"""L = len(parameters) // 2 # number of layers in the neural networks# Momentum update for each parameterfor l in range(L):### START CODE HERE ### (approx. 4 lines)# compute velocitiesv['dW' + str(l + 1)] = beta * v['dW' + str(l + 1)] + (1 - beta) * grads['dW' + str(l + 1)]v['db' + str(l + 1)] = beta * v['db' + str(l + 1)] + (1 - beta) * grads['db' + str(l + 1)]# update parametersparameters['W' + str(l + 1)] += -learning_rate * v['dW' + str(l + 1)]parameters['b' + str(l + 1)] += -learning_rate * v['db' + str(l + 1)]### END CODE HERE ###return parameters, v

Adam

初始化

def initialize_adam(parameters) :"""Initializes v and s as two python dictionaries with:- keys: "dW1", "db1", ..., "dWL", "dbL" - values: numpy arrays of zeros of the same shape as the corresponding gradients/parameters.Arguments:parameters -- python dictionary containing your parameters.parameters["W" + str(l)] = Wlparameters["b" + str(l)] = blReturns: v -- python dictionary that will contain the exponentially weighted average of the gradient.v["dW" + str(l)] = ...v["db" + str(l)] = ...s -- python dictionary that will contain the exponentially weighted average of the squared gradient.s["dW" + str(l)] = ...s["db" + str(l)] = ..."""L = len(parameters) // 2 # number of layers in the neural networksv = {}s = {}# Initialize v, s. Input: "parameters". Outputs: "v, s".for l in range(L):### START CODE HERE ### (approx. 4 lines)v['dW' + str(l + 1)] = np.zeros(np.shape(parameters['W' + str(l + 1)]))v['db' + str(l + 1)] = np.zeros(np.shape(parameters['b' + str(l + 1)]))s['dW' + str(l + 1)] = np.zeros(np.shape(parameters['W' + str(l + 1)]))s['db' + str(l + 1)] = np.zeros(np.shape(parameters['b' + str(l + 1)]))### END CODE HERE ###return v, s

更新參數

def update_parameters_with_adam(parameters, grads, v, s, t,learning_rate = 0.01,beta1 = 0.9, beta2 = 0.999, epsilon = 1e-8):"""Update parameters using AdamArguments:parameters -- python dictionary containing your parameters:parameters['W' + str(l)] = Wlparameters['b' + str(l)] = blgrads -- python dictionary containing your gradientsfor each parameters:grads['dW' + str(l)] = dWlgrads['db' + str(l)] = dblv -- Adam variable, moving average of the first gradient,python dictionarys -- Adam variable, moving average of the squared gradient,python dictionarylearning_rate -- the learning rate, scalar.beta1 -- Exponential decay hyperparameter forthe first moment estimates beta2 -- Exponential decay hyperparameter forthe second moment estimates epsilon -- hyperparameter preventing divisionby zero in Adam updatesReturns:parameters -- python dictionary containing your updated parameters v -- Adam variable, moving average of the first gradient, python dictionarys -- Adam variable, moving average of the squared gradient, python dictionary"""L = len(parameters) // 2 # number of layers in the neural networks# Initializing first moment estimate,python dictionaryv_corrected = {} # Initializing second moment estimate,python dictionarys_corrected = {} # Perform Adam update on all parametersfor l in range(L):### START CODE HERE ### (approx. 2 lines)v['dW' + str(l + 1)] = beta1 * v['dW' + str(l + 1)] + (1 - beta1) * grads['dW' + str(l + 1)]v['db' + str(l + 1)] = beta1 * v['db' + str(l + 1)] + (1 - beta1) * grads['db' + str(l + 1)]### END CODE HERE ###### START CODE HERE ### (approx. 2 lines)v_corrected['dW' + str(l + 1)] = v['dW' + str(l + 1)] / (1 - beta1 ** t)v_corrected['db' + str(l + 1)] = v['db' + str(l + 1)] / (1 - beta1 ** t)### END CODE HERE ###### START CODE HERE ### (approx. 2 lines)s['dW' + str(l + 1)] = beta2 * s['dW' + str(l + 1)] + (1 - beta2) * np.square(grads['dW' + str(l + 1)])s['db' + str(l + 1)] = beta2 * s['db' + str(l + 1)] + (1 - beta2) * np.square(grads['db' + str(l + 1)]) ### END CODE HERE ###### START CODE HERE ### (approx. 2 lines)s_corrected['dW' + str(l + 1)] = s['dW' + str(l + 1)] / (1 - beta2 ** t)s_corrected['db' + str(l + 1)] = s['db' + str(l + 1)] / (1 - beta2 ** t)### END CODE HERE ###### START CODE HERE ### (approx. 2 lines)parameters['W' + str(l + 1)] += -learning_rate * v_corrected['dW' + str(l + 1)] / (np.sqrt(s['dW' + str(l + 1)]) + epsilon)parameters['b' + str(l + 1)] += -learning_rate * v_corrected['db' + str(l + 1)] / (np.sqrt(s['db' + str(l + 1)]) + epsilon)### END CODE HERE ###return parameters, v, s

總結

以上是生活随笔為你收集整理的深度学习优化算法实现(Momentum, Adam)的全部內容，希望文章能夠幫你解決所遇到的問題。

如果覺得生活随笔網站內容還不錯，歡迎將生活随笔推薦給好友。

上一篇： activiti5第六弹手动任务、接
下一篇：梳理百年深度学习发展史-七月在线机器学习

pytorch

深度学习优化算法实现(Momentum, Adam)

目錄

Momentum

初始化

更新參數

Adam

初始化

更新參數

總結