日韩性视频-久久久蜜桃-www中文字幕-在线中文字幕av-亚洲欧美一区二区三区四区-撸久久-香蕉视频一区-久久无码精品丰满人妻-国产高潮av-激情福利社-日韩av网址大全-国产精品久久999-日本五十路在线-性欧美在线-久久99精品波多结衣一区-男女午夜免费视频-黑人极品ⅴideos精品欧美棵-人人妻人人澡人人爽精品欧美一区-日韩一区在线看-欧美a级在线免费观看

歡迎訪問(wèn) 生活随笔!

生活随笔

當(dāng)前位置: 首頁(yè) > 人文社科 > 生活经验 >内容正文

生活经验

PyTorch-Adam优化算法原理,公式,应用

發(fā)布時(shí)間:2023/11/28 生活经验 25 豆豆
生活随笔 收集整理的這篇文章主要介紹了 PyTorch-Adam优化算法原理,公式,应用 小編覺(jué)得挺不錯(cuò)的,現(xiàn)在分享給大家,幫大家做個(gè)參考.

 概念:Adam 是一種可以替代傳統(tǒng)隨機(jī)梯度下降過(guò)程的一階優(yōu)化算法,它能基于訓(xùn)練數(shù)據(jù)迭代地更新神經(jīng)網(wǎng)絡(luò)權(quán)重。Adam 最開始是由 OpenAI 的 Diederik Kingma 和多倫多大學(xué)的 Jimmy Ba 在提交到 2015 年 ICLR 論文(Adam: A Method for Stochastic Optimization)中提出的.該算法名為「Adam」,其并不是首字母縮寫,也不是人名。它的名稱來(lái)源于適應(yīng)性矩估計(jì)(adaptive moment estimation)

  Adam(Adaptive Moment Estimation)本質(zhì)上是帶有動(dòng)量項(xiàng)的RMSprop,它利用梯度的一階矩估計(jì)和二階矩估計(jì)動(dòng)態(tài)調(diào)整每個(gè)參數(shù)的學(xué)習(xí)率。它的優(yōu)點(diǎn)主要在于經(jīng)過(guò)偏置校正后,每一次迭代學(xué)習(xí)率都有個(gè)確定范圍,使得參數(shù)比較平穩(wěn)。其公式如下:

  

  其中,前兩個(gè)公式分別是對(duì)梯度的一階矩估計(jì)和二階矩估計(jì),可以看作是對(duì)期望E|gt|,E|gt^2|的估計(jì);?
公式3,4是對(duì)一階二階矩估計(jì)的校正,這樣可以近似為對(duì)期望的無(wú)偏估計(jì)??梢钥闯?#xff0c;直接對(duì)梯度的矩估計(jì)對(duì)內(nèi)存沒(méi)有額外的要求,而且可以根據(jù)梯度進(jìn)行動(dòng)態(tài)調(diào)整。最后一項(xiàng)前面部分是對(duì)學(xué)習(xí)率n形成的一個(gè)動(dòng)態(tài)約束,而且有明確的范圍。

  優(yōu)點(diǎn):

1、結(jié)合了Adagrad善于處理稀疏梯度和RMSprop善于處理非平穩(wěn)目標(biāo)的優(yōu)點(diǎn);?
2、對(duì)內(nèi)存需求較小;?
3、為不同的參數(shù)計(jì)算不同的自適應(yīng)學(xué)習(xí)率;?
4、也適用于大多非凸優(yōu)化-適用于大數(shù)據(jù)集和高維空間。

  應(yīng)用和源碼:

  參數(shù)實(shí)例:

class torch.optim.Adam(params, lr=0.001, betas=(0.9, 0.999), eps=1e-08, weight_decay=0)

  參數(shù)含義:

  params(iterable):可用于迭代優(yōu)化的參數(shù)或者定義參數(shù)組的dicts。

  lr?(float, optional) :學(xué)習(xí)率(默認(rèn):?1e-3)?betas?(Tuple[float, float], optional):

  用于計(jì)算梯度的平均和平方的系數(shù)(默認(rèn): (0.9,?0.999))?eps?(float, optional):

  為了提高數(shù)值穩(wěn)定性而添加到分母的一個(gè)項(xiàng)(默認(rèn):?1e-8)?weight_decay?(float, optional):權(quán)重衰減(如L2懲罰)(默認(rèn):?0)

  torch.optim.adam源碼:

 1 import math2 from .optimizer import Optimizer3 4 class Adam(Optimizer):5     def __init__(self, params, lr=1e-3, betas=(0.9, 0.999), eps=1e-8,weight_decay=0):6         defaults = dict(lr=lr, betas=betas, eps=eps,weight_decay=weight_decay)7         super(Adam, self).__init__(params, defaults)8 9     def step(self, closure=None):
10         loss = None
11         if closure is not None:
12             loss = closure()
13 
14         for group in self.param_groups:
15             for p in group['params']:
16                 if p.grad is None:
17                     continue
18                 grad = p.grad.data
19                 state = self.state[p]
20 
21                 # State initialization
22                 if len(state) == 0:
23                     state['step'] = 0
24                     # Exponential moving average of gradient values
25                     state['exp_avg'] = grad.new().resize_as_(grad).zero_()
26                     # Exponential moving average of squared gradient values
27                     state['exp_avg_sq'] = grad.new().resize_as_(grad).zero_()
28 
29                 exp_avg, exp_avg_sq = state['exp_avg'], state['exp_avg_sq']
30                 beta1, beta2 = group['betas']
31 
32                 state['step'] += 1
33 
34                 if group['weight_decay'] != 0:
35                     grad = grad.add(group['weight_decay'], p.data)
36 
37                 # Decay the first and second moment running average coefficient
38                 exp_avg.mul_(beta1).add_(1 - beta1, grad)
39                 exp_avg_sq.mul_(beta2).addcmul_(1 - beta2, grad, grad)
40 
41                 denom = exp_avg_sq.sqrt().add_(group['eps'])
42 
43                 bias_correction1 = 1 - beta1 ** state['step']
44                 bias_correction2 = 1 - beta2 ** state['step']
45                 step_size = group['lr'] * math.sqrt(bias_correction2) / bias_correction1
46 
47                 p.data.addcdiv_(-step_size, exp_avg, denom)
48 
49         return loss

  使用例子:

 1 import torch2 3 # N is batch size; D_in is input dimension;4 # H is hidden dimension; D_out is output dimension.5 N, D_in, H, D_out = 64, 1000, 100, 106 7 # Create random Tensors to hold inputs and outputs8 x = torch.randn(N, D_in)9 y = torch.randn(N, D_out)
10 
11 # Use the nn package to define our model and loss function.
12 model = torch.nn.Sequential(
13     torch.nn.Linear(D_in, H),
14     torch.nn.ReLU(),
15     torch.nn.Linear(H, D_out),
16 )
17 loss_fn = torch.nn.MSELoss(reduction='sum')
18 
19 # Use the optim package to define an Optimizer that will update the weights of
20 # the model for us. Here we will use Adam; the optim package contains many other
21 # optimization algoriths. The first argument to the Adam constructor tells the
22 # optimizer which Tensors it should update.
23 learning_rate = 1e-4
24 optimizer = torch.optim.Adam(model.parameters(), lr=learning_rate)
25 for t in range(500):
26     # Forward pass: compute predicted y by passing x to the model.
27     y_pred = model(x)
28 
29     # Compute and print loss.
30     loss = loss_fn(y_pred, y)
31     print(t, loss.item())
32 
33     # Before the backward pass, use the optimizer object to zero all of the
34     # gradients for the variables it will update (which are the learnable
35     # weights of the model). This is because by default, gradients are
36     # accumulated in buffers( i.e, not overwritten) whenever .backward()
37     # is called. Checkout docs of torch.autograd.backward for more details.
38     optimizer.zero_grad()
39 
40     # Backward pass: compute gradient of the loss with respect to model
41     # parameters
42     loss.backward()
43 
44     # Calling the step function on an Optimizer makes an update to its
45     # parameters
46     optimizer.step()

  到這里,相信對(duì)付絕大多數(shù)的應(yīng)用是可以的了.我的目的也就基本完成了.接下來(lái)就要在應(yīng)用中加深理解了.

  

參考文檔:

1 https://blog.csdn.net/kgzhang/article/details/77479737

2 https://pytorch.org/tutorials/beginner/examples_nn/two_layer_net_optim.html

作者:虛生 簡(jiǎn)介:專注于小型物聯(lián)網(wǎng)系統(tǒng)解決方案,擅長(zhǎng)物聯(lián)網(wǎng)協(xié)議(wifi,bt)和音頻處理算法。 商務(wù)合作和技術(shù)探討:郵箱:caoyin2011@163.com QQ:1173496664. 音頻市場(chǎng)技術(shù)對(duì)接群:347609188

總結(jié)

以上是生活随笔為你收集整理的PyTorch-Adam优化算法原理,公式,应用的全部?jī)?nèi)容,希望文章能夠幫你解決所遇到的問(wèn)題。

如果覺(jué)得生活随笔網(wǎng)站內(nèi)容還不錯(cuò),歡迎將生活随笔推薦給好友。