當(dāng)前位置：首頁 > 编程资源 > 编程问答 >内容正文

编程问答

Pytorch 自定义激活函数前向与反向传播 ReLu系列含优点与缺点

發(fā)布時間：2023/12/10 编程问答 30 豆豆

生活随笔收集整理的這篇文章主要介紹了 Pytorch 自定义激活函数前向与反向传播 ReLu系列含优点与缺点小編覺得挺不錯的,現(xiàn)在分享給大家,幫大家做個參考.

文章目錄

- ReLu
- - 公式
  - 求導(dǎo)過程
  - 優(yōu)點(diǎn)：
  - 缺點(diǎn)：
  - 自定義ReLu
  - 與Torch定義的比較
  - 可視化
- Leaky ReLu PReLu
- - 公式
  - 求導(dǎo)過程
  - 優(yōu)點(diǎn)：
  - 缺點(diǎn)：
  - 自定義LeakyReLu
  - 與Torch定義的比較
  - 可視化
  - 自定義PReLu
- ELU
- - 公式
  - 求導(dǎo)過程
  - 優(yōu)點(diǎn)
  - 缺點(diǎn)
  - 自定義LeakyReLu
  - 與Torch定義的比較
  - 可視化

import matplotlib import matplotlib.pyplot as plt import numpy as np import torch import torch.nn as nn import torch.nn.functional as F%matplotlib inlineplt.rcParams['figure.figsize'] = (7, 3.5) plt.rcParams['figure.dpi'] = 150 plt.rcParams['axes.unicode_minus'] = False #解決坐標(biāo)軸負(fù)數(shù)的鉛顯示問題

ReLu

線性整流函數(shù) (rectified linear unit)

公式

$relu=max?(0,x)={x,x>00,x≤0\text{relu} = \max(0, x) = \begin{cases} x, &x>0 \\ 0, &x\leq 0 \end{cases}$

求導(dǎo)過程

$f (x) 是連續(xù) 的$

$f′(x)=lim?h→0f(0)=f(0+h)?f(0)h=max?(0,h)?0hf'(x)=\lim_{h\to 0}f(0) = \frac{f(0 + h)-f(0)}{h}=\frac{\max(0, h) - 0}{h}$
$lim?h→0?=0h=0\lim_{h\to0^-}=\frac{0}{h} = 0$
$lim?h→0+=hh=1\lim_{h\to0^+}=\frac{h}{h} = 1$
所以 $f^{'} (0)$ 處不可導(dǎo)

所以 $\begin{cases} 1, & x > 0 \\ 0, & x < 0 \end{cases}$

優(yōu)點(diǎn)：

ReLU激活函數(shù)是一個簡單的計(jì)算，如果輸入大于0，直接返回作為輸入提供的值；如果輸入是0或更小，返回值0。

相較于sigmoid函數(shù)以及Tanh函數(shù)來看，在輸入為正時，Relu函數(shù)不存在飽和問題，即解決了gradient vanishing問題，使得深層網(wǎng)絡(luò)可訓(xùn)練
Relu輸出會使一部分神經(jīng)元為0值，在帶來網(wǎng)絡(luò)稀疏性的同時，也減少了參數(shù)之間的關(guān)聯(lián)性，一定程度上緩解了過擬合的問題
計(jì)算速度非常快
收斂速度遠(yuǎn)快于sigmoid以及Tanh函數(shù)

缺點(diǎn)：

輸出不是zero-centered
存在Dead Relu Problem，即某些神經(jīng)元可能永遠(yuǎn)不會被激活，進(jìn)而導(dǎo)致相應(yīng)參數(shù)一直得不到更新，產(chǎn)生該問題主要原因包括參數(shù)初始化問題以及學(xué)習(xí)率設(shè)置過大問題
ReLU不會對數(shù)據(jù)做幅度壓縮，所以數(shù)據(jù)的幅度會隨著模型層數(shù)的增加不斷擴(kuò)張，當(dāng)輸入為正值，導(dǎo)數(shù)為1，在“鏈?zhǔn)椒磻?yīng)”中，不會出現(xiàn)梯度消失，但梯度下降的強(qiáng)度則完全取決于權(quán)值的乘積，如此可能會導(dǎo)致梯度爆炸問題

自定義ReLu

class SelfDefinedRelu(torch.autograd.Function):@staticmethoddef forward(ctx, inp):ctx.save_for_backward(inp)return torch.where(inp < 0., torch.zeros_like(inp), inp)@staticmethoddef backward(ctx, grad_output):inp, = ctx.saved_tensorsreturn grad_output * torch.where(inp < 0., torch.zeros_like(inp),torch.ones_like(inp))class Relu(nn.Module):def __init__(self):super().__init__()def forward(self, x):out = SelfDefinedRelu.apply(x)return out

與Torch定義的比較

# self defined torch.manual_seed(0)relu = Relu() # SelfDefinedRelu inp = torch.randn(5, requires_grad=True) out = relu((inp).pow(3))print(f'Out is\n{out}')out.backward(torch.ones_like(inp), retain_graph=True) print(f"\nFirst call\n{inp.grad}")out.backward(torch.ones_like(inp), retain_graph=True) print(f"\nSecond call\n{inp.grad}")inp.grad.zero_() out.backward(torch.ones_like(inp), retain_graph=True) print(f"\nCall after zeroing gradients\n{inp.grad}") Out is tensor([3.6594, 0.0000, 0.0000, 0.1837, 0.0000],grad_fn=<SelfDefinedReluBackward>)First call tensor([7.1240, 0.0000, 0.0000, 0.9693, 0.0000])Second call tensor([14.2480, 0.0000, 0.0000, 1.9387, 0.0000])Call after zeroing gradients tensor([7.1240, 0.0000, 0.0000, 0.9693, 0.0000]) # torch defined torch.manual_seed(0) inp = torch.randn(5, requires_grad=True) out = torch.relu((inp).pow(3))print(f'Out is\n{out}')out.backward(torch.ones_like(inp), retain_graph=True) print(f"\nFirst call\n{inp.grad}")out.backward(torch.ones_like(inp), retain_graph=True) print(f"\nSecond call\n{inp.grad}")inp.grad.zero_() out.backward(torch.ones_like(inp), retain_graph=True) print(f"\nCall after zeroing gradients\n{inp.grad}") Out is tensor([3.6594, 0.0000, 0.0000, 0.1837, 0.0000], grad_fn=<ReluBackward0>)First call tensor([7.1240, 0.0000, 0.0000, 0.9693, 0.0000])Second call tensor([14.2480, 0.0000, 0.0000, 1.9387, 0.0000])Call after zeroing gradients tensor([7.1240, 0.0000, 0.0000, 0.9693, 0.0000])

可視化

# visualization inp = torch.arange(-8, 8, 0.05, requires_grad=True) out = relu(inp) out.sum().backward()inp_grad = inp.gradplt.plot(inp.detach().numpy(),out.detach().numpy(),label=r"$relu(x)$",alpha=0.7) plt.plot(inp.detach().numpy(),inp_grad.numpy(),label=r"$relu'(x)$",alpha=0.5) plt.scatter(0, 0, color='None', marker='o', edgecolors='r', s=50) plt.grid() plt.legend() plt.show()

Leaky ReLu PReLu

公式

$leaky_relu=max?(αx,x)={x,x≥0α,x<0,α∈[0,+∞)\text{leaky\_relu} = \max(\alpha x, x) = \begin{cases} x, & x \ge 0 \\ \alpha, & x < 0 \end{cases} \quad, \alpha \in [0, + \infty)$

$whileα=0,leaky_relu=relu\text{while} \quad \alpha = 0, \text{leaky\_relu} = \text{relu}$

求導(dǎo)過程

所以 $\begin{cases} 1, & x \ge 0 \\ \alpha, & x < 0 \end{cases}$

優(yōu)點(diǎn)：

避免梯度消失的問題
計(jì)算簡單
針對Relu函數(shù)中存在的Dead Relu Problem，Leaky Relu函數(shù)在輸入為負(fù)值時，給予輸入值一個很小的斜率，在解決了負(fù)輸入情況下的0梯度問題的基礎(chǔ)上，也很好的緩解了Dead Relu問題

缺點(diǎn)：

輸出不是zero-centered
ReLU不會對數(shù)據(jù)做幅度壓縮，所以數(shù)據(jù)的幅度會隨著模型層數(shù)的增加不斷擴(kuò)張
理論上來說，該函數(shù)具有比Relu函數(shù)更好的效果，但是大量的實(shí)踐證明，其效果不穩(wěn)定，故實(shí)際中該函數(shù)的應(yīng)用并不多。
由于在不同區(qū)間應(yīng)用的不同的函數(shù)所帶來的不一致結(jié)果，將導(dǎo)致無法為正負(fù)輸入值提供一致的關(guān)系預(yù)測。

超參數(shù) $α\alpha$ 的取值也已經(jīng)被很多實(shí)驗(yàn)研究過，有一種取值方法是對 $α\alpha$ 隨機(jī)取值， $α\alpha$ 的分布滿足均值為0，標(biāo)準(zhǔn)差為1的正態(tài)分布，該方法叫做隨機(jī)LeakyReLU(Randomized LeakyReLU)。原論文指出隨機(jī)LeakyReLU相比LeakyReLU能得更好的結(jié)果，且給出了參數(shù) $α\alpha$ 的經(jīng)驗(yàn)值1/5.5(好于0.01)。至于為什么隨機(jī)LeakyReLU能取得更好的結(jié)果，解釋之一就是隨機(jī)LeakyReLU小于0部分的隨機(jī)梯度，為優(yōu)化方法引入了隨機(jī)性，這些隨機(jī)噪聲可以幫助參數(shù)取值跳出局部最優(yōu)和鞍點(diǎn)，這部分內(nèi)容可能需要一整篇文章來闡述。正是由于 $α\alpha$ 的取值至關(guān)重要，人們不滿足與隨機(jī)取樣 $α\alpha$ ，有論文將 $α\alpha$ 作為了需要學(xué)習(xí)的參數(shù)，該激活函數(shù)為 PReLU(Parametrized ReLU)

自定義LeakyReLu

class SelfDefinedLeakyRelu(torch.autograd.Function):@staticmethoddef forward(ctx, inp, alpha):ctx.constant = alphactx.save_for_backward(inp)return torch.where(inp < 0., alpha * inp, inp)@staticmethoddef backward(ctx, grad_output):inp, = ctx.saved_tensorsones_like_inp = torch.ones_like(inp)return torch.where(inp < 0., ones_like_inp * ctx.constant,ones_like_inp), Noneclass LeakyRelu(nn.Module):def __init__(self, alpha=1):super().__init__()self.alpha = alphadef forward(self, x):out = SelfDefinedLeakyRelu.apply(x, self.alpha)return out

與Torch定義的比較

# self defined torch.manual_seed(0)alpha = 0.1 # greater so could have bettrer visualization leaky_relu = LeakyRelu(alpha=alpha) # SelfDefinedLeakyRelu inp = torch.randn(5, requires_grad=True) out = leaky_relu((inp).pow(3))print(f'Out is\n{out}')out.backward(torch.ones_like(inp), retain_graph=True) print(f"\nFirst call\n{inp.grad}")out.backward(torch.ones_like(inp), retain_graph=True) print(f"\nSecond call\n{inp.grad}")inp.grad.zero_() out.backward(torch.ones_like(inp), retain_graph=True) print(f"\nCall after zeroing gradients\n{inp.grad}") Out is tensor([ 3.6594e+00, -2.5264e-03, -1.0343e+00, 1.8367e-01, -1.2756e-01],grad_fn=<SelfDefinedLeakyReluBackward>)First call tensor([7.1240, 0.0258, 1.4241, 0.9693, 0.3529])Second call tensor([14.2480, 0.0517, 2.8483, 1.9387, 0.7057])Call after zeroing gradients tensor([7.1240, 0.0258, 1.4241, 0.9693, 0.3529]) # torch defined torch.manual_seed(0) inp = torch.randn(5, requires_grad=True) out = F.leaky_relu((inp).pow(3), negative_slope=alpha)print(f'Out is\n{out}')out.backward(torch.ones_like(inp), retain_graph=True) print(f"\nFirst call\n{inp.grad}")out.backward(torch.ones_like(inp), retain_graph=True) print(f"\nSecond call\n{inp.grad}")inp.grad.zero_() out.backward(torch.ones_like(inp), retain_graph=True) print(f"\nCall after zeroing gradients\n{inp.grad}") Out is tensor([ 3.6594e+00, -2.5264e-03, -1.0343e+00, 1.8367e-01, -1.2756e-01],grad_fn=<LeakyReluBackward0>)First call tensor([7.1240, 0.0258, 1.4241, 0.9693, 0.3529])Second call tensor([14.2480, 0.0517, 2.8483, 1.9387, 0.7057])Call after zeroing gradients tensor([7.1240, 0.0258, 1.4241, 0.9693, 0.3529])

可視化

# visualization inp = torch.arange(-8, 8, 0.05, requires_grad=True) out = leaky_relu(inp) out.sum().backward()inp_grad = inp.gradplt.plot(inp.detach().numpy(),out.detach().numpy(),label=r"$leakyrelu(x)$",alpha=0.7) plt.plot(inp.detach().numpy(),inp_grad.numpy(),label=r"$leakyrelu'(x)$",alpha=0.5) plt.scatter(0, 0, color='None', marker='o', edgecolors='r', s=50) plt.grid() plt.legend() plt.show()

自定義PReLu

class SelfDefinedPRelu(torch.autograd.Function):@staticmethoddef forward(ctx, inp, alpha):ctx.constant = alphactx.save_for_backward(inp)return torch.where(inp < 0., alpha * inp, inp)@staticmethoddef backward(ctx, grad_output):inp, = ctx.saved_tensorsones_like_inp = torch.ones_like(inp)return torch.where(inp < 0., ones_like_inp * ctx.constant,ones_like_inp), Noneclass PRelu(nn.Module):def __init__(self):super().__init__()self.alpha = torch.randn(1, dtype=torch.float32, requires_grad=True)def forward(self, x):out = SelfDefinedLeakyRelu.apply(x, self.alpha)return out

ELU

指數(shù)線性單元 (Exponential Linear Unit)

公式

$elu(x)={x,x≥0α(ex?1),x<0\text{elu}(x) = \begin{cases} x, & x \ge 0 \\ \alpha(e^x - 1), & x < 0 \end{cases}$

求導(dǎo)過程

$f′(x)=lim?h→0f(0)=f(0+h)?f(0)hf'(x)=\lim_{h\to 0}f(0) = \frac{f(0+h)-f(0)}{h}$
$lim?h→0?=α(eh?1)?0h=0\lim_{h\to0^-}=\frac{\alpha (e^h - 1) - 0}{h} = 0$
$lim?h→0+=hh=1\lim_{h\to0^+}=\frac{h}{h} = 1$
所以 $f^{'} (0)$ 處不可導(dǎo)
所以 $\begin{cases} 1, & x \ge 0 \\ \alpha e^x, & x < 0 \end{cases}$

理想的激活函數(shù)應(yīng)滿足兩個條件：

輸出的分布是零均值的，可以加快訓(xùn)練速度。

激活函數(shù)是單側(cè)飽和的，可以更好的收斂。

LeakyReLU和PReLU滿足第1個條件，不滿足第2個條件；而ReLU滿足第2個條件，不滿足第1個條件。兩個條件都滿足的激活函數(shù)為ELU(Exponential Linear Unit)。ELU雖然也不是零均值的，但在以0為中心一個較小的范圍內(nèi)，均值是趨向于0，當(dāng)然也與 $α\alpha$ 的取值也是相關(guān)的。

優(yōu)點(diǎn)

ELU具有Relu的大多數(shù)優(yōu)點(diǎn)，不存在Dead Relu問題，輸出的均值也接近為0值；

該函數(shù)通過減少偏置偏移的影響，使正常梯度更接近于單位自然梯度，從而使均值向0加速學(xué)習(xí)；

該函數(shù)在負(fù)數(shù)域存在飽和區(qū)域，從而對噪聲具有一定的魯棒性；

缺點(diǎn)

計(jì)算強(qiáng)度較高，含有冪運(yùn)算；

在實(shí)踐中同樣沒有較Relu更突出的效果，故應(yīng)用不多；

自定義LeakyReLu

class SelfDefinedElu(torch.autograd.Function):@staticmethoddef forward(ctx, inp, alpha):ctx.constant = alpha * inp.exp()ctx.save_for_backward(inp)return torch.where(inp < 0., ctx.constant - alpha, inp)@staticmethoddef backward(ctx, grad_output):inp, = ctx.saved_tensorsones_like_inp = torch.ones_like(inp)return torch.where(inp < 0., ones_like_inp * ctx.constant,ones_like_inp), Noneclass Elu(nn.Module):def __init__(self, alpha=1):super().__init__()self.alpha = alphadef forward(self, x):out = SelfDefinedElu.apply(x, self.alpha)return out

與Torch定義的比較

# self defined torch.manual_seed(0)alpha = 0.5 # greater so could have bettrer visualization elu = Elu(alpha=alpha) # SelfDefinedLeakyRelu inp = torch.randn(5, requires_grad=True) out = elu((inp + 1).pow(3))print(f'Out is\n{out}')out.backward(torch.ones_like(inp), retain_graph=True) print(f"\nFirst call\n{inp.grad}")out.backward(torch.ones_like(inp), retain_graph=True) print(f"\nSecond call\n{inp.grad}")inp.grad.zero_() out.backward(torch.ones_like(inp), retain_graph=True) print(f"\nCall after zeroing gradients\n{inp.grad}") Out is tensor([ 1.6406e+01, 3.5275e-01, -4.0281e-01, 3.8583e+00, -3.0184e-04],grad_fn=<SelfDefinedEluBackward>)First call tensor([1.9370e+01, 1.4977e+00, 4.0513e-01, 7.3799e+00, 1.0710e-02])Second call tensor([3.8740e+01, 2.9955e+00, 8.1027e-01, 1.4760e+01, 2.1419e-02])Call after zeroing gradients tensor([1.9370e+01, 1.4977e+00, 4.0513e-01, 7.3799e+00, 1.0710e-02]) # torch defined torch.manual_seed(0) inp = torch.randn(5, requires_grad=True) out = F.elu((inp + 1).pow(3), alpha=alpha)print(f'Out is\n{out}')out.backward(torch.ones_like(inp), retain_graph=True) print(f"\nFirst call\n{inp.grad}")out.backward(torch.ones_like(inp), retain_graph=True) print(f"\nSecond call\n{inp.grad}")inp.grad.zero_() out.backward(torch.ones_like(inp), retain_graph=True) print(f"\nCall after zeroing gradients\n{inp.grad}") Out is tensor([ 1.6406e+01, 3.5275e-01, -4.0281e-01, 3.8583e+00, -3.0184e-04],grad_fn=<EluBackward>)First call tensor([1.9370e+01, 1.4977e+00, 4.0513e-01, 7.3799e+00, 1.0710e-02])Second call tensor([3.8740e+01, 2.9955e+00, 8.1027e-01, 1.4760e+01, 2.1419e-02])Call after zeroing gradients tensor([1.9370e+01, 1.4977e+00, 4.0513e-01, 7.3799e+00, 1.0710e-02])

可視化

inp = torch.arange(-1, 1, 0.05, requires_grad=True) out = F.elu(inp, alpha=1.2) # out = F.relu(inp) out.mean(), out.std() (tensor(0.0074, grad_fn=<MeanBackward0>),tensor(0.5384, grad_fn=<StdBackward0>)) inp = torch.arange(-1, 1, 0.05, requires_grad=True) # out = F.elu(inp, alpha=1) out = F.relu(inp) out.mean(), out.std() (tensor(0.2375, grad_fn=<MeanBackward0>),tensor(0.3170, grad_fn=<StdBackward0>)) # visualization inp = torch.arange(-8, 8, 0.05, requires_grad=True) out = elu(inp) out.sum().backward()inp_grad = inp.gradplt.plot(inp.detach().numpy(),out.detach().numpy(),label=r"$elu(x)$",alpha=0.7) plt.plot(inp.detach().numpy(),inp_grad.numpy(),label=r"$elu'(x)$",alpha=0.5) plt.scatter(0, 0, color='None', marker='o', edgecolors='r', s=50) plt.grid() plt.legend() plt.show()

總結(jié)

以上是生活随笔為你收集整理的Pytorch 自定义激活函数前向与反向传播 ReLu系列含优点与缺点的全部內(nèi)容，希望文章能夠幫你解決所遇到的問題。

如果覺得生活随笔網(wǎng)站內(nèi)容還不錯，歡迎將生活随笔推薦給好友。

上一篇： Verilog学习笔记（四）QPSK调
下一篇： arcpy.ProjectRaster_

编程问答

Pytorch 自定义激活函数前向与反向传播 ReLu系列 含优点与缺点

文章目錄

ReLu

公式

求導(dǎo)過程

優(yōu)點(diǎn)：

缺點(diǎn)：

自定義ReLu

與Torch定義的比較

可視化

Leaky ReLu PReLu

公式

求導(dǎo)過程

優(yōu)點(diǎn)：

缺點(diǎn)：

自定義LeakyReLu

與Torch定義的比較

可視化

自定義PReLu

ELU

公式

求導(dǎo)過程

優(yōu)點(diǎn)

缺點(diǎn)

自定義LeakyReLu

與Torch定義的比較

可視化

總結(jié)

Pytorch 自定义激活函数前向与反向传播 ReLu系列含优点与缺点