41激活函數(shù)與GPU加速
sigmoid /Tanh 會(huì)出現(xiàn)梯度離散問題,就是梯度為0(導(dǎo)數(shù)為0) relu 在x=0處不連續(xù),x小于0時(shí)梯度為0,x大于0梯度為1不變,利于串行的傳播,這樣就不會(huì)出現(xiàn)梯度爆炸或梯度離散的情況
relu x小于0時(shí)梯度為0,為解決這個(gè)在x小于0部分 設(shè)置了y=a*x,使得有一定的梯度a而不是0,斜角一般默認(rèn)0.02的樣子 selu=relu+指數(shù)函數(shù),使得在x=0出也有平滑的曲線變得連續(xù)。 softplus, 是relu在x=0做個(gè)平滑的曲線使得其在附近連續(xù)
gpu加速 使用.to(device)方法 注意data和data.to(device)的類型是不一樣的 因?yàn)橐粋€(gè)是cpu版本一個(gè)是gpu版本,除此之外的.to(deviece)是一樣的
激活函數(shù)改成leakyrelu且搬到gpu上加速, 比原來的relu acc從
83 % 變
94 % # 超參數(shù)
from torchvision import datasets
, transformsbatch_size
= 200
learning_rate
= 0.01
epochs
= 10 # 獲取訓(xùn)練數(shù)據(jù)
train_db
= datasets
. MNIST ( '../data' , train
= True
, download
= True
, # train
= True則得到的是訓(xùn)練集transform
= transforms
. Compose ( [ # transform進(jìn)行數(shù)據(jù)預(yù)處理transforms
. ToTensor ( ) , # 轉(zhuǎn)成Tensor類型的數(shù)據(jù)transforms
. Normalize ( ( 0.1307 , ) , ( 0.3081 , ) ) # 進(jìn)行數(shù)據(jù)標(biāo)準(zhǔn)化
( 減去均值除以方差
) ] ) ) # DataLoader把訓(xùn)練數(shù)據(jù)分成多個(gè)小組,此函數(shù)每次拋出一組數(shù)據(jù)。直至把所有的數(shù)據(jù)都拋出。就是做一個(gè)數(shù)據(jù)的初始化
train_loader
= torch
. utils
. data
. DataLoader ( train_db
, batch_size
= batch_size
, shuffle
= True
) # 獲取測(cè)試數(shù)據(jù)
test_db
= datasets
. MNIST ( '../data' , train
= False
, transform
= transforms
. Compose ( [ transforms
. ToTensor ( ) , transforms
. Normalize ( ( 0.1307 , ) , ( 0.3081 , ) ) ] ) ) test_loader
= torch
. utils
. data
. DataLoader ( test_db
, batch_size
= batch_size
, shuffle
= True
) class MLP ( nn
. Module
) : def
__init__ ( self
) : super ( MLP
, self
) . __init__ ( ) self
. model
= nn
. Sequential ( #sequential串聯(lián)起來nn
. Linear ( 784 , 200 ) , nn
. LeakyReLU ( inplace
= True
) , nn
. Linear ( 200 , 200 ) , nn
. LeakyReLU ( inplace
= True
) , nn
. Linear ( 200 , 10 ) , nn
. LeakyReLU ( inplace
= True
) , ) def
forward ( self
, x
) : x
= self
. model ( x
) return x
#Train
device
= torch
. device ( 'cuda:0' )
net
= MLP ( ) . to ( device
) #網(wǎng)絡(luò)結(jié)構(gòu) 就是foward函數(shù)
optimizer
= optim
. SGD ( net
. parameters ( ) , lr
= learning_rate
) #使用nn
. Module可以直接代替之前
[ w1
, b1
, w2
, b2
. 。。
]
criteon
= nn
. CrossEntropyLoss ( ) . to ( device
) for epoch in
range ( epochs
) : for batch_ind
, ( data
, target
) in
enumerate ( train_loader
) : data
= data
. view ( - 1 , 28 * 28 ) data
, target
= data
. to ( device
) , target
. to ( device
) #target
. cuda ( ) logits
= net ( data
) #這不要再加softmax logits就是predloss
= criteon ( logits
, target
) #求lossoptimizer
. zero_grad ( ) loss
. backward ( ) optimizer
. step ( ) if batch_ind
% 100 == 0 : print ( 'Train Epoch:{} [{}/{} ({:.0f}%)]\t Loss:{:.6f}' . format ( epoch
, batch_ind
* len ( data
) , len ( train_loader
. dataset
) , 100. * batch_ind
/ len ( train_loader
) , loss
. item ( ) ) ) test_loss
= 0
correct
= 0 for data
, target in test_loader
: data
= data
. view ( - 1 , 28 * 28 ) #第一個(gè)維度保持不變寫
- 1 data
, target
= data
. to ( device
) , target
. to ( device
) logits
= net ( data
) test_loss
+ = criteon ( logits
, target
) . item ( ) pred
= logits
. data
. max ( 1 ) [ 1 ] #因?yàn)閏orrect
+ = pred
. eq ( target
. data
) . sum ( ) test_loss
/ = len ( train_loader
. dataset
)
print ( '\n test set:average loss:{:.4f},Accuracy:{}/{} ({:.0f}%)\n' . format ( test_loss
, correct
, len ( test_loader
. dataset
) , 100. * correct
/ len ( test_loader
. dataset
)
) ) '' '
F
: \anaconda\envs\pytorch\python
. exe F
: / pythonProject1
/ pythonProject3
/ ll
. py
Train Epoch:
0 [ 0 / 60000 ( 0 % ) ] Loss
: 2.315502
Train Epoch:
0 [ 20000 / 60000 ( 33 % ) ] Loss
: 2.117644
Train Epoch:
0 [ 40000 / 60000 ( 67 % ) ] Loss
: 1.659186
Train Epoch:
1 [ 0 / 60000 ( 0 % ) ] Loss
: 1.290930
Train Epoch:
1 [ 20000 / 60000 ( 33 % ) ] Loss
: 1.049087
Train Epoch:
1 [ 40000 / 60000 ( 67 % ) ] Loss
: 0.872082
Train Epoch:
2 [ 0 / 60000 ( 0 % ) ] Loss
: 0.528612
Train Epoch:
2 [ 20000 / 60000 ( 33 % ) ] Loss
: 0.402818
Train Epoch:
2 [ 40000 / 60000 ( 67 % ) ] Loss
: 0.400452
Train Epoch:
3 [ 0 / 60000 ( 0 % ) ] Loss
: 0.318432
Train Epoch:
3 [ 20000 / 60000 ( 33 % ) ] Loss
: 0.344411
Train Epoch:
3 [ 40000 / 60000 ( 67 % ) ] Loss
: 0.443066
Train Epoch:
4 [ 0 / 60000 ( 0 % ) ] Loss
: 0.310835
Train Epoch:
4 [ 20000 / 60000 ( 33 % ) ] Loss
: 0.263893
Train Epoch:
4 [ 40000 / 60000 ( 67 % ) ] Loss
: 0.292117
Train Epoch:
5 [ 0 / 60000 ( 0 % ) ] Loss
: 0.331171
Train Epoch:
5 [ 20000 / 60000 ( 33 % ) ] Loss
: 0.192741
Train Epoch:
5 [ 40000 / 60000 ( 67 % ) ] Loss
: 0.396357
Train Epoch:
6 [ 0 / 60000 ( 0 % ) ] Loss
: 0.363707
Train Epoch:
6 [ 20000 / 60000 ( 33 % ) ] Loss
: 0.225204
Train Epoch:
6 [ 40000 / 60000 ( 67 % ) ] Loss
: 0.218652
Train Epoch:
7 [ 0 / 60000 ( 0 % ) ] Loss
: 0.209941
Train Epoch:
7 [ 20000 / 60000 ( 33 % ) ] Loss
: 0.210056
Train Epoch:
7 [ 40000 / 60000 ( 67 % ) ] Loss
: 0.296629
Train Epoch:
8 [ 0 / 60000 ( 0 % ) ] Loss
: 0.361880
Train Epoch:
8 [ 20000 / 60000 ( 33 % ) ] Loss
: 0.213277
Train Epoch:
8 [ 40000 / 60000 ( 67 % ) ] Loss
: 0.170169
Train Epoch:
9 [ 0 / 60000 ( 0 % ) ] Loss
: 0.301176
Train Epoch:
9 [ 20000 / 60000 ( 33 % ) ] Loss
: 0.175931
Train Epoch:
9 [ 40000 / 60000 ( 67 % ) ] Loss
: 0.214820 test set
: average loss:
0.0002 , Accuracy
: 9370 / 10000 ( 94 % ) Process finished with exit code
0 '' '
42測(cè)試方法
當(dāng)在train上面不停的train,可能會(huì)使得loss很低 accuracy很高,但其實(shí)模型只是記住很淺層的東西,不能學(xué)習(xí)本質(zhì)上的東西,造成over fitting過擬合,在validation上做test就可以發(fā)現(xiàn)在后面階段acc不穩(wěn)定甚至下降,loss不穩(wěn)定甚至上升,所以不是說越訓(xùn)練越好,數(shù)據(jù)量和架構(gòu)是核心
logits
= torch
. rand ( 4 , 10 )
#四張圖片每張圖片
10 維的vector(代表特征:類別
0 - 9 )
pred
= F
. softmax ( logits
, dim
= 1 )
#在dim
= 1 上做softmax 因?yàn)橄M麑?duì)每張圖片的輸出值做softmax, 再dim
= 0 做softmax結(jié)果也是
[ 4 ,
10 ] 的tensor但是結(jié)果不同pred_label
= pred
. argmax ( dim
= 1 )
logits
. argmax ( dim
= 1 )
#先對(duì)pred的值做argmax,再對(duì)logits的值做argmax,返回的都是
[ b
] 大小的tensor,發(fā)現(xiàn)是二者的argmax是一樣的,因?yàn)?span id="ozvdkddzhkzd" class="token number">4張圖片每張圖片都有一個(gè)最大可能性的label,所以對(duì)pred還是對(duì)logits對(duì)argmax都可以correct
= torch
. eq ( pred_label
, label
) #比較是否預(yù)測(cè)正確
correct
. sum ( ) . float ( ) . item ( ) / 4 #就是acc 因?yàn)閏orrect是tensor,
. item ( ) 取標(biāo)量
控制精度test頻率變高,花大量時(shí)間train就是test頻率變小
#
- * - codeing
= utf
- 8 - * -
# @Time
: 2021 / 5 / 14 21 : 06
# @Author
: sueong
# @File
: ll
. py
# @Software
: PyCharm
import torch
import torch
. nn as nn
from torch import optim# 超參數(shù)
from torchvision import datasets
, transformsbatch_size
= 200
learning_rate
= 0.01
epochs
= 10 # 獲取訓(xùn)練數(shù)據(jù)
train_db
= datasets
. MNIST ( '../data' , train
= True
, download
= True
, # train
= True則得到的是訓(xùn)練集transform
= transforms
. Compose ( [ # transform進(jìn)行數(shù)據(jù)預(yù)處理transforms
. ToTensor ( ) , # 轉(zhuǎn)成Tensor類型的數(shù)據(jù)transforms
. Normalize ( ( 0.1307 , ) , ( 0.3081 , ) ) # 進(jìn)行數(shù)據(jù)標(biāo)準(zhǔn)化
( 減去均值除以方差
) ] ) ) # DataLoader把訓(xùn)練數(shù)據(jù)分成多個(gè)小組,此函數(shù)每次拋出一組數(shù)據(jù)。直至把所有的數(shù)據(jù)都拋出。就是做一個(gè)數(shù)據(jù)的初始化
train_loader
= torch
. utils
. data
. DataLoader ( train_db
, batch_size
= batch_size
, shuffle
= True
) # 獲取測(cè)試數(shù)據(jù)
test_db
= datasets
. MNIST ( '../data' , train
= False
, transform
= transforms
. Compose ( [ transforms
. ToTensor ( ) , transforms
. Normalize ( ( 0.1307 , ) , ( 0.3081 , ) ) ] ) ) test_loader
= torch
. utils
. data
. DataLoader ( test_db
, batch_size
= batch_size
, shuffle
= True
) class MLP ( nn
. Module
) : def
__init__ ( self
) : super ( MLP
, self
) . __init__ ( ) self
. model
= nn
. Sequential ( #sequential串聯(lián)起來nn
. Linear ( 784 , 200 ) , nn
. LeakyReLU ( inplace
= True
) , nn
. Linear ( 200 , 200 ) , nn
. LeakyReLU ( inplace
= True
) , nn
. Linear ( 200 , 10 ) , nn
. LeakyReLU ( inplace
= True
) , ) def
forward ( self
, x
) : x
= self
. model ( x
) return x
#Train
device
= torch
. device ( 'cuda:0' )
net
= MLP ( ) . to ( device
) #網(wǎng)絡(luò)結(jié)構(gòu) 就是foward函數(shù)
optimizer
= optim
. SGD ( net
. parameters ( ) , lr
= learning_rate
) #使用nn
. Module可以直接代替之前
[ w1
, b1
, w2
, b2
. 。。
]
criteon
= nn
. CrossEntropyLoss ( ) . to ( device
) for epoch in
range ( epochs
) : for batch_ind
, ( data
, target
) in
enumerate ( train_loader
) : data
= data
. view ( - 1 , 28 * 28 ) data
, target
= data
. to ( device
) , target
. to ( device
) #target
. cuda ( ) logits
= net ( data
) #這不要再加softmax logits就是predloss
= criteon ( logits
, target
) #求lossoptimizer
. zero_grad ( ) loss
. backward ( ) optimizer
. step ( ) if batch_ind
% 100 == 0 : print ( 'Train Epoch:{} [{}/{} ({:.0f}%)]\t Loss:{:.6f}' . format ( epoch
, batch_ind
* len ( data
) , len ( train_loader
. dataset
) , 100. * batch_ind
/ len ( train_loader
) , loss
. item ( ) ) ) #每一個(gè)epcho test一次可以發(fā)現(xiàn)acc再增加test_loss
= 0 correct
= 0 for data
, target in test_loader
: data
= data
. view ( - 1 , 28 * 28 ) #第一個(gè)維度保持不變寫
- 1 data
, target
= data
. to ( device
) , target
. to ( device
) logits
= net ( data
) test_loss
+ = criteon ( logits
, target
) . item ( ) pred
= logits
. data
. max ( 1 ) [ 1 ] #因?yàn)閏orrect
+ = pred
. eq ( target
. data
) . sum ( ) test_loss
/ = len ( train_loader
. dataset
) print ( '\n test set:average loss:{:.4f},Accuracy:{}/{} ({:.0f}%)\n' . format ( test_loss
, correct
, len ( test_loader
. dataset
) , 100. * correct
/ len ( test_loader
. dataset
) ) )
'' '
F
: \anaconda\envs\pytorch\python
. exe F
: / pythonProject1
/ pythonProject3
/ ll
. py
Train Epoch:
0 [ 0 / 60000 ( 0 % ) ] Loss
: 2.308717
Train Epoch:
0 [ 20000 / 60000 ( 33 % ) ] Loss
: 2.017611
Train Epoch:
0 [ 40000 / 60000 ( 67 % ) ] Loss
: 1.563952 test set
: average loss:
0.0011 , Accuracy
: 6175 / 10000 ( 62 % ) Train Epoch:
1 [ 0 / 60000 ( 0 % ) ] Loss
: 1.301144
Train Epoch:
1 [ 20000 / 60000 ( 33 % ) ] Loss
: 1.313298
Train Epoch:
1 [ 40000 / 60000 ( 67 % ) ] Loss
: 1.184744 test set
: average loss:
0.0008 , Accuracy
: 7102 / 10000 ( 71 % ) Train Epoch:
2 [ 0 / 60000 ( 0 % ) ] Loss
: 0.946402
Train Epoch:
2 [ 20000 / 60000 ( 33 % ) ] Loss
: 0.762401
Train Epoch:
2 [ 40000 / 60000 ( 67 % ) ] Loss
: 0.697880 test set
: average loss:
0.0004 , Accuracy
: 8841 / 10000 ( 88 % ) Train Epoch:
3 [ 0 / 60000 ( 0 % ) ] Loss
: 0.579781
Train Epoch:
3 [ 20000 / 60000 ( 33 % ) ] Loss
: 0.480412
Train Epoch:
3 [ 40000 / 60000 ( 67 % ) ] Loss
: 0.347749 test set
: average loss:
0.0003 , Accuracy
: 9047 / 10000 ( 90 % ) Train Epoch:
4 [ 0 / 60000 ( 0 % ) ] Loss
: 0.363675
Train Epoch:
4 [ 20000 / 60000 ( 33 % ) ] Loss
: 0.304079
Train Epoch:
4 [ 40000 / 60000 ( 67 % ) ] Loss
: 0.401550 test set
: average loss:
0.0003 , Accuracy
: 9118 / 10000 ( 91 % ) Train Epoch:
5 [ 0 / 60000 ( 0 % ) ] Loss
: 0.324268
Train Epoch:
5 [ 20000 / 60000 ( 33 % ) ] Loss
: 0.269142
Train Epoch:
5 [ 40000 / 60000 ( 67 % ) ] Loss
: 0.284855 test set
: average loss:
0.0002 , Accuracy
: 9195 / 10000 ( 92 % ) Train Epoch:
6 [ 0 / 60000 ( 0 % ) ] Loss
: 0.181122
Train Epoch:
6 [ 20000 / 60000 ( 33 % ) ] Loss
: 0.214253
Train Epoch:
6 [ 40000 / 60000 ( 67 % ) ] Loss
: 0.310929 test set
: average loss:
0.0002 , Accuracy
: 9229 / 10000 ( 92 % ) Train Epoch:
7 [ 0 / 60000 ( 0 % ) ] Loss
: 0.233558
Train Epoch:
7 [ 20000 / 60000 ( 33 % ) ] Loss
: 0.345559
Train Epoch:
7 [ 40000 / 60000 ( 67 % ) ] Loss
: 0.240973 test set
: average loss:
0.0002 , Accuracy
: 9286 / 10000 ( 93 % ) Train Epoch:
8 [ 0 / 60000 ( 0 % ) ] Loss
: 0.197916
Train Epoch:
8 [ 20000 / 60000 ( 33 % ) ] Loss
: 0.368038
Train Epoch:
8 [ 40000 / 60000 ( 67 % ) ] Loss
: 0.367101 test set
: average loss:
0.0002 , Accuracy
: 9310 / 10000 ( 93 % ) Train Epoch:
9 [ 0 / 60000 ( 0 % ) ] Loss
: 0.221928
Train Epoch:
9 [ 20000 / 60000 ( 33 % ) ] Loss
: 0.190280
Train Epoch:
9 [ 40000 / 60000 ( 67 % ) ] Loss
: 0.183632 test set
: average loss:
0.0002 , Accuracy
: 9351 / 10000 ( 94 % ) Process finished with exit code
0 '' '
43可視化
TensorBoard
visdom
1安裝 2 run server damon
legend里面放的是y1y2的一個(gè)圖標(biāo)
# -*- codeing = utf-8 -*-
# @Time :2021/5/14 21:06
# @Author:sueong
# @File:ll.py
# @Software:PyCharm
import torch
import torch.nn as nn
from torch import optim
from visdom import Visdom# 超參數(shù)
from torchvision import datasets, transforms
from visdom import Visdombatch_size = 200
learning_rate = 0.01
epochs = 10# 獲取訓(xùn)練數(shù)據(jù)
train_db = datasets.MNIST('../data', train=True, download=True, # train=True則得到的是訓(xùn)練集transform=transforms.Compose([ # transform進(jìn)行數(shù)據(jù)預(yù)處理transforms.ToTensor(), # 轉(zhuǎn)成Tensor類型的數(shù)據(jù)transforms.Normalize((0.1307,), (0.3081,)) # 進(jìn)行數(shù)據(jù)標(biāo)準(zhǔn)化(減去均值除以方差)]))# DataLoader把訓(xùn)練數(shù)據(jù)分成多個(gè)小組,此函數(shù)每次拋出一組數(shù)據(jù)。直至把所有的數(shù)據(jù)都拋出。就是做一個(gè)數(shù)據(jù)的初始化
train_loader = torch.utils.data.DataLoader(train_db, batch_size=batch_size, shuffle=True)# 獲取測(cè)試數(shù)據(jù)
test_db = datasets.MNIST('../data', train=False,transform=transforms.Compose([transforms.ToTensor(),transforms.Normalize((0.1307,), (0.3081,))]))test_loader = torch.utils.data.DataLoader(test_db, batch_size=batch_size, shuffle=True)class MLP(nn.Module):def __init__(self):super(MLP,self).__init__()self.model=nn.Sequential(#sequential串聯(lián)起來nn.Linear(784,200),nn.LeakyReLU(inplace=True),nn.Linear(200, 200),nn.LeakyReLU(inplace=True),nn.Linear(200,10),nn.LeakyReLU(inplace=True),)def forward(self,x):x = self.model(x)return x#Train
device=torch.device('cuda:0')
net=MLP().to(device)#網(wǎng)絡(luò)結(jié)構(gòu) 就是foward函數(shù)
optimizer=optim.SGD(net.parameters(),lr=learning_rate)#使用nn.Module可以直接代替之前[w1,b1,w2,b2.。。]
criteon=nn.CrossEntropyLoss().to(device)#在訓(xùn)練-測(cè)試的迭代過程之前,定義兩條曲線,這里相當(dāng)于是占位,
#在訓(xùn)練-測(cè)試的過程中再不斷填充點(diǎn)以實(shí)現(xiàn)曲線隨著訓(xùn)練動(dòng)態(tài)增長(zhǎng):
'''
這里第一步可以提供參數(shù)env='xxx'來設(shè)置環(huán)境窗口的名稱,這里什么都沒傳,所以是在默認(rèn)的main窗口下。第二第三步的viz.line的前兩個(gè)參數(shù)是曲線的Y和X的坐標(biāo)(前面是縱軸后面才是橫軸),
這里為了占位所以都設(shè)置了0(實(shí)際上為L(zhǎng)oss初始Y值設(shè)置為0的話,
在圖中剛開始的地方會(huì)有個(gè)大跳躍有點(diǎn)難看,因?yàn)長(zhǎng)oss肯定是從大往小了走的)。
為它們?cè)O(shè)置了不同的win參數(shù),它們就會(huì)在不同的窗口中展示,
因?yàn)榈谌蕉x的是測(cè)試集的loss和acc兩條曲線,所以在X等于0時(shí)Y給了兩個(gè)初始值。'''viz = Visdom()
viz.line([0.], [0.], win='train_loss', opts=dict(title='train loss'))
viz.line([[0.0, 0.0]], [0.], win='test', opts=dict(title='test loss&acc.',legend=['loss', 'acc.']))global_step = 0
#為了知道訓(xùn)練了多少個(gè)batch了,緊接著設(shè)置一個(gè)全局的計(jì)數(shù)器:for epoch in range(epochs):for batch_ind,(data,target) in enumerate(train_loader):data=data.view(-1,28*28)data,target=data.to(device),target.to(device) #target.cuda()logits=net(data)#這不要再加softmax logits就是predloss=criteon(logits,target)#求lossoptimizer.zero_grad()loss.backward()# print(w1.grad.norm(), w2.grad.norm())optimizer.step()#在每個(gè)batch訓(xùn)練完后,為訓(xùn)練曲線添加點(diǎn),來讓曲線實(shí)時(shí)增長(zhǎng):#注意這里用win參數(shù)來選擇是哪條曲線,# 用update='append'的方式添加曲線的增長(zhǎng)點(diǎn),前面是Y坐標(biāo),后面是X坐標(biāo)。global_step+=1viz.line([loss.item()], [global_step], win='train_loss', update='append')if batch_ind%100==0:print('Train Epoch:{} [{}/{} ({:.0f}%)]\t Loss:{:.6f}'.format(epoch,batch_ind*len(data),len(train_loader.dataset),100.* batch_ind/len(train_loader),loss.item()))#每一個(gè)epcho test一次可以發(fā)現(xiàn)acc再增加test_loss=0correct=0for data,target in test_loader:data=data.view(-1,28*28)#第一個(gè)維度保持不變寫-1data, target = data.to(device), target.to(device)logits=net(data)test_loss+=criteon(logits,target).item()pred=logits.data.max(1)[1]# 在dim=1上找最大值correct += pred.eq(target).float().sum().item()#在每次測(cè)試結(jié)束后, 并在另外兩個(gè)窗口(用win參數(shù)設(shè)置)中展示圖像(.images)和真實(shí)值(文本用.text):viz.line([[test_loss, correct / len(test_loader.dataset)]],[global_step], win='test', update='append')viz.images(data.view(-1, 1, 28, 28), win='x')viz.text(str(pred.detach().cpu().numpy()), win='pred',opts=dict(title='pred'))#老師的代碼里用到了.detach(),并把數(shù)據(jù)搬到了CPU上,這樣才能展示出來。test_loss/=len(train_loader.dataset)print('\n test set:average loss:{:.4f},Accuracy:{}/{} ({:.0f}%)\n'.format(test_loss,correct,len(test_loader.dataset),100.*correct/len(test_loader.dataset)))
44欠擬合和過擬合
真實(shí)分布符合認(rèn)知,但是不知道真實(shí)的分布和function的參數(shù) 而且這些函數(shù)不是線性的可能存在噪聲 次方越大,波形越大,抖動(dòng)越大波形越復(fù)雜 衡量模型的學(xué)習(xí)能力,次方增加了,表達(dá)能力增強(qiáng),能表達(dá)的分布更復(fù)雜,對(duì)復(fù)雜的映射也能學(xué)習(xí)到,即model capacity增大了
estimated用的模型的表達(dá)能力 ground-truth真實(shí)模型的復(fù)雜度 case1:Estimated<Ground-truth :under-fitting,用的模型表達(dá)能力不夠?qū)е虑窋M合
underfitting的表現(xiàn),增加模型復(fù)雜度/層數(shù)是否得到改善 case2:Ground-truth<Estimated :under-fitting,模型過于復(fù)雜,在有限數(shù)據(jù)集上包含了噪聲,過擬合導(dǎo)致泛化能力不好 在train上很好 但是test不好 現(xiàn)實(shí)中通常是overfitting
45交叉驗(yàn)證 1Train-Val-Test劃分
我們做test的目的是看有沒有overfitting 選取在overfitting之前最好的參數(shù),這里的test_loader其實(shí)的validation
我們一般選取test acc最高的點(diǎn),然后終止訓(xùn)練,然后選取最高點(diǎn)作為模型的最終狀態(tài)
保存在overfitting前效果最好的參數(shù)w和b val set:挑選模型參數(shù),在overfitting前停止train test set:測(cè)試,是交給客戶在驗(yàn)收的時(shí)候看性能怎么樣(test是模型不知道的數(shù)據(jù),防止val和train一起訓(xùn)練,如果客戶用val測(cè)試,那么val已經(jīng)train過了效果就很好就是作弊)
因?yàn)閠est是看不見的,如果根據(jù)test set反饋的acc去調(diào)整參數(shù),那么test和val的功能就一樣就會(huì)造成數(shù)據(jù)污染
總結(jié)
以上是生活随笔 為你收集整理的pytorch教程龙曲良41-45 的全部?jī)?nèi)容,希望文章能夠幫你解決所遇到的問題。
如果覺得生活随笔 網(wǎng)站內(nèi)容還不錯(cuò),歡迎將生活随笔 推薦給好友。