日韩性视频-久久久蜜桃-www中文字幕-在线中文字幕av-亚洲欧美一区二区三区四区-撸久久-香蕉视频一区-久久无码精品丰满人妻-国产高潮av-激情福利社-日韩av网址大全-国产精品久久999-日本五十路在线-性欧美在线-久久99精品波多结衣一区-男女午夜免费视频-黑人极品ⅴideos精品欧美棵-人人妻人人澡人人爽精品欧美一区-日韩一区在线看-欧美a级在线免费观看

歡迎訪問 生活随笔!

生活随笔

當前位置: 首頁 > 编程资源 > 编程问答 >内容正文

编程问答

pytorch手动实现梯度下降法,随机梯度法--基于logistic Regression并探索Mini batch作用

發布時間:2025/4/16 编程问答 26 豆豆
生活随笔 收集整理的這篇文章主要介紹了 pytorch手动实现梯度下降法,随机梯度法--基于logistic Regression并探索Mini batch作用 小編覺得挺不錯的,現在分享給大家,幫大家做個參考.

簡述

基于這次凸優化的大項目作業。
下面會圍繞著通過logistic Regression來做MNIST集上的手寫數字識別~
以此來探索logistic Regression,梯度下降法,隨機梯度法,以及Mini batch的作用。

核心任務是實現梯度下降法和隨機梯度法。但是其他的準備工作也得做的較為好~

導入的包

import os import torch import torch.nn as nn import torch.utils.data as Data import torchvision

讀取數據

EPOCH = 1 # train the training data n times, to save time, we just train 1 epoch BATCH_SIZE = 1 DOWNLOAD_MNIST = False LR = 0.001# Mnist digits dataset if not (os.path.exists('./mnist/')) or not os.listdir('./mnist/'):# not mnist dir or mnist is empyt dirDOWNLOAD_MNIST = Truetrain_data = torchvision.datasets.MNIST(root='./mnist/',train=True, # this is training datatransform=torchvision.transforms.ToTensor(),download=DOWNLOAD_MNIST, )# Data Loader for easy mini-batch return in training, the image batch shape will be (50, 1, 28, 28) train_loader = Data.DataLoader(dataset=train_data, batch_size=BATCH_SIZE) # , shuffle=True)

sigmoid函數

sigmoid函數,將數據R映射到(0,1)區間上了。

11?e?x\frac{1}{1-e^{-x}}1?e?x1?

softmax函數

softmax是將根據n個數值的大小來分配概率區間

exi∑inexi\frac{e^{x_i}}{\sum_{i}^{n}{e^{x_i}}}in?exi?exi??

  • 一般來說,為了避免數值越界的話,會要求減去最大值。
  • 但是這里我們是用logistic regression,數值都會在0,1區間中,不會太大,因此不用擔心這個問題。

cross_Entropy函數

cross_Entropy 就是交叉熵。

?∑pilog(qi)-\sum{p_i log(q_i)}?pi?log(qi?)
這里,一旦我們給出了標準的label之后,我們就知道實際的p值分布為
只有一個元素為1,其他元素為0的概率分布了。

也就說,我們這就是

?log(qlabel)-log(q_{label})?log(qlabel?)
也就是對應label的概率越大越好~

任務描述

min?A,bCE(SM(SIG(Ax+b)),label)\min_{A, b}{ CE(SM( SIG(Ax+b)), label)}A,bmin?CE(SM(SIG(Ax+b)),label)

  • SMSMSM: softmax
  • SIGSIGSIG:sigmoid
  • CECECE:cross_Entropy
  • labellabellabel: 真實標簽

采用SDG,和DG算法

本文采用了pytorch實現,主要是為了避免手動算梯度。pytorch有autograd的機制。

本文一直采用的是固定步長

SGD

  • batch = 1

  • (GD的alpha采用的是0.001)

  • 最后的結果是:0.836

  • 準確率的變化情況

  • A和b和最優值的距離(這里用的是矩陣二范數)

  • 實現SDG的部分代碼

從logistics regression模型中獲取了

A, b = [i for i in logits.parameters()] A.cuda() b.cuda()

通過查看pytorch的源碼實現中關于優化器部分的實現,手動設置了梯度歸零的操作,不然就會是累積梯度了。

if A.grad is not None:A.grad.zero_()b.grad.zero_()
  • 梯度下降更新梯度
A.data = A.data - alpha * A.grad.data b.data = b.data - alpha * b.grad.data

完整代碼

import osimport torch import torch.nn as nn import torch.utils.data as Data import torchvision import matplotlib.pyplot as plt EPOCH = 5 # train the training data n times, to save time, we just train 1 epoch BATCH_SIZE = 1 DOWNLOAD_MNIST = False LR = 0.001# Mnist digits dataset if not (os.path.exists('./mnist/')) or not os.listdir('./mnist/'):# not mnist dir or mnist is empyt dirDOWNLOAD_MNIST = Truetrain_data = torchvision.datasets.MNIST(root='./mnist/',train=True, # this is training datatransform=torchvision.transforms.ToTensor(),download=DOWNLOAD_MNIST, )# Data Loader for easy mini-batch return in training, the image batch shape will be (50, 1, 28, 28) train_loader = Data.DataLoader(dataset=train_data, batch_size=BATCH_SIZE, shuffle=True)class Logits(nn.Module):def __init__(self):super(Logits, self).__init__()self.linear = nn.Linear(28 * 28, 10)self.sigmoid = nn.Sigmoid()self.softmax = nn.Softmax(dim=1)def forward(self, x):x = self.linear(x)x = self.sigmoid(x)x = self.softmax(x)return xtest_data = torchvision.datasets.MNIST(root='./mnist/', train=False) test_x = torch.unsqueeze(test_data.test_data, dim=1).type(torch.FloatTensor).cuda() / 255. # shape from (2000, 28, 28) to (2000, 1, 28, 28), value in range(0,1) test_y = test_data.test_labelsalpha = 0.001logits = Logits().cuda() # optimizer = torch.optim.SGD(logits.parameters(), lr=LR) # optimize all cnn parameters # optimizer.zero_grad() loss_func = nn.CrossEntropyLoss() # the target label is not one-hottedAccurate = [] Astore = [] bstore = [] A, b = [i for i in logits.parameters()] A.cuda() b.cuda() for e in range(EPOCH):for step, (x, b_y) in enumerate(train_loader): # gives batch datab_x = x.view(-1, 28 * 28).cuda() # reshape x to (batch, time_step, input_size)b_y = b_y.cuda()output = logits(b_x) # logits outputloss = loss_func(output, b_y) # cross entropy lossif A.grad is not None:A.grad.zero_()b.grad.zero_()loss.backward() # backpropagation, compute gradientsA.data = A.data - alpha * A.grad.datab.data = b.data - alpha * b.grad.dataif step % 1500 == 0:test_output = logits(test_x.view(-1, 28 * 28))pred_y = torch.max(test_output, 1)[1].cuda().data.squeeze()Accurate.append(sum(test_y.cpu().numpy() == pred_y.cpu().numpy()) / (1.0 * len(test_y.cpu().numpy())))print(Accurate[-1])Astore.append(A.detach())bstore.append(b.detach()) test_output = logits(test_x.view(-1, 28 * 28)) pred_y = torch.max(test_output, 1)[1].cuda().data.squeeze()print(pred_y, 'prediction number') print(test_y, 'real number') Accurate.append(sum(test_y.cpu().numpy() == pred_y.cpu().numpy()) / (1.0 * len(test_y.cpu().numpy()))) print(Accurate[-1])for i in range(len(Astore)):Astore[i] = (Astore[i] - Astore[-1]).norm()bstore[i] = (bstore[i] - bstore[-1]).norm()plt.plot(Astore, label='A') plt.plot(bstore, label='b') plt.legend() plt.show() plt.cla() plt.plot(Accurate) plt.show()

GD

將BATCHSIZE設置為6000(MNIST訓練集的數目)就是全梯度下降了。

  • 但是這里的步長不宜過小(GD的alpha采用的是0.05)

其他關鍵的地方都是一樣的,但是因為用到了GPU計算,而且數據集也只有一個,所以先將數據集也拿出來。避免反復的調用MNIST loader讀取數據,再放到GPU上,浪費時間。

此外,將EPOCH次數,設置了為5000

  • 在GPU環境下,很快就完成了運算

import osimport matplotlib.pyplot as plt import torch import torch.nn as nn import torch.utils.data as Data import torchvisionEPOCH = 5000 # train the training data n times, to save time, we just train 1 epoch BATCH_SIZE = 60000 DOWNLOAD_MNIST = False# Mnist digits dataset if not (os.path.exists('./mnist/')) or not os.listdir('./mnist/'):# not mnist dir or mnist is empyt dirDOWNLOAD_MNIST = Truetrain_data = torchvision.datasets.MNIST(root='./mnist/',train=True, # this is training datatransform=torchvision.transforms.ToTensor(),download=DOWNLOAD_MNIST, )# Data Loader for easy mini-batch return in training, the image batch shape will be (50, 1, 28, 28) train_loader = Data.DataLoader(dataset=train_data, batch_size=BATCH_SIZE, shuffle=True)class Logits(nn.Module):def __init__(self):super(Logits, self).__init__()self.linear = nn.Linear(28 * 28, 10)self.sigmoid = nn.Sigmoid()self.softmax = nn.Softmax(dim=1)def forward(self, x):x = self.linear(x)x = self.sigmoid(x)x = self.softmax(x)return xtest_data = torchvision.datasets.MNIST(root='./mnist/', train=False) test_x = torch.unsqueeze(test_data.test_data, dim=1).type(torch.FloatTensor).cuda() / 255. # shape from (2000, 28, 28) to (2000, 1, 28, 28), value in range(0,1) test_y = test_data.test_labelsalpha = 0.05logits = Logits().cuda() # optimizer = torch.optim.SGD(logits.parameters(), lr=LR) # optimize all cnn parameters # optimizer.zero_grad() loss_func = nn.CrossEntropyLoss() # the target label is not one-hottedAccurate = [] Astore = [] bstore = [] A, b = [i for i in logits.parameters()] A.cuda() b.cuda() x, b_y = [(i, j) for i, j in train_loader][0] b_x = x.view(-1, 28 * 28).cuda() # reshape x to (batch, time_step, input_size) b_y = b_y.cuda() for e in range(EPOCH):output = logits(b_x) # logits outputloss = loss_func(output, b_y) # cross entropy lossif A.grad is not None:A.grad.zero_()b.grad.zero_()loss.backward() # backpropagation, compute gradientsA.data = A.data - alpha * A.grad.datab.data = b.data - alpha * b.grad.datatest_output = logits(test_x.view(-1, 28 * 28))# print(e)if e % 10 == 0:pred_y = torch.max(test_output, 1)[1].cuda().data.squeeze()Accurate.append(sum(test_y.cpu().numpy() == pred_y.cpu().numpy()) / (1.0 * len(test_y.cpu().numpy())))print(e, Accurate[-1])Astore.append(A.detach())bstore.append(b.detach())test_output = logits(test_x.view(-1, 28 * 28)) pred_y = torch.max(test_output, 1)[1].cuda().data.squeeze()print(pred_y, 'prediction number') print(test_y, 'real number') Accurate.append(sum(test_y.cpu().numpy() == pred_y.cpu().numpy()) / (1.0 * len(test_y.cpu().numpy()))) print(Accurate[-1])for i in range(len(Astore)):Astore[i] = (Astore[i] - Astore[-1]).norm()bstore[i] = (bstore[i] - bstore[-1]).norm()plt.plot(Astore, label='A') plt.plot(bstore, label='b') plt.legend() plt.show() plt.cla() plt.plot(Accurate) plt.show()

探索batch

注意到當batch設置的比較大(比如像GD算法中的),那對于步長的設計要求還是蠻高的。(真實調參俠hhh)

  • 注意到SGD使用batchsize=1的時候當第25張圖的時候,精度就很高,根據step的間隔用的是1500來計算,應該是在第37500個訓練數據的時候效果就比較突出了。
  • 下面我們用的是batchsize=20的SGD,step的間隔是500,但是卻到了100張圖的時候.也就是1000000的時候,精度才類似。

    再結合之前的GD,可以意識到mini batch的size應該太大,這里再將數字調小一點做下面的計算在下面的代碼后面:
import osimport matplotlib.pyplot as plt import torch import torch.nn as nn import torch.utils.data as Data import torchvisionEPOCH = 100 # train the training data n times, to save time, we just train 1 epoch BATCH_SIZE = 20 DOWNLOAD_MNIST = False LR = 0.001# Mnist digits dataset if not (os.path.exists('./mnist/')) or not os.listdir('./mnist/'):# not mnist dir or mnist is empyt dirDOWNLOAD_MNIST = Truetrain_data = torchvision.datasets.MNIST(root='./mnist/',train=True, # this is training datatransform=torchvision.transforms.ToTensor(),download=DOWNLOAD_MNIST, )# Data Loader for easy mini-batch return in training, the image batch shape will be (50, 1, 28, 28) train_loader = Data.DataLoader(dataset=train_data, batch_size=BATCH_SIZE, shuffle=True)class Logits(nn.Module):def __init__(self):super(Logits, self).__init__()self.linear = nn.Linear(28 * 28, 10)self.sigmoid = nn.Sigmoid()self.softmax = nn.Softmax(dim=1)def forward(self, x):x = self.linear(x)x = self.sigmoid(x)x = self.softmax(x)return xtest_data = torchvision.datasets.MNIST(root='./mnist/', train=False) test_x = torch.unsqueeze(test_data.test_data, dim=1).type(torch.FloatTensor).cuda() / 255. # shape from (2000, 28, 28) to (2000, 1, 28, 28), value in range(0,1) test_y = test_data.test_labelsalpha = 0.001logits = Logits().cuda() # optimizer = torch.optim.SGD(logits.parameters(), lr=LR) # optimize all cnn parameters # optimizer.zero_grad() loss_func = nn.CrossEntropyLoss() # the target label is not one-hottedAccurate = [] Astore = [] bstore = [] A, b = [i for i in logits.parameters()] A.cuda() b.cuda() data = [(step, (x, b_y)) for step, (x, b_y) in enumerate(train_loader)] for e in range(EPOCH):for step, (x, b_y) in data: # gives batch datab_x = x.view(-1, 28 * 28).cuda() # reshape x to (batch, time_step, input_size)b_y = b_y.cuda()output = logits(b_x) # logits outputloss = loss_func(output, b_y) # cross entropy lossif A.grad is not None:A.grad.zero_()b.grad.zero_()loss.backward() # backpropagation, compute gradientsA.data = A.data - alpha * A.grad.datab.data = b.data - alpha * b.grad.dataif step % 500 == 0:test_output = logits(test_x.view(-1, 28 * 28))pred_y = torch.max(test_output, 1)[1].cuda().data.squeeze()Accurate.append(sum(test_y.cpu().numpy() == pred_y.cpu().numpy()) / (1.0 * len(test_y.cpu().numpy())))print(Accurate[-1])Astore.append(A.detach())bstore.append(b.detach()) test_output = logits(test_x.view(-1, 28 * 28)) pred_y = torch.max(test_output, 1)[1].cuda().data.squeeze()print(pred_y, 'prediction number') print(test_y, 'real number') Accurate.append(sum(test_y.cpu().numpy() == pred_y.cpu().numpy()) / (1.0 * len(test_y.cpu().numpy()))) print(Accurate[-1])for i in range(len(Astore)):Astore[i] = (Astore[i] - Astore[-1]).norm()bstore[i] = (bstore[i] - bstore[-1]).norm()plt.plot(Astore, label='A') plt.plot(bstore, label='b') plt.legend() plt.show() plt.cla() plt.plot(Accurate) plt.show()
  • 這里講batchsize設置為8
    • batchsize=8,step區間設置為了2000,大概是第20個圖的時候效果類似。 320000個數據,比之前的batchsize=20的好多了。
    • 類似的會發現在batchsize=4的時候收斂速度也會加快一點(minibatch真的要mini 哈哈哈)

import osimport matplotlib.pyplot as plt import torch import torch.nn as nn import torch.utils.data as Data import torchvisionEPOCH = 20 # train the training data n times, to save time, we just train 1 epoch BATCH_SIZE = 8 DOWNLOAD_MNIST = False LR = 0.001# Mnist digits dataset if not (os.path.exists('./mnist/')) or not os.listdir('./mnist/'):# not mnist dir or mnist is empyt dirDOWNLOAD_MNIST = Truetrain_data = torchvision.datasets.MNIST(root='./mnist/',train=True, # this is training datatransform=torchvision.transforms.ToTensor(),download=DOWNLOAD_MNIST, )# Data Loader for easy mini-batch return in training, the image batch shape will be (50, 1, 28, 28) train_loader = Data.DataLoader(dataset=train_data, batch_size=BATCH_SIZE, shuffle=True)class Logits(nn.Module):def __init__(self):super(Logits, self).__init__()self.linear = nn.Linear(28 * 28, 10)self.sigmoid = nn.Sigmoid()self.softmax = nn.Softmax(dim=1)def forward(self, x):x = self.linear(x)x = self.sigmoid(x)x = self.softmax(x)return xtest_data = torchvision.datasets.MNIST(root='./mnist/', train=False) test_x = torch.unsqueeze(test_data.test_data, dim=1).type(torch.FloatTensor).cuda() / 255. # shape from (2000, 28, 28) to (2000, 1, 28, 28), value in range(0,1) test_y = test_data.test_labelsalpha = 0.001logits = Logits().cuda() # optimizer = torch.optim.SGD(logits.parameters(), lr=LR) # optimize all cnn parameters # optimizer.zero_grad() loss_func = nn.CrossEntropyLoss() # the target label is not one-hottedAccurate = [] Astore = [] bstore = [] A, b = [i for i in logits.parameters()] A.cuda() b.cuda() data = [(step, (x.view(-1, 28 * 28), b_y)) for step, (x, b_y) in enumerate(train_loader)] for e in range(EPOCH):for step, (x, b_y) in data: # gives batch datab_x = x.cuda() # reshape x to (batch, time_step, input_size)b_y = b_y.cuda()output = logits(b_x) # logits outputloss = loss_func(output, b_y) # cross entropy lossif A.grad is not None:A.grad.zero_()b.grad.zero_()loss.backward() # backpropagation, compute gradientsA.data = A.data - alpha * A.grad.datab.data = b.data - alpha * b.grad.dataif step % 2000 == 0:test_output = logits(test_x.view(-1, 28 * 28))pred_y = torch.max(test_output, 1)[1].cuda().data.squeeze()Accurate.append(sum(test_y.cpu().numpy() == pred_y.cpu().numpy()) / (1.0 * len(test_y.cpu().numpy())))print(e, Accurate[-1])Astore.append(A.detach())bstore.append(b.detach()) test_output = logits(test_x.view(-1, 28 * 28)) pred_y = torch.max(test_output, 1)[1].cuda().data.squeeze()print(pred_y, 'prediction number') print(test_y, 'real number') Accurate.append(sum(test_y.cpu().numpy() == pred_y.cpu().numpy()) / (1.0 * len(test_y.cpu().numpy()))) print(Accurate[-1])for i in range(len(Astore)):Astore[i] = (Astore[i] - Astore[-1]).norm()bstore[i] = (bstore[i] - bstore[-1]).norm()plt.plot(Astore, label='A') plt.plot(bstore, label='b') plt.legend() plt.show() plt.cla() plt.plot(Accurate) plt.show()
  • batchsize=4
  • step區間等于4000。也就是說這里跟上面的圖用的index應該是對齊的,可以發現,這里的訓練速度快了很多。

  • 可能是算法設計上還不夠完善(固定步長),這里發現batch越小效果越好,但是實際中batch其實要適中才是比較好的,一般來說batch=8

總結

以上是生活随笔為你收集整理的pytorch手动实现梯度下降法,随机梯度法--基于logistic Regression并探索Mini batch作用的全部內容,希望文章能夠幫你解決所遇到的問題。

如果覺得生活随笔網站內容還不錯,歡迎將生活随笔推薦給好友。