當前位置：首頁 > 编程资源 > 编程问答 >内容正文

编程问答

60分钟快速入门 PyTorch

發布時間：2023/12/10 编程问答 25 豆豆

生活随笔收集整理的這篇文章主要介紹了 60分钟快速入门 PyTorch 小編覺得挺不錯的,現在分享給大家,幫大家做個參考.

PyTorch 是由 Facebook 開發，基于 Torch 開發，從并不常用的 Lua 語言轉為 Python 語言開發的深度學習框架，Torch 是 TensorFlow 開源前非常出名的一個深度學習框架，而 PyTorch 在開源后由于其使用簡單，動態計算圖的特性得到非常多的關注，并且成為了 TensorFlow 的最大競爭對手。目前其 Github 也有 2w8+ 關注。
Github 地址： https://github.com/pytorch/pytorch
官網： https://pytorch.org/
論壇：https://discuss.pytorch.org/

本文是翻譯自官方版教程–DEEP LEARNING WITH PYTORCH: A 60 MINUTE BLITZ，一份 60 分鐘帶你快速入門 PyTorch 的教程。

本文目錄如下：

1. Pytorch 是什么

Pytorch 是一個基于 Python 的科學計算庫，它面向以下兩種人群：

希望將其代替 Numpy 來利用 GPUs 的威力；
一個可以提供更加靈活和快速的深度學習研究平臺。

1.1 安裝

pytorch 的安裝可以直接查看官網教程，如下所示，官網地址：https://pytorch.org/get-started/locally/

根據提示分別選擇系統(Linux、Mac 或者 Windows)，安裝方式(Conda，Pip，LibTorch 或者源碼安裝)、使用的編程語言(Python 2.7 或者 Python 3.5,3.6,3.7 或者是 C++)，如果是 GPU 版本，就需要選擇 CUDA 的版本，所以，如果如上圖所示選擇，安裝的命令是：

conda install pytorch torchvision cudatoolkit=9.0 -c pytorch

這里推薦采用 Conda 安裝，即使用 Anaconda，主要是可以設置不同環境配置不同的設置，關于 Anaconda 可以查看我之前寫的 Python 基礎入門–簡介和環境配置。

當然這里會安裝最新版本的 Pytorch，也就是 1.1 版本，如果希望安裝之前的版本，可以點擊下面的網址：

http://pytorch.org/get-started/previous-versions/

如下圖所示，安裝 0.4.1 版本的 pytorch，在不同版本的 CUDA 以及沒有 CUDA 的情況。

然后還有其他的安裝方式，具體可以自己點擊查看。

安裝后，輸入下列命令：

from __future__ import print_function import torch x = torch.rand(5, 3) print(x)

輸出結果類似下面的結果即安裝成功：

tensor([[0.3380, 0.3845, 0.3217],[0.8337, 0.9050, 0.2650],[0.2979, 0.7141, 0.9069],[0.1449, 0.1132, 0.1375],[0.4675, 0.3947, 0.1426]])

然后是驗證能否正確運行在 GPU 上，輸入下列代碼，這份代碼中 cuda.is_available() 主要是用于檢測是否可以使用當前的 GPU 顯卡，如果返回 True，當然就可以運行，否則就不能。

import torch torch.cuda.is_available()

1.2 張量(Tensors)

Pytorch 的一大作用就是可以代替 Numpy 庫，所以首先介紹 Tensors ，也就是張量，它相當于 Numpy 的多維數組(ndarrays)。兩者的區別就是 Tensors 可以應用到 GPU 上加快計算速度。

首先導入必須的庫，主要是 torch

from __future__ import print_function import torch

1.2.1 聲明和定義

首先是對 Tensors 的聲明和定義方法，分別有以下幾種：

torch.empty(): 聲明一個未初始化的矩陣。

# 創建一個 5*3 的矩陣 x = torch.empty(5, 3) print(x)

輸出結果如下：

tensor([[9.2737e-41, 8.9074e-01, 1.9286e-37],[1.7228e-34, 5.7064e+01, 9.2737e-41],[2.2803e+02, 1.9288e-37, 1.7228e-34],[1.4609e+04, 9.2737e-41, 5.8375e+04],[1.9290e-37, 1.7228e-34, 3.7402e+06]])

torch.rand()：隨機初始化一個矩陣

# 創建一個隨機初始化的 5*3 矩陣 rand_x = torch.rand(5, 3) print(rand_x)

輸出結果：

tensor([[0.4311, 0.2798, 0.8444],[0.0829, 0.9029, 0.8463],[0.7139, 0.4225, 0.5623],[0.7642, 0.0329, 0.8816],[1.0000, 0.9830, 0.9256]])

torch.zeros()：創建數值皆為 0 的矩陣

# 創建一個數值皆是 0，類型為 long 的矩陣 zero_x = torch.zeros(5, 3, dtype=torch.long) print(zero_x)

輸出結果如下：

tensor([[0, 0, 0],[0, 0, 0],[0, 0, 0],[0, 0, 0],[0, 0, 0]])

類似的也可以創建數值都是 1 的矩陣，調用 torch.ones

torch.tensor()：直接傳遞 tensor 數值來創建

# tensor 數值是 [5.5, 3] tensor1 = torch.tensor([5.5, 3]) print(tensor1)

輸出結果：

tensor([5.5000, 3.0000])

除了上述幾種方法，還可以根據已有的 tensor 變量創建新的 tensor 變量，這種做法的好處就是可以保留已有 tensor 的一些屬性，包括尺寸大小、數值屬性，除非是重新定義這些屬性。相應的實現方法如下：

tensor.new_ones()：new_*() 方法需要輸入尺寸大小

# 顯示定義新的尺寸是 5*3，數值類型是 torch.double tensor2 = tensor1.new_ones(5, 3, dtype=torch.double) # new_* 方法需要輸入 tensor 大小 print(tensor2)

輸出結果：

tensor([[1., 1., 1.],[1., 1., 1.],[1., 1., 1.],[1., 1., 1.],[1., 1., 1.]], dtype=torch.float64)

torch.randn_like(old_tensor)：保留相同的尺寸大小

# 修改數值類型 tensor3 = torch.randn_like(tensor2, dtype=torch.float) print('tensor3: ', tensor3)

輸出結果，這里是根據上個方法聲明的 tensor2 變量來聲明新的變量，可以看出尺寸大小都是 5*3，但是數值類型是改變了的。

tensor3: tensor([[-0.4491, -0.2634, -0.0040],[-0.1624, 0.4475, -0.8407],[-0.6539, -1.2772, 0.6060],[ 0.2304, 0.0879, -0.3876],[ 1.2900, -0.7475, -1.8212]])

最后，對 tensors 的尺寸大小獲取可以采用 tensor.size() 方法：

print(tensor3.size()) # 輸出: torch.Size([5, 3])

注意： torch.Size 實際上是元組(tuple)類型，所以支持所有的元組操作。

1.2.2 操作(Operations)

操作也包含了很多語法，但這里作為快速入門，僅僅以加法操作作為例子進行介紹，更多的操作介紹可以點擊下面網址查看官方文檔，包括轉置、索引、切片、數學計算、線性代數、隨機數等等：

https://pytorch.org/docs/stable/torch.html

對于加法的操作，有幾種實現方式：

+ 運算符
torch.add(tensor1, tensor2, [out=tensor3])
tensor1.add_(tensor2)：直接修改 tensor 變量

tensor4 = torch.rand(5, 3) print('tensor3 + tensor4= ', tensor3 + tensor4) print('tensor3 + tensor4= ', torch.add(tensor3, tensor4)) # 新聲明一個 tensor 變量保存加法操作的結果 result = torch.empty(5, 3) torch.add(tensor3, tensor4, out=result) print('add result= ', result) # 直接修改變量 tensor3.add_(tensor4) print('tensor3= ', tensor3)

輸出結果

tensor3 + tensor4= tensor([[ 0.1000, 0.1325, 0.0461],[ 0.4731, 0.4523, -0.7517],[ 0.2995, -0.9576, 1.4906],[ 1.0461, 0.7557, -0.0187],[ 2.2446, -0.3473, -1.0873]])tensor3 + tensor4= tensor([[ 0.1000, 0.1325, 0.0461],[ 0.4731, 0.4523, -0.7517],[ 0.2995, -0.9576, 1.4906],[ 1.0461, 0.7557, -0.0187],[ 2.2446, -0.3473, -1.0873]])add result= tensor([[ 0.1000, 0.1325, 0.0461],[ 0.4731, 0.4523, -0.7517],[ 0.2995, -0.9576, 1.4906],[ 1.0461, 0.7557, -0.0187],[ 2.2446, -0.3473, -1.0873]])tensor3= tensor([[ 0.1000, 0.1325, 0.0461],[ 0.4731, 0.4523, -0.7517],[ 0.2995, -0.9576, 1.4906],[ 1.0461, 0.7557, -0.0187],[ 2.2446, -0.3473, -1.0873]])

注意：可以改變 tensor 變量的操作都帶有一個后綴 _, 例如 x.copy_(y), x.t_() 都可以改變 x 變量

除了加法運算操作，對于 Tensor 的訪問，和 Numpy 對數組類似，可以使用索引來訪問某一維的數據，如下所示：

# 訪問 tensor3 第一列數據 print(tensor3[:, 0])

輸出結果：

tensor([0.1000, 0.4731, 0.2995, 1.0461, 2.2446])

對 Tensor 的尺寸修改，可以采用 torch.view() ，如下所示：

x = torch.randn(4, 4) y = x.view(16) # -1 表示除給定維度外的其余維度的乘積 z = x.view(-1, 8) print(x.size(), y.size(), z.size())

輸出結果：

torch.Size([4, 4]) torch.Size([16]) torch.Size([2, 8])

如果 tensor 僅有一個元素，可以采用 .item() 來獲取類似 Python 中整數類型的數值：

x = torch.randn(1) print(x) print(x.item())

輸出結果:

tensor([0.4549]) 0.4549027979373932

更多的運算操作可以查看官方文檔的介紹：

https://pytorch.org/docs/stable/torch.html

1.3 和 Numpy 數組的轉換

Tensor 和 Numpy 的數組可以相互轉換，并且兩者轉換后共享在 CPU 下的內存空間，即改變其中一個的數值，另一個變量也會隨之改變。

1.3.1 Tensor 轉換為 Numpy 數組

實現 Tensor 轉換為 Numpy 數組的例子如下所示，調用 tensor.numpy() 可以實現這個轉換操作。

a = torch.ones(5) print(a) b = a.numpy() print(b)

輸出結果：

tensor([1., 1., 1., 1., 1.]) [1. 1. 1. 1. 1.]

此外，剛剛說了兩者是共享同個內存空間的，例子如下所示，修改 tensor 變量 a，看看從 a 轉換得到的 Numpy 數組變量 b 是否發生變化。

a.add_(1) print(a) print(b)

輸出結果如下，很明顯，b 也隨著 a 的改變而改變。

tensor([2., 2., 2., 2., 2.]) [2. 2. 2. 2. 2.]

1.3.2 Numpy 數組轉換為 Tensor

轉換的操作是調用 torch.from_numpy(numpy_array) 方法。例子如下所示：

import numpy as np a = np.ones(5) b = torch.from_numpy(a) np.add(a, 1, out=a) print(a) print(b)

輸出結果：

[2. 2. 2. 2. 2.] tensor([2., 2., 2., 2., 2.], dtype=torch.float64)

在 CPU 上，除了 CharTensor 外的所有 Tensor 類型變量，都支持和 Numpy 數組的相互轉換操作。

1.4. CUDA 張量

Tensors 可以通過 .to 方法轉換到不同的設備上，即 CPU 或者 GPU 上。例子如下所示：

# 當 CUDA 可用的時候，可用運行下方這段代碼，采用 torch.device() 方法來改變 tensors 是否在 GPU 上進行計算操作 if torch.cuda.is_available():device = torch.device("cuda") # 定義一個 CUDA 設備對象y = torch.ones_like(x, device=device) # 顯示創建在 GPU 上的一個 tensorx = x.to(device) # 也可以采用 .to("cuda") z = x + yprint(z)print(z.to("cpu", torch.double)) # .to() 方法也可以改變數值類型

輸出結果，第一個結果就是在 GPU 上的結果，打印變量的時候會帶有 device='cuda:0'，而第二個是在 CPU 上的變量。

tensor([1.4549], device='cuda:0')tensor([1.4549], dtype=torch.float64)

本小節教程：

https://pytorch.org/tutorials/beginner/blitz/tensor_tutorial.html

本小節的代碼：

https://github.com/ccc013/DeepLearning_Notes/blob/master/Pytorch/practise/basic_practise.ipynb

2. autograd

對于 Pytorch 的神經網絡來說，非常關鍵的一個庫就是 autograd ，它主要是提供了對 Tensors 上所有運算操作的自動微分功能，也就是計算梯度的功能。它屬于 define-by-run 類型框架，即反向傳播操作的定義是根據代碼的運行方式，因此每次迭代都可以是不同的。

接下來會簡單介紹一些例子來說明這個庫的作用。

2.1 張量

torch.Tensor 是 Pytorch 最主要的庫，當設置它的屬性 .requires_grad=True，那么就會開始追蹤在該變量上的所有操作，而完成計算后，可以調用 .backward() 并自動計算所有的梯度，得到的梯度都保存在屬性 .grad 中。

調用 .detach() 方法分離出計算的歷史，可以停止一個 tensor 變量繼續追蹤其歷史信息，同時也防止未來的計算會被追蹤。

而如果是希望防止跟蹤歷史（以及使用內存），可以將代碼塊放在 with torch.no_grad(): 內，這個做法在使用一個模型進行評估的時候非常有用，因為模型會包含一些帶有 requires_grad=True 的訓練參數，但實際上并不需要它們的梯度信息。

對于 autograd 的實現，還有一個類也是非常重要-- Function 。

Tensor 和 Function 兩個類是有關聯并建立了一個非循環的圖，可以編碼一個完整的計算記錄。每個 tensor 變量都帶有屬性 .grad_fn ，該屬性引用了創建了這個變量的 Function （除了由用戶創建的 Tensors，它們的 grad_fn=None )。

如果要進行求導運算，可以調用一個 Tensor 變量的方法 .backward() 。如果該變量是一個標量，即僅有一個元素，那么不需要傳遞任何參數給方法 .backward() ，當包含多個元素的時候，就必須指定一個 gradient 參數，表示匹配尺寸大小的 tensor，這部分見第二小節介紹梯度的內容。

接下來就開始用代碼來進一步介紹。

首先導入必須的庫：

import torch

開始創建一個 tensor，并讓 requires_grad=True 來追蹤該變量相關的計算操作：

x = torch.ones(2, 2, requires_grad=True) print(x)

輸出結果：

tensor([[1., 1.],[1., 1.]], requires_grad=True)

執行任意計算操作，這里進行簡單的加法運算：

y = x + 2 print(y)

輸出結果：

tensor([[3., 3.],[3., 3.]], grad_fn=<AddBackward>)

y 是一個操作的結果，所以它帶有屬性 grad_fn：

print(y.grad_fn)

輸出結果：

繼續對變量 y 進行操作：

z = y * y * 3 out = z.mean()print('z=', z) print('out=', out)

輸出結果：

z= tensor([[27., 27.],[27., 27.]], grad_fn=<MulBackward>)out= tensor(27., grad_fn=<MeanBackward1>)

實際上，一個 Tensor 變量的默認 requires_grad 是 False ，可以像上述定義一個變量時候指定該屬性是 True，當然也可以定義變量后，調用 .requires_grad_(True) 設置為 True ，這里帶有后綴 _ 是會改變變量本身的屬性，在上一節介紹加法操作 add_() 說明過，下面是一個代碼例子：

a = torch.randn(2, 2) a = ((a * 3) / (a - 1)) print(a.requires_grad) a.requires_grad_(True) print(a.requires_grad) b = (a * a).sum() print(b.grad_fn)

輸出結果如下，第一行是為設置 requires_grad 的結果，接著顯示調用 .requires_grad_(True)，輸出結果就是 True 。

FalseTrue<SumBackward0 object at 0x00000216D25ED710>

2.2 梯度

接下來就是開始計算梯度，進行反向傳播的操作。out 變量是上一小節中定義的，它是一個標量，因此 out.backward() 相當于 out.backward(torch.tensor(1.)) ，代碼如下：

out.backward() # 輸出梯度 d(out)/dx print(x.grad)

輸出結果：

tensor([[4.5000, 4.5000],[4.5000, 4.5000]])

結果應該就是得到數值都是 4.5 的矩陣。這里我們用 o 表示 out 變量，那么根據之前的定義會有：
$\frac{1}{4}\sum_iz_i,\\ z_i = 3(x_i + 2)^2, \\ z_i|_{x_i=1} = 27$
詳細來說，初始定義的 x 是一個全為 1 的矩陣，然后加法操作 x+2 得到 y ，接著 y*y*3，得到 z ，并且此時 z 是一個 2*2 的矩陣，所以整體求平均得到 out 變量應該是除以 4，所以得到上述三條公式。

因此，計算梯度：
$?o?xi=32(xi+2),?o?xi∣xi=1=92=4.5\frac{\partial o}{\partial x_i} = \frac{3}{2}(x_i+2),\\ \frac{\partial o}{\partial x_i}|_{x_i=1} = \frac{9}{2} = 4.5$
從數學上來說，如果你有一個向量值函數：
$y?=f(x?)\vec{y}=f(\vec{x})$
那么對應的梯度是一個雅克比矩陣(Jacobian matrix)：
$KaTeX parse error: No such environment: split at position 8: \begin{?s?p?l?i?t?}?J=\left(\begin{…$
一般來說，torch.autograd 就是用于計算雅克比向量(vector-Jacobian)乘積的工具。這里略過數學公式，直接上代碼例子介紹：

x = torch.randn(3, requires_grad=True)y = x * 2 while y.data.norm() < 1000:y = y * 2print(y)

輸出結果：

tensor([ 237.5009, 1774.2396, 274.0625], grad_fn=<MulBackward>)

這里得到的變量 y 不再是一個標量，torch.autograd 不能直接計算完整的雅克比行列式，但我們可以通過簡單的傳遞向量給 backward() 方法作為參數得到雅克比向量的乘積，例子如下所示：

v = torch.tensor([0.1, 1.0, 0.0001], dtype=torch.float) y.backward(v)print(x.grad)

輸出結果：

tensor([ 102.4000, 1024.0000, 0.1024])

最后，加上 with torch.no_grad() 就可以停止追蹤變量歷史進行自動梯度計算：

print(x.requires_grad) print((x ** 2).requires_grad)with torch.no_grad():print((x ** 2).requires_grad)

輸出結果：

TrueTrueFalse

更多有關 autograd 和 Function 的介紹：

https://pytorch.org/docs/autograd

本小節教程：

https://pytorch.org/tutorials/beginner/blitz/autograd_tutorial.html

本小節的代碼：

https://github.com/ccc013/DeepLearning_Notes/blob/master/Pytorch/practise/autograd.ipynb

3. 神經網絡

在 PyTorch 中 torch.nn 專門用于實現神經網絡。其中 nn.Module 包含了網絡層的搭建，以及一個方法-- forward(input) ，并返回網絡的輸出 outptu .

下面是一個經典的 LeNet 網絡，用于對字符進行分類。

對于神經網絡來說，一個標準的訓練流程是這樣的：

定義一個多層的神經網絡
對數據集的預處理并準備作為網絡的輸入
將數據輸入到網絡
計算網絡的損失
反向傳播，計算梯度
更新網絡的梯度，一個簡單的更新規則是 weight = weight - learning_rate * gradient

3.1 定義網絡

首先定義一個神經網絡，下面是一個 5 層的卷積神經網絡，包含兩層卷積層和三層全連接層：

import torch import torch.nn as nn import torch.nn.functional as Fclass Net(nn.Module):def __init__(self):super(Net, self).__init__()# 輸入圖像是單通道，conv1 kenrnel size=5*5，輸出通道 6self.conv1 = nn.Conv2d(1, 6, 5)# conv2 kernel size=5*5, 輸出通道 16self.conv2 = nn.Conv2d(6, 16, 5)# 全連接層self.fc1 = nn.Linear(16*5*5, 120)self.fc2 = nn.Linear(120, 84)self.fc3 = nn.Linear(84, 10)def forward(self, x):# max-pooling 采用一個 (2,2) 的滑動窗口x = F.max_pool2d(F.relu(self.conv1(x)), (2, 2))# 核(kernel)大小是方形的話，可僅定義一個數字，如 (2,2) 用 2 即可x = F.max_pool2d(F.relu(self.conv2(x)), 2)x = x.view(-1, self.num_flat_features(x))x = F.relu(self.fc1(x))x = F.relu(self.fc2(x))x = self.fc3(x)return xdef num_flat_features(self, x):# 除了 batch 維度外的所有維度size = x.size()[1:]num_features = 1for s in size:num_features *= sreturn num_featuresnet = Net() print(net)

打印網絡結構：

Net((conv1): Conv2d(1, 6, kernel_size=(5, 5), stride=(1, 1))(conv2): Conv2d(6, 16, kernel_size=(5, 5), stride=(1, 1))(fc1): Linear(in_features=400, out_features=120, bias=True)(fc2): Linear(in_features=120, out_features=84, bias=True)(fc3): Linear(in_features=84, out_features=10, bias=True) )

這里必須實現 forward 函數，而 backward 函數在采用 autograd 時就自動定義好了，在 forward 方法可以采用任何的張量操作。

net.parameters() 可以返回網絡的訓練參數，使用例子如下：

params = list(net.parameters()) print('參數數量: ', len(params)) # conv1.weight print('第一個參數大小: ', params[0].size())

輸出：

參數數量: 10 第一個參數大小: torch.Size([6, 1, 5, 5])

然后簡單測試下這個網絡，隨機生成一個 32*32 的輸入：

# 隨機定義一個變量輸入網絡 input = torch.randn(1, 1, 32, 32) out = net(input) print(out)

輸出結果：

tensor([[ 0.1005, 0.0263, 0.0013, -0.1157, -0.1197, -0.0141, 0.1425, -0.0521,0.0689, 0.0220]], grad_fn=<ThAddmmBackward>)

接著反向傳播需要先清空梯度緩存，并反向傳播隨機梯度：

# 清空所有參數的梯度緩存，然后計算隨機梯度進行反向傳播 net.zero_grad() out.backward(torch.randn(1, 10))

注意：

torch.nn 只支持**小批量(mini-batches)**數據，也就是輸入不能是單個樣本，比如對于 nn.Conv2d 接收的輸入是一個 4 維張量–nSamples * nChannels * Height * Width 。

所以，如果你輸入的是單個樣本，需要采用 input.unsqueeze(0) 來擴充一個假的 batch 維度，即從 3 維變為 4 維。

3.2 損失函數

損失函數的輸入是 (output, target) ，即網絡輸出和真實標簽對的數據，然后返回一個數值表示網絡輸出和真實標簽的差距。

PyTorch 中其實已經定義了不少的損失函數，這里僅采用簡單的均方誤差：nn.MSELoss ，例子如下：

output = net(input) # 定義偽標簽 target = torch.randn(10) # 調整大小，使得和 output 一樣的 size target = target.view(1, -1) criterion = nn.MSELoss()loss = criterion(output, target) print(loss)

輸出如下：

tensor(0.6524, grad_fn=<MseLossBackward>)

這里，整個網絡的數據輸入到輸出經歷的計算圖如下所示，其實也就是數據從輸入層到輸出層，計算 loss 的過程。

input -> conv2d -> relu -> maxpool2d -> conv2d -> relu -> maxpool2d-> view -> linear -> relu -> linear -> relu -> linear-> MSELoss-> loss

如果調用 loss.backward() ，那么整個圖都是可微分的，也就是說包括 loss ，圖中的所有張量變量，只要其屬性 requires_grad=True ，那么其梯度 .grad 張量都會隨著梯度一直累計。

用代碼來說明：

# MSELoss print(loss.grad_fn) # Linear layer print(loss.grad_fn.next_functions[0][0]) # Relu print(loss.grad_fn.next_functions[0][0].next_functions[0][0])

輸出：

3.3 反向傳播

反向傳播的實現只需要調用 loss.backward() 即可，當然首先需要清空當前梯度緩存，即.zero_grad() 方法，否則之前的梯度會累加到當前的梯度，這樣會影響權值參數的更新。

下面是一個簡單的例子，以 conv1 層的偏置參數 bias 在反向傳播前后的結果為例：

# 清空所有參數的梯度緩存 net.zero_grad() print('conv1.bias.grad before backward') print(net.conv1.bias.grad)loss.backward()print('conv1.bias.grad after backward') print(net.conv1.bias.grad)

輸出結果：

conv1.bias.grad before backward tensor([0., 0., 0., 0., 0., 0.])conv1.bias.grad after backward tensor([ 0.0069, 0.0021, 0.0090, -0.0060, -0.0008, -0.0073])

了解更多有關 torch.nn 庫，可以查看官方文檔：

https://pytorch.org/docs/stable/nn.html

3.4 更新權重

采用隨機梯度下降(Stochastic Gradient Descent, SGD)方法的最簡單的更新權重規則如下：

weight = weight - learning_rate * gradient

按照這個規則，代碼實現如下所示：

# 簡單實現權重的更新例子 learning_rate = 0.01 for f in net.parameters():f.data.sub_(f.grad.data * learning_rate)

但是這只是最簡單的規則，深度學習有很多的優化算法，不僅僅是 SGD，還有 Nesterov-SGD, Adam, RMSProp 等等，為了采用這些不同的方法，這里采用 torch.optim 庫，使用例子如下所示：

import torch.optim as optim # 創建優化器 optimizer = optim.SGD(net.parameters(), lr=0.01)# 在訓練過程中執行下列操作 optimizer.zero_grad() # 清空梯度緩存 output = net(input) loss = criterion(output, target) loss.backward() # 更新權重 optimizer.step()

注意，同樣需要調用 optimizer.zero_grad() 方法清空梯度緩存。

本小節教程：

https://pytorch.org/tutorials/beginner/blitz/neural_networks_tutorial.html

本小節的代碼：

https://github.com/ccc013/DeepLearning_Notes/blob/master/Pytorch/practise/neural_network.ipynb

4. 訓練分類器

上一節介紹了如何構建神經網絡、計算 loss 和更新網絡的權值參數，接下來需要做的就是實現一個圖片分類器。

4.1 訓練數據

在訓練分類器前，當然需要考慮數據的問題。通常在處理如圖片、文本、語音或者視頻數據的時候，一般都采用標準的 Python 庫將其加載并轉成 Numpy 數組，然后再轉回為 PyTorch 的張量。

對于圖像，可以采用 Pillow, OpenCV 庫；
對于語音，有 scipy 和 librosa;
對于文本，可以選擇原生 Python 或者 Cython 進行加載數據，或者使用 NLTK 和 SpaCy 。

PyTorch 對于計算機視覺，特別創建了一個 torchvision 的庫，它包含一個數據加載器(data loader)，可以加載比較常見的數據集，比如 Imagenet, CIFAR10, MNIST 等等，然后還有一個用于圖像的數據轉換器(data transformers)，調用的庫是 torchvision.datasets 和 torch.utils.data.DataLoader 。

在本教程中，將采用 CIFAR10 數據集，它包含 10 個類別，分別是飛機、汽車、鳥、貓、鹿、狗、青蛙、馬、船和卡車。數據集中的圖片都是 3x32x32。一些例子如下所示：

4.2 訓練圖片分類器

訓練流程如下：

通過調用 torchvision 加載和歸一化 CIFAR10 訓練集和測試集；

構建一個卷積神經網絡；

定義一個損失函數；

在訓練集上訓練網絡；

在測試集上測試網絡性能。

4.2.1 加載和歸一化 CIFAR10

首先導入必須的包：

import torch import torchvision import torchvision.transforms as transforms

torchvision 的數據集輸出的圖片都是 PILImage ，即取值范圍是 [0, 1] ，這里需要做一個轉換，變成取值范圍是 [-1, 1] , 代碼如下所示：

# 將圖片數據從 [0,1] 歸一化為 [-1, 1] 的取值范圍 transform = transforms.Compose([transforms.ToTensor(),transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5))])trainset = torchvision.datasets.CIFAR10(root='./data', train=True,download=True, transform=transform) trainloader = torch.utils.data.DataLoader(trainset, batch_size=4,shuffle=True, num_workers=2)testset = torchvision.datasets.CIFAR10(root='./data', train=False,download=True, transform=transform) testloader = torch.utils.data.DataLoader(testset, batch_size=4,shuffle=False, num_workers=2)classes = ('plane', 'car', 'bird', 'cat','deer', 'dog', 'frog', 'horse', 'ship', 'truck')

這里下載好數據后，可以可視化部分訓練圖片，代碼如下：

import matplotlib.pyplot as plt import numpy as np# 展示圖片的函數 def imshow(img):img = img / 2 + 0.5 # 非歸一化npimg = img.numpy()plt.imshow(np.transpose(npimg, (1, 2, 0)))plt.show()# 隨機獲取訓練集圖片 dataiter = iter(trainloader) images, labels = dataiter.next()# 展示圖片 imshow(torchvision.utils.make_grid(images)) # 打印圖片類別標簽 print(' '.join('%5s' % classes[labels[j]] for j in range(4)))

展示圖片如下所示：

其類別標簽為：

frog plane dog ship

4.2.2 構建一個卷積神經網絡

這部分內容其實直接采用上一節定義的網絡即可，除了修改 conv1 的輸入通道，從 1 變為 3，因為這次接收的是 3 通道的彩色圖片。

import torch.nn as nn import torch.nn.functional as Fclass Net(nn.Module):def __init__(self):super(Net, self).__init__()self.conv1 = nn.Conv2d(3, 6, 5)self.pool = nn.MaxPool2d(2, 2)self.conv2 = nn.Conv2d(6, 16, 5)self.fc1 = nn.Linear(16 * 5 * 5, 120)self.fc2 = nn.Linear(120, 84)self.fc3 = nn.Linear(84, 10)def forward(self, x):x = self.pool(F.relu(self.conv1(x)))x = self.pool(F.relu(self.conv2(x)))x = x.view(-1, 16 * 5 * 5)x = F.relu(self.fc1(x))x = F.relu(self.fc2(x))x = self.fc3(x)return xnet = Net()

4.2.3 定義損失函數和優化器

這里采用類別交叉熵函數和帶有動量的 SGD 優化方法：

import torch.optim as optimcriterion = nn.CrossEntropyLoss() optimizer = optim.SGD(net.parameters(), lr=0.001, momentum=0.9)

4.2.4 訓練網絡

第四步自然就是開始訓練網絡，指定需要迭代的 epoch，然后輸入數據，指定次數打印當前網絡的信息，比如 loss 或者準確率等性能評價標準。

import time start = time.time() for epoch in range(2):running_loss = 0.0for i, data in enumerate(trainloader, 0):# 獲取輸入數據inputs, labels = data# 清空梯度緩存optimizer.zero_grad()outputs = net(inputs)loss = criterion(outputs, labels)loss.backward()optimizer.step()# 打印統計信息running_loss += loss.item()if i % 2000 == 1999:# 每 2000 次迭代打印一次信息print('[%d, %5d] loss: %.3f' % (epoch + 1, i+1, running_loss / 2000))running_loss = 0.0 print('Finished Training! Total cost time: ', time.time()-start)

這里定義訓練總共 2 個 epoch，訓練信息如下，大概耗時為 77s。

[1, 2000] loss: 2.226 [1, 4000] loss: 1.897 [1, 6000] loss: 1.725 [1, 8000] loss: 1.617 [1, 10000] loss: 1.524 [1, 12000] loss: 1.489 [2, 2000] loss: 1.407 [2, 4000] loss: 1.376 [2, 6000] loss: 1.354 [2, 8000] loss: 1.347 [2, 10000] loss: 1.324 [2, 12000] loss: 1.311Finished Training! Total cost time: 77.24696755409241

4.2.5 測試模型性能

訓練好一個網絡模型后，就需要用測試集進行測試，檢驗網絡模型的泛化能力。對于圖像分類任務來說，一般就是用準確率作為評價標準。

首先，我們先用一個 batch 的圖片進行小小測試，這里 batch=4 ，也就是 4 張圖片，代碼如下：

dataiter = iter(testloader) images, labels = dataiter.next()# 打印圖片 imshow(torchvision.utils.make_grid(images)) print('GroundTruth: ', ' '.join('%5s' % classes[labels[j]] for j in range(4)))

圖片和標簽分別如下所示：

GroundTruth: cat ship ship plane

然后用這四張圖片輸入網絡，看看網絡的預測結果：

# 網絡輸出 outputs = net(images)# 預測結果 _, predicted = torch.max(outputs, 1) print('Predicted: ', ' '.join('%5s' % classes[predicted[j]] for j in range(4)))

輸出為：

Predicted: cat ship ship ship

前面三張圖片都預測正確了，第四張圖片錯誤預測飛機為船。

接著，讓我們看看在整個測試集上的準確率可以達到多少吧！

correct = 0 total = 0 with torch.no_grad():for data in testloader:images, labels = dataoutputs = net(images)_, predicted = torch.max(outputs.data, 1)total += labels.size(0)correct += (predicted == labels).sum().item()print('Accuracy of the network on the 10000 test images: %d %%' % (100 * correct / total))

輸出結果如下

Accuracy of the network on the 10000 test images: 55 %

這里可能準確率并不一定一樣，教程中的結果是 51% ，因為權重初始化問題，可能多少有些浮動，相比隨機猜測 10 個類別的準確率(即 10%)，這個結果是不錯的，當然實際上是非常不好，不過我們僅僅采用 5 層網絡，而且僅僅作為教程的一個示例代碼。

然后，還可以再進一步，查看每個類別的分類準確率，跟上述代碼有所不同的是，計算準確率部分是 c = (predicted == labels).squeeze()，這段代碼其實會根據預測和真實標簽是否相等，輸出 1 或者 0，表示真或者假，因此在計算當前類別正確預測數量時候直接相加，預測正確自然就是加 1，錯誤就是加 0，也就是沒有變化。

class_correct = list(0. for i in range(10)) class_total = list(0. for i in range(10)) with torch.no_grad():for data in testloader:images, labels = dataoutputs = net(images)_, predicted = torch.max(outputs, 1)c = (predicted == labels).squeeze()for i in range(4):label = labels[i]class_correct[label] += c[i].item()class_total[label] += 1for i in range(10):print('Accuracy of %5s : %2d %%' % (classes[i], 100 * class_correct[i] / class_total[i]))

輸出結果，可以看到貓、鳥、鹿是錯誤率前三，即預測最不準確的三個類別，反倒是船和卡車最準確。

Accuracy of plane : 58 % Accuracy of car : 59 % Accuracy of bird : 40 % Accuracy of cat : 33 % Accuracy of deer : 39 % Accuracy of dog : 60 % Accuracy of frog : 54 % Accuracy of horse : 66 % Accuracy of ship : 70 % Accuracy of truck : 72 %

4.3 在 GPU 上訓練

深度學習自然需要 GPU 來加快訓練速度的。所以接下來介紹如果是在 GPU 上訓練，應該如何實現。

首先，需要檢查是否有可用的 GPU 來訓練，代碼如下：

device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu") print(device)

輸出結果如下，這表明你的第一塊 GPU 顯卡或者唯一的 GPU 顯卡是空閑可用狀態，否則會打印 cpu 。

cuda:0

既然有可用的 GPU ，接下來就是在 GPU 上進行訓練了，其中需要修改的代碼如下，分別是需要將網絡參數和數據都轉移到 GPU 上：

net.to(device) inputs, labels = inputs.to(device), labels.to(device)

修改后的訓練部分代碼：

import time # 在 GPU 上訓練注意需要將網絡和數據放到 GPU 上 net.to(device) criterion = nn.CrossEntropyLoss() optimizer = optim.SGD(net.parameters(), lr=0.001, momentum=0.9)start = time.time() for epoch in range(2):running_loss = 0.0for i, data in enumerate(trainloader, 0):# 獲取輸入數據inputs, labels = datainputs, labels = inputs.to(device), labels.to(device)# 清空梯度緩存optimizer.zero_grad()outputs = net(inputs)loss = criterion(outputs, labels)loss.backward()optimizer.step()# 打印統計信息running_loss += loss.item()if i % 2000 == 1999:# 每 2000 次迭代打印一次信息print('[%d, %5d] loss: %.3f' % (epoch + 1, i+1, running_loss / 2000))running_loss = 0.0 print('Finished Training! Total cost time: ', time.time() - start)

注意，這里調用 net.to(device) 后，需要定義下優化器，即傳入的是 CUDA 張量的網絡參數。訓練結果和之前的類似，而且其實因為這個網絡非常小，轉移到 GPU 上并不會有多大的速度提升，而且我的訓練結果看來反而變慢了，也可能是因為我的筆記本的 GPU 顯卡問題。

如果需要進一步提升速度，可以考慮采用多 GPUs，也就是下一節的內容。

本小節教程：

https://pytorch.org/tutorials/beginner/blitz/cifar10_tutorial.html

本小節的代碼：

https://github.com/ccc013/DeepLearning_Notes/blob/master/Pytorch/practise/train_classifier_example.ipynb

5. 數據并行

這部分教程將學習如何使用 DataParallel 來使用多個 GPUs 訓練網絡。

首先，在 GPU 上訓練模型的做法很簡單，如下代碼所示，定義一個 device 對象，然后用 .to() 方法將網絡模型參數放到指定的 GPU 上。

device = torch.device("cuda:0") model.to(device)

接著就是將所有的張量變量放到 GPU 上：

mytensor = my_tensor.to(device)

注意，這里 my_tensor.to(device) 是返回一個 my_tensor 的新的拷貝對象，而不是直接修改 my_tensor 變量，因此你需要將其賦值給一個新的張量，然后使用這個張量。

Pytorch 默認只會采用一個 GPU，因此需要使用多個 GPU，需要采用 DataParallel ，代碼如下所示：

model = nn.DataParallel(model)

這代碼也就是本節教程的關鍵，接下來會繼續詳細介紹。

5.1 導入和參數

首先導入必須的庫以及定義一些參數：

import torch import torch.nn as nn from torch.utils.data import Dataset, DataLoader# Parameters and DataLoaders input_size = 5 output_size = 2batch_size = 30 data_size = 100device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")

這里主要定義網絡輸入大小和輸出大小，batch 以及圖片的大小，并定義了一個 device 對象。

5.2 構建一個假數據集

接著就是構建一個假的(隨機)數據集。實現代碼如下：

class RandomDataset(Dataset):def __init__(self, size, length):self.len = lengthself.data = torch.randn(length, size)def __getitem__(self, index):return self.data[index]def __len__(self):return self.lenrand_loader = DataLoader(dataset=RandomDataset(input_size, data_size),batch_size=batch_size, shuffle=True)

5.3 簡單的模型

接下來構建一個簡單的網絡模型，僅僅包含一層全連接層的神經網絡，加入 print() 函數用于監控網絡輸入和輸出 tensors 的大小：

class Model(nn.Module):# Our modeldef __init__(self, input_size, output_size):super(Model, self).__init__()self.fc = nn.Linear(input_size, output_size)def forward(self, input):output = self.fc(input)print("\tIn Model: input size", input.size(),"output size", output.size())return output

5.4 創建模型和數據平行

這是本節的核心部分。首先需要定義一個模型實例，并且檢查是否擁有多個 GPUs，如果是就可以將模型包裹在 nn.DataParallel ，并調用 model.to(device) 。代碼如下：

model = Model(input_size, output_size) if torch.cuda.device_count() > 1:print("Let's use", torch.cuda.device_count(), "GPUs!")# dim = 0 [30, xxx] -> [10, ...], [10, ...], [10, ...] on 3 GPUsmodel = nn.DataParallel(model)model.to(device)

5.5 運行模型

接著就可以運行模型，看看打印的信息：

for data in rand_loader:input = data.to(device)output = model(input)print("Outside: input size", input.size(),"output_size", output.size())

輸出如下：

In Model: input size torch.Size([15, 5]) output size torch.Size([15, 2])In Model: input size torch.Size([15, 5]) output size torch.Size([15, 2]) Outside: input size torch.Size([30, 5]) output_size torch.Size([30, 2])In Model: input size torch.Size([15, 5]) output size torch.Size([15, 2])In Model: input size torch.Size([15, 5]) output size torch.Size([15, 2]) Outside: input size torch.Size([30, 5]) output_size torch.Size([30, 2])In Model: input size torch.Size([15, 5]) output size torch.Size([15, 2])In Model: input size torch.Size([15, 5]) output size torch.Size([15, 2]) Outside: input size torch.Size([30, 5]) output_size torch.Size([30, 2])In Model: input size torch.Size([5, 5]) output size torch.Size([5, 2])In Model: input size torch.Size([5, 5]) output size torch.Size([5, 2]) Outside: input size torch.Size([10, 5]) output_size torch.Size([10, 2])

5.6 運行結果

如果僅僅只有 1 個或者沒有 GPU ，那么 batch=30 的時候，模型會得到輸入輸出的大小都是 30。但如果有多個 GPUs，那么結果如下：

2 GPUs

# on 2 GPUs Let's use 2 GPUs!In Model: input size torch.Size([15, 5]) output size torch.Size([15, 2])In Model: input size torch.Size([15, 5]) output size torch.Size([15, 2]) Outside: input size torch.Size([30, 5]) output_size torch.Size([30, 2])In Model: input size torch.Size([15, 5]) output size torch.Size([15, 2])In Model: input size torch.Size([15, 5]) output size torch.Size([15, 2]) Outside: input size torch.Size([30, 5]) output_size torch.Size([30, 2])In Model: input size torch.Size([15, 5]) output size torch.Size([15, 2])In Model: input size torch.Size([15, 5]) output size torch.Size([15, 2]) Outside: input size torch.Size([30, 5]) output_size torch.Size([30, 2])In Model: input size torch.Size([5, 5]) output size torch.Size([5, 2])In Model: input size torch.Size([5, 5]) output size torch.Size([5, 2]) Outside: input size torch.Size([10, 5]) output_size torch.Size([10, 2])

3 GPUs

Let's use 3 GPUs!In Model: input size torch.Size([10, 5]) output size torch.Size([10, 2])In Model: input size torch.Size([10, 5]) output size torch.Size([10, 2])In Model: input size torch.Size([10, 5]) output size torch.Size([10, 2]) Outside: input size torch.Size([30, 5]) output_size torch.Size([30, 2])In Model: input size torch.Size([10, 5]) output size torch.Size([10, 2])In Model: input size torch.Size([10, 5]) output size torch.Size([10, 2])In Model: input size torch.Size([10, 5]) output size torch.Size([10, 2]) Outside: input size torch.Size([30, 5]) output_size torch.Size([30, 2])In Model: input size torch.Size([10, 5]) output size torch.Size([10, 2])In Model: input size torch.Size([10, 5]) output size torch.Size([10, 2])In Model: input size torch.Size([10, 5]) output size torch.Size([10, 2]) Outside: input size torch.Size([30, 5]) output_size torch.Size([30, 2])In Model: input size torch.Size([4, 5]) output size torch.Size([4, 2])In Model: input size torch.Size([4, 5]) output size torch.Size([4, 2])In Model: input size torch.Size([2, 5]) output size torch.Size([2, 2]) Outside: input size torch.Size([10, 5]) output_size torch.Size([10, 2])

8 GPUs

Let's use 8 GPUs!In Model: input size torch.Size([4, 5]) output size torch.Size([4, 2])In Model: input size torch.Size([4, 5]) output size torch.Size([4, 2])In Model: input size torch.Size([2, 5]) output size torch.Size([2, 2])In Model: input size torch.Size([4, 5]) output size torch.Size([4, 2])In Model: input size torch.Size([4, 5]) output size torch.Size([4, 2])In Model: input size torch.Size([4, 5]) output size torch.Size([4, 2])In Model: input size torch.Size([4, 5]) output size torch.Size([4, 2])In Model: input size torch.Size([4, 5]) output size torch.Size([4, 2]) Outside: input size torch.Size([30, 5]) output_size torch.Size([30, 2])In Model: input size torch.Size([4, 5]) output size torch.Size([4, 2])In Model: input size torch.Size([4, 5]) output size torch.Size([4, 2])In Model: input size torch.Size([4, 5]) output size torch.Size([4, 2])In Model: input size torch.Size([4, 5]) output size torch.Size([4, 2])In Model: input size torch.Size([4, 5]) output size torch.Size([4, 2])In Model: input size torch.Size([4, 5]) output size torch.Size([4, 2])In Model: input size torch.Size([2, 5]) output size torch.Size([2, 2])In Model: input size torch.Size([4, 5]) output size torch.Size([4, 2]) Outside: input size torch.Size([30, 5]) output_size torch.Size([30, 2])In Model: input size torch.Size([4, 5]) output size torch.Size([4, 2])In Model: input size torch.Size([4, 5]) output size torch.Size([4, 2])In Model: input size torch.Size([4, 5]) output size torch.Size([4, 2])In Model: input size torch.Size([4, 5]) output size torch.Size([4, 2])In Model: input size torch.Size([4, 5]) output size torch.Size([4, 2])In Model: input size torch.Size([4, 5]) output size torch.Size([4, 2])In Model: input size torch.Size([4, 5]) output size torch.Size([4, 2])In Model: input size torch.Size([2, 5]) output size torch.Size([2, 2]) Outside: input size torch.Size([30, 5]) output_size torch.Size([30, 2])In Model: input size torch.Size([2, 5]) output size torch.Size([2, 2])In Model: input size torch.Size([2, 5]) output size torch.Size([2, 2])In Model: input size torch.Size([2, 5]) output size torch.Size([2, 2])In Model: input size torch.Size([2, 5]) output size torch.Size([2, 2])In Model: input size torch.Size([2, 5]) output size torch.Size([2, 2]) Outside: input size torch.Size([10, 5]) output_size torch.Size([10, 2])

5.7 總結

DataParallel 會自動分割數據集并發送任務給多個 GPUs 上的多個模型。然后等待每個模型都完成各自的工作后，它又會收集并融合結果，然后返回。

更詳細的數據并行教程：

https://pytorch.org/tutorials/beginner/former_torchies/parallelism_tutorial.html

本小節教程：

https://pytorch.org/tutorials/beginner/blitz/data_parallel_tutorial.html

小結

教程從最基礎的張量開始介紹，然后介紹了非常重要的自動求梯度的 autograd ，接著介紹如何構建一個神經網絡，如何訓練圖像分類器，最后簡單介紹使用多 GPUs 加快訓練速度的方法。

快速入門教程就介紹完了，接下來你可以選擇：

訓練一個神經網絡來玩視頻游戲
在 imagenet 上訓練 ResNet
采用 GAN 訓練一個人臉生成器
采用循環 LSTM 網絡訓練一個詞語級別的語言模型
更多的例子
更多的教程
在 Forums 社區討論 PyTorch

歡迎關注我的微信公眾號–機器學習與計算機視覺，或者掃描下方的二維碼，大家一起交流，學習和進步！

往期精彩推薦

機器學習系列

初學者的機器學習入門實戰教程！
模型評估、過擬合欠擬合以及超參數調優方法
常用機器學習算法匯總比較(完）
常用機器學習算法匯總比較(上）
機器學習入門系列(2)–如何構建一個完整的機器學習項目(一)
特征工程之數據預處理（上）
來了解下計算機視覺的八大應用

Github項目 & 資源教程推薦

[Github 項目推薦] 一個更好閱讀和查找論文的網站
[資源分享] TensorFlow 官方中文版教程來了
必讀的AI和深度學習博客
[教程]一份簡單易懂的 TensorFlow 教程
[資源]推薦一些Python書籍和教程，入門和進階的都有！
[Github項目推薦] 機器學習& Python 知識點速查表
[Github項目推薦] 推薦三個助你更好利用Github的工具
Github上的各大高校資料以及國外公開課視頻
這些單詞你都念對了嗎？順便推薦三份程序員專屬英語教程！

總結

以上是生活随笔為你收集整理的60分钟快速入门 PyTorch的全部內容，希望文章能夠幫你解決所遇到的問題。

如果覺得生活随笔網站內容還不錯，歡迎將生活随笔推薦給好友。

上一篇：如何更优雅的写出你的SQL语句
下一篇：编写高效的PyTorch代码技巧（上）