當前位置：首頁 > 人文社科 > 生活经验 >内容正文

生活经验

优达学城《DeepLearning》2-1：卷积神经网络

發布時間：2023/11/27 生活经验 34 豆豆

生活随笔收集整理的這篇文章主要介紹了优达学城《DeepLearning》2-1：卷积神经网络小編覺得挺不錯的,現在分享給大家,幫大家做個參考.

本次由3部分組成：

可視化卷積神經網絡。
設計和訓練一個CNN來對MNIST手寫數字分類。
設計并訓練一個CNN來對CIFAR10數據集中的圖像進行分類。

本次遇到的深度學習核心概念：

SGD優化器：GD就是梯度下降（Gradient Descent），SGD就是隨機梯度下降。SGD相對于GD優勢在于：①不用計算全部圖片輸入網絡的梯度，而用小批量圖來更新一次網絡，極大提升訓練速度。②“歪歪扭扭”地走，天生容易跳出局部最優點，最終訓練的精度往往比GD高的多。
?Sobel 算子：是一個離散微分算子，結合了高斯平滑和微分求導，主要用來計算圖像中某一點在橫向/縱向上的近似梯度，如果梯度值大于某一個閾值，則認為該點為邊緣點（像素值發生顯著變化的地方）。
1. 圖像近似梯度計算如下：
3. 所以，sobel x和sobel y參數一般如下：
交叉熵損失：
1. 二分類的交叉熵損失公式：（y為標簽，y^為預測為正樣本的概率）
2. 訓練過程中代價函數是對m個樣本的損失函數求和然后除以m：
3. 多分類交叉熵損失：
  1. K是種類數量
  2. y是標簽，也就是如果類別是 i，則 yi?=1，否則等于0
  3. p是神經網絡的輸出，也就是指類別是 i 的概率。這個輸出值就是用 softmax 計算得來的。

1 可視化卷積神經網絡

1.1 自定義濾波器

1.2 可視化卷積層

1.3 可視化池化層

1.3.1 Import the image

1.3.2 Define and visualize the filters

1.3.3 Define convolutional and pooling layers

1.3.4 Visualize the output of each filter

1.3.5 Visualize the output of the pooling layer

2 設計和訓練一個CNN對MNIST手寫數字分類

2.1 加載并可視化數據

2.1.1 可視化訓練集中一個batch圖像集

2.1.2 觀察單個圖像更詳細的信息

2.2 定義網絡結構

2.3 指定損失函數和優化器

2.4 訓練網絡

2.5 測試訓練好的網絡

2.6 可視化test集預測結果

3 設計并訓練一個CNN來對CIFAR10數據集中的圖像進行分類

3.1 CUDA測試

3.2 加載數據

3.3 可視化一批訓練數據

3.4 更詳細地查看圖像

3.5 定義網絡結構

3.6 指定損失函數和優化器

3.7 訓練網絡

3.8 加載模型

3.9 測試訓練好的模型

3.10 問題：你的模型有哪些缺點，如何改進？

3.11 可視化test集預測結果

1 可視化卷積神經網絡

1.1 自定義濾波器

導入資源并顯示圖像：

import matplotlib.pyplot as plt
import matplotlib.image as mpimgimport cv2
import numpy as np%matplotlib inline# Read in the image
image = mpimg.imread('data/curved_lane.jpg')plt.imshow(image)

將圖像轉換為灰度圖：

# Convert to grayscale for filtering
gray = cv2.cvtColor(image, cv2.COLOR_RGB2GRAY)plt.imshow(gray, cmap='gray')

TODO:創建自定義內核

下面，我們為您提供了一種常見的邊緣檢測過濾器：Sobel操作符。

Sobel濾波器常用于邊緣檢測和圖像強度模式的提取。對圖像應用Sobel濾波器是一種分別獲取圖像在x或y方向上的導數（近似值）的方法。運算符如下所示。

由您創建一個sobel x操作符并將其應用于給定的圖像。

作為一個挑戰，看看你是否可以對圖像完成如下一系列濾波操作：模糊圖像（采取平均像素），然后一個檢測邊緣。

# Create a custom kernel# 3x3 array for edge detection
sobel_y = np.array([[ -1, -2, -1], [ 0, 0, 0], [ 1, 2, 1]])## TODO: Create and apply a Sobel x operator
sobel_x = np.array([[ -1, 0, 1], [ -2, 0, 2], [ -1, 0, 1]])# Filter the image using filter2D, which has inputs: (grayscale image, bit-depth, kernel)  
filtered_image_x = cv2.filter2D(gray, -1, sobel_x)
filtered_image_y = cv2.filter2D(gray, -1, sobel_y)plt.figure(figsize=(14,14))#設置圖像尺寸(畫面大小其實是 1400 * 1400)#要生成兩行兩列，這是第一個圖plt.subplot('行','列','編號')
plt.subplot(1,2,1) 
plt.title('sobel x')
plt.imshow(filtered_image_x, cmap='gray')plt.subplot(1,2,2) 
plt.title('sobel y')
plt.imshow(filtered_image_y, cmap='gray')plt.show()

結果：

測試其他過濾器！

我們鼓勵您創建其他類型的過濾器并應用它們來查看發生了什么！作為可選練習，請嘗試以下操作：

創建具有小數值參數的過濾器。
創建5x5過濾器
將過濾器應用于images目錄中的其他圖像。


image = mpimg.imread('data/bridge_trees_example.jpg')
gray = cv2.cvtColor(image, cv2.COLOR_RGB2GRAY)sobel_y = np.array([[ -1, -2, -1], [ 0, 0, 0], [ 1, 2, 1]])sobel_y_2 = np.array([[ -1.5, -2.5, -1.5], [ 0, 0, 0], [ 1.5, 2.5, 1.5]])sobel_x = np.array([[ -1, 0, 1], [ -2, 0, 2], [ -1, 0, 1]])sobel_x_5x5 = np.array([[ -1, 0, 0, 0, 1], [ -1, 0, 0, 0, 1],[ -2, 0, 0, 0, 2], [ -1, 0, 0, 0, 1],[ -1, 0, 0, 0, 1]])# Filter the image using filter2D, which has inputs: (grayscale image, bit-depth, kernel)  
filtered_image_y = cv2.filter2D(gray, -1, sobel_y)
filtered_image_y_2 = cv2.filter2D(gray, -1, sobel_y_2)
filtered_image_x = cv2.filter2D(gray, -1, sobel_x)
filtered_image_x_5x5 = cv2.filter2D(gray, -1, sobel_x_5x5)plt.figure(figsize=(14, 14))#設置圖像尺寸(畫面大小其實是 1200 * 1200)plt.subplot(3,2,1) 
plt.title('image')
plt.imshow(image)plt.subplot(3,2,2) 
plt.title('gray')
plt.imshow(gray, cmap='gray')plt.subplot(3,2,3) 
plt.title('sobel y')
plt.imshow(filtered_image_y, cmap='gray')plt.subplot(3,2,4) 
plt.title('sobel y decimal')
plt.imshow(filtered_image_y_2, cmap='gray')plt.subplot(3,2,5) 
plt.title('sobel x')
plt.imshow(filtered_image_x, cmap='gray')plt.subplot(3,2,6) 
plt.title('sobel x 5*5')
plt.imshow(filtered_image_x_5x5, cmap='gray')plt.show()

結果：

1.2 可視化卷積層

在本筆記本中，我們將卷積層的四個過濾輸出（又稱激活圖）可視化。

在這個例子中，我們定義了四個濾波器，通過初始化卷積層的權值來應用于輸入圖像，經過訓練的CNN將學習這些權值的值。

導入圖像：

import cv2
import matplotlib.pyplot as plt
%matplotlib inline# TODO: Feel free to try out your own images here by changing img_path
# to a file path to another image on your computer!
img_path = 'data/udacity_sdc.png'# load color image 
bgr_img = cv2.imread(img_path)
# convert to grayscale
gray_img = cv2.cvtColor(bgr_img, cv2.COLOR_BGR2GRAY)# normalize, rescale entries to lie in [0,1]
gray_img = gray_img.astype("float32")/255# plot image
plt.imshow(gray_img, cmap='gray')
plt.show()

定義并可視化過濾器：

# visualize all four filters
fig = plt.figure(figsize=(10, 5))
for i in range(4):ax = fig.add_subplot(1, 4, i+1, xticks=[], yticks=[])ax.imshow(filters[i], cmap='gray')ax.set_title('Filter %s' % str(i+1))width, height = filters[i].shapefor x in range(width):for y in range(height):ax.annotate(str(filters[i][x][y]), xy=(y,x),horizontalalignment='center',verticalalignment='center',color='white' if filters[i][x][y]<0 else 'black')

定義卷積層

初始化單個卷積層，使其包含所有創建的過濾器。請注意，您沒有訓練此網絡；您正在卷積層中初始化權重，以便可以直觀地看到前向傳播此網絡后發生的情況！

下面，我定義了一個名為Net類的結構，它有一個卷積層，可以包含四個4x4灰度過濾器。

import torch
import torch.nn as nn
import torch.nn.functional as F# define a neural network with a single convolutional layer with four filters
class Net(nn.Module):def __init__(self, weight):super(Net, self).__init__()# initializes the weights of the convolutional layer to be the weights of the 4 defined filtersk_height, k_width = weight.shape[2:]# assumes there are 4 grayscale filtersself.conv = nn.Conv2d(1, 4, kernel_size=(k_height, k_width), bias=False)self.conv.weight = torch.nn.Parameter(weight)def forward(self, x):# calculates the output of a convolutional layer# pre- and post-activationconv_x = self.conv(x)activated_x = F.relu(conv_x)# returns both layersreturn conv_x, activated_x# instantiate the model and set the weights
weight = torch.from_numpy(filters).unsqueeze(1).type(torch.FloatTensor)
model = Net(weight)# print out the layer in the network
print(model)

可視化每個過濾器的輸出

首先，我們將定義一個helper函數，即接受特定層和過濾器數量（可選參數）的?viz_layer，并在圖像通過后顯示該層的輸出。

# helper function for visualizing the output of a given layer
# default number of filters is 4
def viz_layer(layer, n_filters= 4):fig = plt.figure(figsize=(20, 20))for i in range(n_filters):ax = fig.add_subplot(1, n_filters, i+1, xticks=[], yticks=[])# grab layer outputsax.imshow(np.squeeze(layer[0,i].data.numpy()), cmap='gray')ax.set_title('Output %s' % str(i+1))

在應用ReLu激活函數之前和之后，讓我們看看卷積層的輸出。

# plot original image
plt.imshow(gray_img, cmap='gray')# visualize all filters
fig = plt.figure(figsize=(12, 6))
fig.subplots_adjust(left=0, right=1.5, bottom=0.8, top=1, hspace=0.05, wspace=0.05)
for i in range(4):ax = fig.add_subplot(1, 4, i+1, xticks=[], yticks=[])ax.imshow(filters[i], cmap='gray')ax.set_title('Filter %s' % str(i+1))# convert the image into an input Tensor
gray_img_tensor = torch.from_numpy(gray_img).unsqueeze(0).unsqueeze(1)# get the convolutional layer (pre and post activation)
conv_layer, activated_layer = model(gray_img_tensor)# visualize the output of a conv layer
viz_layer(conv_layer)

結果：

ReLu 激活函數

在這個模型中，我們使用了一個激活函數來縮放卷積層的輸出。我們選擇了一個ReLu函數來實現這一點，這個函數只是將所有負像素值轉換為0（黑色）。關于輸入像素值x，請參見下圖中的公式。

# after a ReLu is applied
# visualize the output of an activated conv layer
viz_layer(activated_layer)

結果：

1.3 可視化池化層

在這個筆記本中，我們添加并可視化了CNN中maxpooling層的輸出。

卷積層+激活函數、池化層和線性層（用于創建所需的輸出大小）構成CNN的基本層。

1.3.1 Import the image

1.3.2 Define and visualize the filters

1.3.3 Define convolutional and pooling layers

在下一個單元中，我們初始化一個卷積層，以便它包含所有創建的過濾器。然后添加一個maxpooling層，內核大小為（2x2），這樣您就可以看到在這一步之后圖像分辨率已經降低了！

maxpooling層減少了輸入的大小，并且只保留最活躍的像素值。下面是一個2x2池內核的示例，步長為2，應用于一小塊灰度像素值；將面片的大小減少2倍。只有2x2中的最大像素值保留在新的合并輸出中。

1.3.4 Visualize the output of each filter

首先，我們將定義一個helper函數，即接受特定層和過濾器數量（可選參數）的viz_layer，并在圖像通過后顯示該層的輸出。

# helper function for visualizing the output of a given layer
# default number of filters is 4
def viz_layer(layer, n_filters= 4):fig = plt.figure(figsize=(20, 20))for i in range(n_filters):ax = fig.add_subplot(1, n_filters, i+1)# grab layer outputsax.imshow(np.squeeze(layer[0,i].data.numpy()), cmap='gray')ax.set_title('Output %s' % str(i+1))

讓我們看看應用ReLu激活函數后卷積層的輸出：

# plot original image
plt.imshow(gray_img, cmap='gray')# visualize all filters
fig = plt.figure(figsize=(12, 6))
fig.subplots_adjust(left=0, right=1.5, bottom=0.8, top=1, hspace=0.05, wspace=0.05)
for i in range(4):ax = fig.add_subplot(1, 4, i+1, xticks=[], yticks=[])ax.imshow(filters[i], cmap='gray')ax.set_title('Filter %s' % str(i+1))# convert the image into an input Tensor
gray_img_tensor = torch.from_numpy(gray_img).unsqueeze(0).unsqueeze(1)# get all the layers 
conv_layer, activated_layer, pooled_layer = model(gray_img_tensor)# visualize the output of the activated conv layer
viz_layer(activated_layer)

結果：

1.3.5 Visualize the output of the pooling layer

然后，看看池層的輸出。池化層將上圖中的特征映射作為輸入，通過某種池化因子，通過在給定的內核區域中構造一個只有最大值（最亮值）的新的、更小的圖像來降低這些映射的維數。

仔細觀察x、y軸上的值，以查看圖像大小的變化。

2 設計和訓練一個CNN對MNIST手寫數字分類

在本筆記本中，我們將訓練一個MLP（Multi-Layer Perceptron 多層感知器）來對MNIST數據庫手寫數字數據庫中的圖像進行分類。

該過程將分為以下步驟：

加載并可視化數據
定義神經網絡
訓練模型
在測試數據集上評估我們訓練模型的性能！

在開始之前，我們必須導入處理數據和PyTorch所需的庫。

# import libraries
import torch
import numpy as np

2.1 加載并可視化數據

下載可能需要一些時間，您應該可以在加載數據時看到您的進度。如果要一次加載更多數據，也可以選擇更改批處理大小。

這個單元格將為每個數據集創建數據加載器。

# The MNIST datasets are hosted on yann.lecun.com that has moved under CloudFlare protection
# Run this script to enable the datasets download
# Reference: https://github.com/pytorch/vision/issues/1938from six.moves import urllib
opener = urllib.request.build_opener()
opener.addheaders = [('User-agent', 'Mozilla/5.0')]
urllib.request.install_opener(opener)

from torchvision import datasets
import torchvision.transforms as transforms# number of subprocesses to use for data loading
num_workers = 0
# how many samples per batch to load
batch_size = 20# convert data to torch.FloatTensor
transform = transforms.ToTensor()# choose the training and test datasets
train_data = datasets.MNIST(root='data', train=True,download=True, transform=transform)
test_data = datasets.MNIST(root='data', train=False,download=True, transform=transform)# prepare data loaders
train_loader = torch.utils.data.DataLoader(train_data, batch_size=batch_size,num_workers=num_workers)
test_loader = torch.utils.data.DataLoader(test_data, batch_size=batch_size, num_workers=num_workers)

2.1.1 可視化訓練集中一個batch圖像集

分類任務的第一步是查看數據，確保數據正確加載，然后對數據中的模式進行任何初始觀察。

2.1.2 觀察單個圖像更詳細的信息

2.2 定義網絡結構

該網絡結構將784維度張量作為輸入，并輸出長度為10（我們的類別數）的張量，該張量指示輸入圖像的類分數。這個特殊的例子使用了2個隱藏層和dropout來避免過度擬合。

import torch.nn as nn
import torch.nn.functional as F## TODO: Define the NN architecture
class Net(nn.Module):def __init__(self):super(Net, self).__init__()# linear layer (784 -> 1 hidden node)self.fc1 = nn.Linear(28 * 28, 256)self.fc2 = nn.Linear(256, 64)self.fc3 = nn.Linear(64, 10)self.dropout = nn.Dropout(0.2)def forward(self, x):# flatten image inputx = x.view(-1, 28 * 28)# add hidden layer, with relu activation functionx = F.relu(self.fc1(x))x = self.dropout(x)x = F.relu(self.fc2(x))x = self.dropout(x)x = F.log_softmax(self.fc3(x), dim=1)  return x# initialize the NN
model = Net()
print(model)

2.3 指定損失函數和優化器

建議使用交叉熵損失進行分類。如果您查看文檔，您可以看到PyTorch的交叉熵函數將softmax函數應用于輸出層，然后計算日志損失。

## TODO: Specify loss and optimization functions
from torch import nn, optim
# specify loss function
criterion = nn.CrossEntropyLoss()# specify optimizer
optimizer = optim.SGD(model.parameters(), lr=0.01)

2.4 訓練網絡

從一批數據中訓練/學習的步驟在下面的注釋中描述：

1.清除所有優化變量的梯度
2.前向傳播：通過將輸入傳遞到模型來計算預測輸出
3.計算損失
4.反向傳播：計算相對于模型參數的損失梯度
5.執行單個優化步驟（參數更新）
6.更新平均訓練損失

以下是30個epoch的循環訓練；請隨意更改此值。目前，我們建議在20-50個epoch之間。在訓練時，看看訓練損失的值是如何隨著時間的推移而減少的。我們希望它減少，同時也避免過擬合訓練數據。

# number of epochs to train the model
n_epochs = 30  # suggest training between 20-50 epochsmodel.train() # prep model for trainingfor epoch in range(n_epochs):# monitor training losstrain_loss = 0.0#################### train the model ####################for data, target in train_loader:# clear the gradients of all optimized variablesoptimizer.zero_grad()# forward pass: compute predicted outputs by passing inputs to the modeloutput = model(data)# calculate the lossloss = criterion(output, target)# backward pass: compute gradient of the loss with respect to model parametersloss.backward()# perform a single optimization step (parameter update)optimizer.step()# update running training losstrain_loss += loss.item()*data.size(0)# print training statistics # calculate average loss over an epochtrain_loss = train_loss/len(train_loader.dataset)print('Epoch: {} \tTraining Loss: {:.6f}'.format(epoch+1, train_loss))

訓練結果：

Epoch: 1 ?? ?Training Loss: 0.950629
Epoch: 2 ?? ?Training Loss: 0.378016
Epoch: 3 ?? ?Training Loss: 0.292131
Epoch: 4 ?? ?Training Loss: 0.237494
Epoch: 5 ?? ?Training Loss: 0.203416
Epoch: 6 ?? ?Training Loss: 0.178869
Epoch: 7 ?? ?Training Loss: 0.157555
Epoch: 8 ?? ?Training Loss: 0.143985
Epoch: 9 ?? ?Training Loss: 0.132015
Epoch: 10 ?? ?Training Loss: 0.122434
Epoch: 11 ?? ?Training Loss: 0.113976
Epoch: 12 ?? ?Training Loss: 0.105239
Epoch: 13 ?? ?Training Loss: 0.098839
Epoch: 14 ?? ?Training Loss: 0.093791
Epoch: 15 ?? ?Training Loss: 0.088727
Epoch: 16 ?? ?Training Loss: 0.081909
Epoch: 17 ?? ?Training Loss: 0.079282
Epoch: 18 ?? ?Training Loss: 0.074924
Epoch: 19 ?? ?Training Loss: 0.071149
Epoch: 20 ?? ?Training Loss: 0.068345
Epoch: 21 ?? ?Training Loss: 0.065399
Epoch: 22 ?? ?Training Loss: 0.062431
Epoch: 23 ?? ?Training Loss: 0.060230
Epoch: 24 ?? ?Training Loss: 0.056332
Epoch: 25 ?? ?Training Loss: 0.055859
Epoch: 26 ?? ?Training Loss: 0.053873
Epoch: 27 ?? ?Training Loss: 0.050490
Epoch: 28 ?? ?Training Loss: 0.049184
Epoch: 29 ?? ?Training Loss: 0.046799
Epoch: 30 ?? ?Training Loss: 0.047051

2.5 測試訓練好的網絡

最后，我們在以前看不到的測試數據上測試了我們的最佳模型，并評估了它的性能。在看不見的數據上進行測試是檢驗我們的模型是否具有良好的泛化能力的一個好方法。在這個分析中，細化模型，看看這個模型在每個類上的表現，以及它的總體損失和準確性，也可能是有用的。

model.eval()?將模型中的所有層設置為評估模式。這會影響像dropout這樣的層，這些層在訓練期間以一定的概率關閉節點，但是評估時dropout的功能會被關閉。

# initialize lists to monitor test loss and accuracy
test_loss = 0.0
class_correct = list(0. for i in range(10))
class_total = list(0. for i in range(10))model.eval() # prep model for *evaluation*for data, target in test_loader:# forward pass: compute predicted outputs by passing inputs to the modeloutput = model(data)# calculate the lossloss = criterion(output, target)# update test loss test_loss += loss.item()*data.size(0)# convert output probabilities to predicted class_, pred = torch.max(output, 1)# compare predictions to true labelcorrect = np.squeeze(pred.eq(target.data.view_as(pred)))# calculate test accuracy for each object classfor i in range(batch_size):label = target.data[i]class_correct[label] += correct[i].item()class_total[label] += 1# calculate and print avg test loss
test_loss = test_loss/len(test_loader.dataset)
print('Test Loss: {:.6f}\n'.format(test_loss))for i in range(10):if class_total[i] > 0:print('Test Accuracy of %5s: %2d%% (%2d/%2d)' % (str(i), 100 * class_correct[i] / class_total[i],class_correct[i], class_total[i]))else:print('Test Accuracy of %5s: N/A (no training examples)' % (classes[i]))print('\nTest Accuracy (Overall): %2d%% (%2d/%2d)' % (100. * np.sum(class_correct) / np.sum(class_total),np.sum(class_correct), np.sum(class_total)))

2.6 可視化test集預測結果

此單元格按以下格式顯示測試圖像及其標簽：predicted (ground-truth)。文本將是綠色的準確分類的例子和紅色的錯誤預測。

# obtain one batch of test images
dataiter = iter(test_loader)
images, labels = dataiter.next()# get sample outputs
output = model(images)
# convert output probabilities to predicted class
_, preds = torch.max(output, 1)
# prep images for display
images = images.numpy()# plot the images in the batch, along with predicted and true labels
fig = plt.figure(figsize=(25, 4))
for idx in np.arange(20):ax = fig.add_subplot(2, 20/2, idx+1, xticks=[], yticks=[])ax.imshow(np.squeeze(images[idx]), cmap='gray')ax.set_title("{} ({})".format(str(preds[idx].item()), str(labels[idx].item())),color=("green" if preds[idx]==labels[idx] else "red"))

3 設計并訓練一個CNN來對CIFAR10數據集中的圖像進行分類

在本筆記本中，我們訓練CNN對CIFAR-10數據庫中的圖像進行分類。

該數據庫中的圖像是小彩色圖像，分為10個類；下面是一些示例圖片。

3.1 CUDA測試

由于這些是更大（32x32x3）的圖像，因此使用GPU加速訓練可能會很有用。CUDA是一個并行計算平臺，CUDA張量與典型張量相同，只是利用GPU進行計算。

3.2 加載數據

下載可能需要一分鐘。我們加載訓練和測試數據，將訓練數據拆分為訓練和驗證集，然后為每個數據集創建數據加載器。

from torchvision import datasets
import torchvision.transforms as transforms
from torch.utils.data.sampler import SubsetRandomSampler# number of subprocesses to use for data loading
num_workers = 0
# how many samples per batch to load
batch_size = 20
# percentage of training set to use as validation
valid_size = 0.2# convert data to a normalized torch.FloatTensor
transform = transforms.Compose([transforms.ToTensor(),transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5))])# choose the training and test datasets
train_data = datasets.CIFAR10('data', train=True,download=True, transform=transform)
test_data = datasets.CIFAR10('data', train=False,download=True, transform=transform)# obtain training indices that will be used for validation
num_train = len(train_data)
indices = list(range(num_train))
np.random.shuffle(indices)
split = int(np.floor(valid_size * num_train))
train_idx, valid_idx = indices[split:], indices[:split]# define samplers for obtaining training and validation batches
train_sampler = SubsetRandomSampler(train_idx)
valid_sampler = SubsetRandomSampler(valid_idx)# prepare data loaders (combine dataset and sampler)
train_loader = torch.utils.data.DataLoader(train_data, batch_size=batch_size,sampler=train_sampler, num_workers=num_workers)
valid_loader = torch.utils.data.DataLoader(train_data, batch_size=batch_size, sampler=valid_sampler, num_workers=num_workers)
test_loader = torch.utils.data.DataLoader(test_data, batch_size=batch_size, num_workers=num_workers)# specify the image classes
classes = ['airplane', 'automobile', 'bird', 'cat', 'deer','dog', 'frog', 'horse', 'ship', 'truck']

3.3 可視化一批訓練數據

3.4 更詳細地查看圖像

在這里，我們將標準化后的紅色、綠色和藍色（RGB）顏色通道視為三個獨立的灰度強度圖像。

rgb_img = np.squeeze(images[6]) #上圖第6序號的紅色鳥
channels = ['red channel', 'green channel', 'blue channel']fig = plt.figure(figsize = (36, 36)) 
for idx in np.arange(rgb_img.shape[0]):ax = fig.add_subplot(1, 3, idx + 1)img = rgb_img[idx]ax.imshow(img, cmap='gray')ax.set_title(channels[idx])width, height = img.shapethresh = img.max()/2.5for x in range(width):for y in range(height):val = round(img[x][y],2) if img[x][y] !=0 else 0ax.annotate(str(val), xy=(y,x),horizontalalignment='center',verticalalignment='center', size=8,color='white' if img[x][y]<thresh else 'black')

結果如下（圖像可以放大查看）：

3.5 定義網絡結構

這一次，您將定義一個CNN架構：

卷積層，可以看作是過濾圖像的濾波器堆疊。
Maxpooling層，它減少輸入的x-y大小，只保留前一層中最活躍的像素。
通常的線性+dropout層，以避免過度擬合，并產生一個10維度的輸出。

下面的圖片和代碼中顯示了一個具有兩個卷積層的網絡，您已經獲得了具有一個卷積層和一個maxpooling層的起始代碼。

TODO:定義具有多個卷積層的模型，并定義前饋網絡行為。

包含的卷積層越多，模型可以檢測到的顏色和形狀的模式就越復雜。建議您的最終模型包括2或3個卷積層以及線性層+dropout，以避免過擬合。

將相關模型的現有研究和實現作為定義您自己的模型的起點是一種很好的做法。您可能會發現查看這個PyTorch分類示例或這個更復雜的Keras示例有助于確定最終結構。

https://github.com/pytorch/tutorials/blob/master/beginner_source/blitz/cifar10_tutorial.py

https://github.com/keras-team/keras/blob/master/examples/cifar10_cnn.py

卷積層的輸出大小：

為了計算給定卷積層的輸出大小，我們可以執行以下計算（摘自斯坦福的cs231n課程）：

我們可以計算輸出卷的空間大小，作為輸入卷大小（W）、內核大小（F）、應用它們的步長（S）和邊界上使用的零填充量（P）的函數。計算輸出的正確公式為：(W?F+2P)/S + 1。

例如，對于7x7輸入和3x3濾波器，步幅1和pad 0，我們將得到5x5輸出。如果用步幅2，我們可以得到3x3的輸出。

import torch.nn as nn
import torch.nn.functional as F# define the CNN architecture
class Net(nn.Module):def __init__(self):super(Net, self).__init__()# convolutional layerself.conv1 = nn.Conv2d(3, 16, 3, padding=1) # convolutional layerself.conv2 = nn.Conv2d(16, 32, 3, padding=1)# convolutional layerself.conv3 = nn.Conv2d(32, 64, 3, padding=1)# max pooling layerself.pool = nn.MaxPool2d(2, 2)# linear layer (64 * 4 * 4 -> 200)self.fc1 = nn.Linear(64 * 4 * 4, 200)# linear layer (200 -> 10)self.fc2 = nn.Linear(200, 10)# dropout layer (p=0.2)self.dropout = nn.Dropout(0.2)def forward(self, x):# add sequence of convolutional and max pooling layersx = self.pool( F.relu( self.conv1(x))) #輸出維度：16 * 16*16x = self.pool( F.relu( self.conv2(x))) #輸出維度：32 * 8*8x = self.pool( F.relu( self.conv3(x))) #輸出維度：64 * 4*4# flatten image inputx = x.view(-1, 64 * 4 * 4)# add dropout layerx = self.dropout(x)# add 1st hidden layer, with relu activation functionx = F.relu(self.fc1(x)) #輸出維度：200# add dropout layerx = self.dropout(x)x = self.fc2(x) #輸出維度：10return x# create a complete CNN
model = Net()
print(model)# move tensors to GPU if CUDA is available
if train_on_gpu:model.cuda()

3.6 指定損失函數和優化器

import torch.optim as optim# specify loss function
criterion = nn.CrossEntropyLoss()# specify optimizer
optimizer = optim.SGD(model.parameters(), lr=0.001, momentum=0.9)

3.7 訓練網絡

記住看看訓練集和驗證集損失是如何隨著時間的推移而減少的；如果驗證集損失增加，則表明可能過擬合。

# number of epochs to train the model
n_epochs = 8 # you may increase this number to train a final modelvalid_loss_min = np.Inf # track change in validation lossfor epoch in range(1, n_epochs+1):# keep track of training and validation losstrain_loss = 0.0valid_loss = 0.0#################### train the model ####################model.train()for data, target in train_loader:# move tensors to GPU if CUDA is availableif train_on_gpu:data, target = data.cuda(), target.cuda()# clear the gradients of all optimized variablesoptimizer.zero_grad()# forward pass: compute predicted outputs by passing inputs to the modeloutput = model(data)# calculate the batch lossloss = criterion(output, target)# backward pass: compute gradient of the loss with respect to model parametersloss.backward()# perform a single optimization step (parameter update)optimizer.step()# update training losstrain_loss += loss.item()*data.size(0)######################    # validate the model #######################model.eval()for data, target in valid_loader:# move tensors to GPU if CUDA is availableif train_on_gpu:data, target = data.cuda(), target.cuda()# forward pass: compute predicted outputs by passing inputs to the modeloutput = model(data)# calculate the batch lossloss = criterion(output, target)# update average validation loss valid_loss += loss.item()*data.size(0)# calculate average lossestrain_loss = train_loss/len(train_loader.dataset)valid_loss = valid_loss/len(valid_loader.dataset)# print training/validation statistics print('Epoch: {} \tTraining Loss: {:.6f} \tValidation Loss: {:.6f}'.format(epoch, train_loss, valid_loss))# save model if validation loss has decreasedif valid_loss <= valid_loss_min:print('Validation loss decreased ({:.6f} --> {:.6f}).  Saving model ...'.format(valid_loss_min,valid_loss))torch.save(model.state_dict(), 'model_cifar.pt')valid_loss_min = valid_loss

結果：

3.8 加載模型

model.load_state_dict(torch.load('model_cifar.pt'))

3.9 測試訓練好的模型

在以前看不到的數據上測試你的訓練模型！一個“好”的訓練結果大約有70%分類精度（或更多，盡你最大的努力！）。

# track test loss
test_loss = 0.0
class_correct = list(0. for i in range(10))
class_total = list(0. for i in range(10))model.eval()
# iterate over test data
for data, target in test_loader:# move tensors to GPU if CUDA is availableif train_on_gpu:data, target = data.cuda(), target.cuda()# forward pass: compute predicted outputs by passing inputs to the modeloutput = model(data)# calculate the batch lossloss = criterion(output, target)# update test loss test_loss += loss.item()*data.size(0)# convert output probabilities to predicted class_, pred = torch.max(output, 1)    # compare predictions to true labelcorrect_tensor = pred.eq(target.data.view_as(pred))correct = np.squeeze(correct_tensor.numpy()) if not train_on_gpu else np.squeeze(correct_tensor.cpu().numpy())# calculate test accuracy for each object classfor i in range(batch_size):label = target.data[i]class_correct[label] += correct[i].item()class_total[label] += 1# average test loss
test_loss = test_loss/len(test_loader.dataset)
print('Test Loss: {:.6f}\n'.format(test_loss))for i in range(10):if class_total[i] > 0:print('Test Accuracy of %5s: %2d%% (%2d/%2d)' % (classes[i], 100 * class_correct[i] / class_total[i],np.sum(class_correct[i]), np.sum(class_total[i])))else:print('Test Accuracy of %5s: N/A (no training examples)' % (classes[i]))print('\nTest Accuracy (Overall): %2d%% (%2d/%2d)' % (100. * np.sum(class_correct) / np.sum(class_total),np.sum(class_correct), np.sum(class_total)))

結果：

3.10 問題：你的模型有哪些缺點，如何改進？

答：

訓練結束時，loss還在快速下降，訓練的epoch數遠遠不夠。
不同類別的測試結果差異較大，類別比較復雜多變的類預測效果普遍較差（如狗、小汽車、鳥類），這些類相對其他類，類內距離較大，這要么表示模型訓練時間不夠還沒掌握復雜類的預測，要么模型結構的復雜度還較低導致無法表達復雜類情況。

3.11 可視化test集預測結果

# obtain one batch of test images
dataiter = iter(test_loader)
images, labels = dataiter.next()
images.numpy()# move model inputs to cuda, if GPU available
if train_on_gpu:images = images.cuda()# get sample outputs
output = model(images)
# convert output probabilities to predicted class
_, preds_tensor = torch.max(output, 1)
preds = np.squeeze(preds_tensor.numpy()) if not train_on_gpu else np.squeeze(preds_tensor.cpu().numpy())if train_on_gpu:images = images.cpu()# plot the images in the batch, along with predicted and true labels
fig = plt.figure(figsize=(25, 4))
for idx in np.arange(20):ax = fig.add_subplot(2, 20/2, idx+1, xticks=[], yticks=[])imshow(images[idx] if not train_on_gpu else images[idx].cpu())ax.set_title("{} ({})".format(classes[preds[idx]], classes[labels[idx]]),color=("green" if preds[idx]==labels[idx].item() else "red"))

結果：

總結

以上是生活随笔為你收集整理的优达学城《DeepLearning》2-1：卷积神经网络的全部內容，希望文章能夠幫你解決所遇到的問題。

如果覺得生活随笔網站內容還不錯，歡迎將生活随笔推薦給好友。

上一篇：优达学城《DeepLearning》项目
下一篇：优达学城《DeepLearning》2-