當前位置：首頁 > 人工智能 > 卷积神经网络 >内容正文

卷积神经网络

卷积神经网络Convolution Neural Network (CNN) 原理与实现

發布時間：2025/3/21 卷积神经网络 75 豆豆

生活随笔收集整理的這篇文章主要介紹了卷积神经网络Convolution Neural Network (CNN) 原理与实现小編覺得挺不錯的,現在分享給大家,幫大家做個參考.

本文結合Deep learning的一個應用，Convolution Neural Network 進行一些基本應用，參考Lecun的Document 0.1進行部分拓展，與結果展示（in python）。

分為以下幾部分：

1. Convolution（卷積）

2. Pooling（降采樣過程）

3. CNN結構

4. ?跑實驗

下面分別介紹。

PS：本篇blog為ese機器學習短期班參考資料（20140516課程），本文只是簡要講最naive最simple的思想，重在實踐部分，原理課上詳述。

1. Convolution（卷積）

類似于高斯卷積，對imagebatch中的所有image進行卷積。對于一張圖，其所有feature map用一個filter卷成一張feature map。如下面的代碼，對一個imagebatch（含兩張圖）進行操作，每個圖初始有3張feature map(R,G,B), 用兩個9*9的filter進行卷積，結果是，每張圖得到兩個feature map。

卷積操作由theano的conv.conv2d實現，這里我們用隨機參數W，b。結果有點像edge detector是不是？

Code: （詳見注釋）

[python]?view plain?copy ?

#?-*-?coding:?utf-8?-*-??

"""?

Created?on?Sat?May?10?18:55:26?2014?

@author:?rachel?

Function:?convolution?option?of?two?pictures?with?same?size?(width,height)?

input:?3?feature?maps?(3?channels?<RGB>?of?a?picture)?

convolution:?two?9*9?convolutional?filters?

"""??

from?theano.tensor.nnet?import?conv??

import?theano.tensor?as?T??

import?numpy,?theano??

rng?=?numpy.random.RandomState(23455)??

#?symbol?variable??

input?=?T.tensor4(name?=?'input')??

#?initial?weights??

w_shape?=?(2,3,9,9)?#2?convolutional?filters,?3?channels,?filter?shape:?9*9??

w_bound?=?numpy.sqrt(3*9*9)??

W?=?theano.shared(numpy.asarray(rng.uniform(low?=?-1.0/w_bound,?high?=?1.0/w_bound,size?=?w_shape),??

????????????????????????????????dtype?=?input.dtype),name?=?'W')??

b_shape?=?(2,)??

b?=?theano.shared(numpy.asarray(rng.uniform(low?=?-.5,?high?=?.5,?size?=?b_shape),??

????????????????????????????????dtype?=?input.dtype),name?=?'b')??

??????????????????????????????????

conv_out?=?conv.conv2d(input,W)??

#T.TensorVariable.dimshuffle()?can?reshape?or?broadcast?(add?dimension)??

#dimshuffle(self,*pattern)??

#?>>>b1?=?b.dimshuffle('x',0,'x','x')??

#?>>>b1.shape.eval()??

#?array([1,2,1,1])??

output?=?T.nnet.sigmoid(conv_out?+?b.dimshuffle('x',0,'x','x'))??

f?=?theano.function([input],output)??

#?demo??

import?pylab??

from?PIL?import?Image??

#minibatch_img?=?T.tensor4(name?=?'minibatch_img')??

#-------------img1---------------??

img1?=?Image.open(open('//home//rachel//Documents//ZJU_Projects//DL//Dataset//rachel.jpg'))??

width1,height1?=?img1.size??

img1?=?numpy.asarray(img1,?dtype?=?'float32')/256.?#?(height,?width,?3)??

#?put?image?in?4D?tensor?of?shape?(1,3,height,width)??

img1_rgb?=?img1.swapaxes(0,2).swapaxes(1,2).reshape(1,3,height1,width1)?#(3,height,width)??

#-------------img2---------------??

img2?=?Image.open(open('//home//rachel//Documents//ZJU_Projects//DL//Dataset//rachel1.jpg'))??

width2,height2?=?img2.size??

img2?=?numpy.asarray(img2,dtype?=?'float32')/256.??

img2_rgb?=?img2.swapaxes(0,2).swapaxes(1,2).reshape(1,3,height2,width2)?#(3,height,width)??

#minibatch_img?=?T.join(0,img1_rgb,img2_rgb)??

minibatch_img?=?numpy.concatenate((img1_rgb,img2_rgb),axis?=?0)??

filtered_img?=?f(minibatch_img)??

#?plot?original?image?and?two?convoluted?results??

pylab.subplot(2,3,1);pylab.axis('off');??

pylab.imshow(img1)??

pylab.subplot(2,3,4);pylab.axis('off');??

pylab.imshow(img2)??

pylab.gray()??

pylab.subplot(2,3,2);?pylab.axis("off")??

pylab.imshow(filtered_img[0,0,:,:])?#0:minibatch_index;?0:1-st?filter??

pylab.subplot(2,3,3);?pylab.axis("off")??

pylab.imshow(filtered_img[0,1,:,:])?#0:minibatch_index;?1:1-st?filter??

pylab.subplot(2,3,5);?pylab.axis("off")??

pylab.imshow(filtered_img[1,0,:,:])?#0:minibatch_index;?0:1-st?filter??

pylab.subplot(2,3,6);?pylab.axis("off")??

pylab.imshow(filtered_img[1,1,:,:])?#0:minibatch_index;?1:1-st?filter??

pylab.show()??

2. Pooling（降采樣過程）

最常用的Maxpooling. 解決了兩個問題：

1. 減少計算量

2. 旋轉不變性（原因自己悟）

???? PS：對于旋轉不變性，回憶下SIFT，LBP：采用主方向；HOG：選擇不同方向的模版

Maxpooling的降采樣過程會將feature map的長寬各減半。（下面結果圖中沒有體現出來，python自動給拉到一樣大了，但實際上像素數是減半的）

Code: （詳見注釋）

[python]?view plain?copy ?

#?-*-?coding:?utf-8?-*-??

"""?

Created?on?Sat?May?10?18:55:26?2014?

@author:?rachel?

Function:?convolution?option??

input:?3?feature?maps?(3?channels?<RGB>?of?a?picture)?

convolution:?two?9*9?convolutional?filters?

"""??

from?theano.tensor.nnet?import?conv??

import?theano.tensor?as?T??

import?numpy,?theano??

rng?=?numpy.random.RandomState(23455)??

#?symbol?variable??

input?=?T.tensor4(name?=?'input')??

#?initial?weights??

w_shape?=?(2,3,9,9)?#2?convolutional?filters,?3?channels,?filter?shape:?9*9??

w_bound?=?numpy.sqrt(3*9*9)??

W?=?theano.shared(numpy.asarray(rng.uniform(low?=?-1.0/w_bound,?high?=?1.0/w_bound,size?=?w_shape),??

????????????????????????????????dtype?=?input.dtype),name?=?'W')??

b_shape?=?(2,)??

b?=?theano.shared(numpy.asarray(rng.uniform(low?=?-.5,?high?=?.5,?size?=?b_shape),??

????????????????????????????????dtype?=?input.dtype),name?=?'b')??

??????????????????????????????????

conv_out?=?conv.conv2d(input,W)??

#T.TensorVariable.dimshuffle()?can?reshape?or?broadcast?(add?dimension)??

#dimshuffle(self,*pattern)??

#?>>>b1?=?b.dimshuffle('x',0,'x','x')??

#?>>>b1.shape.eval()??

#?array([1,2,1,1])??

output?=?T.nnet.sigmoid(conv_out?+?b.dimshuffle('x',0,'x','x'))??

f?=?theano.function([input],output)??

#?demo??

import?pylab??

from?PIL?import?Image??

from?matplotlib.pyplot?import?*??

#open?random?image??

img?=?Image.open(open('//home//rachel//Documents//ZJU_Projects//DL//Dataset//rachel.jpg'))??

width,height?=?img.size??

img?=?numpy.asarray(img,?dtype?=?'float32')/256.?#?(height,?width,?3)??

#?put?image?in?4D?tensor?of?shape?(1,3,height,width)??

img_rgb?=?img.swapaxes(0,2).swapaxes(1,2)?#(3,height,width)??

minibatch_img?=?img_rgb.reshape(1,3,height,width)??

filtered_img?=?f(minibatch_img)??

#?plot?original?image?and?two?convoluted?results??

pylab.figure(1)??

pylab.subplot(1,3,1);pylab.axis('off');??

pylab.imshow(img)??

title('origin?image')??

pylab.gray()??

pylab.subplot(2,3,2);?pylab.axis("off")??

pylab.imshow(filtered_img[0,0,:,:])?#0:minibatch_index;?0:1-st?filter??

title('convolution?1')??

pylab.subplot(2,3,3);?pylab.axis("off")??

pylab.imshow(filtered_img[0,1,:,:])?#0:minibatch_index;?1:1-st?filter??

title('convolution?2')??

#pylab.show()??

#?maxpooling??

from?theano.tensor.signal?import?downsample??

input?=?T.tensor4('input')??

maxpool_shape?=?(2,2)??

pooled_img?=?downsample.max_pool_2d(input,maxpool_shape,ignore_border?=?False)??

maxpool?=?theano.function(inputs?=?[input],??

??????????????????????????outputs?=?[pooled_img])??

pooled_res?=?numpy.squeeze(maxpool(filtered_img))????????????????

#pylab.figure(2)??

pylab.subplot(235);pylab.axis('off');??

pylab.imshow(pooled_res[0,:,:])??

title('down?sampled?1')??

pylab.subplot(236);pylab.axis('off');??

pylab.imshow(pooled_res[1,:,:])??

title('down?sampled?2')??

pylab.show()??

3. CNN結構

想必大家隨便google下CNN的圖都濫大街了，這里拖出來那時候學CNN的時候一張圖，自認為陪上講解的話畫得還易懂（）

廢話不多說了，直接上Lenet結構圖：（從下往上順著箭頭看，最下面為底層original input）

4. CNN代碼

去資源里下載吧，我放上去了喔~（in python）

這里貼少部分代碼，僅表示建模的NN：

[python]?view plain?copy ?

rng?=?numpy.random.RandomState(23455)??

????#?transfrom?x?from?(batchsize,?28*28)?to?(batchsize,feature,28,28))??

????#?I_shape?=?(28,28),F_shape?=?(5,5),??

????N_filters_0?=?20??

????D_features_0=?1??

????layer0_input?=?x.reshape((batch_size,D_features_0,28,28))??

????layer0?=?LeNetConvPoolLayer(rng,?input?=?layer0_input,?filter_shape?=?(N_filters_0,D_features_0,5,5),??

????????????????????????????????image_shape?=?(batch_size,1,28,28))??

????#layer0.output:?(batch_size,?N_filters_0,?(28-5+1)/2,?(28-5+1)/2)?->?20*20*12*12??

??????

????N_filters_1?=?50??

????D_features_1?=?N_filters_0??

????layer1?=?LeNetConvPoolLayer(rng,input?=?layer0.output,?filter_shape?=?(N_filters_1,D_features_1,5,5),??

????????????????????????????????image_shape?=?(batch_size,N_filters_0,12,12))??

????#?layer1.output:?(20,50,4,4)??

??????

????layer2_input?=?layer1.output.flatten(2)?#?(20,50,4,4)->(20,(50*4*4))??

????layer2?=?HiddenLayer(rng,layer2_input,n_in?=?50*4*4,n_out?=?500,?activation?=?T.tanh)??

??????

????layer3?=?LogisticRegression(input?=?layer2.output,?n_in?=?500,?n_out?=?10)??

layer0, layer1 ：分別是卷積+降采樣

layer2+layer3：組成一個MLP（ANN）

訓練模型：

[python]?view plain?copy ?

cost?=?layer3.negative_log_likelihood(y)??

params?=?layer3.params?+?layer2.params?+?layer1.params?+?layer0.params??

gparams?=?T.grad(cost,params)??

updates?=?[]??

for?par,gpar?in?zip(params,gparams):??

????updates.append((par,?par?-?learning_rate?*?gpar))??

train_model?=?theano.function(inputs?=?[minibatch_index],??

??????????????????????????????outputs?=?[cost],??

??????????????????????????????updates?=?updates,??

??????????????????????????????givens?=?{x:?train_set_x[minibatch_index?*?batch_size?:?(minibatch_index+1)?*?batch_size],??

????????????????????????????????????????y:?train_set_y[minibatch_index?*?batch_size?:?(minibatch_index+1)?*?batch_size]})??

根據cost（最上層MLP的輸出NLL），對所有層的parameters進行訓練

剩下的具體見代碼和注釋。

PS：數據為MNIST所有數據

final result：
Optimization complete. Best validation score of 0.990000 % obtained at iteration 122500, with test performance 0.950000 %
from:?http://blog.csdn.net/abcjennifer/article/details/25912675

總結

以上是生活随笔為你收集整理的卷积神经网络Convolution Neural Network (CNN) 原理与实现的全部內容，希望文章能夠幫你解決所遇到的問題。

如果覺得生活随笔網站內容還不錯，歡迎將生活随笔推薦給好友。

上一篇：技术向：一文读懂卷积神经网络CNN
下一篇：基于深度学习的图像分类Image cla