當前位置：首頁 > 运维知识 > windows >内容正文

windows

【theano-windows】学习笔记十——多层感知机手写数字分类

發(fā)布時間：2023/12/13 windows 40 豆豆

生活随笔收集整理的這篇文章主要介紹了【theano-windows】学习笔记十——多层感知机手写数字分类小編覺得挺不錯的,現(xiàn)在分享給大家,幫大家做個參考.

前言

上一篇學習了softmax, 然后更進一步就是學習一下基本的多層感知機(MLP)了. 其實多層感知機同時就是w*x+b用某個激活函數(shù)激活一下, 得到的結果作為下一層神經元的輸入x, 類似于

output=?f3(f2(f1(x?w1+b2)?w2+b2)?w3+b3)?
如果用感知器分類, 那么通常的做法是在最后接一個 softmax, 如果是回歸或者擬合, 這個額, 回頭用到了再說. 如果以 sigmoid作為激活函數(shù), 那么每層單元的計算方法就是

yi=???????11+e?wi?x?bi1≤i<n?1ewn?1j?yn?1+bj∑totalneuralj=1ewn?1j?yn?1+bj
國際慣例，參考網(wǎng)址:

Multilayer Perceptron

預備知識

超參數(shù)

這些參數(shù)無法通過梯度下降算法優(yōu)化, 嚴格點說就是為這些參數(shù)尋找最優(yōu)值是不可行問題, 我們無法單獨對每個參數(shù)進行優(yōu)化, 在這, 我們無法使用之前介紹的梯度方法(因為一些參數(shù)是離散值, 其它的是實值), 最后就是優(yōu)化問題是非凸的,找到(局部)極小值可能需要費很大勁.(筆者注:說了這么多, 其實那些神經元個數(shù)啊, 學習率啊,諸如此類的都屬于超參)

非線性函數(shù)

其實就是激活函數(shù), 截止到目前, 已經出現(xiàn)過好多激活函數(shù)了, 詳細可以去看caffe的官方文檔都有哪些. 早期主要使用sigmoid和tanh, 其實它倆可以互相變換得到

1?2?sigmoid(x)=tanh(x2)
詳細區(qū)別可以戳《在神經網(wǎng)絡中，激活函數(shù)sigmoid和tanh除了閾值取值外有什么不同嗎？》

權重初始化

一定不能把權重初始化為0, 因為全0的話所有的輸出就一樣了, 影響不同神經元上梯度的多樣性. 初始化權重的時候, 我們希望能夠讓它盡量接近0, 這樣梯度就在激活函數(shù)的接近線性區(qū)域的部分(比如sigmoid和tanh在原點附近很接近y=x), 這時候梯度是最大的. 還有就是尤其對于深度神經網(wǎng)絡, 會保存激活的反差以及層與層之間的梯度, 這允許神經網(wǎng)絡中上行和下行過程正常流動, 并且降低層與層之間的差異性. 我們一般會遵循一個稱為fan-in and fan-out的準則, 具體論文Understanding the difficulty of training deep feedforward neuralnetworks, 就是權重從如下分布中均勻采樣:

uniform[?6√fanin+fanout????????????√,6√fanin+fanout????????????√]fortanhuniform[?4?6√fanin+fanout????????????√,4?6√fanin+fanout????????????√]forsigmoid
其中

fanin是輸入神經元個數(shù),

fanout是隱層單元個數(shù)

學習率

最簡單的就是采用常量值, 嘗試一些對數(shù)空間值 (10?1,10?2,?) , 逐漸縮小直到驗證集誤差最小

還有一個好方法是逐漸降低學習率.使用

μ01+d?t 其中

μ0是初始學習率,

d稱為降低常量, 控制學習率的降低速度(經常是不大于

10?3),

t就是迭代次數(shù)

隱單元個數(shù)

這個超參與數(shù)據(jù)集非常相關, 如果數(shù)據(jù)分布復雜, 那么就需要更多的神經元個數(shù), 是不是可以理解為”并不是說數(shù)據(jù)量越大網(wǎng)絡就需要越復雜？”呢…….除非我們使用正則化方法(提前停止或者L1/L2懲罰項),否則隱單元個數(shù)與圖模型的泛化能力將是U型的

懲罰項

典型的是L1/L2正則參數(shù),λ是 10?2,10?3,?

算法實現(xiàn)

導入包

這個也就沒啥好說的, 導入三種模塊:thenao相關的、解壓相關的, 讀取數(shù)據(jù)相關的, 計時相關的

# -*- coding:utf-8 -*- #導入模塊 import theano import theano.tensor as T import numpy as np import cPickle,gzip import os import timeit

讀取數(shù)據(jù)集

這個沒啥好說的, 所有theano手寫數(shù)字分類的博客都是用這段代碼讀數(shù)據(jù)

#讀取數(shù)據(jù)集 def load_data(dataset):data_dir,data_file=os.path.split(dataset)if os.path.isfile(dataset):with gzip.open(dataset,'rb') as f:train_set,valid_set,test_set=cPickle.load(f)#共享數(shù)據(jù)集def shared_dataset(data_xy,borrow=True):data_x,data_y=data_xyshared_x=theano.shared(np.asarray(data_x,dtype=theano.config.floatX),borrow=borrow)shared_y=theano.shared(np.asarray(data_y,dtype=theano.config.floatX),borrow=borrow)return shared_x,T.cast(shared_y,'int32')#定義三個元組分別返回訓練集,驗證集,測試集train_set_x,train_set_y=shared_dataset(train_set)valid_set_x,valid_set_y=shared_dataset(valid_set)test_set_x,test_set_y=shared_dataset(test_set)rval=[(train_set_x,train_set_y),(valid_set_x,valid_set_y),(test_set_x,test_set_y)]return rval

分類器函數(shù)

這里要注意由于多層感知機最后一層輸出是softmax, 而之前的隱層都是它前一層與權重乘積加上偏置被激活得來的(詳細看前言中的那個計算每層單元值的方法), 所以我們要定義兩種層:softmax層和HiddenLayer層

softmax層

直接復制粘貼前面一篇博客的定義方法就行啦

#定義最后一層softmax class LogisticRegression(object):def __init__(self,input,n_in,n_out):#共享權重self.W=theano.shared(value=np.zeros((n_in,n_out),dtype=theano.config.floatX),name='W',borrow=True)#共享偏置self.b=theano.shared(value=np.zeros((n_out,),dtype=theano.config.floatX),name='b',borrow=True)#softmax函數(shù)self.p_y_given_x=T.nnet.softmax(T.dot(input,self.W)+self.b)#預測值self.y_pred=T.argmax(self.p_y_given_x,axis=1)self.params=[self.W,self.b]#模型參數(shù)self.input=input#模型輸入#定義負對數(shù)似然def negative_log_likelihood(self,y):return -T.mean(T.log(self.p_y_given_x)[T.arange(y.shape[0]),y])#定義誤差def errors(self, y):# check if y has same dimension of y_predif y.ndim != self.y_pred.ndim:raise TypeError('y should have the same shape as self.y_pred',('y', y.type, 'y_pred', self.y_pred.type))# check if y is of the correct datatypeif y.dtype.startswith('int'):# the T.neq operator returns a vector of 0s and 1s, where 1# represents a mistake in predictionreturn T.mean(T.neq(self.y_pred, y))else:raise NotImplementedError()

HiddenLayer層

因為MLP的損失函數(shù)都是softmax控制的, 而HiddenLayer只需要完成中間隱層單元值的計算就行了

#定義多層感知器的隱層單元相關操作 class HiddenLayer(object):def __init__(self,rng,input,n_in,n_out,W=None,b=None,activation=T.tanh):self.input=inputif W is None:W_values=np.asarray(rng.uniform(low=- np.sqrt(6./(n_in+n_out)),high= np.sqrt(6./(n_in+n_out)),size=(n_in,n_out)),dtype=theano.config.floatX)if activation==T.nnet.sigmoid:W_values *= 4W=theano.shared(value=W_values,name='W',borrow=True)if b is None:b_vaules=np.zeros((n_out,),dtype=theano.config.floatX)b=theano.shared(value=b_vaules,name='b',borrow=True)self.W=Wself.b=blin_output=T.dot(input,self.W)+self.b#未被激活的線性操作self.output=(lin_output if activation is None else activation(lin_output))self.params=[self.W,self.b]

組合成MLP

搭建一個具有單隱層的MLP網(wǎng)絡就是將這兩個網(wǎng)絡堆起來, 堆的方法就是將HiddenLayer的輸出丟給softmax的輸入, 還有一個就是要將HiddenLayer中的參數(shù)與softmax中的參數(shù)組合起來存到一起相當于是MLP的參數(shù)了

#定義感知器 class MLP(object):def __init__(self,rng,input,n_in,n_hidden,n_out):self.hiddenLayer=HiddenLayer(rng=rng,input=input,n_in=n_in,n_out=n_hidden,activation=T.tanh)self.logRegressitionLayer=LogisticRegression(input=self.hiddenLayer.output,n_in=n_hidden,n_out=n_out)#正則項self.L1=(abs(self.hiddenLayer.W).sum()+abs(self.logRegressitionLayer.W).sum())self.L2=((self.hiddenLayer.W**2).sum()+(self.logRegressitionLayer.W**2).sum())#損失函數(shù)self.negative_log_likelihood=(self.logRegressitionLayer.negative_log_likelihood)self.errors=self.logRegressitionLayer.errorsself.params=self.hiddenLayer.params+self.logRegressitionLayer.params#兩類參數(shù)存一起

訓練

接下來就是訓練了, 說白了就是梯度計算, 更新梯度, 提前終止訓練, 以下代碼都放在test_mlp()函數(shù)中

def test_mlp(learning_rate=0.01,L1_reg=0.00,L2_reg=0.0001,n_epochs=1000,dataset='mnist.pkl.gz',batch_size=20,n_hidden=500):

首先是讀取數(shù)據(jù), 計算批總數(shù)

#讀取數(shù)據(jù)datasets = load_data(dataset)train_set_x,train_set_y=datasets[0]valid_set_x,valid_set_y=datasets[1]test_set_x,test_set_y=datasets[2]#總批次n_train_batches=train_set_x.get_value(borrow=True).shape[0]//batch_sizen_valid_batches=valid_set_x.get_value(borrow=True).shape[0] //batch_sizen_test_batches=test_set_x.get_value(borrow=True).shape[0]//batch_size

隨后構建存儲數(shù)據(jù)和標簽的容器, 并實例化一個分類器

#建立模型print '建立模型......'index=T.iscalar()#批索引x=T.matrix('x')#存儲數(shù)據(jù)集y=T.ivector('y')#存儲標簽rng=np.random.RandomState(1234)#創(chuàng)建分類器classifier=MLP(rng=rng,input=x,n_in=28*28,n_hidden=n_hidden,n_out=10)

定義具有正則項的損失函數(shù)(softmax的負對數(shù)似然+λ1L1+λ2L2), 并且對參數(shù)(包含softmax和HiddenLayer兩種層的權重和偏置)求導, 并且進行梯度更新

#創(chuàng)建具有正則項的損失函數(shù)cost=(classifier.negative_log_likelihood(y)+L1_reg*classifier.L1+L2_reg*classifier.L2)#梯度計算gparams=[T.grad(cost,param) for param in classifier.params]updates=[(param,param-learning_rate*gparams) for param,gparams in zip(classifier.params,gparams)]

接下來就是訓練模型、驗證模型、測試模型的三個函數(shù)設計

#訓練模型train_model=theano.function(inputs=[index],outputs=cost,updates=updates,givens={x:train_set_x[index*batch_size:(index+1)*batch_size],y:train_set_y[index*batch_size:(index+1)*batch_size]})#驗證模型valid_model=theano.function(inputs=[index],outputs=classifier.errors(y),givens={x:valid_set_x[index*batch_size:(index+1)*batch_size],y:valid_set_y[index*batch_size:(index+1)*batch_size]})#測試模型test_model=theano.function(inputs=[index],outputs=classifier.errors(y),givens={x:test_set_x[index*batch_size:(index+1)*batch_size],y:test_set_y[index*batch_size:(index+1)*batch_size]})

使用提前終止算法開始訓練

#提前終止法訓練patiences=10000patiences_increase=2improvement_threshold=0.995#模型性能提升閾值validation_frequency=min(n_train_batches,patiences//2)best_validation_loss=np.inf#最好的模型損失best_iter=0#最好的迭代次數(shù)best_score=0#最好的得分start_time=timeit.default_timer()epoch=0done_looping=Falsewhile(epoch<n_epochs) and (not done_looping):epoch=epoch+1for minibatch_index in range(n_train_batches):minibatch_avg_cost=train_model(minibatch_index)#迭代次數(shù)iter=(epoch-1)*n_train_batches+minibatch_indexif (iter+1)%validation_frequency==0:validation_loss=[valid_model(i) for i in range(n_valid_batches)]this_validation_loss=np.mean(validation_loss)print('epoch %i, minibatch %i/%i, validation error %f %%' %(epoch,minibatch_index + 1,n_train_batches,this_validation_loss * 100.))if this_validation_loss<best_validation_loss:if this_validation_loss<best_validation_loss*improvement_threshold:patiences=max(patiences,iter*patiences_increase)best_validation_loss=this_validation_lossbest_iter=iter#測試集的效果test_losses=[test_model(i) for i in range(n_test_batches)]test_score=np.mean(test_losses)print((' epoch %i, minibatch %i/%i, test error of ''best model %f %%') %(epoch, minibatch_index + 1, n_train_batches,test_score * 100.))if patiences<iter:done_looping=Truebreakend_time=timeit.default_timer()print(('Optimization complete. Best validation score of %f %% ''obtained at iteration %i, with test performance %f %%') %(best_validation_loss * 100., best_iter + 1, test_score * 100.))

再回顧一下這個提前終止算法：最大迭代上限就是n_epochs, 在迭代過程中設置了一個最大耐心值patiences, 每批數(shù)據(jù)迭代一次算是更新了一次梯度(所以這個次數(shù)iter是一直遞增的, 不會在某次循環(huán)被置零), 每更新validation_frequency次就測試以下模型的精度如何, 如果模型還在優(yōu)化且性能提升超過閾值, 那么取max(原始耐心值, iter*增量)作為新的耐心值, 當模型性能不再優(yōu)化或者優(yōu)化程度不高的時候(不會再更新耐心值), 一旦梯度更新次數(shù)超過耐心值, 就強制終止循環(huán)了.

接下來執(zhí)行訓練過程【先別訓練, 繼續(xù)看博客】

if __name__=='__main__':test_mlp()

貼出我訓練的時候最后一次迭代的準確率:

...... epoch 1000, minibatch 2500/2500, validation error 1.700000 % Optimization complete. Best validation score of 1.690000 % obtained at iteration 2367500, with test performance 1.650000 %

那么問題出現(xiàn)了？我丫沒保存模型哇，待會咋測試。。。。。。然后嘗試著在上面的test_mlp()中添加保存過程

print(('epoch %i, minibatch %i/%i, test error of ''best model %f %%') %(epoch, minibatch_index + 1, n_train_batches,test_score * 100.))# 保存最優(yōu)模型with open('best_model_MPL.pkl', 'wb') as f:pickle.dump(classifier, f)if patiences<iter:done_looping=Truebreak

我勒個擦，提示錯誤了

TypeError: can't pickle instancemethod objects

允許我這個python菜雞逃避這個錯誤的修改方法, 嘗試使用其它方法保存模型

想啊想，想啊想，好吧，把參數(shù)提取出來保存吧

print(('epoch %i, minibatch %i/%i, test error of ''best model %f %%') %(epoch, minibatch_index + 1, n_train_batches,test_score * 100.))# 保存最優(yōu)模型save_file=open('best_model_MLP.pkl','wb')model=[classifier.hiddenLayer,classifier.logRegressitionLayer]cPickle.dump( model,save_file)if patiences<iter:done_looping=Truebreak

竟然成功了, 哈哈哈哈哈哈嗝o(╯□╰)o

測試

保存成功以后當然是來一波測試咯

讀之

classifier=cPickle.load(open('best_model_MLP.pkl'))

初始化一個MLP, 注意要與訓練的一模一樣

x=T.matrix('x') n_hidden=500 classifier_test=MLP(rng=np.random.RandomState(1234),input=x,n_in=28*28,n_hidden=n_hidden,n_out=10)

然后用set_value更改這個初始化MLP的權重和偏置

classifier_test.hiddenLayer.W.set_value(classifier[0].W.get_value()) classifier_test.hiddenLayer.b.set_value(classifier[0].b.get_value())classifier_test.logRegressitionLayer.W.set_value(classifier[1].W.get_value()) classifier_test.logRegressitionLayer.b.set_value(classifier[1].b.get_value())

讀一個數(shù)據(jù)出來

dataset='mnist.pkl.gz' datasets=load_data(dataset) test_set_x,test_set_y=datasets[2] test_set_x=test_set_x.get_value() test_data=test_set_x[10:11]

跟上一篇softmax一樣使用y_pred()函數(shù)測試以下準確度

predict_model=theano.function(inputs=[x],outputs=classifier_test.logRegressitionLayer.y_pred) predicted_value=predict_model(test_data) print predicted_value

我勒個擦，竟然沒錯，出結果了，為了嚴謹性，我們輸出以下這個圖像

from skimage import io import matplotlib.pyplot as plt img= np.ceil(test_data*255) img_res=np.asarray(img.reshape(28,28),dtype=np.int32) io.imshow(img_res) plt.show()

完全正確，多試幾個也是對的，偷偷說一下，為了保存這個模型, 我后來只訓練了模型2次哇

建立模型...... epoch 1, minibatch 2500/2500, validation error 9.620000 % epoch 1, minibatch 2500/2500, test error of best model 10.090000 % epoch 2, minibatch 2500/2500, validation error 8.610000 % epoch 2, minibatch 2500/2500, test error of best model 8.740000 % Optimization complete. Best validation score of 8.610000 % obtained at iteration 5000, with test performance 8.740000 %

剩下的測試我就不說啦，畢竟和softmax一樣，批測試和自己的手寫數(shù)字測試, 一樣的道理咯

后記

這一次主要還是學會了怎么分開保存模型的每一部分的參數(shù), 其它的看大家一起分享分享咯, 都學會啥了捏？

code:鏈接: https://pan.baidu.com/s/1c1GDh5Y 密碼: wpvc

訓練好的模型:鏈接: https://pan.baidu.com/s/1gf1ohSR 密碼: dv6r

總結

以上是生活随笔為你收集整理的【theano-windows】学习笔记十——多层感知机手写数字分类的全部內容，希望文章能夠幫你解決所遇到的問題。

如果覺得生活随笔網(wǎng)站內容還不錯，歡迎將生活随笔推薦給好友。

上一篇：六大商业银行手机银行跨行转账要手续费吗？
下一篇：【theano-windows】学习笔记