深度学习(10)-- Capsules Networks(CapsNet)
目錄
- lecture 11:Capsules Networks(CapsNet)
- 目錄
- 1、膠囊網(wǎng)絡(luò)
- 1.1 CNN有重要的缺點
- 1.2 將3D世界硬編碼為神經(jīng)網(wǎng)絡(luò):逆向圖形方法
- 2、動態(tài)路由規(guī)劃
- 2.1 背景
- 2.2 解決路由問題
- 2.3 網(wǎng)絡(luò)構(gòu)建
- 3、代碼
- 1、膠囊網(wǎng)絡(luò)
1、膠囊網(wǎng)絡(luò)
1.1 CNN有重要的缺點
CNN(卷積神經(jīng)網(wǎng)絡(luò))真棒。這是今天深度學(xué)習(xí)如此受歡迎的原因之一。他們可以做出令人驚嘆的人們曾經(jīng)認(rèn)為計算機(jī)長期以來都無法做到的事情。盡管如此,它們有其局限性,有根本的缺陷。
讓我們考慮一個非常簡單和非技術(shù)性的例子。想象一張臉。組件是什么? 我們有橢圓形的臉,兩只眼睛,一個鼻子和一個嘴巴。對于一個CNN來說,這些對象的存在可以成為認(rèn)定圖像中有一張臉的一個非常有力的指標(biāo)。這些組件之間的定向關(guān)系和相對空間關(guān)系對CNN來說并不重要。
對CNN來說兩幅圖是相似的,因為它們包含相似的元素。
CNN如何工作?
- CNN的主要組成部分是一個卷積層 它的工作是檢測圖像像素中的重要特征。
- 較深層(接近輸入)將學(xué)習(xí)檢測諸如邊緣和顏色漸變等簡單特征
- 而更高層將簡單特征組合成更復(fù)雜的特征。
- 最后,網(wǎng)絡(luò)頂部的密集層將結(jié)合非常高級的特征并產(chǎn)生分類預(yù)測。
需要了解的重要一點是,更高級別的特征以加權(quán)和的方式將較低級特征組合:
(1)前一層的激活與下一層神經(jīng)元的權(quán)重相乘并相加,然后傳遞到非線性激活。
(2)在這個設(shè)置中,沒有任何地方在構(gòu)成更高級特征的簡單特征之間存在姿態(tài)(平移和旋轉(zhuǎn))關(guān)系。
(3)CNN解決這個問題的方法是使用最大池化或連續(xù)卷積層來減少流經(jīng)網(wǎng)絡(luò)的數(shù)據(jù)的空間大小,從而增加高層神經(jīng)元的“視野”,從而允許他們檢測輸入圖像的較大區(qū)域的高階特征。
(4)最大的池化是一個使卷積網(wǎng)絡(luò)工作得非常好的拐杖,在許多領(lǐng)域?qū)崿F(xiàn)了超人的表現(xiàn)。
但是不要被它的表現(xiàn)所迷惑:
- 雖然CNN比之前的任何模式都要好,但是最大池化會失去有價值的信息。
- Hinton:“卷積神經(jīng)網(wǎng)絡(luò)中使用的池化操作是一個很大的錯誤,它運行得很好的事實是一場災(zāi)難。”
- 當(dāng)然,你可以用傳統(tǒng)的CNNs來替代最大的池化,但是仍然不能解決關(guān)鍵問題:
- 卷積神經(jīng)網(wǎng)絡(luò)的內(nèi)部數(shù)據(jù)表示不考慮簡單和復(fù)雜對象之間的重要空間層次。
在上面的例子中,圖片中僅存在2只眼睛1張嘴巴和1只鼻子并不意味著有一張臉,我們也需要知道這些物體相對于彼此處于怎樣的位置。
1.2 將3D世界硬編碼為神經(jīng)網(wǎng)絡(luò):逆向圖形方法
計算機(jī)圖形學(xué)涉及從幾何數(shù)據(jù)的內(nèi)部分層表示來構(gòu)造可視圖像。請注意,這種表示的結(jié)構(gòu)需要考慮對象的相對位置。該內(nèi)部表示作為表示這些對象的相對位置和方向的幾何對象,以矩陣的陣列形式存儲在計算機(jī)的存儲器中。然后,特殊軟件將該表示轉(zhuǎn)換成屏幕上的圖像。這就是所謂的渲染。
膠囊網(wǎng)絡(luò)比其他模型要好得多,可以告訴上下行的圖像是同一類,視角不同。 最新的論文使錯誤率降低了45%。
2、動態(tài)路由規(guī)劃
2.1 背景
目前的神經(jīng)網(wǎng)絡(luò)中,每一層的神經(jīng)元都做的是類似的事情,比如一個卷積層內(nèi)的每個神經(jīng)元都做的是一樣的卷積操作。
- 而Hinton堅信,不同的神經(jīng)元完全可以關(guān)注不同的實體或者屬性,比如在一開始就有不同的神經(jīng)元關(guān)注不同的類別(而不是到最后才有歸一化分類)。
- 具體來說,有的神經(jīng)元關(guān)注位置、有的關(guān)注尺寸、有的關(guān)注方向。這類似人類大腦中語言、視覺都有分別的區(qū)域負(fù)責(zé),而不是分散在整個大腦中。
為了避免網(wǎng)絡(luò)結(jié)構(gòu)的雜亂無章,Hinton提出把關(guān)注同一個類別或者同一個屬性的神經(jīng)元打包集合在一起,好像膠囊一樣。
- 在神經(jīng)網(wǎng)絡(luò)工作時,這些膠囊間的通路形成稀疏激活的樹狀結(jié)構(gòu)(整個樹中只有部分路徑上的膠囊被激活),從而形成了他的Capsule理論。Capsule也就具有更好的解釋性。
- 值得一提的是,同在谷歌大腦(但不在同一個辦公室)的Jeff Dean也認(rèn)為稀疏激活的神經(jīng)網(wǎng)絡(luò)是未來的重要發(fā)展方向,不知道他能不能也提出一些不同的實現(xiàn)方法來。
Capsule這樣的網(wǎng)絡(luò)結(jié)構(gòu)在符合人們“一次認(rèn)知多個屬性”的直觀感受的同時,也會帶來另一個直觀的問題,那就是不同的膠囊應(yīng)該如何訓(xùn)練、又如何讓網(wǎng)絡(luò)自己決定膠囊間的激活關(guān)系。Hinton這篇論文解決的重點問題就是不同膠囊間連接權(quán)重(路由)的學(xué)習(xí)。
2.2 解決路由問題
- 首先,每個層中的神經(jīng)元分組形成不同的膠囊,每個膠囊有一個“活動向量”activity vector,它是這個膠囊對于它關(guān)注的類別或者屬性的表征。
- 樹結(jié)構(gòu)中的每個節(jié)點就對應(yīng)著一個活動的膠囊。
- 通過一個迭代路由的過程,每個活動的膠囊都會從高一層網(wǎng)絡(luò)中的膠囊中選擇一個,讓它成為自己的母節(jié)點。
- 對于高階的視覺系統(tǒng)來說,這樣的迭代過程就很有潛力解決一個物體的部分如何層層組合成整體的問題。
對于實體在網(wǎng)絡(luò)中的表征,眾多屬性中有一個屬性比較特殊,那就是它出現(xiàn)的概率(網(wǎng)絡(luò)檢測到某一類物體出現(xiàn)的置信度)。
- 一般典型的方式是用一個單獨的、輸出0到1之間的回歸單元來表示,0就是沒出現(xiàn),1就是出現(xiàn)了。
- 在這篇論文中,Hinton想用活動向量同時表示一個實體是否出現(xiàn)以及這個實體的屬性。
- 他的做法是用向量不同維度上的值分別表示不同的屬性,然后用整個向量的模表示這個實體出現(xiàn)的概率。
- 為了保證向量的長度,也就是實體出現(xiàn)的概率不超過1,向量會通過一個非線性計算進(jìn)行標(biāo)準(zhǔn)化,這樣實體的不同屬性也就實際上體現(xiàn)為了這個向量在高維空間中的方向。
采用這樣的活動向量有一個很大的好處,就是可以幫助低層級的膠囊選擇自己連接到哪個高層級的膠囊。具體做法是:
- 一開始低層級的膠囊會給所有高層級的膠囊提供輸入;
- 然后這個低層級的膠囊會把自己的輸出和一個權(quán)重矩陣相乘,得到一個預(yù)測向量。
- 如果預(yù)測向量和某個高層級膠囊的輸出向量的標(biāo)量積更大,就可以形成從上而下的反饋,提高這兩個膠囊間的耦合系數(shù),降低低層級膠囊和其它高層級膠囊間的耦合系數(shù)。
- 進(jìn)行幾次迭代后,貢獻(xiàn)更大的低層級膠囊和接收它的貢獻(xiàn)的高層級膠囊之間的連接就會占越來越重要的位置。
在論文作者們看來,這種“一致性路由”(routing-by-agreement)的方法要比之前最大池化之類只保留了唯一一個最活躍的特征的路由方法有效得多。
2.3 網(wǎng)絡(luò)構(gòu)建
作者們構(gòu)建了一個簡單的CapsNet。除最后一層外,網(wǎng)絡(luò)的各層都是卷積層,但它們現(xiàn)在都是“膠囊”的層,其中用向量輸出代替了CNN的標(biāo)量特征輸出、用一致性路由代替了最大池化。與CNN類似,更高層的網(wǎng)絡(luò)觀察了圖像中更大的范圍,不過由于不再是最大池化,所以位置信息一直都得到了保留。對于較低的層,空間位置的判斷也只需要看是哪些膠囊被激活了。
這個網(wǎng)絡(luò)中最底層的多維度膠囊結(jié)構(gòu)就展現(xiàn)出了不同的特性,它們起到的作用就像傳統(tǒng)計算機(jī)圖形渲染中的不同元素一樣,每一個膠囊關(guān)注自己的一部分特征。這和目前的計算機(jī)視覺任務(wù)中,把圖像中不同空間位置的元素組合起來形成整體理解(或者說圖像中的每個區(qū)域都會首先激活整個網(wǎng)絡(luò)然后再進(jìn)行組合)具有截然不同的計算特性。在底層的膠囊之后連接了PrimaryCaps層和DigitCaps層。
3.代碼
capsulelayers.py
import keras.backend as K import tensorflow as tf from keras import initializers, layersclass Length(layers.Layer):"""Compute the length of vectors. This is used to compute a Tensor that has the same shape with y_true in margin_loss.Using this layer as model's output can directly predict labels by using `y_pred = np.argmax(model.predict(x), 1)`inputs: shape=[None, num_vectors, dim_vector]output: shape=[None, num_vectors]"""def call(self, inputs, **kwargs):return K.sqrt(K.sum(K.square(inputs), -1))def compute_output_shape(self, input_shape):return input_shape[:-1]class Mask(layers.Layer):"""Mask a Tensor with shape=[None, num_capsule, dim_vector] either by the capsule with max length or by an additional input mask. Except the max-length capsule (or specified capsule), all vectors are masked to zeros. Then flatten themasked Tensor.For example:```x = keras.layers.Input(shape=[8, 3, 2]) # batch_size=8, each sample contains 3 capsules with dim_vector=2y = keras.layers.Input(shape=[8, 3]) # True labels. 8 samples, 3 classes, one-hot coding.out = Mask()(x) # out.shape=[8, 6]# orout2 = Mask()([x, y]) # out2.shape=[8,6]. Masked with true labels y. Of course y can also be manipulated.```"""def call(self, inputs, **kwargs):if type(inputs) is list: # true label is provided with shape = [None, n_classes], i.e. one-hot code.assert len(inputs) == 2inputs, mask = inputselse: # if no true label, mask by the max length of capsules. Mainly used for prediction# compute lengths of capsulesx = K.sqrt(K.sum(K.square(inputs), -1))# generate the mask which is a one-hot code.# mask.shape=[None, n_classes]=[None, num_capsule]mask = K.one_hot(indices=K.argmax(x, 1), num_classes=x.get_shape().as_list()[1])# inputs.shape=[None, num_capsule, dim_capsule]# mask.shape=[None, num_capsule]# masked.shape=[None, num_capsule * dim_capsule]masked = K.batch_flatten(inputs * K.expand_dims(mask, -1))return maskeddef compute_output_shape(self, input_shape):if type(input_shape[0]) is tuple: # true label providedreturn tuple([None, input_shape[0][1] * input_shape[0][2]])else: # no true label providedreturn tuple([None, input_shape[1] * input_shape[2]])def squash(vectors, axis=-1):"""The non-linear activation used in Capsule. It drives the length of a large vector to near 1 and small vector to 0:param vectors: some vectors to be squashed, N-dim tensor:param axis: the axis to squash:return: a Tensor with same shape as input vectors"""s_squared_norm = K.sum(K.square(vectors), axis, keepdims=True)scale = s_squared_norm / (1 + s_squared_norm) / K.sqrt(s_squared_norm + K.epsilon())return scale * vectorsclass CapsuleLayer(layers.Layer):"""The capsule layer. It is similar to Dense layer. Dense layer has `in_num` inputs, each is a scalar, the output of the neuron from the former layer, and it has `out_num` output neurons. CapsuleLayer just expand the output of the neuronfrom scalar to vector. So its input shape = [None, input_num_capsule, input_dim_capsule] and output shape = \[None, num_capsule, dim_capsule]. For Dense Layer, input_dim_capsule = dim_capsule = 1.:param num_capsule: number of capsules in this layer:param dim_capsule: dimension of the output vectors of the capsules in this layer:param routings: number of iterations for the routing algorithm"""def __init__(self, num_capsule, dim_capsule, routings=3,kernel_initializer='glorot_uniform',**kwargs):super(CapsuleLayer, self).__init__(**kwargs)self.num_capsule = num_capsuleself.dim_capsule = dim_capsuleself.routings = routingsself.kernel_initializer = initializers.get(kernel_initializer)def build(self, input_shape):assert len(input_shape) >= 3, "The input Tensor should have shape=[None, input_num_capsule, input_dim_capsule]"self.input_num_capsule = input_shape[1]self.input_dim_capsule = input_shape[2]# Transform matrixself.W = self.add_weight(shape=[self.num_capsule, self.input_num_capsule,self.dim_capsule, self.input_dim_capsule],initializer=self.kernel_initializer,name='W')self.built = Truedef call(self, inputs, training=None):# inputs.shape=[None, input_num_capsule, input_dim_capsule]# inputs_expand.shape=[None, 1, input_num_capsule, input_dim_capsule]inputs_expand = K.expand_dims(inputs, 1)# Replicate num_capsule dimension to prepare being multiplied by W# inputs_tiled.shape=[None, num_capsule, input_num_capsule, input_dim_capsule]inputs_tiled = K.tile(inputs_expand, [1, self.num_capsule, 1, 1])# Compute `inputs * W` by scanning inputs_tiled on dimension 0.# x.shape=[num_capsule, input_num_capsule, input_dim_capsule]# W.shape=[num_capsule, input_num_capsule, dim_capsule, input_dim_capsule]# Regard the first two dimensions as `batch` dimension,# then matmul: [input_dim_capsule] x [dim_capsule, input_dim_capsule]^T -> [dim_capsule].# inputs_hat.shape = [None, num_capsule, input_num_capsule, dim_capsule]inputs_hat = K.map_fn(lambda x: K.batch_dot(x, self.W, [2, 3]), elems=inputs_tiled)# Begin: Routing algorithm ---------------------------------------------------------------------## The prior for coupling coefficient, initialized as zeros.# b.shape = [None, self.num_capsule, self.input_num_capsule].b = tf.zeros(shape=[K.shape(inputs_hat)[0], self.num_capsule, self.input_num_capsule])assert self.routings > 0, 'The routings should be > 0.'for i in range(self.routings):# c.shape=[batch_size, num_capsule, input_num_capsule]c = tf.nn.softmax(b, dim=1)# c.shape = [batch_size, num_capsule, input_num_capsule]# inputs_hat.shape=[None, num_capsule, input_num_capsule, dim_capsule]# The first two dimensions as `batch` dimension,# then matmal: [input_num_capsule] x [input_num_capsule, dim_capsule] -> [dim_capsule].# outputs.shape=[None, num_capsule, dim_capsule]outputs = squash(K.batch_dot(c, inputs_hat, [2, 2])) # [None, 10, 16]if i < self.routings - 1:# outputs.shape = [None, num_capsule, dim_capsule]# inputs_hat.shape=[None, num_capsule, input_num_capsule, dim_capsule]# The first two dimensions as `batch` dimension,# then matmal: [dim_capsule] x [input_num_capsule, dim_capsule]^T -> [input_num_capsule].# b.shape=[batch_size, num_capsule, input_num_capsule]b += K.batch_dot(outputs, inputs_hat, [2, 3])# End: Routing algorithm -----------------------------------------------------------------------#return outputsdef compute_output_shape(self, input_shape):return tuple([None, self.num_capsule, self.dim_capsule])def PrimaryCap(inputs, dim_capsule, n_channels, kernel_size, strides, padding):"""Apply Conv2D `n_channels` times and concatenate all capsules:param inputs: 4D tensor, shape=[None, width, height, channels]:param dim_capsule: the dim of the output vector of capsule:param n_channels: the number of types of capsules:return: output tensor, shape=[None, num_capsule, dim_capsule]"""output = layers.Conv2D(filters=dim_capsule*n_channels, kernel_size=kernel_size, strides=strides, padding=padding,name='primarycap_conv2d')(inputs)outputs = layers.Reshape(target_shape=[-1, dim_capsule], name='primarycap_reshape')(output)return layers.Lambda(squash, name='primarycap_squash')(outputs)capsulenet.py
import numpy as np from keras import layers, models, optimizers from keras import backend as K from keras.utils import to_categorical import matplotlib.pyplot as plt from utils import combine_images from PIL import Image from capsulelayers import CapsuleLayer, PrimaryCap, Length, MaskK.set_image_data_format('channels_last') from resnets_utils import *def CapsNet(input_shape, n_class, routings):"""A Capsule Network on MNIST.:param input_shape: data shape, 3d, [width, height, channels]:param n_class: number of classes:param routings: number of routing iterations:return: Two Keras Models, the first one used for training, and the second one for evaluation.`eval_model` can also be used for training."""x = layers.Input(shape=input_shape)# Layer 1: Just a conventional Conv2D layerconv1 = layers.Conv2D(filters=32, kernel_size=5, strides=1, padding='valid', activation='relu', name='conv1')(x)# Layer 2: Conv2D layer with `squash` activation, then reshape to [None, num_capsule, dim_capsule]primarycaps = PrimaryCap(conv1, dim_capsule=8, n_channels=16, kernel_size=9, strides=2, padding='valid')# Layer 3: Capsule layer. Routing algorithm works here.digitcaps = CapsuleLayer(num_capsule=n_class, dim_capsule=12, routings=routings,name='digitcaps')(primarycaps)# Layer 4: This is an auxiliary layer to replace each capsule with its length. Just to match the true label's shape.# If using tensorflow, this will not be necessary. :)out_caps = Length(name='capsnet')(digitcaps)# Decoder network.y = layers.Input(shape=(n_class,))masked_by_y = Mask()([digitcaps, y]) # The true label is used to mask the output of capsule layer. For trainingmasked = Mask()(digitcaps) # Mask using the capsule with maximal length. For prediction# Shared Decoder model in training and predictiondecoder = models.Sequential(name='decoder')decoder.add(layers.Dense(64, activation='relu', input_dim=12*n_class))decoder.add(layers.Dense(32, activation='relu'))decoder.add(layers.Dense(np.prod(input_shape), activation='sigmoid'))decoder.add(layers.Reshape(target_shape=input_shape, name='out_recon'))# Models for training and evaluation (prediction)train_model = models.Model([x, y], [out_caps, decoder(masked_by_y)])eval_model = models.Model(x, [out_caps, decoder(masked)])# manipulate modelnoise = layers.Input(shape=(n_class, 12))noised_digitcaps = layers.Add()([digitcaps, noise])masked_noised_y = Mask()([noised_digitcaps, y])manipulate_model = models.Model([x, y, noise], decoder(masked_noised_y))return train_model, eval_model, manipulate_modeldef margin_loss(y_true, y_pred):"""Margin loss for Eq.(4). When y_true[i, :] contains not just one `1`, this loss should work too. Not test it.:param y_true: [None, n_classes]:param y_pred: [None, num_capsule]:return: a scalar loss value."""L = y_true * K.square(K.maximum(0., 0.9 - y_pred)) + \0.5 * (1 - y_true) * K.square(K.maximum(0., y_pred - 0.1))return K.mean(K.sum(L, 1))def train(model, data, args):"""Training a CapsuleNet:param model: the CapsuleNet model:param data: a tuple containing training and testing data, like `((x_train, y_train), (x_test, y_test))`:param args: arguments:return: The trained model"""# unpacking the data(x_train, y_train), (x_test, y_test) = data# callbackslog = callbacks.CSVLogger(args.save_dir + '/log.csv')tb = callbacks.TensorBoard(log_dir=args.save_dir + '/tensorboard-logs',batch_size=args.batch_size, histogram_freq=int(args.debug))checkpoint = callbacks.ModelCheckpoint(args.save_dir + '/weights-{epoch:02d}.h5', monitor='val_capsnet_acc',save_best_only=True, save_weights_only=True, verbose=1)lr_decay = callbacks.LearningRateScheduler(schedule=lambda epoch: args.lr * (args.lr_decay ** epoch))# compile the modelmodel.compile(optimizer=optimizers.Adam(lr=args.lr),loss=[margin_loss, 'mse'],loss_weights=[1., args.lam_recon],metrics={'capsnet': 'accuracy'})"""# Training without data augmentation:model.fit([x_train, y_train], [y_train, x_train], batch_size=args.batch_size, epochs=args.epochs,validation_data=[[x_test, y_test], [y_test, x_test]], callbacks=[log, tb, checkpoint, lr_decay])"""# Begin: Training with data augmentation ---------------------------------------------------------------------#def train_generator(x, y, batch_size, shift_fraction=0.):train_datagen = ImageDataGenerator(width_shift_range=shift_fraction,height_shift_range=shift_fraction) # shift up to 2 pixel for MNISTgenerator = train_datagen.flow(x, y, batch_size=batch_size)while 1:x_batch, y_batch = generator.next()yield ([x_batch, y_batch], [y_batch, x_batch])# Training with data augmentation. If shift_fraction=0., also no augmentation.model.fit_generator(generator=train_generator(x_train, y_train, args.batch_size, args.shift_fraction),steps_per_epoch=int(y_train.shape[0] / args.batch_size),epochs=args.epochs,validation_data=[[x_test, y_test], [y_test, x_test]],callbacks=[log, tb, checkpoint, lr_decay])# End: Training with data augmentation -----------------------------------------------------------------------#model.save_weights(args.save_dir + '/trained_model.h5')print('Trained model saved to \'%s/trained_model.h5\'' % args.save_dir)from utils import plot_logplot_log(args.save_dir + '/log.csv', show=True)return modeldef test(model, data, args):x_test, y_test = datay_pred, x_recon = model.predict(x_test, batch_size=32)print('-'*30 + 'Begin: test' + '-'*30)print('Test acc:', np.sum(np.argmax(y_pred, 1) == np.argmax(y_test, 1))/y_test.shape[0])img = combine_images(np.concatenate([x_test[:50],x_recon[:50]]))image = img * 255Image.fromarray(image.astype(np.uint8)).save(args.save_dir + "/real_and_recon.png")print()print('Reconstructed images are saved to %s/real_and_recon.png' % args.save_dir)print('-' * 30 + 'End: test' + '-' * 30)plt.imshow(plt.imread(args.save_dir + "/real_and_recon.png"))plt.show()def manipulate_latent(model, data, args):print('-'*30 + 'Begin: manipulate' + '-'*30)x_test, y_test = dataindex = np.argmax(y_test, 1) == args.digitnumber = np.random.randint(low=0, high=sum(index) - 1)x, y = x_test[index][number], y_test[index][number]x, y = np.expand_dims(x, 0), np.expand_dims(y, 0)noise = np.zeros([1, 6, 12])x_recons = []for dim in range(16):for r in [-0.25, -0.2, -0.15, -0.1, -0.05, 0, 0.05, 0.1, 0.15, 0.2, 0.25]:tmp = np.copy(noise)tmp[:,:,dim] = rx_recon = model.predict([x, y, tmp])x_recons.append(x_recon)x_recons = np.concatenate(x_recons)img = combine_images(x_recons, height=12)image = img*255Image.fromarray(image.astype(np.uint8)).save(args.save_dir + '/manipulate-%d.png' % args.digit)print('manipulated result saved to %s/manipulate-%d.png' % (args.save_dir, args.digit))print('-' * 30 + 'End: manipulate' + '-' * 30)def load_mnist():# the data, shuffled and split between train and test sets# the data, shuffled and split between train and test setsX_train_orig, Y_train_orig, X_test_orig, Y_test_orig, classes = load_dataset()# 將數(shù)據(jù)歸一化,標(biāo)簽one-hotx_train = X_train_orig/255.x_test = X_test_orig/255.y_train = convert_to_one_hot(Y_train_orig, 6).Ty_test = convert_to_one_hot(Y_test_orig, 6).Tprint(x_train.shape, y_train.shape)return (x_train, y_train), (x_test, y_test)if __name__ == "__main__":import osimport argparsefrom keras.preprocessing.image import ImageDataGeneratorfrom keras import callbacks# setting the hyper parametersparser = argparse.ArgumentParser(description="Capsule Network on MNIST.")parser.add_argument('--epochs', default=20, type=int)parser.add_argument('--batch_size', default=32, type=int)parser.add_argument('--lr', default=0.001, type=float,help="Initial learning rate")parser.add_argument('--lr_decay', default=0.9, type=float,help="The value multiplied by lr at each epoch. Set a larger value for larger epochs")parser.add_argument('--lam_recon', default=0.392, type=float,help="The coefficient for the loss of decoder")parser.add_argument('-r', '--routings', default=3, type=int,help="Number of iterations used in routing algorithm. should > 0")parser.add_argument('--shift_fraction', default=0.1, type=float,help="Fraction of pixels to shift at most in each direction.")parser.add_argument('--debug', action='store_true',help="Save weights by TensorBoard")parser.add_argument('--save_dir', default='./result')parser.add_argument('-t', '--testing', action='store_true',help="Test the trained model on testing dataset")parser.add_argument('--digit', default=5, type=int,help="Digit to manipulate")parser.add_argument('-w', '--weights', default=None,help="The path of the saved weights. Should be specified when testing")args = parser.parse_args()print(args)if not os.path.exists(args.save_dir):os.makedirs(args.save_dir)# load data(x_train, y_train), (x_test, y_test) = load_mnist()# define modelmodel, eval_model, manipulate_model = CapsNet(input_shape=x_train.shape[1:],n_class=len(np.unique(np.argmax(y_train, 1))),routings=args.routings)model.summary()# train or testif args.weights is not None: # init the model weights with provided onemodel.load_weights(args.weights)if not args.testing:train(model=model, data=((x_train, y_train), (x_test, y_test)), args=args)else: # as long as weights are given, will run testingif args.weights is None:print('No weights are provided. Will test using random initialized weights.')manipulate_latent(manipulate_model, (x_test, y_test), args)test(model=eval_model, data=(x_test, y_test), args=args)總結(jié)
以上是生活随笔為你收集整理的深度学习(10)-- Capsules Networks(CapsNet)的全部內(nèi)容,希望文章能夠幫你解決所遇到的問題。
- 上一篇: 《Python Cookbook 3rd
- 下一篇: 加载tf模型 正确率很低_深度学习模型训