TensorFlow2-神经网络训练
生活随笔
收集整理的這篇文章主要介紹了
TensorFlow2-神经网络训练
小編覺得挺不錯的,現(xiàn)在分享給大家,幫大家做個參考.
TensorFlow2神經(jīng)網(wǎng)絡(luò)訓(xùn)練
文章目錄
- TensorFlow2神經(jīng)網(wǎng)絡(luò)訓(xùn)練
- 梯度下降
- 反向傳播
- 訓(xùn)練可視化
- 補(bǔ)充說明
梯度下降
- 梯度?f=(?f?x1;?f?x2;…;?f?xn)\nabla f=\left(\frac{\partial f}{\partial x_{1}} ; \frac{\partial f}{\partial x_{2}} ; \ldots ; \frac{\partial f}{\partial x_{n}}\right)?f=(?x1??f?;?x2??f?;…;?xn??f?)指函數(shù)關(guān)于變量x的導(dǎo)數(shù),梯度的方向表示函數(shù)值增大的方向,梯度的模表示函數(shù)值增大的速率。那么只要不斷將參數(shù)的值向著梯度的反方向更新一定大小,就能得到函數(shù)的最小值(全局最小值或者局部最小值)。
θt+1=θt?αt?f(θt)\theta_{t+1}=\theta_{t}-\alpha_{t} \nabla f\left(\theta_{t}\right)θt+1?=θt??αt??f(θt?) - 上述參數(shù)更新的過程就叫做梯度下降法,但是一般利用梯度更新參數(shù)時會將梯度乘以一個小于1的學(xué)習(xí)速率(learning rate),這是因?yàn)橥荻鹊哪_€是比較大的,直接用其更新參數(shù)會使得函數(shù)值不斷波動,很難收斂到一個平衡點(diǎn)(這也是學(xué)習(xí)率不宜過大的原因)。
- 但是對于不同的函數(shù),GD(梯度下降法)未必都能找到最優(yōu)解,很多時候它只能收斂到一個局部最優(yōu)解就不再變動了(盡管這個局部最優(yōu)解已經(jīng)很接近全局最優(yōu)解了),這是函數(shù)性質(zhì)決定的,實(shí)驗(yàn)證明,梯度下降法對于凸函數(shù)有著較好的表現(xiàn)。
- TensorFlow和PyTorch這類深度學(xué)習(xí)框架是支持自動梯度求解的,在TensorFlow2中只要將需要進(jìn)行梯度求解的代碼段包裹在GradientTape中,TensorFlow就會自動求解相關(guān)運(yùn)算的梯度。但是通過tape.gradient(loss, [w1, w2, ...])只能調(diào)用一次,梯度作為占用顯存較大的資源在被獲取一次后就會被釋放掉,要想多次調(diào)用需要設(shè)置tf.GradientTape(persistent=True)(此時注意及時釋放資源)。TensorFlow2也支持多階求導(dǎo),只要將求導(dǎo)進(jìn)行多層包裹即可。示例如下。
反向傳播
- 反向傳播算法(BP)是訓(xùn)練深度神經(jīng)網(wǎng)絡(luò)的核心算法,它的實(shí)現(xiàn)是基于鏈?zhǔn)椒▌t的。將輸出層的loss通過權(quán)值反向傳播(前向傳播的逆運(yùn)算)回第i層(這是個反復(fù)迭代返回的過程),計算i層的梯度更新參數(shù)。具體原理見之前的BP神經(jīng)網(wǎng)絡(luò)博客。
- 在TensorFlow2中,對于經(jīng)典的BP神經(jīng)網(wǎng)絡(luò)層進(jìn)行了封裝,稱為全連接層,自動完成BP神經(jīng)網(wǎng)絡(luò)隱層的操作。下面為使用Dense層構(gòu)建BP神經(jīng)網(wǎng)絡(luò)訓(xùn)練Fashion_MNIST數(shù)據(jù)集進(jìn)行識別的代碼。""" Author: Zhou Chen Date: 2019/10/15 Desc: About """ import tensorflow as tf from tensorflow import keras from tensorflow.keras import datasets, layers, optimizers, Sequential, metrics(x, y), (x_test, y_test) = datasets.fashion_mnist.load_data() print(x.shape, y.shape)def preprocess(x, y):x = tf.cast(x, dtype=tf.float32) / 255.y = tf.cast(y, dtype=tf.int32)return x, ybatch_size = 64 db = tf.data.Dataset.from_tensor_slices((x, y)) db = db.map(preprocess).shuffle(10000).batch(batch_size) db_test = tf.data.Dataset.from_tensor_slices((x_test, y_test)) db_test = db_test.map(preprocess).shuffle(10000).batch(batch_size)model = Sequential([layers.Dense(256, activation=tf.nn.relu), # [b, 784] => [b, 256]layers.Dense(128, activation=tf.nn.relu), # [b, 256] => [b, 128]layers.Dense(64, activation=tf.nn.relu), # [b, 128] => [b, 64]layers.Dense(32, activation=tf.nn.relu), # [b, 64] => [b, 32]layers.Dense(10), # [b, 32] => [b, 10] ]) model.build(input_shape=([None, 28*28])) optimizer = optimizers.Adam(lr=1e-3)def main():# forwardfor epoch in range(30):for step, (x, y) in enumerate(db):x = tf.reshape(x, [-1, 28*28])with tf.GradientTape() as tape:logits = model(x)y_onthot = tf.one_hot(y, depth=10)loss_mse = tf.reduce_mean(tf.losses.MSE(y_onthot, logits))loss_ce = tf.reduce_mean(tf.losses.categorical_crossentropy(y_onthot, logits, from_logits=True))grads = tape.gradient(loss_ce, model.trainable_variables)# backwardoptimizer.apply_gradients(zip(grads, model.trainable_variables))if step % 100 == 0:print(epoch, step, "loss:", float(loss_mse), float(loss_ce))# testtotal_correct, total_num = 0, 0for x, y in db_test:x = tf.reshape(x, [-1, 28*28])logits = model(x)prob = tf.nn.softmax(logits, axis=1)pred = tf.cast(tf.argmax(prob, axis=1), dtype=tf.int32)correct = tf.reduce_sum(tf.cast(tf.equal(pred, y), dtype=tf.int32))total_correct += int(correct)total_num += int(x.shape[0])acc = total_correct / total_numprint("acc", acc)if __name__ == '__main__':main()
訓(xùn)練可視化
- TensorFlow有一套伴生的可視化工具包TensorBoard(使用pip安裝, 最新版本的TensorFlow會自動安裝TensorBoard),它是基于Web端的方便監(jiān)控訓(xùn)練過程和訓(xùn)練數(shù)據(jù)的工具,監(jiān)控數(shù)據(jù)來源于本地磁盤指定的一個目錄。一般使用TensorBoard需要三步,創(chuàng)建log目錄,創(chuàng)建summary實(shí)例,指定數(shù)據(jù)給summary實(shí)例。
- tensorboard --logdir logs監(jiān)聽設(shè)定的log目錄,此時由于并沒有寫入文件,所以顯示如下。
- 后面兩步一般在訓(xùn)練過程中嵌入,示例如下。(注意,TensorBoard并沒有設(shè)計組合多個sample圖片而是一個個顯示,組合需要自己寫接口,下面的代碼就寫了這個接口。)""" Author: Zhou Chen Date: 2019/10/15 Desc: About """ import tensorflow as tf from tensorflow.keras import datasets, layers, optimizers, Sequential, metrics import datetime from matplotlib import pyplot as plt import iodef preprocess(x, y):x = tf.cast(x, dtype=tf.float32) / 255.y = tf.cast(y, dtype=tf.int32)return x, ydef plot_to_image(figure):# Save the plot to a PNG in memory.buf = io.BytesIO()plt.savefig(buf, format='png')# Closing the figure prevents it from being displayed directly inside the notebook.plt.close(figure)buf.seek(0)# Convert PNG buffer to TF imageimage = tf.image.decode_png(buf.getvalue(), channels=4)# Add the batch dimensionimage = tf.expand_dims(image, 0)return imagedef image_grid(images):"""Return a 5x5 grid of the MNIST images as a matplotlib figure."""# Create a figure to contain the plot.figure = plt.figure(figsize=(10, 10))for i in range(25):# Start next subplot.plt.subplot(5, 5, i + 1, title='name')plt.xticks([])plt.yticks([])plt.grid(False)plt.imshow(images[i], cmap=plt.cm.binary)return figurebatchsz = 128 (x, y), (x_val, y_val) = datasets.mnist.load_data() print('datasets:', x.shape, y.shape, x.min(), x.max())db = tf.data.Dataset.from_tensor_slices((x, y)) db = db.map(preprocess).shuffle(60000).batch(batchsz).repeat(10)ds_val = tf.data.Dataset.from_tensor_slices((x_val, y_val)) ds_val = ds_val.map(preprocess).batch(batchsz, drop_remainder=True)network = Sequential([layers.Dense(256, activation='relu'),layers.Dense(128, activation='relu'),layers.Dense(64, activation='relu'),layers.Dense(32, activation='relu'),layers.Dense(10)]) network.build(input_shape=(None, 28 * 28)) network.summary()optimizer = optimizers.Adam(lr=0.01)current_time = datetime.datetime.now().strftime("%Y%m%d-%H%M%S") log_dir = 'logs/' + current_time summary_writer = tf.summary.create_file_writer(log_dir)# get x from (x,y) sample_img = next(iter(db))[0] # get first image instance sample_img = sample_img[0] sample_img = tf.reshape(sample_img, [1, 28, 28, 1]) with summary_writer.as_default():tf.summary.image("Training sample:", sample_img, step=0)for step, (x, y) in enumerate(db):with tf.GradientTape() as tape:# [b, 28, 28] => [b, 784]x = tf.reshape(x, (-1, 28 * 28))# [b, 784] => [b, 10]out = network(x)# [b] => [b, 10]y_onehot = tf.one_hot(y, depth=10)# [b]loss = tf.reduce_mean(tf.losses.categorical_crossentropy(y_onehot, out, from_logits=True))grads = tape.gradient(loss, network.trainable_variables)optimizer.apply_gradients(zip(grads, network.trainable_variables))if step % 100 == 0:print(step, 'loss:', float(loss))with summary_writer.as_default():tf.summary.scalar('train-loss', float(loss), step=step)# evaluateif step % 500 == 0:total, total_correct = 0., 0for _, (x, y) in enumerate(ds_val):# [b, 28, 28] => [b, 784]x = tf.reshape(x, (-1, 28 * 28))# [b, 784] => [b, 10]out = network(x)# [b, 10] => [b]pred = tf.argmax(out, axis=1)pred = tf.cast(pred, dtype=tf.int32)# bool typecorrect = tf.equal(pred, y)# bool tensor => int tensor => numpytotal_correct += tf.reduce_sum(tf.cast(correct, dtype=tf.int32)).numpy()total += x.shape[0]print(step, 'Evaluate Acc:', total_correct / total)# print(x.shape)val_images = x[:25]val_images = tf.reshape(val_images, [-1, 28, 28, 1])with summary_writer.as_default():tf.summary.scalar('test-acc', float(total_correct / total), step=step)tf.summary.image("val-onebyone-images:", val_images, max_outputs=25, step=step)val_images = tf.reshape(val_images, [-1, 28, 28])figure = image_grid(val_images)tf.summary.image('val-images:', plot_to_image(figure), step=step)
- TensorBoard的反饋效果如下。
補(bǔ)充說明
- 本文主要針對TensorFlow2中l(wèi)ayers中Dense層以及反向傳播和訓(xùn)練可視化進(jìn)行了簡略說明。
- 博客同步至我的個人博客網(wǎng)站,歡迎瀏覽其他文章。
- 如有錯誤,歡迎指正。
總結(jié)
以上是生活随笔為你收集整理的TensorFlow2-神经网络训练的全部內(nèi)容,希望文章能夠幫你解決所遇到的問題。
- 上一篇: Linux服务-Samba文件服务器部署
- 下一篇: TensorFlow2-高层API接口K