chatbot2 RNN语言模型
基于RNN的語言模型
RNN語言模型理論基礎
參考文獻
cbow/skip gram 的局限性####
- 解決方案
rnn模型細節
- 數學表示
一個輸入一個輸出的不是循環神經網絡。
RNN語言模型實踐
demo1
- 1A. 優化上一節課的RNN模型
在第一個版本里面,我們將上一節課的代碼包裝為Class,并且使用tensorflow 自帶的 rnn 實現 forward-propagation 功能
要討論如何使用交叉驗證
- 定義CharRNN 模型
和上一節課一樣,這一節課里,我們的RNN模型的輸入和輸出是同樣長度的序列,我們叫做char-level-RNN模型
下周我們將研究以句子為單位輸入輸出
BasicRNNCell是抽象類RNNCell的一個最簡單的實現。是tensorflow python自己的函數,這里面的rnn輸出會輸出兩次,一邊給外面,一邊給下一個cell
厚度為詞向量的維度。
class CharRNNLM(object):def __init__(self, batch_size, num_unrollings, vocab_size,hidden_size, embedding_size, learning_rate):"""Character-2-Character RNN 模型。這個模型的訓練數據是兩個相同長度的sequence,其中一個sequence是input,另外一個sequence是output。"""self.batch_size = batch_sizeself.num_unrollings = num_unrollingsself.hidden_size = hidden_sizeself.vocab_size = vocab_sizeself.embedding_size = embedding_sizeself.input_data = tf.placeholder(tf.int64, [self.batch_size, self.num_unrollings], name='inputs')self.targets = tf.placeholder(tf.int64, [self.batch_size, self.num_unrollings], name='targets')cell_fn = tf.nn.rnn_cell.BasicRNNCellparams = dict()cell = cell_fn(self.hidden_size, **params)with tf.name_scope('initial_state'):self.zero_state = cell.zero_state(self.batch_size, tf.float32)self.initial_state = tf.placeholder(tf.float32,[self.batch_size, cell.state_size],'initial_state')with tf.name_scope('embedding_layer'):## 定義詞向量參數,并通過查詢將輸入的整數序列每一個元素轉換為embedding向量# 如果提供了embedding的維度,我們聲明一個embedding參數,即詞向量參數矩陣# 否則,我們使用Identity矩陣作為詞向量參數矩陣#embedding一行一個次向量if embedding_size > 0:self.embedding = tf.get_variable('embedding', [self.vocab_size, self.embedding_size])else:self.embedding = tf.constant(np.eye(self.vocab_size), dtype=tf.float32)inputs = tf.nn.embedding_lookup(self.embedding, self.input_data)with tf.name_scope('slice_inputs'):# 我們將要使用static_rnn方法,需要將長度為num_unrolling的序列切割成# num_unrolling個單位,存在一個list里面,# 即,輸入格式為:# [ num_unrollings, (batch_size, embedding_size)]sliced_inputs = [tf.squeeze(input_, [1]) for input_ in tf.split(axis=1, num_or_size_splits=self.num_unrollings, value=inputs)]# 調用static_rnn方法,作forward propagation# 為方便閱讀,我們將static_rnn的注釋貼到這里#tf.nn.rnn創建一個展開圖的一個固定的網絡長度。這意味著,如果有200次輸入的步驟你與200步驟創建一個靜態的圖tf.nn.rnn RNN。# 首先,創建graphh較慢。第二,您無法傳遞比最初指定的更長的序列(> 200)。但是動態的tf.nn.dynamic_rnn解決這。當它被執行# 時,它使用循環來動態構建圖形。這意味著圖形創建速度更快,并且可以提供可變大小的批處理。#靜態rnn就是先建立完成所有的cell,再開始運行## 輸入:# inputs: A length T list of inputs, each a Tensor of shape [batch_size, input_size]# initial_state: An initial state for the RNN.# If cell.state_size is an integer, this must be a Tensor of appropriate# type and shape [batch_size, cell.state_size]# 輸出:# outputs: a length T list of outputs (one for each input), or a nested tuple of such elements.# state: the final stateoutputs, final_state = tf.nn.static_rnn(cell=cell,#我的rnn cell內部結構inputs=sliced_inputs,#輸入initial_state=self.initial_state)self.final_state = final_statewith tf.name_scope('flatten_outputs'):flat_outputs = tf.reshape(tf.concat(axis=1, values=outputs), [-1, hidden_size])with tf.name_scope('flatten_targets'):flat_targets = tf.reshape(tf.concat(axis=1, values=self.targets), [-1])with tf.variable_scope('softmax') as sm_vs:softmax_w = tf.get_variable('softmax_w', [hidden_size, vocab_size])softmax_b = tf.get_variable('softmax_b', [vocab_size])self.logits = tf.matmul(flat_outputs, softmax_w) + softmax_bself.probs = tf.nn.softmax(self.logits)with tf.name_scope('loss'):loss = tf.nn.sparse_softmax_cross_entropy_with_logits(logits=self.logits, labels=flat_targets)self.mean_loss = tf.reduce_mean(loss)with tf.name_scope('loss_montor'):count = tf.Variable(1.0, name='count')sum_mean_loss = tf.Variable(1.0, name='sum_mean_loss')self.reset_loss_monitor = tf.group(sum_mean_loss.assign(0.0),count.assign(0.0), name='reset_loss_monitor')self.update_loss_monitor = tf.group(sum_mean_loss.assign(sum_mean_loss + self.mean_loss),count.assign(count + 1), name='update_loss_monitor')with tf.control_dependencies([self.update_loss_monitor]):self.average_loss = sum_mean_loss / countself.ppl = tf.exp(self.average_loss)self.global_step = tf.get_variable('global_step', [], initializer=tf.constant_initializer(0.0))self.learning_rate = tf.placeholder(tf.float32, [], name='learning_rate')tvars = tf.trainable_variables()grads = tf.gradients(self.mean_loss, tvars)optimizer = tf.train.AdamOptimizer(self.learning_rate)self.train_op = optimizer.apply_gradients(zip(grads, tvars), global_step=self.global_step)# 運行一個epoch# 注意我們將session作為一個input argument# 參考下圖解釋def run_epoch(self, session, batch_generator, learning_rate, freq=10):epoch_size = batch_generator.num_batchesextra_op = self.train_opstate = self.zero_state.eval()self.reset_loss_monitor.run()batch_generator.reset_batch_pointer()start_time = time.time()for step in range(epoch_size):x, y = batch_generator.next_batch()ops = [self.average_loss, self.ppl, self.final_state, extra_op, self.global_step]feed_dict = {self.input_data: x, self.targets: y,self.initial_state: state,self.learning_rate: learning_rate}results = session.run(ops, feed_dict)# option 1. 將上一個 minibatch 的 final state# 作為下一個 minibatch 的 initial stateaverage_loss, ppl, state, _, global_step = results# option 2. 總是使用 0-tensor 作為下一個 minibatch 的 initial state# average_loss, ppl, final_state, _, global_step = resultsreturn ppl, global_step調用產生合成數據的module
from data.synthetic.synthetic_binary import gen_data演示variable scope的沖突
如果下面的code cell被連續調用兩次,則會有下述錯誤(注意reuse):
ValueError: Variable embedding already exists, disallowed. Did you mean to set reuse=True in VarScope? Originally defined at:
測試下效果
batch_size = 16 num_unrollings = 20 vocab_size = 2 hidden_size = 16 embedding_size = 16 learning_rate = 0.01model = CharRNNLM(batch_size, num_unrollings,vocab_size, hidden_size, embedding_size, learning_rate) dataset = gen_data(size = 1000000) batch_size = 16 seq_length = num_unrollings batch_generator = PredBatchGenerator(data_in = dataset[0],data_out = dataset[1],batch_size = batch_size,seq_length = seq_length) #batch_generator = BatchGenerator(dataset[0], batch_size, seq_length) session = tf.Session() with session.as_default():for epoch in range(1):session.run(tf.global_variables_initializer())ppl, global_step = model.run_epoch(session, batch_generator, learning_rate, freq=10)print(ppl)輸出
1.58694 1.59246 1.59855 1.59121 1.59335打印一下變量
all_vars = [node.name for node in tf.global_variables()]for var in all_vars:print(var) #打印結果 embedding:0 rnn/basic_rnn_cell/kernel:0 rnn/basic_rnn_cell/bias:0 softmax/softmax_w:0 softmax/softmax_b:0 loss_montor/count:0 loss_montor/sum_mean_loss:0 global_step:0 beta1_power:0 beta2_power:0 embedding/Adam:0 embedding/Adam_1:0 rnn/basic_rnn_cell/kernel/Adam:0 rnn/basic_rnn_cell/kernel/Adam_1:0 rnn/basic_rnn_cell/bias/Adam:0 rnn/basic_rnn_cell/bias/Adam_1:0 softmax/softmax_w/Adam:0 softmax/softmax_w/Adam_1:0 softmax/softmax_b/Adam:0 softmax/softmax_b/Adam_1:0改進: 如何cross-validation?
定義另一個CharRNN對象,使用validation數據計算ppl
tf.get_variable_scope().reuse_variables() valid_model = CharRNNLM(batch_size, num_unrollings,vocab_size, hidden_size, embedding_size, learning_rate)
畫重點:一個debug練習
ValueError: Variable embedding/Adam_2/ does not exist, or was not created with tf.get_variable(). Did you mean to set reuse=None in VarScope?
關鍵是Adam_2,參考上面的Variable 列表,注意Adam_1
我們創建validation(以及test)對象的時候,應該disable優化器
除此以外,我們添加summary功能,見第二版。
為了解決上述問題,加上參數
加上is_training參數
import time import numpy as np import tensorflow as tfclass CharRNNLM(object):def __init__(self, is_training, batch_size, num_unrollings, vocab_size,hidden_size, embedding_size, learning_rate):"""New arguments:is_training: 是否在訓練階段"""在訓練的時候才定義優化器,不訓練就不定義優化器了,也就不存在version1的錯誤了
# mark: 從version1到version2的更新:if is_training:tvars = tf.trainable_variables()grads = tf.gradients(self.mean_loss, tvars)optimizer = tf.train.AdamOptimizer(self.learning_rate)self.train_op = optimizer.apply_gradients(zip(grads, tvars), global_step=self.global_step)添加了summary功能
# mark: version 1 --> version 2# 增加總結summary,方便通過tensorboard觀察訓練過程average_loss_summary = tf.summary.scalar(name = 'average_loss', tensor = self.average_loss)ppl_summary = tf.summary.scalar(name = 'perplexity', tensor = self.ppl)self.summaries = tf.summary.merge(inputs = [average_loss_summary, ppl_summary], name='loss_monitor')- ppl-》Perplexity,有可計算出來的理論值嗎,不會為0。未必會單調收斂到一個值,訓練數據的會,但是test和valid的ppl不一定。可能會先下降再上升(過擬合)
- 當計算完損失函數后使用優化器
- char-rnn是特例,rnnlm思路上兩者一樣
- 調參:使用統一規則產生的train和valid,結果很好。
- dnn、rnn深度學習的機器人說的話未知,而模板機器人(規則)說的話可以預測到,固定領域的機器人
- 預測下一個字是什么??
總結
以上是生活随笔為你收集整理的chatbot2 RNN语言模型的全部內容,希望文章能夠幫你解決所遇到的問題。
- 上一篇: 双向特征融合的数据自适应SAR图像舰船目
- 下一篇: Git命令:常用Git命令集合