當(dāng)前位置：首頁 > 人工智能 > 循环神经网络 >内容正文

循环神经网络

通过keras例子理解LSTM 循环神经网络(RNN)

發(fā)布時間：2023/12/15 循环神经网络 31 豆豆

生活随笔收集整理的這篇文章主要介紹了通过keras例子理解LSTM 循环神经网络(RNN) 小編覺得挺不錯的,現(xiàn)在分享給大家,幫大家做個參考.

博文的翻譯和實踐：

Understanding Stateful LSTM Recurrent Neural Networks in Python with Keras

正文

一個強(qiáng)大而流行的循環(huán)神經(jīng)網(wǎng)絡(luò)(RNN)的變種是長短期模型網(wǎng)絡(luò)(LSTM)。

它使用廣泛，因為它的架構(gòu)克服了困擾著所有周期性的神經(jīng)網(wǎng)絡(luò)梯度消失和梯度爆炸的問題，允許創(chuàng)建非常大的、非常深的網(wǎng)絡(luò)。

與其他周期性的神經(jīng)網(wǎng)絡(luò)一樣，LSTM網(wǎng)絡(luò)保持狀態(tài)，在keras框架中實現(xiàn)這一點的細(xì)節(jié)可能會令人困惑。

在這篇文章中，您將會確切地了解到在LSTM網(wǎng)絡(luò)中，如何在LSTM深度學(xué)習(xí)庫中維護(hù)狀態(tài)。

　本文目標(biāo)：

怎么在keras上實現(xiàn)一個普通的lstm循環(huán)神經(jīng)網(wǎng)絡(luò)

在lstm中怎樣小心的利用好時間狀態(tài)特征

怎樣在lstm上實現(xiàn)狀態(tài)的預(yù)測

本文在一個很簡單的例子上說明lstm的使用和lstm的特點，通過對這個簡化例子的理解，可以幫助我們對一般的序列預(yù)測問題和序列預(yù)測問題有更高的理解和使用。
用到的庫：Keras 2.0.2,TensorFlow 1.0.1和Theano 0.9.0.

問題描述:學(xué)習(xí)字母

在本教程中，我們將開發(fā)和對比許多不同的LSTM循環(huán)神經(jīng)網(wǎng)絡(luò)模型。

這些比較的背景是學(xué)習(xí)字母表的一個簡單的序列預(yù)測問題。也就是說，根據(jù)字母表的字母，可以預(yù)測字母表的下一個字母。

這是一個簡單的序列預(yù)測問題，一旦被理解，就可以被推廣到其他的序列預(yù)測問題，如時間序列預(yù)測和序列分類。

讓我們用一些python代碼來準(zhǔn)備這個問題，我們可以從示例中重用這些代碼。

首先，讓我們導(dǎo)入本教程中計劃使用的所有類和函數(shù)。

import numpy from keras.models import Sequential from keras.layers import Dense from keras.layers import LSTM from keras.utils import np_utils

接下來，我們可以對隨機(jī)數(shù)生成器選定隨機(jī)數(shù)種子，以確保每次執(zhí)行代碼時結(jié)果都是相同的。

# fix random seed for reproducibility numpy.random.seed(7)

我們現(xiàn)在可以定義我們的數(shù)據(jù)集，字母表。為了便于閱讀，我們用大寫字母來定義字母表。

神經(jīng)網(wǎng)絡(luò)是對數(shù)字建模，因此我們需要將字母表中的字母映射到整數(shù)值（把字母映射為數(shù)字）。我們可以很容易地通過創(chuàng)建字母索引的字典(map)到字符。我們還可以創(chuàng)建一個反向查找，以便將預(yù)測轉(zhuǎn)換回字符，以便稍后使用。

# define the raw dataset alphabet = "ABCDEFGHIJKLMNOPQRSTUVWXYZ" # create mapping of characters to integers (0-25) and the reverse char_to_int = dict((c, i) for i, c in enumerate(alphabet)) int_to_char = dict((i, c) for i, c in enumerate(alphabet))

現(xiàn)在我們需要創(chuàng)建我們的輸入和輸出鍵值對來訓(xùn)練我們的神經(jīng)網(wǎng)絡(luò)。我們可以通過定義輸入序列長度，然后從輸入字母序列中讀取序列來實現(xiàn)這一點。

例如，我們使用的輸入長度是1。從原始輸入數(shù)據(jù)的開始，我們可以讀出第一個字母A和下一個字母“B”。我們沿著一個字符移動，直到我們到達(dá)一個“Z”的預(yù)測。

我們先創(chuàng)造這樣一個數(shù)據(jù)集，用一個字母，來預(yù)測下一個字母是什么。

# prepare the dataset of input to output pairs encoded as integers seq_length = 1 dataX = [] dataY = [] for i in range(0, len(alphabet) - seq_length, 1):seq_in = alphabet[i:i + seq_length]seq_out = alphabet[i + seq_length]dataX.append([char_to_int[char] for char in seq_in])dataY.append(char_to_int[seq_out])print seq_in, '->', seq_out

我們運行上面的代碼，來觀察現(xiàn)在我們的input和output數(shù)據(jù)集是這樣一種情況

A -> B B -> C C -> D D -> E E -> F F -> G G -> H H -> I I -> J J -> K K -> L L -> M M -> N N -> O O -> P P -> Q Q -> R R -> S S -> T T -> U U -> V V -> W W -> X X -> Y Y -> Z

input是一個一個字母的序列，output是一個一個的序列。
ok，就在這樣的數(shù)據(jù)集上來應(yīng)用我們的lstm?？纯磿惺裁唇Y(jié)果？

這時候dataX是一個一個用字母組成的序列，但是還要轉(zhuǎn)換一下格式，才能用到keras上。我們需要將NumPy數(shù)組重新構(gòu)造為LSTM網(wǎng)絡(luò)所期望的格式，即[samples示例, time steps時間步數(shù), features特征]。

# reshape X to be [samples, time steps, features] X = numpy.reshape(dataX, (len(dataX), seq_length, 1))

然后我們需要把我們的整數(shù)值歸一化到0～1的區(qū)間上，這是LSTM網(wǎng)絡(luò)使用的s形激活函數(shù)（sigmoid）的范圍。

# normalize X = X / float(len(alphabet))

最后，我們可以把這個問題看作是一個序列分類任務(wù)，其中26個字母代表一個不同的類。因此，我們用keras的內(nèi)置的 to_categorical()函數(shù)把輸出output(y)進(jìn)行 one－hot編碼(one-hot指n維單位向量a=(0,…,0,1,0,…,0))作為輸出層的結(jié)果。

# one hot encode the output variable y = np_utils.to_categorical(dataY)

現(xiàn)在我們已經(jīng)準(zhǔn)備好去訓(xùn)練不同的LSTM模型了。

　單字符——單字符的映射的簡單LSTM

讓我們從設(shè)計一個簡單的LSTM開始，學(xué)習(xí)如何根據(jù)一個字符的上下文來預(yù)測字母表中的下一個字符。

我們將定義這個問題為：一些單字母的隨機(jī)集合作為輸入，另一些單字母作為輸出，由輸入輸出對組成。正如我們所看到的，這對于LSTM來說是一個很難用來學(xué)習(xí)的結(jié)構(gòu)。

讓我們定義一個LSTM網(wǎng)絡(luò)，它有32個單元(the LSTM units are the “memory units” or you can just call them the neurons.)，一個輸出層，其中有一個softmax的激活函數(shù)來進(jìn)行預(yù)測。由于這是一個多類分類問題，所以我們可以使用在Keras中使用對數(shù)損失函數(shù)(稱為“分類交叉熵”(categorical_crossentropy))，并使用ADAM優(yōu)化函數(shù)對網(wǎng)絡(luò)進(jìn)行優(yōu)化。

該模型以500批次(epochs)，每批次數(shù)據(jù)輸入大小(batch)為1的形式訓(xùn)練

我們通過lstm在這個問題上的預(yù)測，會發(fā)現(xiàn)這對lstm循環(huán)網(wǎng)絡(luò)來說是很難解決的問題。

keras上LSTM用于上述問題的代碼如下：

# create and fit the model model = Sequential() model.add(LSTM(32, input_shape=(X.shape[1], X.shape[2]))) model.add(Dense(y.shape[1], activation='softmax')) model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy']) model.fit(X, y, nb_epoch=500, batch_size=1, verbose=2)

在我們訓(xùn)練模型之后，我們可以對整個訓(xùn)練集的性能進(jìn)行評估和總結(jié)。

# summarize performance of the model scores = model.evaluate(X, y, verbose=0) print("Model Accuracy: %.2f%%" % (scores[1]*100))

然后，我們可以通過網(wǎng)絡(luò)重新運行訓(xùn)練數(shù)據(jù)，并生成預(yù)測，將輸入和輸出對轉(zhuǎn)換回原來的字符格式，以獲得關(guān)于網(wǎng)絡(luò)如何了解問題的視覺效果。

# demonstrate some model predictions for pattern in dataX:x = numpy.reshape(pattern, (1, len(pattern), 1))x = x / float(len(alphabet))prediction = model.predict(x, verbose=0)index = numpy.argmax(prediction)result = int_to_char[index]seq_in = [int_to_char[value] for value in pattern]print seq_in, "->", result

我們可以看到，這個問題對于網(wǎng)絡(luò)來說確實是很困難的。
原因是可憐的lstm單元根本沒有可以利用的上下文章信息。
每個輸入輸出模式都以隨機(jī)的順序顯示在網(wǎng)絡(luò)中，并且網(wǎng)絡(luò)的狀態(tài)在每個模式之后被重置(每個批處理的每個批次包含一個模式)。

這是對LSTM網(wǎng)絡(luò)架構(gòu)的濫用，因為我們把它當(dāng)作了一個標(biāo)準(zhǔn)的多層感知器。

接下來，讓我們嘗試一個不同的問題框架，以便為網(wǎng)絡(luò)提供更多的序列來學(xué)習(xí)。

三字符特征——單字符的映射的簡單LSTM

在多層感知器中添加更多上下文最流行的方法是特征窗口方法(Feature Window method)。

即序列中的前面步驟的輸出被作為附加的輸入特性提供給網(wǎng)絡(luò)。我們可以用相同的技巧，為LSTM網(wǎng)絡(luò)提供更多的上下文。

在這里，我們將序列長度從1增加到3，例如:
我們把輸入從一個字符升到三個字符。

# prepare the dataset of input to output pairs encoded as integers seq_length = 3

就像這樣：

ABC -> D BCD -> E CDE -> F

然后將序列中的每個元素作為網(wǎng)絡(luò)的一個新輸入特性提供給它。這需要修改輸入序列在數(shù)據(jù)準(zhǔn)備步驟中的reshape:

# reshape X to be [samples, time steps, features] X = numpy.reshape(dataX, (len(dataX), 1, seq_length))

還需要對示例模式的reshape進(jìn)行修改，以展示模型的預(yù)測結(jié)果。

x = numpy.reshape(pattern, (1, 1, len(pattern)))

全部的代碼如下：

# Naive LSTM to learn three-char window to one-char mapping import numpy from keras.models import Sequential from keras.layers import Dense from keras.layers import LSTM from keras.utils import np_utils # fix random seed for reproducibility numpy.random.seed(7) # define the raw dataset alphabet = "ABCDEFGHIJKLMNOPQRSTUVWXYZ" # create mapping of characters to integers (0-25) and the reverse char_to_int = dict((c, i) for i, c in enumerate(alphabet)) int_to_char = dict((i, c) for i, c in enumerate(alphabet)) # prepare the dataset of input to output pairs encoded as integers seq_length = 3 dataX = [] dataY = [] for i in range(0, len(alphabet) - seq_length, 1):seq_in = alphabet[i:i + seq_length]seq_out = alphabet[i + seq_length]dataX.append([char_to_int[char] for char in seq_in])dataY.append(char_to_int[seq_out])print seq_in, '->', seq_out # reshape X to be [samples, time steps, features] X = numpy.reshape(dataX, (len(dataX), 1, seq_length)) # normalize X = X / float(len(alphabet)) # one hot encode the output variable y = np_utils.to_categorical(dataY) # create and fit the model model = Sequential() model.add(LSTM(32, input_shape=(X.shape[1], X.shape[2]))) model.add(Dense(y.shape[1], activation='softmax')) model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy']) model.fit(X, y, epochs=500, batch_size=1, verbose=2) # summarize performance of the model scores = model.evaluate(X, y, verbose=0) print("Model Accuracy: %.2f%%" % (scores[1]*100)) # demonstrate some model predictions for pattern in dataX:x = numpy.reshape(pattern, (1, 1, len(pattern)))x = x / float(len(alphabet))prediction = model.predict(x, verbose=0)index = numpy.argmax(prediction)result = int_to_char[index]seq_in = [int_to_char[value] for value in pattern]print seq_in, "->", result

運行結(jié)果如下：

Model Accuracy: 86.96% ['A', 'B', 'C'] -> D ['B', 'C', 'D'] -> E ['C', 'D', 'E'] -> F ['D', 'E', 'F'] -> G ['E', 'F', 'G'] -> H ['F', 'G', 'H'] -> I ['G', 'H', 'I'] -> J ['H', 'I', 'J'] -> K ['I', 'J', 'K'] -> L ['J', 'K', 'L'] -> M ['K', 'L', 'M'] -> N ['L', 'M', 'N'] -> O ['M', 'N', 'O'] -> P ['N', 'O', 'P'] -> Q ['O', 'P', 'Q'] -> R ['P', 'Q', 'R'] -> S ['Q', 'R', 'S'] -> T ['R', 'S', 'T'] -> U ['S', 'T', 'U'] -> V ['T', 'U', 'V'] -> Y ['U', 'V', 'W'] -> Z ['V', 'W', 'X'] -> Z ['W', 'X', 'Y'] -> Z

我們發(fā)現(xiàn)有了一點點的提升，但是這一點點的提升未必是真的，梯度下降算法本來就是具有隨機(jī)性的。

也就是說我們再一次的錯誤使用了lstm循環(huán)神經(jīng)網(wǎng)絡(luò)。
我們確實給了上下文，但是并不是合適的方式，
實際上，字母序列A-B-C才是一個特征的timesteps，而不是單獨ABC一個特征的timestep
我們已經(jīng)給網(wǎng)絡(luò)提供了更多的上下文，但并沒有像預(yù)期的那樣有更多的順序。

在下一節(jié)中，我們將以timesteps的形式為網(wǎng)絡(luò)提供更多的上下文。

keras實踐循環(huán)的正確打開方式！

在keras中，利用lstm的關(guān)鍵是以時間序列(time steps)的方法來提供上下文，而不是像其他網(wǎng)絡(luò)結(jié)構(gòu)(CNN)一樣，通過windowed features的方式。

這次我們還是采用這樣的訓(xùn)練方式

seq_length = 3

輸入輸出對(input-output pairs)

ABC -> D BCD -> E CDE -> F DEF -> G

我們這次唯一改變的地方是下面這里：

# reshape X to be [samples, time steps, features] X = numpy.reshape(dataX, (len(dataX), seq_length, 1))

timesteps這個參數(shù)，我們設(shè)置了3，而不是前面的1。

不同之處是，對輸入數(shù)據(jù)的reshape是將輸入序列作為一個特性的time step序列，而不是多個特性的單一time step。
也就是說我們把ABC 看成獨立的一個特征組成的多個時間序列，而不是把ABC看成一個多個特征組成一個時間序列。
?

這就是keras中LSTM循環(huán)神經(jīng)網(wǎng)絡(luò)的正確打開的方式。
我的理解是，這樣在訓(xùn)練 ABC——D的時候，BCD，CDE，都可以發(fā)揮作用。而最開始那種使用方法，只是利用了ABC——D這樣一個訓(xùn)練樣本。

完整代碼如下：

# Naive LSTM to learn three-char time steps to one-char mapping import numpy from keras.models import Sequential from keras.layers import Dense from keras.layers import LSTM from keras.utils import np_utils # fix random seed for reproducibility numpy.random.seed(7) # define the raw dataset alphabet = "ABCDEFGHIJKLMNOPQRSTUVWXYZ" # create mapping of characters to integers (0-25) and the reverse char_to_int = dict((c, i) for i, c in enumerate(alphabet)) int_to_char = dict((i, c) for i, c in enumerate(alphabet)) # prepare the dataset of input to output pairs encoded as integers seq_length = 3 dataX = [] dataY = [] for i in range(0, len(alphabet) - seq_length, 1):seq_in = alphabet[i:i + seq_length]seq_out = alphabet[i + seq_length]dataX.append([char_to_int[char] for char in seq_in])dataY.append(char_to_int[seq_out])print seq_in, '->', seq_out # reshape X to be [samples, time steps, features] X = numpy.reshape(dataX, (len(dataX), seq_length, 1)) # normalize X = X / float(len(alphabet)) # one hot encode the output variable y = np_utils.to_categorical(dataY) # create and fit the model model = Sequential() model.add(LSTM(32, input_shape=(X.shape[1], X.shape[2]))) model.add(Dense(y.shape[1], activation='softmax')) model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy']) model.fit(X, y, nb_epoch=500, batch_size=1, verbose=2) # summarize performance of the model scores = model.evaluate(X, y, verbose=0) print("Model Accuracy: %.2f%%" % (scores[1]*100)) # demonstrate some model predictions for pattern in dataX:x = numpy.reshape(pattern, (1, len(pattern), 1))x = x / float(len(alphabet))prediction = model.predict(x, verbose=0)index = numpy.argmax(prediction)result = int_to_char[index]seq_in = [int_to_char[value] for value in pattern]print seq_in, "->", result

最終的訓(xùn)練結(jié)果是

Model Accuracy: 100.00% ['A', 'B', 'C'] -> D ['B', 'C', 'D'] -> E ['C', 'D', 'E'] -> F ['D', 'E', 'F'] -> G ['E', 'F', 'G'] -> H ['F', 'G', 'H'] -> I ['G', 'H', 'I'] -> J ['H', 'I', 'J'] -> K ['I', 'J', 'K'] -> L ['J', 'K', 'L'] -> M ['K', 'L', 'M'] -> N ['L', 'M', 'N'] -> O ['M', 'N', 'O'] -> P ['N', 'O', 'P'] -> Q ['O', 'P', 'Q'] -> R ['P', 'Q', 'R'] -> S ['Q', 'R', 'S'] -> T ['R', 'S', 'T'] -> U ['S', 'T', 'U'] -> V ['T', 'U', 'V'] -> W ['U', 'V', 'W'] -> X ['V', 'W', 'X'] -> Y ['W', 'X', 'Y'] -> Z

它已經(jīng)學(xué)會了用字母表中的三個字母來預(yù)測下一個字母的順序。它可以顯示字母表中的任意三個字母的隨機(jī)序列，并預(yù)測下一個字母。

我們還沒有展示出循環(huán)神經(jīng)網(wǎng)絡(luò)的強(qiáng)大之處，因為上面這個問題我們用多層感知器，足夠多的神經(jīng)元，足夠多的迭代次數(shù)也可以很好的解決。（三層神經(jīng)網(wǎng)絡(luò)擬合任意可以表示的函數(shù)）

LSTM網(wǎng)絡(luò)是有狀態(tài)的。它們應(yīng)該能夠?qū)W習(xí)整個字母表序列，但是在默認(rèn)情況下，keras在每次訓(xùn)練之后重新設(shè)置網(wǎng)絡(luò)狀態(tài)。

那么接下來就是展示循環(huán)神經(jīng)網(wǎng)絡(luò)的獨到之處！！

一個批處理中的LSTM狀態(tài)

keras實現(xiàn)的LSTM在每一個batch以后，都重置了LSTM的狀態(tài)。

這表明，如果我們的批處理大小足夠容納所有輸入模式，如果所有輸入模式都按順序排序，LSTM就可以使用序列中的序列上下文來更好地學(xué)習(xí)序列。

通過修改第一個示例來學(xué)習(xí)一對一映射，并將批處理大小從1增加到訓(xùn)練數(shù)據(jù)集的大小，我們可以很容易地演示這一點。

此外，在每個epoch前，keras都重置了訓(xùn)練數(shù)據(jù)集。為了確保訓(xùn)練數(shù)據(jù)模式保持順序，我們可以禁用這種洗牌。

model.fit(X, y, epochs=5000, batch_size=len(dataX), verbose=2, shuffle=False)

該網(wǎng)絡(luò)將使用 within-batch批序列來學(xué)習(xí)字符的映射，但在進(jìn)行預(yù)測時，這個上下文將無法用于網(wǎng)絡(luò)。我們可以對網(wǎng)絡(luò)進(jìn)行評估，以確定網(wǎng)絡(luò)在隨機(jī)序列和順序序列的預(yù)測能力。

完整代碼如下：

Naive LSTM to learn one-char to one-char mapping with all data in each batch import numpy from keras.models import Sequential from keras.layers import Dense from keras.layers import LSTM from keras.utils import np_utils from keras.preprocessing.sequence import pad_sequences # fix random seed for reproducibility numpy.random.seed(7) # define the raw dataset alphabet = "ABCDEFGHIJKLMNOPQRSTUVWXYZ" # create mapping of characters to integers (0-25) and the reverse char_to_int = dict((c, i) for i, c in enumerate(alphabet)) int_to_char = dict((i, c) for i, c in enumerate(alphabet)) # prepare the dataset of input to output pairs encoded as integers seq_length = 1 dataX = [] dataY = [] for i in range(0, len(alphabet) - seq_length, 1):seq_in = alphabet[i:i + seq_length]seq_out = alphabet[i + seq_length]dataX.append([char_to_int[char] for char in seq_in])dataY.append(char_to_int[seq_out])print seq_in, '->', seq_out # convert list of lists to array and pad sequences if needed X = pad_sequences(dataX, maxlen=seq_length, dtype='float32') # reshape X to be [samples, time steps, features] X = numpy.reshape(dataX, (X.shape[0], seq_length, 1)) # normalize X = X / float(len(alphabet)) # one hot encode the output variable y = np_utils.to_categorical(dataY) # create and fit the model model = Sequential() model.add(LSTM(16, input_shape=(X.shape[1], X.shape[2]))) model.add(Dense(y.shape[1], activation='softmax')) model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy']) model.fit(X, y, epochs=5000, batch_size=len(dataX), verbose=2, shuffle=False) # summarize performance of the model scores = model.evaluate(X, y, verbose=0) print("Model Accuracy: %.2f%%" % (scores[1]*100)) # demonstrate some model predictions for pattern in dataX:x = numpy.reshape(pattern, (1, len(pattern), 1))x = x / float(len(alphabet))prediction = model.predict(x, verbose=0)index = numpy.argmax(prediction)result = int_to_char[index]seq_in = [int_to_char[value] for value in pattern]print seq_in, "->", result # demonstrate predicting random patterns print "Test a Random Pattern:" for i in range(0,20):pattern_index = numpy.random.randint(len(dataX))pattern = dataX[pattern_index]x = numpy.reshape(pattern, (1, len(pattern), 1))x = x / float(len(alphabet))prediction = model.predict(x, verbose=0)index = numpy.argmax(prediction)result = int_to_char[index]seq_in = [int_to_char[value] for value in pattern]print seq_in, "->", result

結(jié)果：

Model Accuracy: 100.00% ['A'] -> B ['B'] -> C ['C'] -> D ['D'] -> E ['E'] -> F ['F'] -> G ['G'] -> H ['H'] -> I ['I'] -> J ['J'] -> K ['K'] -> L ['L'] -> M ['M'] -> N ['N'] -> O ['O'] -> P ['P'] -> Q ['Q'] -> R ['R'] -> S ['S'] -> T ['T'] -> U ['U'] -> V ['V'] -> W ['W'] -> X ['X'] -> Y ['Y'] -> Z Test a Random Pattern: ['T'] -> U ['V'] -> W ['M'] -> N ['Q'] -> R ['D'] -> E ['V'] -> W ['T'] -> U ['U'] -> V ['J'] -> K ['F'] -> G ['N'] -> O ['B'] -> C ['M'] -> N ['F'] -> G ['F'] -> G ['P'] -> Q ['A'] -> B ['K'] -> L ['W'] -> X ['E'] -> F

正如我們所期望的那樣，網(wǎng)絡(luò)能夠使用 within-sequence的上下文來學(xué)習(xí)字母表，在訓(xùn)練數(shù)據(jù)上達(dá)到100%的準(zhǔn)確率。

重要的是，該網(wǎng)絡(luò)可以對隨機(jī)選擇的字符的下一個字母進(jìn)行準(zhǔn)確的預(yù)測。非常令人印象深刻。

單字符——單字符的映射的有狀態(tài)LSTM

我們已經(jīng)看到，我們可以將原始數(shù)據(jù)拆分為固定大小的序列，并且這種表示可以由LSTM來學(xué)習(xí)，且只需要學(xué)習(xí)3個字符到1個字符的隨機(jī)映射。

我們也看到，我們可以對批量的大小進(jìn)行限制，為網(wǎng)絡(luò)提供更多的序列，但只有在訓(xùn)練期間才行。

理想情況下，我們希望將網(wǎng)絡(luò)公開給整個序列，并讓它學(xué)習(xí)相互依賴關(guān)系，而不是在問題的框架中明確地定義這些依賴關(guān)系。

我們可以在keras中做到這一點，通過使LSTM層擁有狀態(tài)，并在epoch結(jié)束時手動重新設(shè)置網(wǎng)絡(luò)的狀態(tài)，這時也結(jié)束了訓(xùn)練整個序列的過程。

這才是LSTM網(wǎng)絡(luò)的真正用途。我們發(fā)現(xiàn)，如果允許網(wǎng)絡(luò)本身學(xué)習(xí)字符之間的依賴關(guān)系，我們只需要一個更小的網(wǎng)絡(luò)(一半的單位數(shù)量)和更少的訓(xùn)練期(幾乎是一半)。

首先我們需要將LSTM層定義為有狀態(tài)的。這樣做的話，我們必須顯式地指定批大小作為輸入形狀的一個維度。這也意味著當(dāng)我們評估網(wǎng)絡(luò)或用網(wǎng)絡(luò)進(jìn)行預(yù)測時，我們也必須指定并遵守相同的批大小?，F(xiàn)在這不是問題，因為我們使用的是批大小的1。這可能會在預(yù)測的時候帶來困難，因為當(dāng)批大小不是1時，預(yù)測需要按批進(jìn)行和按順序進(jìn)行。

batch_size = 1 model.add(LSTM(16, batch_input_shape=(batch_size, X.shape[1], X.shape[2]), stateful=True))

訓(xùn)練有狀態(tài)的LSTM的一個重要區(qū)別是，我們每次都手動地訓(xùn)練它，并且在每個時代之后重新設(shè)置狀態(tài)。我們可以在for循環(huán)中這樣做。同樣，我們不會對輸入進(jìn)行洗牌，保留輸入訓(xùn)練數(shù)據(jù)創(chuàng)建的順序。

for i in range(300):model.fit(X, y, epochs=1, batch_size=batch_size, verbose=2, shuffle=False)model.reset_states()

如前所述，在評估整個培訓(xùn)數(shù)據(jù)集的網(wǎng)絡(luò)性能時，我們指定批處理大小。

# summarize performance of the model scores = model.evaluate(X, y, batch_size=batch_size, verbose=0) model.reset_states() print("Model Accuracy: %.2f%%" % (scores[1]*100))

最后，我們可以證明網(wǎng)絡(luò)確實學(xué)會了整個字母表。我們可以用第一個字母A“A”來做輸入，獲得一個預(yù)測，把預(yù)測作為輸入反饋給它，然后把這個過程一直重復(fù)到“Z”。

# demonstrate some model predictions seed = [char_to_int[alphabet[0]]] for i in range(0, len(alphabet)-1):x = numpy.reshape(seed, (1, len(seed), 1))x = x / float(len(alphabet))prediction = model.predict(x, verbose=0)index = numpy.argmax(prediction)print int_to_char[seed[0]], "->", int_to_char[index]seed = [index] model.reset_states()

我們也可以看看這個網(wǎng)絡(luò)是否可以從任意的字母開始預(yù)測

# demonstrate a random starting point letter = "K" seed = [char_to_int[letter]] print "New start: ", letter for i in range(0, 5):x = numpy.reshape(seed, (1, len(seed), 1))x = x / float(len(alphabet))prediction = model.predict(x, verbose=0)index = numpy.argmax(prediction)print int_to_char[seed[0]], "->", int_to_char[index]seed = [index] model.reset_states()

完整代碼如下：

# Stateful LSTM to learn one-char to one-char mapping import numpy from keras.models import Sequential from keras.layers import Dense from keras.layers import LSTM from keras.utils import np_utils # fix random seed for reproducibility numpy.random.seed(7) # define the raw dataset alphabet = "ABCDEFGHIJKLMNOPQRSTUVWXYZ" # create mapping of characters to integers (0-25) and the reverse char_to_int = dict((c, i) for i, c in enumerate(alphabet)) int_to_char = dict((i, c) for i, c in enumerate(alphabet)) # prepare the dataset of input to output pairs encoded as integers seq_length = 1 dataX = [] dataY = [] for i in range(0, len(alphabet) - seq_length, 1):seq_in = alphabet[i:i + seq_length]seq_out = alphabet[i + seq_length]dataX.append([char_to_int[char] for char in seq_in])dataY.append(char_to_int[seq_out])print seq_in, '->', seq_out # reshape X to be [samples, time steps, features] X = numpy.reshape(dataX, (len(dataX), seq_length, 1)) # normalize X = X / float(len(alphabet)) # one hot encode the output variable y = np_utils.to_categorical(dataY) # create and fit the model batch_size = 1 model = Sequential() model.add(LSTM(16, batch_input_shape=(batch_size, X.shape[1], X.shape[2]), stateful=True)) model.add(Dense(y.shape[1], activation='softmax')) model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy']) for i in range(300):model.fit(X, y, epochs=1, batch_size=batch_size, verbose=2, shuffle=False)model.reset_states() # summarize performance of the model scores = model.evaluate(X, y, batch_size=batch_size, verbose=0) model.reset_states() print("Model Accuracy: %.2f%%" % (scores[1]*100)) # demonstrate some model predictions seed = [char_to_int[alphabet[0]]] for i in range(0, len(alphabet)-1):x = numpy.reshape(seed, (1, len(seed), 1))x = x / float(len(alphabet))prediction = model.predict(x, verbose=0)index = numpy.argmax(prediction)print int_to_char[seed[0]], "->", int_to_char[index]seed = [index] model.reset_states() # demonstrate a random starting point letter = "K" seed = [char_to_int[letter]] print "New start: ", letter for i in range(0, 5):x = numpy.reshape(seed, (1, len(seed), 1))x = x / float(len(alphabet))prediction = model.predict(x, verbose=0)index = numpy.argmax(prediction)print int_to_char[seed[0]], "->", int_to_char[index]seed = [index] model.reset_states()

output

Model Accuracy: 100.00% A -> B B -> C C -> D D -> E E -> F F -> G G -> H H -> I I -> J J -> K K -> L L -> M M -> N N -> O O -> P P -> Q Q -> R R -> S S -> T T -> U U -> V V -> W W -> X X -> Y Y -> Z New start: K K -> B B -> C C -> D D -> E E -> F

我們可以看到，網(wǎng)絡(luò)已經(jīng)完美地記住了整個字母表。它使用了樣本的上下文，并學(xué)習(xí)了預(yù)測序列中下一個字符所需要的依賴關(guān)系。

我們還可以看到，如果我們用第一個字母輸入網(wǎng)絡(luò)，它就能正確地對字母表的其他部分進(jìn)行正確的理解。

我們還可以看到，它只是從一個冷啟動開始，就學(xué)會了完整的字母表順序。當(dāng)要求預(yù)測“K”的下一個字母時，它會預(yù)測“B”，然后返回到整個字母表中。

為了真正地預(yù)測“K”，網(wǎng)絡(luò)的狀態(tài)需要被反復(fù)地從“A”到“J”的字母“加熱”。這告訴我們，我們也可以達(dá)到“無狀態(tài)”LSTM的效果，如果我們通過準(zhǔn)備形如下面的訓(xùn)練數(shù)據(jù)：

---a -> b --ab -> c -abc -> d abcd -> e

輸入序列固定在25(a-y，以預(yù)測z)的位置，并且模式以 zero-padding為前綴。

最后，這提出了另一個問題，即是否可以使用可變長度的輸入序列來訓(xùn)練LSTM網(wǎng)絡(luò)，以預(yù)測下一個字符。

可變長度輸入——單字符輸出的LSTM

在上一節(jié)中，我們發(fā)現(xiàn)keras的“有狀態(tài)的”LSTM實際上只是重新播放第一個n序列的一個快捷方式，并沒有真正學(xué)習(xí)一個通用的字母表模型。

在這一節(jié)中，我們將探索一個“無狀態(tài)”LSTM的變體，它學(xué)習(xí)了字母表中的隨機(jī)子序列，并可以根據(jù)任意字母或字母序列去預(yù)測字母表中的下一個字母。

首先，我們改變問題的框架。為了簡化，我們定義一個最大的輸入序列長度(maximum input sequence length)，并將其設(shè)置為5這樣的小值來加速訓(xùn)練。這就定義了(用于訓(xùn)練的字母表的)子序列的最大長度。在擴(kuò)展中，如果我們允許循環(huán)回到序列的開始，這就可以設(shè)置為完整的字母表(26)或更長。

我們還需要定義要創(chuàng)建的隨機(jī)序列的數(shù)量，在本例中為1000。這也可能是更多或更少。我希望實際需要的模式更少。

# prepare the dataset of input to output pairs encoded as integers num_inputs = 1000 max_len = 5 dataX = [] dataY = [] for i in range(num_inputs):start = numpy.random.randint(len(alphabet)-2)end = numpy.random.randint(start, min(start+max_len,len(alphabet)-1))sequence_in = alphabet[start:end+1]sequence_out = alphabet[end + 1]dataX.append([char_to_int[char] for char in sequence_in])dataY.append(char_to_int[sequence_out])print sequence_in, '->', sequence_out

輸入大概像這樣

PQRST -> U W -> X O -> P OPQ -> R IJKLM -> N QRSTU -> V ABCD -> E X -> Y GHIJ -> K

輸入序列的長度在1和maxlen之間變化，因此需要zero padding(零填充)。在這里，我們使用了left-hand-side (prefix) padding和 keras自帶的pad_sequences()函數(shù)。

X = pad_sequences(dataX, maxlen=max_len, dtype='float32')

訓(xùn)練模型在隨機(jī)選擇的輸入模式下進(jìn)行評估。這可以很容易地成為新的隨機(jī)生成的字符序列。我認(rèn)為，這也可以是一個線性序列，用“Ａ”作為單個字符輸入的輸出。

# LSTM with Variable Length Input Sequences to One Character Output import numpy from keras.models import Sequential from keras.layers import Dense from keras.layers import LSTM from keras.utils import np_utils from keras.preprocessing.sequence import pad_sequences from theano.tensor.shared_randomstreams import RandomStreams # fix random seed for reproducibility numpy.random.seed(7) # define the raw dataset alphabet = "ABCDEFGHIJKLMNOPQRSTUVWXYZ" # create mapping of characters to integers (0-25) and the reverse char_to_int = dict((c, i) for i, c in enumerate(alphabet)) int_to_char = dict((i, c) for i, c in enumerate(alphabet)) # prepare the dataset of input to output pairs encoded as integers num_inputs = 1000 max_len = 5 dataX = [] dataY = [] for i in range(num_inputs):start = numpy.random.randint(len(alphabet)-2)end = numpy.random.randint(start, min(start+max_len,len(alphabet)-1))sequence_in = alphabet[start:end+1]sequence_out = alphabet[end + 1]dataX.append([char_to_int[char] for char in sequence_in])dataY.append(char_to_int[sequence_out])print sequence_in, '->', sequence_out # convert list of lists to array and pad sequences if needed X = pad_sequences(dataX, maxlen=max_len, dtype='float32') # reshape X to be [samples, time steps, features] X = numpy.reshape(X, (X.shape[0], max_len, 1)) # normalize X = X / float(len(alphabet)) # one hot encode the output variable y = np_utils.to_categorical(dataY) # create and fit the model batch_size = 1 model = Sequential() model.add(LSTM(32, input_shape=(X.shape[1], 1))) model.add(Dense(y.shape[1], activation='softmax')) model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy']) model.fit(X, y, epochs=500, batch_size=batch_size, verbose=2) # summarize performance of the model scores = model.evaluate(X, y, verbose=0) print("Model Accuracy: %.2f%%" % (scores[1]*100)) # demonstrate some model predictions for i in range(20):pattern_index = numpy.random.randint(len(dataX))pattern = dataX[pattern_index]x = pad_sequences([pattern], maxlen=max_len, dtype='float32')x = numpy.reshape(x, (1, max_len, 1))x = x / float(len(alphabet))prediction = model.predict(x, verbose=0)index = numpy.argmax(prediction)result = int_to_char[index]seq_in = [int_to_char[value] for value in pattern]print seq_in, "->", result

output

Model Accuracy: 98.90% ['Q', 'R'] -> S ['W', 'X'] -> Y ['W', 'X'] -> Y ['C', 'D'] -> E ['E'] -> F ['S', 'T', 'U'] -> V ['G', 'H', 'I', 'J', 'K'] -> L ['O', 'P', 'Q', 'R', 'S'] -> T ['C', 'D'] -> E ['O'] -> P ['N', 'O', 'P'] -> Q ['D', 'E', 'F', 'G', 'H'] -> I ['X'] -> Y ['K'] -> L ['M'] -> N ['R'] -> T ['K'] -> L ['E', 'F', 'G'] -> H ['Q'] -> R ['Q', 'R', 'S'] -> T

我們可以看到，盡管這個模型沒有從隨機(jī)生成的子序列中完美地學(xué)習(xí)字母表，但它做得很好。該模型沒有進(jìn)行調(diào)整，可能需要更多的訓(xùn)練或更大的網(wǎng)絡(luò)，或者兩者都需要(為讀者提供一個練習(xí))。

這是一個很好的自然擴(kuò)展，對于“每個批處理中的所有順序輸入示例”，都可以在上面學(xué)到，它可以處理特殊的查詢，但是這一次是任意的序列長度(最多的是最大長度)。

總結(jié)

這篇文章你應(yīng)該學(xué)會了:

如何開發(fā)一個簡單的LSTM網(wǎng)絡(luò)，一個字符到一個字符的預(yù)測。
如何配置一個簡單的LSTM，以在一個示例中跨時間步驟學(xué)習(xí)一個序列。
如何配置LSTM來通過手動管理狀態(tài)來學(xué)習(xí)跨示例的序列。

總結(jié)

以上是生活随笔為你收集整理的通过keras例子理解LSTM 循环神经网络(RNN)的全部內(nèi)容，希望文章能夠幫你解決所遇到的問題。

如果覺得生活随笔網(wǎng)站內(nèi)容還不錯，歡迎將生活随笔推薦給好友。

上一篇： Win10自带虚拟机Hpyer-V怎么用
下一篇： matlab计算海洋浮力频率_帝国理工学