05.序列模型 W3.序列模型和注意力机制(作业:机器翻译+触发词检测)
文章目錄
- 作業(yè)1:機(jī)器翻譯
- 1. 日期轉(zhuǎn)換
- 1.1 數(shù)據(jù)集
- 2. 用注意力模型進(jìn)行機(jī)器翻譯
- 2.1 注意力機(jī)制
- 3. 可視化注意力
- 作業(yè)2:觸發(fā)詞檢測
- 1. 數(shù)據(jù)合成:創(chuàng)建語音數(shù)據(jù)集
- 1.1 聽一下數(shù)據(jù)
- 1.2 音頻轉(zhuǎn)頻譜
- 1.3 生成一個(gè)訓(xùn)練樣本
- 1.4 全部訓(xùn)練集
- 1.5 開發(fā)集
- 2. 模型
- 2.1 建模
- 2.2 訓(xùn)練
- 2.3 測試模型
- 3. 預(yù)測
- 3.3 在開發(fā)集上測試
- 4. 用自己的樣本測試
測試題:參考博文
筆記:W3.序列模型和注意力機(jī)制
作業(yè)1:機(jī)器翻譯
建立一個(gè)神經(jīng)元機(jī)器翻譯(NMT)模型來將人類可讀日期(25th of June, 2009)翻譯成機(jī)器可讀日期(“2009—06—25”)
將使用注意力模型來實(shí)現(xiàn)這一點(diǎn),這是最復(fù)雜的 序列到序列 模型之一
注意安裝包
pip install Faker==2.0.0 pip install babel- 導(dǎo)入包
1. 日期轉(zhuǎn)換
模型將輸入以各種可能格式書寫的日期(例如"the 29th of August 1958", "03/30/1968", "24 JUNE 1987"),并將其轉(zhuǎn)換為標(biāo)準(zhǔn)化、機(jī)器可讀的日期(如 "1958-08-29", "1968-03-30", "1987-06-24")。我們將讓模型學(xué)習(xí)以通用機(jī)器可讀格式Y(jié)YYY-MM-DD輸出日期
1.1 數(shù)據(jù)集
- 1萬條數(shù)據(jù)
- 打印看看
輸出:
[('9 may 1998', '1998-05-09'),('10.11.19', '2019-11-10'),('9/10/70', '1970-09-10'),('saturday april 28 1990', '1990-04-28'),('thursday january 26 1995', '1995-01-26'),('monday march 7 1983', '1983-03-07'),('sunday may 22 1988', '1988-05-22'),('08 jul 2008', '2008-07-08'),('8 sep 1999', '1999-09-08'),('thursday january 1 1981', '1981-01-01')]上面加載了:
- dataset
- human_vocab: 字典, human readable dates : an integer-valued index
- machine_vocab: 字典, machine readable dates : an integer-valued index
- inv_machine_vocab: 字典,machine_vocab的反向映射,indices : characters
輸出:
X.shape: (10000, 30) Y.shape: (10000, 10) Xoh.shape: (10000, 30, 37) # 37 是 len(human_vocab) Yoh.shape: (10000, 10, 11) # 11 是 日期中的字符種類 0-9 和 ‘-’- 看看數(shù)據(jù)(數(shù)據(jù)不夠長度的,會(huì)補(bǔ)充 pad,所有 x 都是 30 長度)
輸出:
Source date: saturday october 9 1976 Target date: 1976-10-09Source after preprocessing (indices): [29 13 30 31 28 16 13 34 0 26 15 30 26 14 17 28 0 12 0 4 12 10 9 3636 36 36 36 36 36] Target after preprocessing (indices): [ 2 10 8 7 0 2 1 0 1 10]Source after preprocessing (one-hot): [[0. 0. 0. ... 0. 0. 0.][0. 0. 0. ... 0. 0. 0.][0. 0. 0. ... 0. 0. 0.]...[0. 0. 0. ... 0. 0. 1.][0. 0. 0. ... 0. 0. 1.][0. 0. 0. ... 0. 0. 1.]] Target after preprocessing (one-hot): [[0. 0. 1. 0. 0. 0. 0. 0. 0. 0. 0.][0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 1.][0. 0. 0. 0. 0. 0. 0. 0. 1. 0. 0.][0. 0. 0. 0. 0. 0. 0. 1. 0. 0. 0.][1. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.][0. 0. 1. 0. 0. 0. 0. 0. 0. 0. 0.][0. 1. 0. 0. 0. 0. 0. 0. 0. 0. 0.][1. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.][0. 1. 0. 0. 0. 0. 0. 0. 0. 0. 0.][0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 1.]]2. 用注意力模型進(jìn)行機(jī)器翻譯
2.1 注意力機(jī)制
context<t>=∑t′=0Txα<t,t′>a<t′>context^{<t>} = \sum_{t' = 0}^{T_x} \alpha^{<t,t'>}a^{<t'>}context<t>=t′=0∑Tx??α<t,t′>a<t′>
將輸入重復(fù)幾次:https://keras.io/zh/layers/core/#repeatvector
輸入張量通過 axis 軸串聯(lián)起來 https://keras.io/zh/layers/merge/#concatenate_1
https://keras.io/zh/layers/wrappers/#bidirectional
- 注意力計(jì)算
- 定義模型
- 定義優(yōu)化器、配置模型
- 訓(xùn)練
- 為了節(jié)省時(shí)間,老師準(zhǔn)備好了訓(xùn)練好的權(quán)值
輸出:
source: 5th Otc 2019 output: 2019-10-05 source: 5 April 09 output: 2009-04-05 source: 21th of August 2016 output: 2016-08-20 source: Tue 10 Jul 2007 output: 2007-07-10 source: Saturday May 9 2018 output: 2018-05-09 source: March 3 2001 output: 2001-03-03 source: March 3rd 2001 output: 2001-03-03 source: 1 March 2001 output: 2001-03-013. 可視化注意力
attention_map = plot_attention_map(model, human_vocab, inv_machine_vocab, "Tuesday 09 Oct 1993", num = 7, n_s = 64)
可以看出大部分的注意力用來預(yù)測年份
作業(yè)2:觸發(fā)詞檢測
- 導(dǎo)入包
1. 數(shù)據(jù)合成:創(chuàng)建語音數(shù)據(jù)集
1.1 聽一下數(shù)據(jù)
有正向音頻 activates(觸發(fā)詞)、負(fù)向音頻(非觸發(fā)詞)、背景噪聲
IPython.display.Audio("./raw_data/backgrounds/1.wav")1.2 音頻轉(zhuǎn)頻譜
音頻為 44100 Hz 的,時(shí)長 10秒
x = graph_spectrogram("audio_examples/example_train.wav")
本作業(yè)訓(xùn)練樣本時(shí)長 10 秒,頻譜時(shí)間步為 5511,所以 Tx=5511T_x = 5511Tx?=5511
輸出:
Time steps in audio recording before spectrogram (441000,) Time steps in input after spectrogram (101, 5511)- 定義參數(shù)
1.3 生成一個(gè)訓(xùn)練樣本
- 隨機(jī)選擇10s 背景噪聲
- 隨機(jī)插入 0-4 段 觸發(fā)詞音頻
- 隨機(jī)插入 0-2 段 非觸發(fā)詞音頻
輸出:
background len: 10000 activate[0] len: 721 activate[1] len: 731- 獲取背景音頻中的隨機(jī)時(shí)間段
- 檢測插入的音頻是否重疊
- 插入音頻
- 插入標(biāo)簽 1
- 合成訓(xùn)練數(shù)據(jù)
1.4 全部訓(xùn)練集
老師已經(jīng)處理完了所有數(shù)據(jù)
# Load preprocessed training examples X = np.load("./XY_train/X.npy") Y = np.load("./XY_train/Y.npy")1.5 開發(fā)集
使用真人錄制的音頻
# Load preprocessed dev set examples X_dev = np.load("./XY_dev/X_dev.npy") Y_dev = np.load("./XY_dev/Y_dev.npy")2. 模型
- 導(dǎo)入包
2.1 建模
模型先由一個(gè) 1維的卷積 來抽取一些特征,還可以加速GRU計(jì)算只需要處理 1375 個(gè)時(shí)間步,而不是5511個(gè)
注意:不要使用雙向RNN,我們需要檢測到觸發(fā)詞后馬上輸出動(dòng)作,如果使用雙向RNN,我們需要等待 10s 音頻被記錄下來,再判斷
- 一些 Keras 參考
conv1d https://keras.io/zh/layers/convolutional/#conv1d
BN https://keras.io/zh/layers/normalization/#batchnormalization
GRU https://keras.io/zh/layers/recurrent/#gru
timedistributed https://keras.io/zh/layers/wrappers/#timedistributed
# GRADED FUNCTION: modeldef model(input_shape):"""Function creating the model's graph in Keras.Argument:input_shape -- shape of the model's input data (using Keras conventions)Returns:model -- Keras model instance"""X_input = Input(shape = input_shape)### START CODE HERE #### Step 1: CONV layer (≈4 lines)X = Conv1D(filters=196,kernel_size=15,strides=4)(X_input) # CONV1DX = BatchNormalization()(X) # Batch normalizationX = Activation('relu')(X) # ReLu activationX = Dropout(rate=0.8)(X) # dropout (use 0.8)# Step 2: First GRU Layer (≈4 lines)X = GRU(128, return_sequences=True)(X) # GRU (use 128 units and return the sequences)X = Dropout(rate=0.8)(X) # dropout (use 0.8)X = BatchNormalization()(X) # Batch normalization# Step 3: Second GRU Layer (≈4 lines)X = GRU(128, return_sequences=True)(X) # GRU (use 128 units and return the sequences)X = Dropout(rate=0.8)(X) # dropout (use 0.8)X = BatchNormalization()(X) # Batch normalizationX = Dropout(rate=0.8)(X) # dropout (use 0.8)# Step 4: Time-distributed dense layer (≈1 line)X = TimeDistributed(Dense(1, activation = "sigmoid"))(X) # time distributed (sigmoid)### END CODE HERE ###model = Model(inputs = X_input, outputs = X)return model model = model(input_shape = (Tx, n_freq)) model.summary()輸出:
Model: "model_1" _________________________________________________________________ Layer (type) Output Shape Param # ================================================================= input_2 (InputLayer) (None, 5511, 101) 0 _________________________________________________________________ conv1d_2 (Conv1D) (None, 1375, 196) 297136 _________________________________________________________________ batch_normalization_2 (Batch (None, 1375, 196) 784 _________________________________________________________________ activation_2 (Activation) (None, 1375, 196) 0 _________________________________________________________________ dropout_2 (Dropout) (None, 1375, 196) 0 _________________________________________________________________ gru_2 (GRU) (None, 1375, 128) 124800 _________________________________________________________________ dropout_3 (Dropout) (None, 1375, 128) 0 _________________________________________________________________ batch_normalization_3 (Batch (None, 1375, 128) 512 _________________________________________________________________ gru_3 (GRU) (None, 1375, 128) 98688 _________________________________________________________________ dropout_4 (Dropout) (None, 1375, 128) 0 _________________________________________________________________ batch_normalization_4 (Batch (None, 1375, 128) 512 _________________________________________________________________ dropout_5 (Dropout) (None, 1375, 128) 0 _________________________________________________________________ time_distributed_1 (TimeDist (None, 1375, 1) 129 ================================================================= Total params: 522,561 Trainable params: 521,657 Non-trainable params: 9042.2 訓(xùn)練
訓(xùn)練很費(fèi)時(shí),在4000個(gè)樣本上,老師已經(jīng)訓(xùn)練好了該模型
model = load_model('./models/tr_model.h5')再用我們的數(shù)據(jù)集,訓(xùn)練1代
opt = Adam(lr=0.0001, beta_1=0.9, beta_2=0.999, decay=0.01) model.compile(loss='binary_crossentropy', optimizer=opt, metrics=["accuracy"]) model.fit(X, Y, batch_size = 5, epochs=1)2.3 測試模型
loss, acc = model.evaluate(X_dev, Y_dev) print("Dev set accuracy = ", acc)輸出:
25/25 [==============================] - 1s 46ms/step Dev set accuracy = 0.9427199959754944但是 準(zhǔn)確率 在這里不是一個(gè)好的衡量標(biāo)準(zhǔn),因?yàn)榇蟛糠謽?biāo)簽都是0,都預(yù)測為0,準(zhǔn)確率也會(huì)很高,應(yīng)該用 F1值等
3. 預(yù)測
def detect_triggerword(filename):plt.subplot(2, 1, 1)x = graph_spectrogram(filename)# the spectogram outputs (freqs, Tx) and we want (Tx, freqs) to input into the modelx = x.swapaxes(0,1)x = np.expand_dims(x, axis=0)predictions = model.predict(x)plt.subplot(2, 1, 2)plt.plot(predictions[0,:,0])plt.ylabel('probability')plt.show()return predictions一旦估計(jì)了在每個(gè)輸出步驟檢測到單詞“activate”的概率,當(dāng)概率高于某個(gè)閾值時(shí),您可以觸發(fā)“chiming”聲音播放。此外,在說“activate”之后,有很多個(gè) y 值可能接近1,但我們只想發(fā)出一次蜂鳴音。所以最多每75個(gè)輸出步驟插入一個(gè)蜂鳴音。這將有助于防止我們?yōu)椤癮ctivate”的單個(gè)實(shí)例插入兩個(gè)蜂鳴音。(這與計(jì)算機(jī)視覺的非最大值抑制類似)
chime_file = "audio_examples/chime.wav" def chime_on_activate(filename, predictions, threshold):audio_clip = AudioSegment.from_wav(filename)chime = AudioSegment.from_wav(chime_file)Ty = predictions.shape[1]# Step 1: Initialize the number of consecutive output steps to 0consecutive_timesteps = 0# Step 2: Loop over the output steps in the yfor i in range(Ty):# Step 3: Increment consecutive output stepsconsecutive_timesteps += 1# Step 4: If prediction is higher than the threshold and more than 75 consecutive output steps have passedif predictions[0,i,0] > threshold and consecutive_timesteps > 75:# Step 5: Superpose audio and background using pydubaudio_clip = audio_clip.overlay(chime, position = ((i / Ty) * audio_clip.duration_seconds)*1000)# Step 6: Reset consecutive output steps to 0consecutive_timesteps = 0audio_clip.export("chime_output.wav", format='wav')3.3 在開發(fā)集上測試
- 第一段語音,有1個(gè)觸發(fā)
- 第二段語音,有2個(gè)觸發(fā)
4. 用自己的樣本測試
# Preprocess the audio to the correct format def preprocess_audio(filename):# Trim or pad audio segment to 10000mspadding = AudioSegment.silent(duration=10000)segment = AudioSegment.from_wav(filename)[:10000]segment = padding.overlay(segment)# Set frame rate to 44100segment = segment.set_frame_rate(44100)# Export as wavsegment.export(filename, format='wav') your_filename = "audio_examples/my_audio.wav" preprocess_audio(your_filename) IPython.display.Audio(your_filename) # listen to the audio you uploaded chime_threshold = 0.5 prediction = detect_triggerword(your_filename) chime_on_activate(your_filename, prediction, chime_threshold) IPython.display.Audio("./chime_output.wav")本文地址:https://michael.blog.csdn.net/article/details/108933798
我的CSDN博客地址 https://michael.blog.csdn.net/
長按或掃碼關(guān)注我的公眾號(hào)(Michael阿明),一起加油、一起學(xué)習(xí)進(jìn)步!
總結(jié)
以上是生活随笔為你收集整理的05.序列模型 W3.序列模型和注意力机制(作业:机器翻译+触发词检测)的全部內(nèi)容,希望文章能夠幫你解決所遇到的問題。
- 上一篇: LeetCode 1723. 完成所有工
- 下一篇: LeetCode 1686. 石子游戏