日韩性视频-久久久蜜桃-www中文字幕-在线中文字幕av-亚洲欧美一区二区三区四区-撸久久-香蕉视频一区-久久无码精品丰满人妻-国产高潮av-激情福利社-日韩av网址大全-国产精品久久999-日本五十路在线-性欧美在线-久久99精品波多结衣一区-男女午夜免费视频-黑人极品ⅴideos精品欧美棵-人人妻人人澡人人爽精品欧美一区-日韩一区在线看-欧美a级在线免费观看

歡迎訪問 生活随笔!

生活随笔

當(dāng)前位置: 首頁 > 编程资源 > 编程问答 >内容正文

编程问答

05.序列模型 W3.序列模型和注意力机制(作业:机器翻译+触发词检测)

發(fā)布時(shí)間:2024/7/5 编程问答 30 豆豆
生活随笔 收集整理的這篇文章主要介紹了 05.序列模型 W3.序列模型和注意力机制(作业:机器翻译+触发词检测) 小編覺得挺不錯(cuò)的,現(xiàn)在分享給大家,幫大家做個(gè)參考.

文章目錄

  • 作業(yè)1:機(jī)器翻譯
    • 1. 日期轉(zhuǎn)換
      • 1.1 數(shù)據(jù)集
    • 2. 用注意力模型進(jìn)行機(jī)器翻譯
      • 2.1 注意力機(jī)制
    • 3. 可視化注意力
  • 作業(yè)2:觸發(fā)詞檢測
    • 1. 數(shù)據(jù)合成:創(chuàng)建語音數(shù)據(jù)集
      • 1.1 聽一下數(shù)據(jù)
      • 1.2 音頻轉(zhuǎn)頻譜
      • 1.3 生成一個(gè)訓(xùn)練樣本
      • 1.4 全部訓(xùn)練集
      • 1.5 開發(fā)集
    • 2. 模型
      • 2.1 建模
      • 2.2 訓(xùn)練
      • 2.3 測試模型
    • 3. 預(yù)測
      • 3.3 在開發(fā)集上測試
    • 4. 用自己的樣本測試

測試題:參考博文

筆記:W3.序列模型和注意力機(jī)制

作業(yè)1:機(jī)器翻譯

建立一個(gè)神經(jīng)元機(jī)器翻譯(NMT)模型來將人類可讀日期(25th of June, 2009)翻譯成機(jī)器可讀日期(“2009—06—25”)

將使用注意力模型來實(shí)現(xiàn)這一點(diǎn),這是最復(fù)雜的 序列到序列 模型之一

注意安裝包

pip install Faker==2.0.0 pip install babel
  • 導(dǎo)入包
from keras.layers import Bidirectional, Concatenate, Permute, Dot, Input, LSTM, Multiply from keras.layers import RepeatVector, Dense, Activation, Lambda from keras.optimizers import Adam from keras.utils import to_categorical from keras.models import load_model, Model import keras.backend as K import numpy as npfrom faker import Faker import random from tqdm import tqdm from babel.dates import format_date from nmt_utils import * import matplotlib.pyplot as plt %matplotlib inline

1. 日期轉(zhuǎn)換

模型將輸入以各種可能格式書寫的日期(例如"the 29th of August 1958", "03/30/1968", "24 JUNE 1987"),并將其轉(zhuǎn)換為標(biāo)準(zhǔn)化、機(jī)器可讀的日期(如 "1958-08-29", "1968-03-30", "1987-06-24")。我們將讓模型學(xué)習(xí)以通用機(jī)器可讀格式Y(jié)YYY-MM-DD輸出日期

1.1 數(shù)據(jù)集

  • 1萬條數(shù)據(jù)
m = 10000 dataset, human_vocab, machine_vocab, inv_machine_vocab = load_dataset(m)
  • 打印看看
dataset[:10]

輸出:

[('9 may 1998', '1998-05-09'),('10.11.19', '2019-11-10'),('9/10/70', '1970-09-10'),('saturday april 28 1990', '1990-04-28'),('thursday january 26 1995', '1995-01-26'),('monday march 7 1983', '1983-03-07'),('sunday may 22 1988', '1988-05-22'),('08 jul 2008', '2008-07-08'),('8 sep 1999', '1999-09-08'),('thursday january 1 1981', '1981-01-01')]

上面加載了:

  • dataset
  • human_vocab: 字典, human readable dates : an integer-valued index
  • machine_vocab: 字典, machine readable dates : an integer-valued index
  • inv_machine_vocab: 字典,machine_vocab的反向映射,indices : characters
Tx = 30 # 最大輸入長度,如果大了,就截?cái)?/span> Ty = 10 # 輸出日期長度 YYYY-MM-DD X, Y, Xoh, Yoh = preprocess_data(dataset, human_vocab, machine_vocab, Tx, Ty)print("X.shape:", X.shape) print("Y.shape:", Y.shape) print("Xoh.shape:", Xoh.shape) print("Yoh.shape:", Yoh.shape)

輸出:

X.shape: (10000, 30) Y.shape: (10000, 10) Xoh.shape: (10000, 30, 37) # 37 是 len(human_vocab) Yoh.shape: (10000, 10, 11) # 11 是 日期中的字符種類 0-9 和 ‘-’
  • 看看數(shù)據(jù)(數(shù)據(jù)不夠長度的,會(huì)補(bǔ)充 pad,所有 x 都是 30 長度)
index = 52 print("Source date:", dataset[index][0]) print("Target date:", dataset[index][1]) print() print("Source after preprocessing (indices):", X[index]) print("Target after preprocessing (indices):", Y[index]) print() print("Source after preprocessing (one-hot):", Xoh[index]) print("Target after preprocessing (one-hot):", Yoh[index])

輸出:

Source date: saturday october 9 1976 Target date: 1976-10-09Source after preprocessing (indices): [29 13 30 31 28 16 13 34 0 26 15 30 26 14 17 28 0 12 0 4 12 10 9 3636 36 36 36 36 36] Target after preprocessing (indices): [ 2 10 8 7 0 2 1 0 1 10]Source after preprocessing (one-hot): [[0. 0. 0. ... 0. 0. 0.][0. 0. 0. ... 0. 0. 0.][0. 0. 0. ... 0. 0. 0.]...[0. 0. 0. ... 0. 0. 1.][0. 0. 0. ... 0. 0. 1.][0. 0. 0. ... 0. 0. 1.]] Target after preprocessing (one-hot): [[0. 0. 1. 0. 0. 0. 0. 0. 0. 0. 0.][0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 1.][0. 0. 0. 0. 0. 0. 0. 0. 1. 0. 0.][0. 0. 0. 0. 0. 0. 0. 1. 0. 0. 0.][1. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.][0. 0. 1. 0. 0. 0. 0. 0. 0. 0. 0.][0. 1. 0. 0. 0. 0. 0. 0. 0. 0. 0.][1. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.][0. 1. 0. 0. 0. 0. 0. 0. 0. 0. 0.][0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 1.]]

2. 用注意力模型進(jìn)行機(jī)器翻譯

2.1 注意力機(jī)制


context<t>=∑t′=0Txα<t,t′>a<t′>context^{<t>} = \sum_{t' = 0}^{T_x} \alpha^{<t,t'>}a^{<t'>}context<t>=t=0Tx??α<t,t>a<t>

將輸入重復(fù)幾次:https://keras.io/zh/layers/core/#repeatvector
輸入張量通過 axis 軸串聯(lián)起來 https://keras.io/zh/layers/merge/#concatenate_1
https://keras.io/zh/layers/wrappers/#bidirectional

# Defined shared layers as global variables repeator = RepeatVector(Tx) concatenator = Concatenate(axis=-1) densor1 = Dense(10, activation = "tanh") densor2 = Dense(1, activation = "relu") activator = Activation(softmax, name='attention_weights') # We are using a custom softmax(axis = 1) loaded in this notebookdotor = Dot(axes = 1)
  • 注意力計(jì)算
# GRADED FUNCTION: one_step_attentiondef one_step_attention(a, s_prev):"""Performs one step of attention: Outputs a context vector computed as a dot product of the attention weights"alphas" and the hidden states "a" of the Bi-LSTM.Arguments:a -- hidden state output of the Bi-LSTM, numpy-array of shape (m, Tx, 2*n_a)s_prev -- previous hidden state of the (post-attention) LSTM, numpy-array of shape (m, n_s)Returns:context -- context vector, input of the next (post-attetion) LSTM cell"""### START CODE HERE #### Use repeator to repeat s_prev to be of shape (m, Tx, n_s) so that you can concatenate it with all hidden states "a" (≈ 1 line)s_prev = repeator(s_prev)# Use concatenator to concatenate a and s_prev on the last axis (≈ 1 line)concat = concatenator(inputs=[a, s_prev])# Use densor1 to propagate concat through a small fully-connected neural network to compute the "intermediate energies" variable e. (≈1 lines)e = densor1(concat)# Use densor2 to propagate e through a small fully-connected neural network to compute the "energies" variable energies. (≈1 lines)energies = densor2(e)# Use "activator" on "energies" to compute the attention weights "alphas" (≈ 1 line)alphas = activator(energies)# Use dotor together with "alphas" and "a" to compute the context vector to be given to the next (post-attention) LSTM-cell (≈ 1 line)context = dotor([alphas, a])### END CODE HERE ###return context n_a = 32 n_s = 64 post_activation_LSTM_cell = LSTM(n_s, return_state = True) output_layer = Dense(len(machine_vocab), activation=softmax) # GRADED FUNCTION: modeldef model(Tx, Ty, n_a, n_s, human_vocab_size, machine_vocab_size):"""Arguments:Tx -- length of the input sequenceTy -- length of the output sequencen_a -- hidden state size of the Bi-LSTMn_s -- hidden state size of the post-attention LSTMhuman_vocab_size -- size of the python dictionary "human_vocab"machine_vocab_size -- size of the python dictionary "machine_vocab"Returns:model -- Keras model instance"""# Define the inputs of your model with a shape (Tx,)# Define s0 and c0, initial hidden state for the decoder LSTM of shape (n_s,)X = Input(shape=(Tx, human_vocab_size))s0 = Input(shape=(n_s,), name='s0')c0 = Input(shape=(n_s,), name='c0')s = s0c = c0# Initialize empty list of outputsoutputs = []### START CODE HERE #### Step 1: Define your pre-attention Bi-LSTM. Remember to use return_sequences=True. (≈ 1 line)a = Bidirectional(LSTM(n_a, return_sequences=True))(X)# Step 2: Iterate for Ty stepsfor t in range(Ty):# Step 2.A: Perform one step of the attention mechanism to get back the context vector at step t (≈ 1 line)context = one_step_attention(a, s)# Step 2.B: Apply the post-attention LSTM cell to the "context" vector.# Don't forget to pass: initial_state = [hidden state, cell state] (≈ 1 line)s, _, c = post_activation_LSTM_cell(context, initial_state=[s, c])# Step 2.C: Apply Dense layer to the hidden state output of the post-attention LSTM (≈ 1 line)out = output_layer(s)# Step 2.D: Append "out" to the "outputs" list (≈ 1 line)outputs.append(out)# Step 3: Create model instance taking three inputs and returning the list of outputs. (≈ 1 line)model = Model(inputs=[X, s0, c0], outputs=outputs)### END CODE HERE ###return model
  • 定義模型
model = model(Tx, Ty, n_a, n_s, len(human_vocab), len(machine_vocab))
  • 定義優(yōu)化器、配置模型
### START CODE HERE ### (≈2 lines) opt = Adam(learning_rate=0.005, beta_1=0.9, beta_2=0.999,decay=0.01) model.compile(loss='categorical_crossentropy',optimizer=opt, metrics=['accuracy']) ### END CODE HERE ###
  • 訓(xùn)練
s0 = np.zeros((m, n_s)) c0 = np.zeros((m, n_s)) outputs = list(Yoh.swapaxes(0,1)) model.fit([Xoh, s0, c0], outputs, epochs=1, batch_size=100)
  • 為了節(jié)省時(shí)間,老師準(zhǔn)備好了訓(xùn)練好的權(quán)值
model.load_weights('models/model.h5') EXAMPLES = ['5th Otc 2019', '5 April 09', '21th of August 2016', 'Tue 10 Jul 2007', 'Saturday May 9 2018', 'March 3 2001', 'March 3rd 2001', '1 March 2001'] for example in EXAMPLES:source = string_to_int(example, Tx, human_vocab)source = np.array(list(map(lambda x: to_categorical(x, num_classes=len(human_vocab)), source))).swapaxes(0,1)source = source.transpose() #交換兩個(gè)軸source = np.expand_dims(source, axis=0) #增加一維軸prediction = model.predict([source, s0, c0])prediction = np.argmax(prediction, axis = -1)output = [inv_machine_vocab[int(i)] for i in prediction]print("source:", example)print("output:", ''.join(output))

輸出:

source: 5th Otc 2019 output: 2019-10-05 source: 5 April 09 output: 2009-04-05 source: 21th of August 2016 output: 2016-08-20 source: Tue 10 Jul 2007 output: 2007-07-10 source: Saturday May 9 2018 output: 2018-05-09 source: March 3 2001 output: 2001-03-03 source: March 3rd 2001 output: 2001-03-03 source: 1 March 2001 output: 2001-03-01

3. 可視化注意力

attention_map = plot_attention_map(model, human_vocab, inv_machine_vocab, "Tuesday 09 Oct 1993", num = 7, n_s = 64)


可以看出大部分的注意力用來預(yù)測年份

作業(yè)2:觸發(fā)詞檢測

  • 導(dǎo)入包
import numpy as np from pydub import AudioSegment import random import sys import io import os import glob import IPython from td_utils import * %matplotlib inline

1. 數(shù)據(jù)合成:創(chuàng)建語音數(shù)據(jù)集

1.1 聽一下數(shù)據(jù)

有正向音頻 activates(觸發(fā)詞)、負(fù)向音頻(非觸發(fā)詞)、背景噪聲

IPython.display.Audio("./raw_data/backgrounds/1.wav")

1.2 音頻轉(zhuǎn)頻譜

音頻為 44100 Hz 的,時(shí)長 10秒

x = graph_spectrogram("audio_examples/example_train.wav")



本作業(yè)訓(xùn)練樣本時(shí)長 10 秒,頻譜時(shí)間步為 5511,所以 Tx=5511T_x = 5511Tx?=5511

_, data = wavfile.read("audio_examples/example_train.wav") print("Time steps in audio recording before spectrogram", data[:,0].shape) print("Time steps in input after spectrogram", x.shape)

輸出:

Time steps in audio recording before spectrogram (441000,) Time steps in input after spectrogram (101, 5511)
  • 定義參數(shù)
Tx = 5511 # The number of time steps input to the model from the spectrogram n_freq = 101 # Number of frequencies input to the model at each time step of the spectrogram Ty = 1375 # The number of time steps in the output of our model

1.3 生成一個(gè)訓(xùn)練樣本

  • 隨機(jī)選擇10s 背景噪聲
  • 隨機(jī)插入 0-4 段 觸發(fā)詞音頻
  • 隨機(jī)插入 0-2 段 非觸發(fā)詞音頻
# Load audio segments using pydub activates, negatives, backgrounds = load_raw_audio()print("background len: " + str(len(backgrounds[0]))) # Should be 10,000, since it is a 10 sec clip print("activate[0] len: " + str(len(activates[0]))) # Maybe around 1000, since an "activate" audio clip is usually around 1 sec (but varies a lot) print("activate[1] len: " + str(len(activates[1]))) # Different "activate" clips can have different lengths

輸出:

background len: 10000 activate[0] len: 721 activate[1] len: 731
  • 獲取背景音頻中的隨機(jī)時(shí)間段
def get_random_time_segment(segment_ms):"""Gets a random time segment of duration segment_ms in a 10,000 ms audio clip.Arguments:segment_ms -- the duration of the audio clip in ms ("ms" stands for "milliseconds")Returns:segment_time -- a tuple of (segment_start, segment_end) in ms"""segment_start = np.random.randint(low=0, high=10000-segment_ms) # Make sure segment doesn't run past the 10sec background segment_end = segment_start + segment_ms - 1return (segment_start, segment_end)
  • 檢測插入的音頻是否重疊
# GRADED FUNCTION: is_overlappingdef is_overlapping(segment_time, previous_segments):"""Checks if the time of a segment overlaps with the times of existing segments.Arguments:segment_time -- a tuple of (segment_start, segment_end) for the new segmentprevious_segments -- a list of tuples of (segment_start, segment_end) for the existing segmentsReturns:True if the time segment overlaps with any of the existing segments, False otherwise"""segment_start, segment_end = segment_time### START CODE HERE ### (≈ 4 line)# Step 1: Initialize overlap as a "False" flag. (≈ 1 line)overlap = False# Step 2: loop over the previous_segments start and end times.# Compare start/end times and set the flag to True if there is an overlap (≈ 3 lines)for previous_start, previous_end in previous_segments:if previous_end >= segment_start and previous_start <= segment_end:overlap = True### END CODE HERE ###return overlap
  • 插入音頻
# GRADED FUNCTION: insert_audio_clipdef insert_audio_clip(background, audio_clip, previous_segments):"""Insert a new audio segment over the background noise at a random time step, ensuring that the audio segment does not overlap with existing segments.Arguments:background -- a 10 second background audio recording. audio_clip -- the audio clip to be inserted/overlaid. previous_segments -- times where audio segments have already been placedReturns:new_background -- the updated background audio"""# Get the duration of the audio clip in mssegment_ms = len(audio_clip)### START CODE HERE ### # Step 1: Use one of the helper functions to pick a random time segment onto which to insert # the new audio clip. (≈ 1 line)segment_time = get_random_time_segment(segment_ms)# Step 2: Check if the new segment_time overlaps with one of the previous_segments. If so, keep # picking new segment_time at random until it doesn't overlap. (≈ 2 lines)while is_overlapping(segment_time, previous_segments):segment_time = get_random_time_segment(segment_ms)# Step 3: Add the new segment_time to the list of previous_segments (≈ 1 line)previous_segments.append(segment_time)### END CODE HERE #### Step 4: Superpose audio segment and backgroundnew_background = background.overlay(audio_clip, position = segment_time[0])return new_background, segment_time
  • 插入標(biāo)簽 1
# GRADED FUNCTION: insert_onesdef insert_ones(y, segment_end_ms):"""Update the label vector y. The labels of the 50 output steps strictly after the end of the segment should be set to 1. By strictly we mean that the label of segment_end_y should be 0 while, the50 followinf labels should be ones.Arguments:y -- numpy array of shape (1, Ty), the labels of the training examplesegment_end_ms -- the end time of the segment in msReturns:y -- updated labels"""# duration of the background (in terms of spectrogram time-steps)segment_end_y = int(segment_end_ms * Ty / 10000.0)# Add 1 to the correct index in the background label (y)### START CODE HERE ### (≈ 3 lines)for i in range(segment_end_y+1, segment_end_y+51):if i < Ty:y[0, i] = 1### END CODE HERE ###return y
  • 合成訓(xùn)練數(shù)據(jù)
# GRADED FUNCTION: create_training_exampledef create_training_example(background, activates, negatives):"""Creates a training example with a given background, activates, and negatives.Arguments:background -- a 10 second background audio recordingactivates -- a list of audio segments of the word "activate"negatives -- a list of audio segments of random words that are not "activate"Returns:x -- the spectrogram of the training exampley -- the label at each time step of the spectrogram"""# Set the random seednp.random.seed(18)# Make background quieterbackground = background - 20### START CODE HERE #### Step 1: Initialize y (label vector) of zeros (≈ 1 line)y = np.zeros((1, Ty))# Step 2: Initialize segment times as empty list (≈ 1 line)previous_segments = []### END CODE HERE #### Select 0-4 random "activate" audio clips from the entire list of "activates" recordingsnumber_of_activates = np.random.randint(0, 5)random_indices = np.random.randint(len(activates), size=number_of_activates)random_activates = [activates[i] for i in random_indices]### START CODE HERE ### (≈ 3 lines)# Step 3: Loop over randomly selected "activate" clips and insert in backgroundfor random_activate in random_activates:# Insert the audio clip on the backgroundbackground, segment_time = insert_audio_clip(background, random_activate, previous_segments)# Retrieve segment_start and segment_end from segment_timesegment_start, segment_end = segment_time# Insert labels in "y"y = insert_ones(y, segment_end)### END CODE HERE #### Select 0-2 random negatives audio recordings from the entire list of "negatives" recordingsnumber_of_negatives = np.random.randint(0, 3)random_indices = np.random.randint(len(negatives), size=number_of_negatives)random_negatives = [negatives[i] for i in random_indices]### START CODE HERE ### (≈ 2 lines)# Step 4: Loop over randomly selected negative clips and insert in backgroundfor random_negative in random_negatives:# Insert the audio clip on the background background, _ = insert_audio_clip(background, random_negative, previous_segments)### END CODE HERE #### Standardize the volume of the audio clip background = match_target_amplitude(background, -20.0)# Export new training example file_handle = background.export("train" + ".wav", format="wav")print("File (train.wav) was saved in your directory.")# Get and plot spectrogram of the new recording (background with superposition of positive and negatives)x = graph_spectrogram("train.wav")return x, y x, y = create_training_example(backgrounds[0], activates, negatives)

plt.plot(y[0])

1.4 全部訓(xùn)練集

老師已經(jīng)處理完了所有數(shù)據(jù)

# Load preprocessed training examples X = np.load("./XY_train/X.npy") Y = np.load("./XY_train/Y.npy")

1.5 開發(fā)集

使用真人錄制的音頻

# Load preprocessed dev set examples X_dev = np.load("./XY_dev/X_dev.npy") Y_dev = np.load("./XY_dev/Y_dev.npy")

2. 模型

  • 導(dǎo)入包
from keras.callbacks import ModelCheckpoint from keras.models import Model, load_model, Sequential from keras.layers import Dense, Activation, Dropout, Input, Masking, TimeDistributed, LSTM, Conv1D from keras.layers import GRU, Bidirectional, BatchNormalization, Reshape from keras.optimizers import Adam

2.1 建模


模型先由一個(gè) 1維的卷積 來抽取一些特征,還可以加速GRU計(jì)算只需要處理 1375 個(gè)時(shí)間步,而不是5511個(gè)

注意:不要使用雙向RNN,我們需要檢測到觸發(fā)詞后馬上輸出動(dòng)作,如果使用雙向RNN,我們需要等待 10s 音頻被記錄下來,再判斷

  • 一些 Keras 參考

conv1d https://keras.io/zh/layers/convolutional/#conv1d

BN https://keras.io/zh/layers/normalization/#batchnormalization

GRU https://keras.io/zh/layers/recurrent/#gru

timedistributed https://keras.io/zh/layers/wrappers/#timedistributed

# GRADED FUNCTION: modeldef model(input_shape):"""Function creating the model's graph in Keras.Argument:input_shape -- shape of the model's input data (using Keras conventions)Returns:model -- Keras model instance"""X_input = Input(shape = input_shape)### START CODE HERE #### Step 1: CONV layer (≈4 lines)X = Conv1D(filters=196,kernel_size=15,strides=4)(X_input) # CONV1DX = BatchNormalization()(X) # Batch normalizationX = Activation('relu')(X) # ReLu activationX = Dropout(rate=0.8)(X) # dropout (use 0.8)# Step 2: First GRU Layer (≈4 lines)X = GRU(128, return_sequences=True)(X) # GRU (use 128 units and return the sequences)X = Dropout(rate=0.8)(X) # dropout (use 0.8)X = BatchNormalization()(X) # Batch normalization# Step 3: Second GRU Layer (≈4 lines)X = GRU(128, return_sequences=True)(X) # GRU (use 128 units and return the sequences)X = Dropout(rate=0.8)(X) # dropout (use 0.8)X = BatchNormalization()(X) # Batch normalizationX = Dropout(rate=0.8)(X) # dropout (use 0.8)# Step 4: Time-distributed dense layer (≈1 line)X = TimeDistributed(Dense(1, activation = "sigmoid"))(X) # time distributed (sigmoid)### END CODE HERE ###model = Model(inputs = X_input, outputs = X)return model model = model(input_shape = (Tx, n_freq)) model.summary()

輸出:

Model: "model_1" _________________________________________________________________ Layer (type) Output Shape Param # ================================================================= input_2 (InputLayer) (None, 5511, 101) 0 _________________________________________________________________ conv1d_2 (Conv1D) (None, 1375, 196) 297136 _________________________________________________________________ batch_normalization_2 (Batch (None, 1375, 196) 784 _________________________________________________________________ activation_2 (Activation) (None, 1375, 196) 0 _________________________________________________________________ dropout_2 (Dropout) (None, 1375, 196) 0 _________________________________________________________________ gru_2 (GRU) (None, 1375, 128) 124800 _________________________________________________________________ dropout_3 (Dropout) (None, 1375, 128) 0 _________________________________________________________________ batch_normalization_3 (Batch (None, 1375, 128) 512 _________________________________________________________________ gru_3 (GRU) (None, 1375, 128) 98688 _________________________________________________________________ dropout_4 (Dropout) (None, 1375, 128) 0 _________________________________________________________________ batch_normalization_4 (Batch (None, 1375, 128) 512 _________________________________________________________________ dropout_5 (Dropout) (None, 1375, 128) 0 _________________________________________________________________ time_distributed_1 (TimeDist (None, 1375, 1) 129 ================================================================= Total params: 522,561 Trainable params: 521,657 Non-trainable params: 904

2.2 訓(xùn)練

訓(xùn)練很費(fèi)時(shí),在4000個(gè)樣本上,老師已經(jīng)訓(xùn)練好了該模型

model = load_model('./models/tr_model.h5')

再用我們的數(shù)據(jù)集,訓(xùn)練1代

opt = Adam(lr=0.0001, beta_1=0.9, beta_2=0.999, decay=0.01) model.compile(loss='binary_crossentropy', optimizer=opt, metrics=["accuracy"]) model.fit(X, Y, batch_size = 5, epochs=1)

2.3 測試模型

loss, acc = model.evaluate(X_dev, Y_dev) print("Dev set accuracy = ", acc)

輸出:

25/25 [==============================] - 1s 46ms/step Dev set accuracy = 0.9427199959754944

但是 準(zhǔn)確率 在這里不是一個(gè)好的衡量標(biāo)準(zhǔn),因?yàn)榇蟛糠謽?biāo)簽都是0,都預(yù)測為0,準(zhǔn)確率也會(huì)很高,應(yīng)該用 F1值等

3. 預(yù)測

def detect_triggerword(filename):plt.subplot(2, 1, 1)x = graph_spectrogram(filename)# the spectogram outputs (freqs, Tx) and we want (Tx, freqs) to input into the modelx = x.swapaxes(0,1)x = np.expand_dims(x, axis=0)predictions = model.predict(x)plt.subplot(2, 1, 2)plt.plot(predictions[0,:,0])plt.ylabel('probability')plt.show()return predictions

一旦估計(jì)了在每個(gè)輸出步驟檢測到單詞“activate”的概率,當(dāng)概率高于某個(gè)閾值時(shí),您可以觸發(fā)“chiming”聲音播放。此外,在說“activate”之后,有很多個(gè) y 值可能接近1,但我們只想發(fā)出一次蜂鳴音。所以最多每75個(gè)輸出步驟插入一個(gè)蜂鳴音。這將有助于防止我們?yōu)椤癮ctivate”的單個(gè)實(shí)例插入兩個(gè)蜂鳴音。(這與計(jì)算機(jī)視覺的非最大值抑制類似)

chime_file = "audio_examples/chime.wav" def chime_on_activate(filename, predictions, threshold):audio_clip = AudioSegment.from_wav(filename)chime = AudioSegment.from_wav(chime_file)Ty = predictions.shape[1]# Step 1: Initialize the number of consecutive output steps to 0consecutive_timesteps = 0# Step 2: Loop over the output steps in the yfor i in range(Ty):# Step 3: Increment consecutive output stepsconsecutive_timesteps += 1# Step 4: If prediction is higher than the threshold and more than 75 consecutive output steps have passedif predictions[0,i,0] > threshold and consecutive_timesteps > 75:# Step 5: Superpose audio and background using pydubaudio_clip = audio_clip.overlay(chime, position = ((i / Ty) * audio_clip.duration_seconds)*1000)# Step 6: Reset consecutive output steps to 0consecutive_timesteps = 0audio_clip.export("chime_output.wav", format='wav')

3.3 在開發(fā)集上測試

  • 第一段語音,有1個(gè)觸發(fā)
filename = "./raw_data/dev/1.wav" prediction = detect_triggerword(filename) chime_on_activate(filename, prediction, 0.5) IPython.display.Audio("./chime_output.wav")

  • 第二段語音,有2個(gè)觸發(fā)

4. 用自己的樣本測試

# Preprocess the audio to the correct format def preprocess_audio(filename):# Trim or pad audio segment to 10000mspadding = AudioSegment.silent(duration=10000)segment = AudioSegment.from_wav(filename)[:10000]segment = padding.overlay(segment)# Set frame rate to 44100segment = segment.set_frame_rate(44100)# Export as wavsegment.export(filename, format='wav') your_filename = "audio_examples/my_audio.wav" preprocess_audio(your_filename) IPython.display.Audio(your_filename) # listen to the audio you uploaded chime_threshold = 0.5 prediction = detect_triggerword(your_filename) chime_on_activate(your_filename, prediction, chime_threshold) IPython.display.Audio("./chime_output.wav")


本文地址:https://michael.blog.csdn.net/article/details/108933798

我的CSDN博客地址 https://michael.blog.csdn.net/

長按或掃碼關(guān)注我的公眾號(hào)(Michael阿明),一起加油、一起學(xué)習(xí)進(jìn)步!

總結(jié)

以上是生活随笔為你收集整理的05.序列模型 W3.序列模型和注意力机制(作业:机器翻译+触发词检测)的全部內(nèi)容,希望文章能夠幫你解決所遇到的問題。

如果覺得生活随笔網(wǎng)站內(nèi)容還不錯(cuò),歡迎將生活随笔推薦給好友。