吴恩达深度学习课程第五章第二周编程作业(pytorch实现)
文章目錄
- 前言
- 一、詞向量運(yùn)算
- 1.數(shù)據(jù)準(zhǔn)備
- 2.余弦相似度
- 3.詞類類比
- 二、表情生成器V1
- 三、表情生成器V2
- 1.構(gòu)造嵌入層embedding_layer
- 2.Dataloader
- 3.構(gòu)造LSTM
- 4.模型訓(xùn)練
- 5.實(shí)驗(yàn)結(jié)果
前言
??本博客只是記錄一下本人在深度學(xué)習(xí)過程中的學(xué)習(xí)筆記和編程經(jīng)驗(yàn),大部分代碼是參考了【中文】【吳恩達(dá)課后編程作業(yè)】Course 5 - 序列模型 - 第二周作業(yè) - 詞向量的運(yùn)算與Emoji生成器這篇博客,對其代碼實(shí)現(xiàn)了復(fù)現(xiàn),但是原博客中代碼使用的是tensorflow,而我在學(xué)習(xí)中主要用到的是pytorch,所以此次作業(yè)我使用pytorch框架來完成。代碼或文字表述中還存在一些問題,請見諒,之前的博客也是主要參考這個(gè)大佬。下文中的完整代碼已經(jīng)上傳到百度網(wǎng)盤中,提取碼:00cz。
??所以開始作業(yè)前,請大家安裝好pytorch的環(huán)境,我代碼是在服務(wù)器上利用gpu加速運(yùn)行的,但是cpu版本的pytorch也能運(yùn)行,只是速度會(huì)比較慢。
一、詞向量運(yùn)算
1.數(shù)據(jù)準(zhǔn)備
??訓(xùn)練得到詞嵌入數(shù)據(jù)是需要消耗龐大的資源,這里我們就用已經(jīng)訓(xùn)練好的glove詞向量代替。
??讀取glove英文詞向量:
??查看英文詞向量:
words, word_to_vec_map = w2v_utils_pytorch.read_glove_vecs('data/glove.6B.50d.txt') print(word_to_vec_map["hello"]) [-0.38497 0.80092 0.064106 -0.28355 -0.026759 -0.34532 -0.64253 -0.11729 -0.33257 0.55243 -0.087813 0.9035 0.47102 0.56657 0.6985 -0.35229 -0.86542 0.90573 0.03576 -0.071705 -0.12327 0.54923 0.47005 0.35572 1.2611 -0.67581 -0.94983 0.68666 0.3871 -1.3492 0.63512 0.46416 -0.48814 0.83827 -0.9246 -0.33722 0.53741 -1.0616 -0.081403 -0.67111 0.30923 -0.3923 -0.55002 -0.68827 0.58049 -0.11626 0.013139 -0.57654 0.048833 0.67204 ]??glove中包含了40000個(gè)英文單詞的詞向量,每個(gè)詞向量的維度是50維。在了解數(shù)據(jù)的基本情況后,可以運(yùn)用這些詞向量做一些簡單的計(jì)算了。
2.余弦相似度
??根據(jù)余弦相似度的計(jì)算公式可以編程計(jì)算兩個(gè)詞的相似度情況,不清楚余弦相似度的可以自行百度:
def cosine_similarity(u, v):"""計(jì)算兩個(gè)詞向量的余弦相似度:param u:單詞u的詞向量:param v:單詞v的詞向量:return:"""dot = np.dot(u, v)norm_u = np.sqrt(np.sum(np.power(u, 2)))norm_v = np.sqrt(np.sum(np.power(v, 2)))distance = np.divide(dot, norm_v * norm_u)return distance??簡單計(jì)算一些詞的余弦相似度:
words, word_to_vec_map = w2v_utils_pytorch.read_glove_vecs('data/glove.6B.50d.txt')father = word_to_vec_map["father"] mother = word_to_vec_map["mother"] ball = word_to_vec_map["ball"] crocodile = word_to_vec_map["crocodile"] france = word_to_vec_map["france"] italy = word_to_vec_map["italy"] paris = word_to_vec_map["paris"] rome = word_to_vec_map["rome"]print("cosine_similarity(father, mother) = ", w2v_utils_pytorch.cosine_similarity(father, mother)) print("cosine_similarity(ball, crocodile) = ",w2v_utils_pytorch.cosine_similarity(ball, crocodile)) print("cosine_similarity(france - paris, rome - italy) = ",w2v_utils_pytorch.cosine_similarity(france - paris, rome - italy)) cosine_similarity(father, mother) = 0.8909038442893616 cosine_similarity(ball, crocodile) = 0.27439246261379424 cosine_similarity(france - paris, rome - italy) = -0.6751479308174201??可以看出約相似的詞,其詞向量在空間中的夾角越小,計(jì)算得到的余弦相似度就越大,這也說明了glove詞向量的質(zhì)量比較優(yōu)秀。
3.詞類類比
??當(dāng)我們擁有優(yōu)秀的詞向量后可以完成詞類類比任務(wù):“A與B相比就類似于C與____相比一樣”,比如:“男人與女人相比就像國王與 女皇 相比”。具體原理就是在詞典中找到一個(gè)詞D,使得vector(B)-vector(A) ≈\approx≈ vector(D)-vector?,依舊采用余弦公式計(jì)算兩者的相似度。
def complete_analogy(word_a, word_b, word_c, word_to_vec_map):"""詞類比問題:解決“A與B相比就類似于C與____相比一樣”問題,比如“男人與女人相比就像國王與 女皇 相比一樣”其實(shí)就是在詞庫里面找到一個(gè)詞word_d滿足:word_b - word-a 與 word_d - word_c 近似相等:param word_a:詞a:param word_b:詞b:param word_c:詞c:param word_to_vec_map:詞典:return:"""# 將單詞轉(zhuǎn)換為小寫word_a, word_b, word_c = word_a.lower(), word_b.lower(), word_c.lower()# 找到單詞的詞向量e_a, e_b, e_c = word_to_vec_map[word_a], word_to_vec_map[word_b], word_to_vec_map[word_c]words = word_to_vec_map.keys()max_cosine_similarity = -100best_word = None# 遍歷整個(gè)詞典for word in words:if word in [word_a, word_b, word_c]:continuecosine_sim = cosine_similarity((e_b - e_a), (word_to_vec_map[word] - e_c))if cosine_sim > max_cosine_similarity:max_cosine_similarity = cosine_simbest_word = wordreturn best_word??簡單測試一下:
triads_to_try = [('italy', 'italian', 'spain'), ('india', 'delhi', 'japan'), ('man', 'woman', 'boy'), ('small', 'smaller', 'large')] for triad in triads_to_try:print('{} -> {} <====> {} -> {}'.format(*triad, w2v_utils_pytorch.complete_analogy(*triad, word_to_vec_map))) italy -> italian <====> spain -> spanish india -> delhi <====> japan -> tokyo man -> woman <====> boy -> girl small -> smaller <====> large -> larger??可以看出,glove詞向量處理詞類類比任務(wù)時(shí)效果還是非常好的。
??原作業(yè)中提到了去除詞向量中的偏見屬于選學(xué)部分,本人還未完全理解,感興趣的同學(xué)可以參考我前言中原博客中的內(nèi)容。
二、表情生成器V1
??表情生成器其實(shí)就是情感分類,本質(zhì)上是多分類問題。在原作業(yè)中想要打印表情符號(hào)需要安裝emoji包,這里我簡化一下問題,只針對情感分類任務(wù)。
??我們首先使用一個(gè)簡單的前饋神經(jīng)網(wǎng)絡(luò)來完成這個(gè)分類任務(wù),網(wǎng)絡(luò)的結(jié)構(gòu)如下:
主控模型:
def model(X, Y, word_to_vec_map, learning_rate=0.01, num_iterations=400):np.random.seed(1)m = Y.shape[0]n_y = 5n_h = 50W = np.random.randn(n_y, n_h) / np.sqrt(n_h)b = np.zeros((n_y,))Y_oh = emo_utils.convert_to_one_hot(Y, C=n_y)for epoch in range(num_iterations):for i in range(m):avg = sentence_to_avg(X[i], word_to_vec_map)# 前向傳播z = np.dot(W, avg) + ba = emo_utils.softmax(z)# 計(jì)算第i個(gè)訓(xùn)練的損失cost = -np.sum(Y_oh[i] * np.log(a))# 計(jì)算梯度dz = a - Y_oh[i]dW = np.dot(dz.reshape(n_y, 1), avg.reshape(1, n_h))db = dz# 更新參數(shù)W = W - learning_rate * dWb = b - learning_rate * dbif epoch % 100 == 0:print("第{epoch}輪,損失為{cost}".format(epoch=epoch, cost=cost))pred = emo_utils.predict(X, Y, W, b, word_to_vec_map)return pred, W, b計(jì)算平均詞向量:
def sentence_to_avg(sentence, word_to_vec_map):"""將句子轉(zhuǎn)換為單詞列表,提取Glove向量,取平均值:param sentence: 輸入的句子:param word_to_vec_map: 詞典:return:"""# 將句子拆成單詞列表words = sentence.lower().split()# 初始化均值向量avg = np.zeros(50, )for w in words:avg = avg + word_to_vec_map[w]avg = np.divide(avg, len(words))return avg訓(xùn)練測試模型:
words, word_to_vec_map = w2v_utils_pytorch.read_glove_vecs('data/glove.6B.50d.txt') pred, W, b = model(X_train, Y_train, word_to_vec_map) print("=====訓(xùn)練集====") pred_train = emo_utils.predict(X_train, Y_train, W, b, word_to_vec_map) print("=====測試集====") pred_test = emo_utils.predict(X_test, Y_test, W, b, word_to_vec_map) X_my_sentences = np.array(["i adore you", "i love you", "funny lol", "lets play with a ball", "food is ready", "you are not happy"]) Y_my_labels = np.array([[0], [0], [2], [1], [4], [3]])pred = emo_utils.predict(X_my_sentences, Y_my_labels, W, b, word_to_vec_map) emo_utils.print_predictions(X_my_sentences, pred)訓(xùn)練結(jié)果如下:
第0輪,損失為1.952049881281007 Accuracy: 0.3484848484848485 第100輪,損失為0.07971818726014794 Accuracy: 0.9318181818181818 第200輪,損失為0.04456369243681379 Accuracy: 0.9545454545454546 第300輪,損失為0.03432267378786059 Accuracy: 0.9696969696969697 =====訓(xùn)練集==== Accuracy: 0.9772727272727273 =====測試集==== Accuracy: 0.8571428571428571i adore you ?? i love you ?? funny lol 😄 lets play with a ball ???可以看出經(jīng)過單層的全連接層訓(xùn)練就可以得到不錯(cuò)的結(jié)果,但是存在一些問題。由于模型的輸入只是簡單地將每個(gè)詞的詞向量做了一個(gè)平均,沒有考慮到順序?qū)渥拥挠绊?#xff0c;會(huì)得到一些完全錯(cuò)誤的結(jié)果:
you are not happy ??三、表情生成器V2
??在表情生成器V2中,我們用兩層的LSTM來完成同樣的情感分類任務(wù)。
1.構(gòu)造嵌入層embedding_layer
??構(gòu)造嵌入層的目的是能夠快速地將英文句子轉(zhuǎn)換成詞向量矩陣,首先是讀取glove詞向量數(shù)據(jù)文件:
def read_glove_vecs(glove_file):with open(glove_file, 'r', encoding='utf8') as f:words = set()word_to_vec_map = {}for line in f:line = line.strip().split()curr_word = line[0]words.add(curr_word)word_to_vec_map[curr_word] = np.array(line[1:], dtype=np.float64)i = 1words_to_index = {}index_to_words = {}for w in sorted(words):words_to_index[w] = iindex_to_words[i] = wi = i + 1return words_to_index, index_to_words, word_to_vec_map words_to_index:字典類型,完成單詞到序號(hào)的一個(gè)映射 index_to_words:字典類型,完成序號(hào)到單詞的一個(gè)映射 word_to_vec_map:字典類型,完成單詞到詞向量的一個(gè)映射??這里我們主要用到的是words_to_index,word_to_vec_map。我們通過words_to_index和word_to_vec_map構(gòu)造嵌入層:
def pretrained_embedding_layer(word_to_vec_map, word_to_index):"""創(chuàng)建embedding層,加載50維的GloVe向量:param word_to_vec_map::param word_to_index::return:"""vocab_len = len(word_to_index) + 1embedding_size = word_to_vec_map["cucumber"].shape[0]# 初始化嵌入矩陣embedding_matrix = np.zeros((vocab_len, embedding_size))for word, index in word_to_index.items():embedding_matrix[index, :] = word_to_vec_map[word]embedding_matrix = torch.Tensor(embedding_matrix)# 定義embedding層embedding_layer = torch.nn.Embedding.from_pretrained(embedding_matrix)return embedding_layer??pytorch提供了封裝好的嵌入層,只要將嵌入矩陣embedding_matrix傳入即可。從代碼可以看出embedding_matrix的維度為(單詞數(shù),詞向量維度),我們將句子拆分成單詞,將單詞轉(zhuǎn)換成對應(yīng)序號(hào),即可通過嵌入層找到句子中各個(gè)詞的詞向量:
words_to_index, index_to_words, word_to_vec_map = emo_utils.read_glove_vecs('data/glove.6B.50d.txt') embedding = pretrained_embedding_layer(word_to_vec_map, words_to_index) sentence = "i love you" words = sentence.split() words_index = [words_to_index[word] for word in words] words_index = torch.LongTensor(words_index) words_vec = embedding(words_index) words_vec2 = [word_to_vec_map[word] for word in words] tensor([[ 1.1891e-01, 1.5255e-01,...... 9.2121e-01],[-1.3886e-01, 1.1401e+00,......, 2.8980e-01],[-1.0919e-03, 3.3324e-01, ......, 1.1316e+00]]) [array([ 1.1891e-01, 1.5255e-01,......, 9.2121e-01]), array([-0.13886 , 1.1401 , ..... 0.2898 ]), array([-1.0919e-03, 3.3324e-01, ......, 1.1316e+00])]??可以看到,通過嵌入層去得到句子中每個(gè)詞的詞向量和直接得到詞向量的結(jié)果是一樣的,不同的是嵌入層得到的是tensor類型的數(shù)據(jù)。
2.Dataloader
??當(dāng)完成嵌入層后,我們可以根據(jù)訓(xùn)練數(shù)據(jù)封裝Dataloader:
class Sentence_Data(Dataset):def __init__(self, filename):super(Sentence_Data, self).__init__()self.max_len = 20data, label = emo_utils.read_csv(filename)self.label = torch.from_numpy(label)self.len = self.label.size()[0]words_to_index, index_to_words, word_to_vec_map = emo_utils.read_glove_vecs('data/glove.6B.50d.txt')self.embedding = self.pretrained_embedding_layer(word_to_vec_map, words_to_index)self.data = self.sentence_to_vec(data, words_to_index=words_to_index)def __getitem__(self, item):return self.data[item], self.label[item]def __len__(self):return self.lendef pretrained_embedding_layer(self, word_to_vec_map, word_to_index):"""創(chuàng)建embedding層,加載50維的GloVe向量:param word_to_vec_map::param word_to_index::return:"""vocab_len = len(word_to_index) + 1embedding_size = word_to_vec_map["cucumber"].shape[0]# 初始化嵌入矩陣embedding_matrix = np.zeros((vocab_len, embedding_size))for word, index in word_to_index.items():embedding_matrix[index, :] = word_to_vec_map[word]embedding_matrix = torch.Tensor(embedding_matrix)# 定義embedding層embedding_layer = torch.nn.Embedding.from_pretrained(embedding_matrix)return embedding_layerdef sentence_to_vec(self, data, words_to_index):vec_list = []for sentence in data:words_index = self.sentences_to_indices(sentence, words_to_index, self.max_len)words_index = torch.LongTensor(words_index)words_vec = self.embedding(words_index)vec_list.append(words_vec)return vec_listdef sentences_to_indices(self, x, words_to_index, max_len):"""輸入的是X(字符串句子列表),再轉(zhuǎn)化為對應(yīng)的句子列表:param x: 句子數(shù)組,維度為(m,1):param word_to_index: 字典類型,單詞到索引的映射:param max_len: 最大句長:return:"""X_indices = np.zeros(max_len)sentences_words = x.lower().split()j = 0for w in sentences_words:X_indices[j] = words_to_index[w]j += 1return X_indices??在讀取完訓(xùn)練數(shù)據(jù)后,將每個(gè)句子分詞并轉(zhuǎn)換成序號(hào)列表(sentences_to_indices),根據(jù)得到的嵌入層將序號(hào)列表轉(zhuǎn)換成向量(sentence_to_vec),這就完成了每個(gè)句子向量化。考慮到每個(gè)句子的長度不同,我們需要設(shè)置最大長度max_len(這里設(shè)置的20),若句子長度不足最大長度就用0向量來填充。所以每個(gè)句子都會(huì)得到一個(gè)(20,50)的矩陣,20表示的是最大句長,50表示的是詞向量的維度。
3.構(gòu)造LSTM
import torch class LSTM_EMO(torch.nn.Module):def __init__(self, input_size, num_classes):super(LSTM_EMO, self).__init__()self.lstm = torch.nn.LSTM(input_size=input_size, hidden_size=128, num_layers=2, dropout=0.5, batch_first=True)self.dropout = torch.nn.Dropout(0.5)self.fc = torch.nn.Linear(128, num_classes)self.softmax = torch.nn.Softmax(dim=1)def forward(self, x):out, (h_n, c_n) = self.lstm(x)out = self.dropout(h_n[-1])linear_out = self.fc(out)return linear_outdef predict(self, x):out, (h_n, c_n) = self.lstm(x)out = self.dropout(h_n[-1])linear_out = self.fc(out)y_pre = self.softmax(linear_out)return y_pre??根據(jù)神經(jīng)網(wǎng)絡(luò)結(jié)構(gòu)圖,我們需要搭建兩層的LSTM,取最后一層的最后輸出向量作為全連接層的輸入。
self.lstm = torch.nn.LSTM(input_size=input_size, hidden_size=128, num_layers=2, dropout=0.5, batch_first=True) inputsize:每個(gè)時(shí)刻輸入的維度,根據(jù)glove提供的詞向量,inputsize應(yīng)該為50 hidden_size:LSTM單元的隱藏層大小 num_layers:LSTM的層數(shù),這里設(shè)置了兩層LSTM dropout:為不同層之間設(shè)置dropout batch_first:與輸入數(shù)據(jù)的維度格式有關(guān), 當(dāng)batch_first為True時(shí)輸入的維度為(batch_size,句長,詞向量維度); 當(dāng)batch_first為False時(shí),輸入的維度為(句長,batch_size,詞向量維度)。??關(guān)于torch.nn.LSTM的輸出包含兩個(gè)部分output和(h_n, c_n):
output:每個(gè)時(shí)刻的輸出構(gòu)成的矩陣,維度應(yīng)為(批處理大小,句長,LSTM單元輸出維度) h_n:最后時(shí)刻隱藏層的輸出h,維度應(yīng)為(LSTM層數(shù),批處理大小,LSTM單元輸出維度) c_n:最后時(shí)刻LSTM單元的c,維度應(yīng)為(LSTM層數(shù),批處理大小,LSTM單元輸出維度)??我們應(yīng)該取第二層LSTM的輸出作為全連接層的輸入,即h_n[-1]。
4.模型訓(xùn)練
if __name__ == "__main__":# 初始化訓(xùn)練參數(shù)batch_size = 32epoch_nums = 1000learning_rate = 0.001costs = []input_size = 50num_classes = 5# 加載訓(xùn)練數(shù)據(jù)train_data = Sentence_Data(train_data_path)train_data_loader = DataLoader(train_data, shuffle=True, batch_size=32)# 初始化模型m = lstm_pytorch.LSTM_EMO(input_size=input_size, num_classes=num_classes)m.to(device)# 定義優(yōu)化器和損失函數(shù)loss_fn = torch.nn.CrossEntropyLoss().to(device)optimizer = torch.optim.Adam(m.parameters(), lr=learning_rate)# 開始訓(xùn)練print("learning_rate=" + str(learning_rate))for epoch in range(epoch_nums):cost = 0index = 0for data, label in train_data_loader:data, label = data.to(device), label.to(device)optimizer.zero_grad()y_pred = m.forward(data)loss = loss_fn(y_pred, label.long())loss.backward()optimizer.step()cost = cost + loss.cpu().detach().numpy()index = index + 1if epoch % 50 == 0:costs.append(cost / index)print("epoch=" + str(epoch) + ": " + "loss=" + str(cost / (index + 1)))??模型訓(xùn)練一般步驟:設(shè)置訓(xùn)練超參數(shù)->加載數(shù)據(jù)集->初始化模型->定義優(yōu)化器和損失函數(shù)->開始訓(xùn)練。
5.實(shí)驗(yàn)結(jié)果
??用pytorch復(fù)現(xiàn)時(shí)測試集上的準(zhǔn)確率并未達(dá)到原博客中的那么高,具體原因還在研究中:
epoch=700: loss=0.0019042102503590286 epoch=750: loss=0.0015947955350081127 epoch=800: loss=0.0009102935218834318 epoch=850: loss=0.0009600761889790496 epoch=900: loss=0.0004162280577778195 epoch=950: loss=0.0004672826180467382 訓(xùn)練集上準(zhǔn)確率為:1.0 測試集上準(zhǔn)確率為:0.83928573總結(jié)
以上是生活随笔為你收集整理的吴恩达深度学习课程第五章第二周编程作业(pytorch实现)的全部內(nèi)容,希望文章能夠幫你解決所遇到的問題。
- 上一篇: DO280介绍红帽OPENSHIFT容器
- 下一篇: dll注入代码注入