當前位置：首頁 > 编程资源 > 编程问答 >内容正文

编程问答

【李宏毅机器学习HW2】

發布時間：2023/12/14 编程问答 38 豆豆

生活随笔收集整理的這篇文章主要介紹了【李宏毅机器学习HW2】小編覺得挺不錯的,現在分享給大家,幫大家做個參考.

按照自己的計劃，以后應該會一兩個星期完成一個作業，目標是盡量都達到boss baseline吧，能參考的資料也挺多的，但如果只是學會掉包肯定是不行的，所以以后也會花時間總結一下原理。除了算法思想和如何構造外，我認為代碼能力也很重要，但現在能看到的代碼注解還是較少，包括助教給的代碼有些注釋不算很詳細，所以以后不懂的函數也會總結總結，然后注解好發到GitHub上

文章目錄

前言
一、過strongbaseline所需
- - 1、concat_nframes
  - 2、小細節方面
  - 3、余弦退火
二、后續改進
- 1.RNN（Recurrent Neural Network）循環神經網絡
- 2.LSTM(Long Short-term Memory )
- 3.LSTM+CRF
- 4.問題
總結
參考

前言

這篇文章是我關于李宏毅機器學習2022年版第二次作業的一些想法，記錄自己的學習歷程，主要參考內容附在文末。
然后這是我發的第一篇博客，以后會堅持更新一些關于李宏毅作業和視覺方面作業的解答，歡迎關注我的博客和GitHub

HW2作業任務：
這次作業來自于語音辨識的一部分，他需要我們根據已有音頻材料預測音位，而數據預處理部分是：從原始波形中提取mfcc特征（已經由助教完成了），然后我們則需要以此分類：即使用預先提取的mfcc特征進行幀級音素分類。
音位分類預測（Phoneme classification）是通過語音數據，預測音位。音位（phoneme），是人類某一種語言中能夠區別意義的最小語音單位，是音位學分析的基礎概念。每種語言都有一套自己的音位系統。
要求如下

水平準確率

simple	0.45797
medium	0.69747
strong	0.75028
boss	0.82324

下面是對助教代碼的改動和方法的總結

一、過strongbaseline所需

1、concat_nframes

首先按照助教提示由于一個phoneme 不會只有一個frame（幀）訓練時接上前后的frame會得到較好的結果，這里將concat_nframes調大，這里前后接對稱數量，例如concat_n = 19 則前后都接9個frame。

2、小細節方面

添加了Batch Normalization 和 dropout
關于Batch Normalization的優點：
鏈接: link
關于weight decay
鏈接: link
關于dropout
鏈接: link

3、余弦退火

參考：鏈接: link
官網鏈接為鏈接: link
這里我們添加以下代碼

optimizer = torch.optim.AdamW(model.parameters(), lr=learning_rate,weight_decay=0.01) scheduler = torch.optim.lr_scheduler.CosineAnnealingWarmRestarts(optimizer,T_0=8,T_mult=2,eta_min = learning_rate/2)

余弦退火學習率公式為

函數用法如下

torch.optim.lr_scheduler.CosineAnnealingWarmRestarts(optimizer, T_0, T_mult=1, eta_min=0, last_epoch=- 1, verbose=False)

T_0為初始時的周期，準確來說是學習率從最大值到下一個最大值所需epoch，之后所需的epoch為上一個步驟的T_mult倍。而eta_min為最小學習率。last_epoch是最后一個epoch的索引，默認為-1。verbose為True時可以自動輸出每一個epoch時學習率為多少。
我們通過以下代碼來驗證

import torch import torch.nn as nn from torch.optim.lr_scheduler import CosineAnnealingWarmRestarts import matplotlib.pyplot as plt #隨便構造一個簡單的模型 class Simple_Model(nn.Module):def __init__(self):super(Simple_Model, self).__init__()self.conv1 = nn.Conv2d(in_channels=3,out_channels=3,kernel_size=1)def forward(self,x):passlearning_rate = 0.0001 model = Simple_Model() optimizer = torch.optim.Adam(model.parameters(), lr=learning_rate) #設置好我們用在代碼中的參數 scheduler = torch.optim.lr_scheduler.CosineAnnealingWarmRestarts(optimizer,T_0=8,T_mult=2,eta_min = learning_rate/2)print('初始時的學習率',optimizer.defaults['lr']) lr_get = [] #保存lr來作圖for epoch in range(1,100):# trainoptimizer.zero_grad()optimizer.step()lr_get.append(optimizer.param_groups[0]['lr'])scheduler.step()#CosineAnnealingWarmRestarts畫學習率的變化 plt.plot(list(range(1,100)),lr_get) plt.xlabel('epoch') plt.ylabel('learning_rate') plt.title('How CosineAnnealingWarmRestart goes') plt.show()

結果如圖:

最后通過以上步驟我們的模型能達到下面的結果

這里是過strongbaseline全部代碼(改下地址就能跑，可以在kaggle上或者自己下數據集在本地跑)：

# Preparing Dataimport os import random import pandas as pd import torch from tqdm import tqdm# 以下幾個函數都是用于concat_feat拼接的 def load_feat(path):feat = torch.load(path)return featdef shift(x, n):if n < 0:left = x[0].repeat(-n, 1)right = x[:n]elif n > 0:right = x[-1].repeat(n, 1)left = x[n:]else:return xreturn torch.cat((left, right), dim=0)#一個phoneme 不會只有一個frame（幀）訓練時接上前后的frame會得到較好的結果 #這里前后接對稱數量，例如concat_n = 11 則前后都接5 def concat_feat(x, concat_n):assert concat_n % 2 == 1 # n #為奇數if concat_n < 2:return xseq_len, feature_dim = x.size(0), x.size(1)x = x.repeat(1, concat_n)x = x.view(seq_len, concat_n, feature_dim).permute(1, 0, 2) # concat_n, seq_len, feature_dimmid = (concat_n // 2)for r_idx in range(1, mid+1):x[mid + r_idx, :] = shift(x[mid + r_idx], r_idx)x[mid - r_idx, :] = shift(x[mid - r_idx], -r_idx)return x.permute(1, 0, 2).view(seq_len, concat_n * feature_dim)# x = torch.tensor([[ 1, 2, 3], # [ 4, 5, 6], # [ 7, 8, 9], # [10, 11, 12]]) # y = concat_feat(x , 3) # print(y)def preprocess_data(split, feat_dir, phone_path, concat_nframes, train_ratio=0.8, train_val_seed=1337):class_num = 41 # NOTE: pre-computed, should not need changemode = 'train' if (split == 'train' or split == 'val') else 'test'label_dict = {}if mode != 'test':phone_file = open(os.path.join(phone_path, f'{mode}_labels.txt')).readlines()#print(os.path.join(phone_path, f'{mode}_labels.txt'))for line in phone_file:line = line.strip('\n').split(' ')label_dict[line[0]] = [int(p) for p in line[1:]]if split == 'train' or split == 'val':# split training and validation datausage_list = open(os.path.join(phone_path, 'train_split.txt')).readlines()random.seed(train_val_seed)random.shuffle(usage_list)percent = int(len(usage_list) * train_ratio)usage_list = usage_list[:percent] if split == 'train' else usage_list[percent:]elif split == 'test':usage_list = open(os.path.join(phone_path, 'test_split.txt')).readlines()else:raise ValueError('Invalid \'split\' argument for dataset: PhoneDataset!')#得到每一個音頻代號usage_list = [line.strip('\n') for line in usage_list]print('[Dataset] - # phone classes: ' + str(class_num) + ', number of utterances for ' + split + ': ' + str(len(usage_list)))max_len = 3000000X = torch.empty(max_len, 39 * concat_nframes)if mode != 'test':y = torch.empty(max_len, dtype=torch.long)#將音頻數據讀取出來 X為特征 y為labelidx = 0for i, fname in tqdm(enumerate(usage_list)):feat = load_feat(os.path.join(feat_dir, mode, f'{fname}.pt'))cur_len = len(feat)feat = concat_feat(feat, concat_nframes)if mode != 'test':label = torch.LongTensor(label_dict[fname])X[idx: idx + cur_len, :] = featif mode != 'test':y[idx: idx + cur_len] = labelidx += cur_lenX = X[:idx, :]if mode != 'test':y = y[:idx]print(f'[INFO] {split} set')print(X.shape)if mode != 'test':print(y.shape)return X, yelse:return X#Define Datasetimport torch from torch.utils.data import Dataset from torch.utils.data import DataLoaderclass LibriDataset(Dataset):def __init__(self, X, y=None):self.data = Xif y is not None:self.label = torch.LongTensor(y)else:self.label = Nonedef __getitem__(self, idx):if self.label is not None:return self.data[idx], self.label[idx]else:return self.data[idx]def __len__(self):return len(self.data)# Define Modelimport torch import torch.nn as nn import torch.nn.functional as Fclass BasicBlock(nn.Module):def __init__(self, input_dim, output_dim):super(BasicBlock, self).__init__()self.block = nn.Sequential(nn.Linear(input_dim, output_dim),nn.ReLU(),nn.BatchNorm1d(output_dim,eps=1e-05, momentum=0.1, affine=True), # num_features：來自期望輸入的特征數，C from an expected input of size (N,C,L) or L from input of size (N,L) # eps：為保證數值穩定性（分母不能趨近或取0）,給分母加上的值。默認為1e-5。 # momentum：動態均值和動態方差所使用的動量。默認為0.1。 # affine：一個布爾值，當設為true，給該層添加可學習的仿射變換參數。nn.Dropout(0.35),)def forward(self, x):x = self.block(x)return xclass Classifier(nn.Module):def __init__(self, input_dim, output_dim=41, hidden_layers=1, hidden_dim=256):super(Classifier, self).__init__()self.fc = nn.Sequential(BasicBlock(input_dim, hidden_dim),*[BasicBlock(hidden_dim, hidden_dim) for _ in range(hidden_layers)],nn.Linear(hidden_dim, output_dim))def forward(self, x):x = self.fc(x)return x#超參數 ## Hyper-parameters# data prarameters concat_nframes = 19 # the number of frames to concat with, n must be odd (total 2k+1 = n frames) train_ratio = 0.95 # the ratio of data used for training, the rest will be used for validation # 百萬級數據集的訓練集驗證集劃分 # 一種常見的啟發式策略是將整體30%的數據用作測試集,這適用于總體數據量規模一般的情況 # （比如100至10,000個樣本）。但在大數據時期，分配比例會發生變化， # 如100萬數據時，98%(訓練)1%（驗證)1%（測試），超百萬時，95%（訓練)/2.5%（驗證)2.5%（測試) # -《Machine Learning Yearning》 Andrew Ng# training parameters seed = 0 # random seed batch_size = 1024 # batch size （512） num_epoch = 100 # the number of training epoch learning_rate = 0.0001 # learning rate model_path = './model.ckpt' # the path where the checkpoint will be saved# model parameters input_dim = 39 * concat_nframes # the input dim of the model, you should not change the value hidden_layers = 2 # the number of hidden layers hidden_dim =1024 # the hidden dim#對垃圾進行回收所需調用的函數 ## Prepare dataset and model import gc# preprocess data train_X, train_y = preprocess_data(split='train', feat_dir='F:\kaggle\HW2\libriphone\libriphone\\feat', phone_path='F:\kaggle\HW2\libriphone\libriphone', concat_nframes=concat_nframes, train_ratio=train_ratio) val_X, val_y = preprocess_data(split='val', feat_dir='F:\kaggle\HW2\libriphone\libriphone\\feat', phone_path='F:\kaggle\HW2\libriphone\libriphone', concat_nframes=concat_nframes, train_ratio=train_ratio)# get dataset train_set = LibriDataset(train_X, train_y) val_set = LibriDataset(val_X, val_y)# remove raw feature to save memory del train_X, train_y, val_X, val_y gc.collect()# get dataloader train_loader = DataLoader(train_set, batch_size=batch_size, shuffle=True) val_loader = DataLoader(val_set, batch_size=batch_size, shuffle=False) device = 'cuda:0' if torch.cuda.is_available() else 'cpu' print(f'DEVICE: {device}')import numpy as np#fix seed def same_seeds(seed):torch.manual_seed(seed)if torch.cuda.is_available():torch.cuda.manual_seed(seed)torch.cuda.manual_seed_all(seed)np.random.seed(seed)torch.backends.cudnn.benchmark = Falsetorch.backends.cudnn.deterministic = True# fix random seed same_seeds(seed)# create model, define a loss function, and optimizer model = Classifier(input_dim=input_dim, hidden_layers=hidden_layers, hidden_dim=hidden_dim).to(device) criterion = nn.CrossEntropyLoss() optimizer = torch.optim.AdamW(model.parameters(), lr=learning_rate,weight_decay=0.01) scheduler = torch.optim.lr_scheduler.CosineAnnealingWarmRestarts(optimizer,T_0=8,T_mult=2,eta_min = learning_rate/2)# # # import torchsummary # torchsummary.summary(model, input_size=(input_dim,)) # TorchSummary提供了更詳細的信息分析，包括模塊信息（每一層的類型、輸出shape和參數量） # 、模型整體的參數量、模型大小、一次前向或者反向傳播需要的內存大小等。 #ncol 設置輸出寬度## Training best_acc = 0.0 early_stop_count = 0 early_stopping = 8 for epoch in range(num_epoch):train_acc = 0.0train_loss = 0.0val_acc = 0.0val_loss = 0.0# trainingmodel.train() # set the model to training modepbar = tqdm(train_loader, ncols=110) #用于可視化進度pbar.set_description(f'T: {epoch + 1}/{num_epoch}')samples = 0for i, batch in enumerate(pbar):features, labels = batchfeatures = features.to(device)labels = labels.to(device)optimizer.zero_grad()outputs = model(features)# optimizer.zero_grad()# 函數會遍歷模型的所有參數，，清空上一次的梯度記錄。loss = criterion(outputs, labels) #設定判別損失函數loss.backward() #執行反向傳播，更新梯度optimizer.step() #執行參數更新# 關于上述函數的講解 https://blog.csdn.net/PanYHHH/article/details/107361827?ops_request_misc=%257B%2522request%255Fid%2522%253A%2522166523672216782391838079%2522%252C%2522scm%2522%253A%252220140713.130102334..%2522%257D&request_id=166523672216782391838079&biz_id=0&utm_medium=distribute.pc_search_result.none-task-blog-2~all~top_positive~default-1-107361827-null-null.142^v52^control,201^v3^add_ask&utm_term=optimizer.step%28%29&spm=1018.2226.3001.4187_, train_pred = torch.max(outputs, 1) # get the index of the class with the highest probabilitycorrect = (train_pred.detach() == labels.detach()).sum().item()# t.item()將Tensor變量轉換為python標量（int float等），其中t是一個Tensor變量，只能是標量，轉換后dtype與Tensor的dtype一致# detach 該參數的requires_grad 屬性設置為False,這樣之后的反向傳播時就不會更新它train_acc += correctsamples += labels.size(0)train_loss += loss.item()lr = optimizer.param_groups[0]["lr"]# 可視化進度條的參數設置pbar.set_postfix({'lr': lr, 'batch acc': correct / labels.size(0),'acc': train_acc / samples, 'loss': train_loss / (i + 1)})scheduler.step() #用于更新學習率# 各個情況下的 .step() 一般都是用來更新參數的pbar.close() #清空并關閉進度條（progress bar）# validationif len(val_set) > 0:model.eval() # set the model to evaluation mode#用于將模型變為評估模式，而不是訓練模式，這樣batchNorm層，dropout層等用于優化訓練而添加的網絡層會被關閉，從而使得評估時不會發生偏移。with torch.no_grad():pbar = tqdm(val_loader, ncols=110)pbar.set_description(f'V: {epoch + 1}/{num_epoch}')samples = 0for i, batch in enumerate(pbar):features, labels = batch #取出一個batch中的特征和標簽features = features.to(device)labels = labels.to(device)outputs = model(features) #得到預測結果loss = criterion(outputs, labels)_, val_pred = torch.max(outputs, 1) # get the index of the class with the highest probability# 用于得到預測結果# torch.max(input: tensor, dim: index)# 該函數有兩個輸入：inputs: tensor，第一個參數為一個張量# dim: index，第二個參數為一個整數[-2 - 1]，dim = 0表示計算每列的最大值，dim = 1表示每行的最大值val_acc += (val_pred.cpu() == labels.cpu()).sum().item()samples += labels.size(0)val_loss += loss.item()pbar.set_postfix({'val acc': val_acc / samples, 'val loss': val_loss / (i + 1)})pbar.close()# 如果模型有進步（在訓練集上）就保存一個checkpoint，把模型保存下來if val_acc > best_acc:best_acc = val_acctorch.save(model.state_dict(), model_path)print('saving model with acc {:.3f}'.format(best_acc / len(val_set)))early_stop_count = 0else:early_stop_count += 1if early_stop_count >= early_stopping:# print(f'')中的f使得其有print(''.format())的作用print(f"Epoch: {epoch + 1}, model not improving, early stopping.")breakelse:print('i dont know')# print(f'[{epoch + 1:03d}/{num_epoch:03d}] Acc: {acc:3.6f} Loss: {loss:3.6f}') # print(f'[{epoch + 1:03d}/{num_epoch:03d}] Acc: {acc:3.6f} Loss: {loss:3.6f}')# 如果沒有測試，保存最后一次訓練我們是有測試集的，所以下述代碼用不著 # if not validating, save the last epoch if len(val_set) == 0:torch.save(model.state_dict(), model_path)print('saving model at last epoch') #老規矩,清除內存 del train_loader, val_loader gc.collect()## Testing## 創造一個測試集用來得到題目想要的預測結果，我們從之前保存的checkpoint也就是最好的模型來預測結果 ## Testing # Create a testing dataset, and load model from the saved checkpoint.# Create a testing dataset, and load model from the saved checkpoint.# load data test_X = preprocess_data(split='test', feat_dir='F:\kaggle\HW2\libriphone\libriphone\\feat', phone_path='F:\kaggle\HW2\libriphone\libriphone', concat_nframes=concat_nframes) test_set = LibriDataset(test_X, None) test_loader = DataLoader(test_set, batch_size=batch_size, shuffle=False) # load model model = Classifier(input_dim=input_dim, hidden_layers=hidden_layers, hidden_dim=hidden_dim).to(device) model.load_state_dict(torch.load(model_path)) #Make prediction. test_acc = 0.0 test_lengths = 0 pred = np.array([], dtype=np.int32)model.eval() with torch.no_grad():for i, batch in enumerate(tqdm(test_loader)):features = batchfeatures = features.to(device)outputs = model(features)_, test_pred = torch.max(outputs, 1) # get the index of the class with the highest probabilitypred = np.concatenate((pred, test_pred.cpu().numpy()), axis=0)with open('prediction.csv', 'w') as f:f.write('Id,Class\n')for i, y in enumerate(pred):f.write('{},{}\n'.format(i, y))

二、后續改進

1.RNN（Recurrent Neural Network）循環神經網絡

參考：關于RNN代碼和LSTM代碼參數解釋鏈接: link

2.LSTM(Long Short-term Memory )

3.LSTM+CRF

最后選用BI-LSTM模型（雙向LSTM）和CRF拼接而成的模型，其可以兼具二者的優點，能達到更高的準確率

雙向LSTM不僅可以考慮上文的信息還可以考慮下文的信息，雙向LSTM可以理解為兩個LSTM同時訓練，但他們兩的方向相反，而且參數由于順序不同也不一樣，當前時刻的輸出就是將兩個不同方向的LSTM的輸出拼接到一起。
而CRF源于HMM，HMM模型的其次馬爾科夫假設為：隱藏的馬爾可夫鏈在任意時刻t的狀態只與前一時刻的狀態有關，而與其他時刻的狀態及觀測無關，也與時間t無關。CRF模型就算不僅考慮前一時刻的狀態，同時考慮前后多個狀態。

Bi-LSTM和CRF都可以捕捉未來和過去的特征，將他們拼接起來的原因是Bi-LSTM網絡是利用隱含信息，而CRF可以直接利用label的信息，因此可以加快訓練。

LSTM的參數如下

class torch.nn.LSTM(*args, **kwargs)：input_size：x的特征維度hidden_size：隱藏層的特征維度num_layers：lstm隱層的層數，默認為1bias：默認為Truebatch_first：True則輸入輸出的數據格式為 (batch, seq, feature)dropout：除最后一層，每一層的輸出都進行dropout，默認為: 0bidirectional：True則為雙向lstm默認為False

主要架構代碼如下

class BiLSTM(nn.Module):def __init__(self, class_size=41, input_dim=39, hidden_dim=384,linear_hidden = 1024, dropout=0.5,concat_nframes = 19):super().__init__()self.input_dim = input_dimself.hidden_dim = hidden_dimself.class_size = class_sizeself.linear_hidden = linear_hiddenself.concat_nframes = concat_nframesself.lstm = nn.LSTM(input_dim, hidden_dim // 2, dropout=dropout,num_layers=3, bidirectional=True, batch_first=True)self.hidden2tag = nn.Sequential(nn.Dropout(dropout),nn.Linear(hidden_dim,linear_hidden),nn.BatchNorm1d(self.concat_nframes),# torch.nn.BatchNorm1d(num_features, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)# num_features – 特征維度你想要歸一化那一層，就把那一層的維數寫上去# eps – 為數值穩定性而加到分母上的值。# momentum – 移動平均的動量值。# affine – 一個布爾值，當設置為真時，此模塊具有可學習的仿射參數。nn.ReLU(inplace=True),nn.Linear(linear_hidden, class_size))def forward(self, x):feats, _ = self.lstm(x)return self.hidden2tag(feats)class Crf(nn.Module):def __init__(self, class_size=41):super().__init__()self.class_size = class_sizeself.crf = CRF(self.class_size, batch_first=True)def likelihood(self, x, y):return self.crf(x, y)def forward(self, x):return torch.LongTensor(self.crf.decode(x))

最后能達到的效果是
在第七輪訓練時就達到了0.805準確率，但遺憾的是最后還是沒有太多的進步還是達不到boss baseline

4.問題

看之前一個博主說的，不用CRF,就直接硬加BiLSTM的隱含層數就可以達到0.83以上的準確率，我直接在 BiLSTM+CRF模型增加隱含層似乎效果不太好，晚點我把CRF去掉試試。
但其實理論上在這種有標注的任務中，CRF層應該是對RNN類的模型是有幫助的，因為RNN類的模型沒有考慮類別之間的關聯性，在我們的語音中，單詞中的音位不是隨機的，是有一定順序的，所以各個音位間是有所聯系的，所以CRF可以提取到RNN所不能提取出來的東西，因此BiLSTM-CRF的上限應該高于單純只用BiLSTM

總結

參考

鏈接: link
【【ML2021李宏毅機器學習】作業2 TIMIT phoneme classification 思路講解-嗶哩嗶哩】
鏈接: link
鏈接: link

總結

以上是生活随笔為你收集整理的【李宏毅机器学习HW2】的全部內容，希望文章能夠幫你解決所遇到的問題。

如果覺得生活随笔網站內容還不錯，歡迎將生活随笔推薦給好友。

上一篇：现在动画制作用什么软件？
下一篇：三人表决器c语言实验报告,项目一：三人表