【NLP】LSTM 唐诗生成器 pytorch 版
參考這篇文章LSTM唐詩生成器Keras版
將相關的 keras 模型代碼進行修改,改成對應的 pytorch 模型,現將有區別的部分放在這里。
訓練模型
搭建網絡
# 把keras 模型改成 pytorch 模型 # 建立LSTM模型 import torch import torch.nn as nn import torch.nn.functional as F# 設置 CUDA device = torch.device("cuda" if torch.cuda.is_available() else "cpu") # model = Sequential() # model.add(Embedding(10000, 128, input_length=20)) # model.add(LSTM(128, return_sequences=True)) # model.add(Dropout(0.2)) # model.add(LSTM(128)) # model.add(Dropout(0.2)) # model.add(Dense(10000, activation='softmax'))# 參考上述的 keras 模型,建立 pytorch 模型# 第二層 LSTM 只取最后一個輸出,所以 return_sequences=Falseclass LSTMNet(nn.Module):def __init__(self):super(LSTMNet, self).__init__()self.embedding = nn.Embedding(10000, 128)self.lstm1 = nn.LSTM(input_size=128, hidden_size=128, num_layers=1, batch_first=True)self.dropout1 = nn.Dropout(0.2)self.lstm2 = nn.LSTM(input_size=128, hidden_size=128, num_layers=1, batch_first=True)self.dropout2 = nn.Dropout(0.2)self.fc = nn.Linear(128, 10000)def forward(self, x):x = self.embedding(x) # [batch_size, seq_len, embedding_size]x, _ = self.lstm1(x) # [batch_size, seq_len, hidden_size]x = self.dropout1(x) # [batch_size, seq_len, hidden_size]x, _ = self.lstm2(x) # [batch_size, seq_len, hidden_size]x = self.dropout2(x) # [batch_size, seq_len, hidden_size]x = x[:, -1, :] # 這里-1的意思是:取最后一個輸出 [batch_size, hidden_size]x = self.fc(x) # [batch_size, 10000]return x # 實例化模型 model = LSTMNet().to(device) modelLSTMNet(
(embedding): Embedding(10000, 128)
(lstm1): LSTM(128, 128, batch_first=True)
(dropout1): Dropout(p=0.2, inplace=False)
(lstm2): LSTM(128, 128, batch_first=True)
(dropout2): Dropout(p=0.2, inplace=False)
(fc): Linear(in_features=128, out_features=10000, bias=True)
)
Pytorch 數據轉換
注意:因為 y_train 和 y_test [batch, 1] 最后一個維度是沒用的,
所以要把它去掉,變成 [batch] 才能正常給交叉熵損失函數計算
torch.Size([3, 20])
torch.Size([3, 10000])
(torch.Size([39405]), torch.Size([16889]))
訓練模型
# 訓練模型 import torch.optim as optim from tqdm import tqdm optimizer = optim.Adam(model.parameters(), lr=0.001)batch_size = 256 epochs = 20# 注意,這里 y_train, y_test 的形狀都是 [batch, 1] ,也就是說,并不是 one-hot 編碼 # 所以,損失函數用的是 CrossEntropyLossloss_func = nn.CrossEntropyLoss() for epoch in range(epochs):print('Epoch: ', epoch)for i in tqdm(range(0, len(x_train), batch_size)):x_batch = x_train[i:i+batch_size]y_batch = y_train[i:i+batch_size]pred = model(x_batch)loss = loss_func(pred, y_batch)optimizer.zero_grad()loss.backward()optimizer.step()# 每個 epoch 結束后,計算一下準確率# 訓練集準確率pred = model(x_train)pred = torch.argmax(pred, dim=1)acc = (pred == y_train).sum().item() / len(y_train)print('Train acc: ', acc)# 測試集準確率pred = model(x_test)pred = torch.argmax(pred, dim=1)acc = (pred == y_test).sum().item() / len(y_test)print('Test acc: ', acc)Epoch: 0
100%|██████████| 154/154 [00:38<00:00, 4.01it/s]
Train acc: 0.10216977540921203
Test acc: 0.10320326839955
Epoch: 1
…
Epoch: 19
100%|██████████| 154/154 [00:37<00:00, 4.09it/s]
Train acc: 0.20576069026773253
Test acc: 0.17970276511338742
總結
以上是生活随笔為你收集整理的【NLP】LSTM 唐诗生成器 pytorch 版的全部內容,希望文章能夠幫你解決所遇到的問題。
- 上一篇: java lame_音视频编解码——LA
- 下一篇: c语言程序规定必须用main作为,C语言