當前位置：首頁 > 人文社科 > 生活经验 >内容正文

生活经验

Bi-LSTM-CRF for Sequence Labeling

發布時間：2023/11/28 生活经验 38 豆豆

生活随笔收集整理的這篇文章主要介紹了 Bi-LSTM-CRF for Sequence Labeling 小編覺得挺不錯的,現在分享給大家,幫大家做個參考.

做了一段時間的Sequence Labeling的工作，發現在NER任務上面，很多論文都采用LSTM-CRFs的結構。CRF在最后一層應用進來可以考慮到概率最大的最優label路徑，可以提高指標。

一般的深度學習框架是沒有CRF layer的，需要手動實現。最近在學習PyTorch，里面有一個Bi-LSTM-CRF的tutorial實現。不得不說PyTorch的tutorial真是太良心了，基本涵蓋了NLP領域各個流行的model實現。在這里從頭梳理一遍，也記錄下學習過程中的一些問題。

Bi-LSTM-CRF的結構一般如上，最后一層利用CRF來學習一個最優路徑。Bi-LSTM layer的輸出維度是tag size，這就相當于是每個詞映射到tag的發射概率值，設Bi-LSTM的輸出矩陣為，其中代表詞映射到的非歸一化概率。對于CRF來說，我們假定存在一個轉移矩陣，則代表轉移到的轉移概率。

對于輸入序列對應的輸出tag序列，定義分數為

利用Softmax函數，我們為每一個正確的tag序列定義一個概率值（代表所有的tag序列，包括不可能出現的）

因而在訓練中，我們只需要最大化似然概率即可，這里我們利用對數似然

所以我們將損失函數定義為，就可以利用梯度下降法來進行網絡的學習了。

在對損失函數進行計算的時候，的計算很簡單，而（下面記作logsumexp）的計算稍微復雜一些，因為需要計算每一條可能路徑的分數。這里用一種簡便的方法，對于到詞的路徑，可以先把到詞的logsumexp計算出來，因為

因此先計算每一步的路徑分數和直接計算全局分數相同，但這樣可以大大減少計算的時間。下面是PyTorch中的代碼

def _forward_alg(self, feats):# Do the forward algorithm to compute the partition functioninit_alphas = torch.Tensor(1, self.tagset_size).fill_(-10000.)# START_TAG has all of the score.init_alphas[0][self.tag_to_ix[START_TAG]] = 0.# Wrap in a variable so that we will get automatic backpropforward_var = autograd.Variable(init_alphas)# Iterate through the sentencefor feat in feats:alphas_t = []  # The forward variables at this timestepfor next_tag in range(self.tagset_size):# broadcast the emission score: it is the same regardless of# the previous tagemit_score = feat[next_tag].view(1, -1).expand(1, self.tagset_size)# the ith entry of trans_score is the score of transitioning to# next_tag from itrans_score = self.transitions[next_tag].view(1, -1)# The ith entry of next_tag_var is the value for the# edge (i -> next_tag) before we do log-sum-expnext_tag_var = forward_var + trans_score + emit_score# The forward variable for this tag is log-sum-exp of all the# scores.alphas_t.append(log_sum_exp(next_tag_var))forward_var = torch.cat(alphas_t).view(1, -1)terminal_var = forward_var + self.transitions[self.tag_to_ix[STOP_TAG]]alpha = log_sum_exp(terminal_var)return alpha

在解碼時，采用Viterbi算法

def _viterbi_decode(self, feats):backpointers = []# Initialize the viterbi variables in log spaceinit_vvars = torch.Tensor(1, self.tagset_size).fill_(-10000.)init_vvars[0][self.tag_to_ix[START_TAG]] = 0# forward_var at step i holds the viterbi variables for step i-1forward_var = autograd.Variable(init_vvars)for feat in feats:bptrs_t = []  # holds the backpointers for this stepviterbivars_t = []  # holds the viterbi variables for this stepfor next_tag in range(self.tagset_size):# next_tag_var[i] holds the viterbi variable for tag i at the# previous step, plus the score of transitioning# from tag i to next_tag.# We don't include the emission scores here because the max# does not depend on them (we add them in below)next_tag_var = forward_var + self.transitions[next_tag]best_tag_id = argmax(next_tag_var)bptrs_t.append(best_tag_id)viterbivars_t.append(next_tag_var[0][best_tag_id])# Now add in the emission scores, and assign forward_var to the set# of viterbi variables we just computedforward_var = (torch.cat(viterbivars_t) + feat).view(1, -1)backpointers.append(bptrs_t)# Transition to STOP_TAGterminal_var = forward_var + self.transitions[self.tag_to_ix[STOP_TAG]]best_tag_id = argmax(terminal_var)path_score = terminal_var[0][best_tag_id]# Follow the back pointers to decode the best path.best_path = [best_tag_id]for bptrs_t in reversed(backpointers):best_tag_id = bptrs_t[best_tag_id]best_path.append(best_tag_id)# Pop off the start tag (we dont want to return that to the caller)start = best_path.pop()assert start == self.tag_to_ix[START_TAG]  # Sanity checkbest_path.reverse()return path_score, best_path

全部代碼實現可以移步Bi-LSTM-CRF。

參考

Bidirectional LSTM-CRF Models for Sequence Tagging
Neural Architectures for Named Entity Recognition
Advanced: Making Dynamic Decisions and the Bi-LSTM CRF

總結

以上是生活随笔為你收集整理的Bi-LSTM-CRF for Sequence Labeling的全部內容，希望文章能夠幫你解決所遇到的問題。

如果覺得生活随笔網站內容還不錯，歡迎將生活随笔推薦給好友。

上一篇： pytorch BiLSTM+CRF代码
下一篇： pytorch方法,Tensor及其基本