pytorch实现文本分类_使用变形金刚进行文本分类(Pytorch实现)
pytorch實(shí)現(xiàn)文本分類
‘Attention Is All You Need’
“注意力就是你所需要的”
New deep learning models are introduced at an increasing rate and sometimes it’s hard to keep track of all the novelties .
新的深度學(xué)習(xí)模型被引入的速度越來越快,有時(shí)很難跟蹤所有新穎性。
in this Article we will talk about Transformers with attached notebook(text classification example) are a type of neural network architecture that have been gaining popularity .
在本文中,我們將討論帶有筆記本的變壓器 (文本分類示例)是一種已經(jīng)越來越流行的神經(jīng)網(wǎng)絡(luò)體系結(jié)構(gòu)。
In this post, we will address the following questions related to Transformers :
在這篇文章中,我們將解決與變形金剛有關(guān)的以下問題:
目錄 : (Table Of Contents :)
Why do we need the Transformer ?
為什么我們需要變壓器?
Transformer and its architecture in detail .
變壓器及其架構(gòu)詳細(xì)。
Text Classification with Transformer .
用變形金剛進(jìn)行文本分類。
useful papers to well dealing with Transformer.
有用的文章,以很好地處理Transformer。
我-為什么需要變壓器? (I -Why do we need the transformer ?)
Transformers were developed to solve the problem of sequence transduction, or neural machine translation. That means any task that transforms an input sequence to an output sequence. This includes speech recognition, text-to-speech transformation, etc..
開發(fā)了變壓器來解決序列轉(zhuǎn)導(dǎo)或神經(jīng)機(jī)器翻譯的問題。 這意味著任何將輸入序列轉(zhuǎn)換為輸出序列的任務(wù)。 這包括語音識(shí)別,文本到語音轉(zhuǎn)換等。
For models to perform sequence transduction, it is necessary to have some sort of memory.
為了使模型執(zhí)行序列轉(zhuǎn)導(dǎo),必須具有某種內(nèi)存。
The limitations of Long-term dependencies :
長(zhǎng)期依賴的局限性:
Transformer is an architecture for transforming one sequence into another one with the help of two parts (Encoder and Decoder), but it differs from the previously described/existing sequence-to-sequence models because it does not imply any Recurrent Networks (GRU, LSTM, etc.).
Transformer是一種通過兩個(gè)部分(編碼器和解碼器)將一個(gè)序列轉(zhuǎn)換為另一個(gè)序列的體系結(jié)構(gòu),但它與先前描述/現(xiàn)有的序列到序列模型不同,因?yàn)樗⒉话凳救魏芜f歸網(wǎng)絡(luò)(GRU,LSTM)等)。
In the paper “Attention is All You Need” , the transformer architecture is well introduced , and like the title indicates transformer architecture uses the attention mechanism (we will make a detailed article about it later)
在“注意就是你所需要的一切”一文中,很好地介紹了變壓器體系結(jié)構(gòu),并且正如標(biāo)題所示,變壓器體系結(jié)構(gòu)使用了注意力機(jī)制(我們將在后面進(jìn)行詳細(xì)的介紹)
let’s consider a language model that will predict the next word based on the previous ones !
讓我們考慮一種語言模型,該模型將根據(jù)前一個(gè)單詞預(yù)測(cè)下一個(gè)單詞!
sentence : “bitcoin the best cryptocurrency”
句子: “比特幣是最好的加密貨幣”
here we don’t need an additional context , so obvious that the next word will be “cryptocurrency” .
這里我們不需要額外的上下文,很明顯,下一個(gè)單詞將是“ cryptocurrency”。
In this case RNN’s can sove the issue and predict the answer using the past information .
在這種情況下,RNN可以解決問題并使用過去的信息預(yù)測(cè)答案。
But in other cases we need more context . For example, let’s say that you are trying to predict the last word of the text: I grew up in Tunisia … I speak fluent ... Recent information suggests that the next word is probably a language, but if we want to narrow down which language, we need context of Tunisia, that is further back in the text.
但是在其他情況下,我們需要更多的上下文。 例如,假設(shè)您要預(yù)測(cè)文本的最后一句話: 我在突尼斯長(zhǎng)大……我的語言說得很流利..... 最近的信息表明,下一個(gè)詞可能是一種語言,但是如果我們想縮小哪種語言的范圍,我們需要突尼斯的語境,這在文本中會(huì)更進(jìn)一步。
RNNs become very ineffective when the gap between the relevant information and the point where it is needed become very large. That is due to the fact that the information is passed at each step and the longer the chain is, the more probable the information is lost along the chain.
當(dāng)相關(guān)信息和需要的信息之間的差距變得很大時(shí),RNN變得非常無效。 這是由于以下事實(shí):信息在每個(gè)步驟中傳遞,并且鏈條越長(zhǎng),信息沿著鏈條丟失的可能性就越大。
i recommend a nice article that talk in depth about the difference between seq2seq and transformer .
我推薦一篇不錯(cuò)的文章 ,深入探討seq2seq和Transformer之間的區(qū)別。
II-變壓器及其架構(gòu)的詳細(xì)信息: (II -Transformer and its architecture in detail :)
An image is worth thousand words, so we will start with that!
一張圖像值一千個(gè)單詞,因此我們將從此開始!
The first thing that we can see is that it has a sequence-to-sequence encoder-decoder architecture.
我們可以看到的第一件事是它具有序列到序列的編碼器-解碼器體系結(jié)構(gòu)。
Much of the literature on Transformers present on the Internet use this very architecture to explain Transformers.
互聯(lián)網(wǎng)上有關(guān)變形金剛的許多文獻(xiàn)都使用這種架構(gòu)來解釋變形金剛。
But this is not the one used in Open AI’s GPT model (or the GPT-2 model, which was just a larger version of its predecessor).
但這不是Open AI的GPT模型(或GPT-2模型,只是其前身的較大版本)中使用的模型。
The GPT is a 12-layer decoder only transformer with 117M parameters.
GPT是僅12層解碼器的變壓器,具有117M個(gè)參數(shù)。
The Transformer has a stack of 6 Encoder and 6 Decoder, unlike Seq2Seq;
與Seq2Seq不同,該Transformer具有6個(gè)編碼器和6個(gè)解碼器的堆棧;
the Encoder contains two sub-layers: multi-head self-attention layer and a fully connected feed-forward network.
編碼器包含兩個(gè)子層:多頭自我注意層和完全連接的前饋網(wǎng)絡(luò)。
The Decoder contains three sub-layers, a multi-head self-attention layer, an additional layer that performs multi-head self-attention over encoder outputs, and a fully connected feed-forward network.
解碼器包含三個(gè)子層,一個(gè)多頭自我注意層,一個(gè)在編碼器輸出上執(zhí)行多頭自我注意的附加層以及一個(gè)完全連接的前饋網(wǎng)絡(luò)。
Each sub-layer in Encoder and Decoder has a Residual connection followed by a layer normalization.
編碼器和解碼器中的每個(gè)子層都有一個(gè)殘差連接,然后進(jìn)行層歸一化。
All input and output tokens to Encoder/Decoder are converted to vectors using learned embeddings.
使用學(xué)習(xí)到的嵌入,將編碼器/解碼器的所有輸入和輸出令牌轉(zhuǎn)換為向量。
These input embeddings are then passed to Positional Encoding.
然后,將這些輸入嵌入傳遞到位置編碼。
The Transformers architecture does not contain any recurrence or convolution and hence has no notion of word order.
變形金剛架構(gòu)不包含任何重復(fù)或卷積,因此沒有詞序的概念。
All the words of the input sequence are fed to the network
輸入序列的所有單詞都被饋送到網(wǎng)絡(luò)
with no special order or position as they all flow simultaneously through the Encoder and decoder stack.
沒有特殊的順序或位置,因?yàn)樗鼈兌纪瑫r(shí)流經(jīng)編碼器和解碼器堆棧。
To understand the meaning of a sentence,
要了解句子的含義,
it is essential to understand the position and the order of words.
了解單詞的位置和順序至關(guān)重要。
III —使用Transformer(Pytorch實(shí)現(xiàn))進(jìn)行文本分類: (III — Text Classification using Transformer(Pytorch implementation) :)
It is too simple to use the ClassificationModel from simpletransformes :ClassificationModel(‘Architecture’, ‘model shortcut name’, use_cuda=True,num_labels=4)Architecture : Bert , Roberta , Xlnet , Xlm…shortcut name models for Roberta : roberta-base , roberta-large ….more details here
使用來自simpletransformes的分類模型太簡(jiǎn)單了:ClassificationModel('Architecture','模型快捷方式名稱',use_cuda = True,num_labels = 4)體系結(jié)構(gòu):Bert,Roberta,Xlnet,Xlm…Roberta的快捷名稱模型:roberta-base ,roberta-large…。更多詳細(xì)信息在這里
we create a model that classify text for 4 classes [‘a(chǎn)rt’, ‘politics’, ‘health’, ‘tourism’]
我們創(chuàng)建了一個(gè)模型,將文本分為4類['藝術(shù)','政治','健康','旅游']
we apply this model in our previous project
我們?cè)谥暗捻?xiàng)目中應(yīng)用了此模型
and we integrate it in our flask application here . (you can buy it for helping us to create better content and help community)
并將其集成到此處的燒瓶應(yīng)用程序中。 (您可以購買它來幫助我們創(chuàng)建更好的內(nèi)容并幫助社區(qū))
here you will find a commented notebook :
在這里,您會(huì)找到一個(gè)有注釋的筆記本:
- setup environment & configuration 設(shè)置環(huán)境和配置
!pip install simpletransformers# memory footprint support libraries/code
!ln -sf /opt/bin/nvidia-smi /usr/bin/nvidia-smi
!pip install gputil
!pip install psutil
!pip install humanize importing libraries
- Importing Libraries 導(dǎo)入庫
warnings.simplefilter('ignore')import gcfrom scipy.special import softmaxfrom simpletransformers.classification import ClassificationModelfrom sklearn.model_selection import train_test_split, StratifiedKFold, KFold import sklearnfrom sklearn.metrics import log_lossfrom sklearn.metrics import *from sklearn.model_selection import *import reimport randomimport torch
pd.options.display.max_colwidth = 200#choose the same seed to assure that our model will be roproducibledef seed_all(seed_value):
random.seed(seed_value) # Python
np.random.seed(seed_value) # cpu vars
torch.manual_seed(seed_value) # cpu vars
if torch.cuda.is_available():
torch.cuda.manual_seed(seed_value)
torch.cuda.manual_seed_all(seed_value) # gpu vars
torch.backends.cudnn.deterministic = True #needed
torch.backends.cudnn.benchmark = False
seed_all(2)
- Reading Data 讀取資料
train=pd.read_csv()
feat_cols = "text"
- Verify the topic classes in the data 驗(yàn)證數(shù)據(jù)中的主題類
- train the model 訓(xùn)練模型
train.head()
l=['art', 'politics', 'health', 'tourism']# Get the numerical ids of column label
train['label']=train.label.astype('category')
Y = train.label.cat.codes
train['label']=Y# Print initial shape
print(Y.shape)from keras.utils import to_categorical# One-hot encode the indexes
Y = to_categorical(Y)# Check the new shape of the variable
print(Y.shape)# Print the first 5 rows
print(Y[0:5])for i in range(len(l)) :
train[l[i]] = Y[:,i]#using KFOLD Cross Validation is important to test our model%%time
err=[]
y_pred_tot=[]
fold=StratifiedKFold(n_splits=5, shuffle=True, random_state=1997)
i=1for train_index, test_index in fold.split(train,train['label']):
train1_trn, train1_val = train.iloc[train_index], train.iloc[test_index]
model = ClassificationModel('roberta', 'roberta-base', use_cuda=True,num_labels=4, args={
'train_batch_size':16,
'reprocess_input_data': True,
'overwrite_output_dir': True,
'fp16': False,
'do_lower_case': False,
'num_train_epochs': 4,
'max_seq_length': 128,
'regression': False,
'manual_seed': 1997,
"learning_rate":2e-5,
'weight_decay':0,
"save_eval_checkpoints": True,
"save_model_every_epoch": False,
"silent": True})
model.train_model(train1_trn)
raw_outputs_val = model.eval_model(train1_val)[1]
raw_outputs_vals = softmax(raw_outputs_val,axis=1)
print(f"Log_Loss: {log_loss(train1_val['label'], raw_outputs_vals)}")
err.append(log_loss(train1_val['label'], raw_outputs_vals))
output :
輸出:
Log_Loss: 0.35637871529928816
Log_Loss:0.35637871529928816
CPU times: user 11min 2s, sys: 4min 21s,
CPU時(shí)間:用戶11分鐘2秒,系統(tǒng)時(shí)間:4分鐘21秒,
total: 15min 23s Wall time: 16min 7s
總計(jì):15分23秒掛墻時(shí)間:16分7秒
Log Loss :
日志損失:
print("Mean LogLoss: ",np.mean(err))output :
輸出:
Mean LogLoss: 0.34930175561484067
平均對(duì)數(shù)損失:0.34930175561484067
raw_outputs_valsOutput :
輸出:
array([[9.9822301e-01, 3.4856689e-04, 3.8243082e-04, 1.0458552e-03], [9.9695909e-01, 1.1522240e-03, 5.9563853e-04, 1.2927916e-03], [9.9910539e-01, 2.3084633e-04, 2.5905663e-04, 4.0465154e-04], ..., [3.6545596e-04, 2.8826005e-04, 4.3145564e-04, 9.9891484e-01], [4.0789684e-03, 9.9224585e-01, 1.2752400e-03, 2.3997365e-03], [3.7382307e-04, 3.4797701e-04, 3.6257200e-04, 9.9891579e-01]], dtype=float32)
數(shù)組([[9.9822301e-01,3.4856689e-04,3.8243082e-04,1.0458552e-03],[9.9695909e-01,1.1522240e-03,5.9563853e-04,1.2927916e-03],[9.9910539e -01,2.3084633e-04,2.5905663e-04,4.0465154e-04],...,[3.6545596e-04,2.8826005e-04,4.3145564e-04,9.9891484e-01],[4.0789684e-03 ,9.9224585e-01,1.2752400e-03,2.3997365e-03],[3.7382307e-04、3.4797701e-04、3.6257200e-04、9.9891579e-01]],dtype = float32)
- test our Model 測(cè)試我們的模型
preds = softmax(pred,axis=1)
preds
output :
輸出:
array([[6.0461409e-04, 3.6119239e-04, 3.3729596e-04, 9.9869716e-01]], dtype=float32)
數(shù)組([[6.0461409e-04,3.6119239e-04,3.3729596e-04,9.9869716e-01]],dtype = float32)
we create a function which calculate the maximum probability and detect the topicfor example if we have 0.6 politics 0.1 art 0.15 health 0.15 tourism >>>> topic = politics
我們創(chuàng)建了一個(gè)計(jì)算最大概率并檢測(cè)主題的函數(shù), 例如,如果我們有0.6政治0.1藝術(shù)0.15健康0.15旅游業(yè)>>>> topic =政治
def estm(raw_outputs_vals):for i in range(len(raw_outputs_vals)):
for j in range(4):
if(max(raw_outputs_vals[i])==raw_outputs_vals[i][j]):
raw_outputs_vals[i][j]=1
else :
raw_outputs_vals[i][j]=0
return(raw_outputs_vals)estm(preds)
output :
輸出:
array([[0., 0., 0., 1.]], dtype=float32)
數(shù)組([[0.,0.,0.,1.]],dtype = float32)
our labels are :['art', 'politics', 'health', 'tourism']so that's correct ;)
我們的標(biāo)簽是:['藝術(shù)','政治','健康','旅游'], 所以沒錯(cuò);)
i hope you find it useful & helpful !
我希望您覺得它有用和有用!
Download source code from our github.
從我們的github下載源代碼 。
III-與變壓器很好地打交道的有用論文: (III -useful papers to well dealing with Transformer:)
here a list of recommended papers to get in depth with transformers (mainly Bert Model) :
這是深入了解變壓器的推薦論文清單(主要是伯特模型):
* Cross-Linguistic Syntactic Evaluation of Word Prediction Models* Emerging Cross-lingual Structure in Pretrained Language Models* Finding Universal Grammatical Relations in Multilingual BERT* On the Cross-lingual Transferability of Monolingual Representations* How multilingual is Multilingual BERT?* Is Multilingual BERT Fluent in Language Generation?* Are All Languages Created Equal in Multilingual BERT?* What’s so special about BERT’s layers? A closer look at the NLP pipeline in monolingual and multilingual models* A Study of Cross-Lingual Ability and Language-specific Information in Multilingual BERT* Cross-Lingual Ability of Multilingual BERT: An Empirical Study* Multilingual is not enough: BERT for Finnish
*單詞預(yù)測(cè)模型的跨語言語法評(píng)估*預(yù)先訓(xùn)練的語言模型中新出現(xiàn)的跨語言結(jié)構(gòu)*在多語言BERT中找到通用語法關(guān)系*關(guān)于單語言表示形式的跨語言可傳遞性*多語言BERT的多語言能力*多語言BERT會(huì)流利語言生成?*在多語言BERT中所有語言都創(chuàng)建相同嗎?* BERT的層有何特別之處? 仔細(xì)研究單語言和多語言模型中的NLP管道*多語言BERT中的跨語言能力和特定于語言的信息的研究*多語言BERT的跨語言能力:一項(xiàng)經(jīng)驗(yàn)研究*多語言是不夠的:芬蘭語的BERT
Download all files from our github repo
從我們的github上下載的所有文件回購
III-摘要: (III -Summary :)
Transformers present the next front in NLP.
變形金剛代表了NLP的下一個(gè)前沿。
In less than a couple of years since its introduction,
自推出以來不到兩年的時(shí)間,
this new architectural trend has surpassed the feats of RNN-based architectures.
這種新的架構(gòu)趨勢(shì)已經(jīng)超越了基于RNN的架構(gòu)的壯舉。
This exciting pace of invention is perhaps the best part of being early to a new field like Deep Learning!
如此激動(dòng)人心的發(fā)明步伐也許是早期進(jìn)入深度學(xué)習(xí)等新領(lǐng)域的最好部分!
if you have any suggestions or a questions please contact NeuroData Team :
如果您有任何建議或疑問,請(qǐng)聯(lián)系NeuroData團(tuán)隊(duì):
臉書
領(lǐng)英
Website
網(wǎng)站
Github
Github
Authors :
作者:
Yassine Hamdaoui
Yassine Hamdaoui
code credits goes to Med Klai Helmi : Data Scientist and Zindi Mentor
代碼信用歸于Med Klai Helmi:數(shù)據(jù)科學(xué)家和Zindi Mentor
翻譯自: https://medium.com/swlh/text-classification-using-transformers-pytorch-implementation-5ff9f21bd106
pytorch實(shí)現(xiàn)文本分類
總結(jié)
以上是生活随笔為你收集整理的pytorch实现文本分类_使用变形金刚进行文本分类(Pytorch实现)的全部?jī)?nèi)容,希望文章能夠幫你解決所遇到的問題。
- 上一篇: 真我GT Neo5首发240W满级秒充
- 下一篇: 《满江红》称将起诉传谣言者 网友神嘲讽: