日韩性视频-久久久蜜桃-www中文字幕-在线中文字幕av-亚洲欧美一区二区三区四区-撸久久-香蕉视频一区-久久无码精品丰满人妻-国产高潮av-激情福利社-日韩av网址大全-国产精品久久999-日本五十路在线-性欧美在线-久久99精品波多结衣一区-男女午夜免费视频-黑人极品ⅴideos精品欧美棵-人人妻人人澡人人爽精品欧美一区-日韩一区在线看-欧美a级在线免费观看

歡迎訪問(wèn) 生活随笔!

生活随笔

當(dāng)前位置: 首頁(yè) > 编程资源 > 编程问答 >内容正文

编程问答

变压器 5g_T5:文本到文本传输变压器

發(fā)布時(shí)間:2023/12/15 编程问答 29 豆豆
生活随笔 收集整理的這篇文章主要介紹了 变压器 5g_T5:文本到文本传输变压器 小編覺(jué)得挺不錯(cuò)的,現(xiàn)在分享給大家,幫大家做個(gè)參考.

變壓器 5g

With the burgeoning of Transfer Learning, Deep Learning has achieved many wonders. More specifically, in NLP, with the rise of the Transformer (Vaswani et. al.), various approaches for ‘Language Modeling’ have arisen wherein we leverage transfer learning by pre-training the model for a very generic task and then fine-tuning it on specific downstream problems.

隨著遷移學(xué)習(xí)的蓬勃發(fā)展,深度學(xué)習(xí)已實(shí)現(xiàn)了許多奇跡。 更具體地說(shuō),在NLP中,隨著Transformer的興起( Vaswani等人 ),出現(xiàn)了各種“語(yǔ)言建模”方法,其中我們通過(guò)對(duì)模型進(jìn)行預(yù)訓(xùn)練以完成非常通用的任務(wù),然后進(jìn)行微調(diào)來(lái)利用轉(zhuǎn)移學(xué)習(xí)它針對(duì)特定的下游問(wèn)題。

In this article, we’ll discuss Google’s state of the art, T5 Text-to-Text Transfer Transformer Model which was proposed earlier this year in the paper, “Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer”. This paper is essentially a survey of modern transfer learning techniques used in language understanding and hence proposes a unified framework that attempts to combine all language problems into a text-to-text format. We will discuss this approach in greater detail in the coming sections. Moreover, the authors have also open-sourced a new dataset (for facilitating their work) called C4 Colossal Clean Crawled Corpus.

在這篇文章中,我們將討論藝術(shù),T5的谷歌的狀態(tài)- T的外部- 牛逼T外部貿(mào)易交接牛逼 ransformer模型,它在今年早些時(shí)候提出的文件,“ 探索遷移學(xué)習(xí)的限制與統(tǒng)一文本-to-Text Transformer ”。 本文本質(zhì)上是對(duì)用于語(yǔ)言理解的現(xiàn)代遷移學(xué)習(xí)技術(shù)的調(diào)查,因此提出了一個(gè)統(tǒng)一的框架,該框架試圖將所有語(yǔ)言問(wèn)題組合為文本到文本格式。 我們將在接下來(lái)的部分中詳細(xì)討論這種方法。 此外,作者還開(kāi)源了一個(gè)新的數(shù)據(jù)集(為了方便他們的工作),稱(chēng)為C4 - C大型CC精煉CAppium。

T5—文本到文本傳輸變壓器 (T5— Text-To-Text Transfer Transformer)

As mentioned earlier, T5 attempts to combine all the downstream tasks into a text-to-text format.

如前所述,T5嘗試將所有下游任務(wù)組合為文本到文本格式。

文本到文本框架 (The Text-to-Text Framework)

Google AI BlogGoogle AI博客實(shí)現(xiàn)所有下游任務(wù)的統(tǒng)一框架

Consider the example of a BERT-style architecture that is pre-trained on a Masked LM and Next Sentence Prediction objective and then, fine-tuned on downstream tasks (for example predicting a class label in classification or the span of the input in QnA). Here, we separately fine-tune different instances of the pre-trained model on different downstream tasks.

考慮一個(gè)BERT樣式的架構(gòu)示例,該架構(gòu)在Masked LM和Next Sentence Prediction目標(biāo)上進(jìn)行了預(yù)訓(xùn)練,然后在下游任務(wù)上進(jìn)行了微調(diào)(例如,預(yù)測(cè)類(lèi)別中的類(lèi)標(biāo)簽或QnA中輸入的范圍) 。 在這里,我們分別針對(duì)不同的下游任務(wù)微調(diào)了預(yù)訓(xùn)練模型的不同實(shí)例。

The text-to-text framework on the contrary, suggests using the same model, same loss function, and the same hyperparameters on all the NLP tasks. In this approach, the inputs are modeled in such a way that the model shall recognize a task, and the output is simply the “text” version of the expected outcome. Refer to the above animation to get a clearer view of this.

相反,文本到文本框架建議在所有NLP任務(wù)上使用相同的模型,相同的損失函數(shù)和相同的超參數(shù)。 在這種方法中,對(duì)輸入進(jìn)行建模的方式是模型可以識(shí)別任務(wù),而輸出只是預(yù)期結(jié)果的“文本”版本。 請(qǐng)參考上面的動(dòng)畫(huà)以獲得更清晰的視圖。

Fun fact: We can even apply T5 to regression tasks by training it to output the string representation of the expected output.

有趣的事實(shí):通過(guò)訓(xùn)練T5輸出期望輸出的字符串表示形式,我們甚至可以將T5應(yīng)用于回歸任務(wù)。

C4—巨大的干凈爬行的語(yǔ)料庫(kù) (C4— Colossal Clean Crawled Corpus)

It is a stereotype to pre-train language models on huge unlabeled datasets. Common Crawl is one of such datasets. It is obtained by scraping web pages and ignoring the markup from the HTML. It produces around 20TB of scraped data each month. However, Common Crawl contains large amounts of gibberish text like menus or error messages, or duplicate text. Moreover, there is also an appreciable amount of useless text with respect to our tasks like offensive words, placeholder text, or source codes.

這是在大量未標(biāo)記數(shù)據(jù)集上預(yù)訓(xùn)練語(yǔ)言模型的刻板印象。 Common Crawl是此類(lèi)數(shù)據(jù)集之一。 它是通過(guò)抓取網(wǎng)頁(yè)并忽略HTML中的標(biāo)記而獲得的。 每個(gè)月會(huì)產(chǎn)生約20TB的抓取數(shù)據(jù)。 但是,“常見(jiàn)爬網(wǎng)”包含大量亂碼,如菜單或錯(cuò)誤消息,或重復(fù)的文本。 此外,對(duì)于我們的任務(wù),還有相當(dāng)數(shù)量的無(wú)用文字,例如令人反感的文字,占位符文字或源代碼。

For C4, the authors took Common Crawl scrape from April 2019 and applied some cleansing filters on it:

對(duì)于C4,作者從2019年4月開(kāi)始抓取Common Crawl刮擦并在其上應(yīng)用了一些清理過(guò)濾器:

  • Retaining sentences that end only with a valid terminal punctuation mark (a period, exclamation mark, question mark, or end quotation mark).

    保留僅以有效的終端標(biāo)點(diǎn)符號(hào)(句點(diǎn),感嘆號(hào),問(wèn)號(hào)或結(jié)束引號(hào))結(jié)尾的句子。
  • Removing any page containing offensive words that appear on the “List of Dirty, Naughty, Obscene or Otherwise Bad Words”.

    刪除出現(xiàn)在“ 臟話(huà),頑皮話(huà),淫穢話(huà)或其他不良話(huà)語(yǔ)清單 ”上的任何含有冒犯性話(huà)語(yǔ)的頁(yè)面。

  • “JavaScript must be enabled” type warnings are removed by filtering out any line that contains the word JavaScript.

    通過(guò)過(guò)濾掉包含JavaScript單詞的任何行,可以刪除“必須啟用JavaScript”類(lèi)型的警告。
  • Pages with placeholder text like “l(fā)orem ipsum” are removed.

    帶有占位符文本(如“ lorem ipsum”)的頁(yè)面將被刪除。
  • Source codes are removed by removing any pages that contain a curly brace “{” (since curly braces appear in many well-known programming languages).

    通過(guò)刪除任何包含花括號(hào)“ {”的頁(yè)面來(lái)刪除源代碼(因?yàn)榛ɡㄌ?hào)在許多眾所周知的編程語(yǔ)言中都顯示)。
  • For removing duplicates, three-sentence spans are considered. Any duplicate occurrences of the same 3 sentences are filtered out.

    為了刪除重復(fù)項(xiàng),請(qǐng)考慮三句跨度。 同一3個(gè)句子的任何重復(fù)出現(xiàn)都將被過(guò)濾掉。
  • Finally, since the downstream tasks are mostly for English language, langdetect is used to filter out any pages that are not classified as English with a probability of at least 0.99.

    最后,由于下游任務(wù)主要用于英語(yǔ), 因此使用langdetect過(guò)濾掉任何未歸類(lèi)為英語(yǔ)的頁(yè)面的可能性至少為0.99。

  • This resulted in a 750GB dataset which is not just reasonably larger than the most pre-training datasets but also contains a relatively very clean text.

    這產(chǎn)生了750GB的數(shù)據(jù)集,它不僅比大多數(shù)預(yù)訓(xùn)練數(shù)據(jù)集合理地大,而且還包含相對(duì)非常干凈的文本。

    輸入和輸出表示 (Input and Output Representations)

    This is one of the major concerns of T5 as this is what makes the unified text-to-text approach possible. To avail the same model for all the downstream tasks, a task-specific text prefix is added to the original input that is fed to the model. This text prefix is also considered as a hyperparameter.

    這是T5的主要問(wèn)題之一,因?yàn)檫@使統(tǒng)一的文本到文本方法成為可能。 為了對(duì)所有下游任務(wù)使用相同的模型,將特定于任務(wù)的文本前綴添加到提供給模型的原始輸入中。 此文本前綴也被視為超參數(shù)。

    As an example,to ask the model to translate the sentence “That is good.” from English to German, the model would be fed the sequence “translate English to German: That is good.” and would be trained to output “Das ist gut.

    例如,要求模型翻譯句子“那很好”。 從英語(yǔ)到德語(yǔ),將向模型提供以下順序:“ 將英語(yǔ)翻譯為德語(yǔ):很好。 ”,并將經(jīng)過(guò)訓(xùn)練以輸出“ 達(dá)斯主義者的直覺(jué)”。

    — T5 Paper

    — T5紙

    Similarly, for classification tasks, the model predicts a single word corresponding to the target label.

    類(lèi)似地,對(duì)于分類(lèi)任務(wù),模型預(yù)測(cè)與目標(biāo)標(biāo)簽相對(duì)應(yīng)的單個(gè)單詞。

    For example, on the MNLI benchmark the goal is to predict whether a premise implies (“entailment”), contradicts (“contradiction”), or neither (“neutral”) a hypothesis. With our preprocessing, the input sequence becomes “mnli premise: I hate pigeons. hypothesis: My feelings towards pigeons are filled with animosity.” with the corresponding target word “entailment”.

    例如,在MNLI基準(zhǔn)的目標(biāo)是預(yù)測(cè)的前提是否意味著(“ 蘊(yùn)涵 ”)相矛盾(“ 矛盾 ”),或者兩者都不是(“ 中性 ”)的假設(shè)。 通過(guò)我們的預(yù)處理,輸入序列變成了“ mnli前提:我討厭鴿子”。 假設(shè):我對(duì)鴿子的感覺(jué)充滿(mǎn)敵意。 ”和相應(yīng)的目標(biāo)詞“ 蘊(yùn)含 ”。

    — T5 Paper

    — T5紙

    Here’s an issue with this. What if the predicted word is something else i.e. not “entailment”, “contradiction” or “neutral”. Well, in that case, the model is trained to consider all the other words as wrong.

    這是一個(gè)問(wèn)題。 如果預(yù)測(cè)的單詞不是“蘊(yùn)含”,“矛盾”或“中立”,該怎么辦? 好吧,在那種情況下,訓(xùn)練模型可以將所有其他單詞視為錯(cuò)誤。

    該模型 (The Model)

    The proposed model is essentially a Encoder-Decoder Transformer (Vaswani et. al.) with some architectural changes (like applying Layer Normalization before a sub block and then adding the initial input to the sub-block output; also known as pre-norm). Moreover, the model configuration is similar to BERT base (Devlin et. al.).

    提出的模型本質(zhì)上是一個(gè)編碼器-解碼器變壓器( Vaswani et al。 ),具有一些架構(gòu)上的變化(例如在子塊之前應(yīng)用Layer Normalization,然后將初始輸入添加到子塊輸出;也稱(chēng)為pre-norm)。 。 此外,模型配置類(lèi)似于BERT基( Devlin等人 )。

    We’ll skip these architectures as they’re out of scope for this article. If you’re interested in knowing the specifications of these models in particular, I have already covered them in the following articles:

    我們將跳過(guò)這些架構(gòu),因?yàn)樗鼈儾辉诒疚挠懻摲秶畠?nèi)。 如果您有興趣特別了解這些模型的規(guī)格,那么我將在以下文章中介紹它們:

  • Transformers: https://towardsdatascience.com/transformers-explained-65454c0f3fa7

    變形金剛: https : //towardsdatascience.com/transformers-explained-65454c0f3fa7

  • Transformers Implementation: https://medium.com/swlh/abstractive-text-summarization-using-transformers-3e774cc42453

    變壓器實(shí)現(xiàn): https : //medium.com/swlh/abstractive-text-summarization-using-transformers-3e774cc42453

  • BERT: https://medium.com/swlh/bert-pre-training-of-transformers-for-language-understanding-5214fba4a9af

    BERT: https : //medium.com/swlh/bert-pre-training-of-transformers-for-language-understanding-5214fba4a9af

  • 培訓(xùn)方式 (Training Approach)

    the Paper的紙

    At an architectural level, there are several options in selecting the training approach:The paper is an exhaustive survey on many modern approaches for language understanding. Hence, many architectural specifications have been explored and compared.

    在體系結(jié)構(gòu)級(jí)別上,選擇培訓(xùn)方法有多種選擇:本文是對(duì)許多現(xiàn)代語(yǔ)言理解方法的詳盡調(diào)查。 因此,已經(jīng)探索和比較了許多架構(gòu)規(guī)范。

  • Encoder-Decoder (Left): This is the standard encoder-decoder, seq2seq architecture wherein the encoder is trained in a BERT-style, fully visible manner (i.e. every token contributes to the attention calculation of every other token in the sequence), and the decoder is trained in a GPT-style causal manner (i.e. every token is attended by all the tokens that occur before it in the sequence).

    編碼器-解碼器(左):這是標(biāo)準(zhǔn)的編碼器-解碼器seq2seq架構(gòu),其中以BERT樣式, 完全可見(jiàn)的方式訓(xùn)練編碼器(即,每個(gè)令牌都有助于序列中每個(gè)其他令牌的注意力計(jì)算),以及解碼器以GPT樣式的因果方式進(jìn)行訓(xùn)練(即,每個(gè)令牌都由序列中在其之前出現(xiàn)的所有令牌所伴隨)。

  • Language Model (Middle): This is essentially the causal attention mechanism that was discussed earlier. It is an autoregressive modeling approach.

    語(yǔ)言模型(中):本質(zhì)上是前面討論的因果注意機(jī)制。 這是一種自回歸建模方法。

  • Prefix LM (Right): This is a combination of the BERT-style and language model approaches. For example, the task of translating from English to German can have a BERT-style attention on: “translate English to German: That is good. target:”. And then the translation “Das ist gut.” will be attended autoregressively.

    前綴LM(右):這是BERT樣式和語(yǔ)言模型方法的組合。 例如,將英語(yǔ)翻譯成德語(yǔ)的任務(wù)可以引起B(yǎng)ERT風(fēng)格的關(guān)注:“將英語(yǔ)翻譯成德語(yǔ):很好。 目標(biāo):”。 然后翻譯為“ Das ist gut”。 將自發(fā)參加。

  • With experimentation, the best results were obtained with the Encoder-Decoder approach.

    通過(guò)實(shí)驗(yàn),使用“編碼器-解碼器”方法可獲得最佳結(jié)果。

    無(wú)監(jiān)督目標(biāo) (Unsupervised Objective)

    the Paper文件中損壞跨度

    With respect to the pre-training objective too, the authors have explored some of the approaches in practice:

    關(guān)于培訓(xùn)前的目標(biāo),作者還探索了實(shí)踐中的一些方法:

  • Language Modeling: This approach mainly includes the causal prediction task i.e. predicting the next word in the sentence considering all the words preceding that word.

    語(yǔ)言建模:此方法主要包括因果預(yù)測(cè)任務(wù),即考慮該詞之前的所有詞來(lái)預(yù)測(cè)句子中的下一個(gè)詞。

  • Deshuffling: All the words in a sentence are shuffled and the model is trained to predict the original text.

    混洗將句子中的所有單詞混洗,并訓(xùn)練模型以預(yù)測(cè)原始文本。

  • Corrupting Spans: Masking a sequence of words from the sentence and training the model to predict these masked words as shown in the figure above. It is also known as a denoising objective.

    損壞的跨度:屏蔽句子中的一系列單詞,并訓(xùn)練模型以預(yù)測(cè)這些屏蔽的單詞,如上圖所示。 它也被稱(chēng)為降噪目標(biāo)。

  • After exploration, the denoising objective had the most promising results.

    經(jīng)過(guò)探索,降噪目標(biāo)得到了最有希望的結(jié)果。

    the Paper本文探索無(wú)監(jiān)督目標(biāo)

    結(jié)果 (Results)

    First things first, T5 has achieved the state of the art in many GLUE, SuperGLUE tasks along with translation and summarization benchmarks.

    首先,T5在許多GLUE,SuperGLUE任務(wù)以及翻譯和摘要基準(zhǔn)中都達(dá)到了最先進(jìn)的水平。

    T5 is surprisingly good at this task. The full 11-billion parameter model produces the exact text of the answer 50.1%, 37.4%, and 34.5% of the time on TriviaQA, WebQuestions, and Natural Questions, respectively.

    T5出奇地擅長(zhǎng)此任務(wù)。 完整的110億參數(shù)模型分別在TriviaQA , WebQuestions和Natural Questions上分別產(chǎn)生答案的準(zhǔn)確文本,分別為50.1%,37.4%和34.5%。

    — Google AI Blog

    — Google AI博客

    To generate realistic text, T5 relies on a fill-in-the-blanks type task with which it is familiar due to the pre-training. So, the authors have created a new downstream task called sized fill-in-the-blank. For example, given the sentence, “I like to eat peanut butter and _4_ sandwiches,”, the model will be trained to predict approximately 4 words for the blank.

    為了生成逼真的文本,T5依賴(lài)于由于預(yù)先訓(xùn)練而熟悉的填空任務(wù)。 因此,作者創(chuàng)建了一個(gè)新的下游任務(wù),稱(chēng)為“ 大小填充空白” 。 例如,給定句子“ 我喜歡吃花生醬和_4_三明治 ”,該模型將被訓(xùn)練為空白預(yù)測(cè)大約4個(gè)單詞。

    Fun fact: The model also adjusts its predictions based on the requested size of the missing text.

    有趣的事實(shí):該模型還會(huì)根據(jù)請(qǐng)求的缺失文本大小來(lái)調(diào)整其預(yù)測(cè)。

    For the demonstration of the above, refer to the official blog.

    有關(guān)上述說(shuō)明,請(qǐng)參閱官方博客 。

    放在一起 (Putting it All Together)

    Google AI BlogGoogle AI博客進(jìn)行 T5的預(yù)訓(xùn)練和微調(diào)
    • T5 is first pre-trained on the C4 dataset for the denoising, corrupting span objective with an Encoder-Decoder architecture.

      T5首先在C4數(shù)據(jù)集上經(jīng)過(guò)預(yù)編碼,以使用Encoder-Decoder體系結(jié)構(gòu)進(jìn)行降噪,破壞跨度目標(biāo)。
    • It is then fine tuned on the downstream tasks with a supervised objective with appropriate input modeling for the text-to-text setting.

      然后在帶有監(jiān)督目標(biāo)的下游任務(wù)上進(jìn)行微調(diào),并為文本到文本設(shè)置設(shè)置適當(dāng)?shù)妮斎肽P汀?

    結(jié)論 (Conclusion)

    In this article, we dived deep into Google’s T5 model which is one of the state of the art models in language understanding. We saw the new dataset: C4. The main takeaway from this article would be the empirical results obtained by the T5 authors regarding the training approaches, model architectures and the datasets. Moreover, it can be also observed that DL is approaching more and more towards achieving human quality understanding— in this context, generalizing to just one model for many NLP tasks.

    在本文中,我們深入研究了Google的T5模型,該模型是語(yǔ)言理解方面的最新模型之一。 我們看到了新的數(shù)據(jù)集:C4。 本文的主要內(nèi)容是T5作者在訓(xùn)練方法,模型架構(gòu)和數(shù)據(jù)集方面的經(jīng)驗(yàn)結(jié)果。 此外,還可以觀察到,DL正在越來(lái)越多地實(shí)現(xiàn)對(duì)人類(lèi)素質(zhì)的理解-在這種情況下,DL僅適用于許多NLP任務(wù)的模型。

    Github repo: https://github.com/google-research/text-to-text-transfer-transformer

    Github倉(cāng)庫(kù): https : //github.com/google-research/text-to-text-transfer-transformer

    API for the model architecture and pre-trained weights by huggingface: https://huggingface.co/transformers/model_doc/t5.html

    通過(guò)擁抱面Kong獲得模型架構(gòu)和預(yù)訓(xùn)練權(quán)重的API: https ://huggingface.co/transformers/model_doc/t5.html

    C4 Tensorflow datasets: https://www.tensorflow.org/datasets/catalog/c4

    C4 Tensorflow數(shù)據(jù)集: https ://www.tensorflow.org/datasets/catalog/c4

    翻譯自: https://towardsdatascience.com/t5-text-to-text-transfer-transformer-643f89e8905e

    變壓器 5g

    總結(jié)

    以上是生活随笔為你收集整理的变压器 5g_T5:文本到文本传输变压器的全部?jī)?nèi)容,希望文章能夠幫你解決所遇到的問(wèn)題。

    如果覺(jué)得生活随笔網(wǎng)站內(nèi)容還不錯(cuò),歡迎將生活随笔推薦給好友。