當前位置：首頁 > 编程资源 > 编程问答 >内容正文

编程问答

bert简介_BERT简介

發布時間：2023/12/15 编程问答 29 豆豆

生活随笔收集整理的這篇文章主要介紹了 bert简介_BERT简介小編覺得挺不錯的,現在分享給大家,幫大家做個參考.

bert簡介

BERT, Bi-directional Encoder Representation from Transformer, is a state of the art language model by Google which can be used for cutting-edge natural language processing (NLP) tasks.

BERT是Transformer的雙向編碼器表示形式，是Google先進的語言模型，可用于尖端的自然語言處理(NLP)任務。

After reading this article, you will have a basic understanding of BERT and will be able to utilize it for your own business applications. It would be helpful if you are familiar with Python and have a general idea of machine learning.

閱讀本文之后，您將對BERT有基本的了解，并將能夠將其用于您自己的業務應用程序。如果您熟悉Python并且對機器學習有所了解，這將很有幫助。

The BERT models I will cover in this article are:

我將在本文中介紹的BERT模型是：

Binary or multi-class classification
二進制或多類分類
Regression model
回歸模型
Question-answering applications
問答應用

Introduction to BERT

BERT簡介

BERT is trained on the entirety of Wikipedia (~2.5 billion words), along with a book corpus (~800 million words). In order to utilize BERT, you won’t have to repeat this compute-intensive process.

BERT接受了整個Wikipedia(約25億個單詞)以及書籍語料庫(約8億個單詞)的培訓。為了利用BERT，您不必重復此計算密集型過程。

BERT brings the transfer learning approach into the natural language processing area in a way that no language model has done before.

BERT以前所未有的語言模型將遷移學習方法引入自然語言處理領域。

Transfer Learning

轉移學習

Transfer learning is a process where a machine learning model developed for a general task can be reused as a starting point for a specific business problem.

轉移學習是一個過程，在該過程中，可以將為一般任務開發的機器學習模型重新用作特定業務問題的起點。

Imagine you want to teach someone named Amanda, who doesn’t speak English, how to take the SAT. The first step would be to teach Amanda the English language as thoroughly as possible. Then, you can teach her more specifically for the SAT.

想象一下，您想教一個不會說英語的名叫Amanda的人參加SAT考試。第一步是盡可能全面地教阿曼達英語。然后，您可以針對SAT更具體地教她。

In the context of a machine learning model, this idea is known as transfer learning. The first part of transfer learning is pre-training (similar to teaching Amanda English for the first time). After the pre-training is complete you can focus on a specific task (like teaching Amanda how to take the SAT). This is a process known as fine-tuning — changing the model so it can fit your specific business problem.

在機器學習模型的上下文中，這個想法稱為轉移學習。轉移學習的第一部分是預培訓(類似于第一次教阿曼達英語)。預培訓完成后，您可以專注于特定任務(例如教阿曼達(Amanda)如何參加SAT)。這是一個稱為微調的過程-更改模型以使其適合您的特定業務問題。

BERT Pre-training

BERT預訓練

This is a quick introduction about the BERT pre-training process. For practical purposes, you can use a pre-trained BERT model and do not need to perform this step.

這是有關BERT預訓練過程的快速介紹。出于實際目的，您可以使用預訓練的BERT模型，而無需執行此步驟。

BERT takes two chunks of text as input. In the simplified example above, I referred to these two inputs as Sentence 1 and Sentence 2. In the pre-training for BERT, Sentence 2 intentionally does not follow Sentence 1 in about half of the training examples.

BERT將兩個文本塊作為輸入。在上面的簡化示例中，我將這兩個輸入稱為句子1和句子2。在BERT的預訓練中，在大約一半的訓練示例中，句子2故意不遵循句子1。

Sentence 1 starts with a special token [CLS] and both sentences end with another special token [SEP]. There will be a single token for each word that is in the BERT vocabulary. If a word is not in the vocabulary, BERT will split that word into multiple tokens. Before feeding sentences to BERT, 15% of the tokens are masked.

句子1以特殊標記[CLS]開頭，兩個句子都以另一個特殊標記[SEP]結尾。 BERT詞匯表中的每個單詞都有一個令牌。如果單詞不在詞匯表中，則BERT會將單詞拆分為多個標記。在將句子提供給BERT之前，將屏蔽15％的令牌。

The pre-training process, the first step of transfer learning, is like teaching English to the BERT model so that it can be used for various tasks which require English knowledge. This is accomplished by the two practice tasks given to BERT:

預培訓過程是遷移學習的第一步，就像在BERT模型上教英語一樣，它可以用于需要英語知識的各種任務。這是通過給BERT的兩個練習任務完成的：

Predict masked (hidden) tokens. To illustrate, the words “favorite” and “to” are masked in the diagram above. BERT will try to predict these masked tokens as part of the pre-training. This is similar to a “fill in the blanks” task we may give to a student who is learning English. While trying to fill in the missing words, the student will learn the language. This is referred to as the Masked Language Model (MLM).

預測屏蔽(隱藏)令牌。為了說明起見，在上圖中屏蔽了單詞“收藏夾”和“收件人”。 BERT將在預訓練中嘗試預測這些被屏蔽的令牌。這類似于我們可能給予正在學習英語的學生的“填補空白”任務。在嘗試填寫缺失的單詞時，學生將學習該語言。這被稱為屏蔽語言模型(MLM)。

BERT also tries to predict if Sentence 2 logically follows Sentence 1 or not in order to provide a deeper understanding about sentence dependencies. In the example above, Sentence 2 is in logical continuation of Sentence 1, so the prediction will be True. The special token [CLS] on the output side is used for this task.

BERT還嘗試預測句子2在邏輯上是否跟隨句子1，以提供對句子依存關系的更深入理解。在上面的示例中，句子2是句子1的邏輯延續，因此預測將為True。輸出端的特殊令牌[CLS]用于此任務。

The BERT pre-trained model comes in many variants. The most common ones are BERT Large and BERT Base:

BERT預訓練模型有許多變體。最常見的是BERT Large和BERT Base：

BERT Fine-Tuning

BERT微調

Fine-tuning is the next part of transfer learning. For specific tasks, such as text classification or question-answering, you would perform incremental training on a much smaller dataset. This adjusts the parameters of the pre-trained model.

微調是遷移學習的下一部分。對于特定任務，例如文本分類或問題解答，您將在較小的數據集上進行增量訓練。這將調整預訓練模型的參數。

用例 (Use Cases)

To demonstrate practical uses of BERT, I am providing two examples below. The code and documentation are provided in both GitHub and Google Colab. You can use either of the options to follow along and try it out for yourself!

為了演示BERT的實際用法，我在下面提供兩個示例。 GitHub和Google Colab中都提供了代碼和文檔。您可以使用以下任何一種方法來自己嘗試一下！

Text Classification or Regression

文字分類或回歸

This is sample code for the binary classification of tweets. Here we have two types of tweets, disaster-related tweets (target = 1) and normal tweets (target = 0). We fine-tune the BERT Base model to classify tweets into these two groups.

這是推文的二進制分類的示例代碼。在這里，我們有兩種類型的推文，與災難有關的推文(目標= 1)和普通推文(目標= 0)。我們對BERT Base模型進行微調，以將推文分為這兩類。

GitHub: https://github.com/sanigam/BERT_Medium

GitHub： https : //github.com/sanigam/BERT_Medium

Google Colab: https://colab.research.google.com/drive/1ARH9dnugVuKjRTNorKIVrgRKitjg051c?usp=sharing

Google Colab： https ：//colab.research.google.com/drive/1ARH9dnugVuKjRTNorKIVrgRKitjg051c ？ usp = sharing

This code can be used for multi-class classification or regression by using appropriate values of parameters in the function bert_model_creation(). The code provides details on parameter values. If you want, you can add additional dense layers into this function.

通過在函數bert_model_creation()中使用適當的參數值，此代碼可用于多類分類或回歸。該代碼提供了有關參數值的詳細信息。如果需要，可以在此功能中添加其他密集層。

2. BERT for Question-Answering

2. BERT進行問題解答

This is another interesting use case for BERT, where you input a passage and a question into the BERT model. It can find the answer to the question based on information given in the passage. In this code, I am using the BERT Large model, which is already fine-tuned on the Stanford Question Answer Dataset (SQuAD). You will see how to use this fine-tuned model to get answers from a given passage.

這個是BERT的另一個有趣用例，您在BERT模型中輸入了段落和問題。它可以根據段落中給出的信息找到問題的答案。在此代碼中，我使用的是BERT Large模型，該模型已經在Stanford問題答案數據集(SQuAD)上進行了微調。您將看到如何使用此微調的模型從給定的段落中獲得答案。

GitHub: https://github.com/sanigam/BERT_QA_Medium

GitHub： https : //github.com/sanigam/BERT_QA_Medium

Google Colab: https://colab.research.google.com/drive/1ZpeVygQJW3O2Olg1kZuLnybxZMV1GpKK?usp=sharing

Google Colab： https ：//colab.research.google.com/drive/1ZpeVygQJW3O2Olg1kZuLnybxZMV1GpKK ？ usp = sharing

Example with this use case:

此用例示例：

Passage — “John is a 10 year old boy. He is the son of Robert Smith. Elizabeth Davis is Robert’s wife. She teaches at UC Berkeley. Sophia Smith is Elizabeth’s daughter. She studies at UC Davis”

段落— “約翰是個10歲的男孩。他是羅伯特·史密斯(Robert Smith)的兒子。伊麗莎白·戴維斯(Elizabeth Davis)是羅伯特(Robert)的妻子。她在加州大學伯克利分校任教。索菲亞·史密斯(Sophia Smith)是伊麗莎白的女兒。她在加州大學戴維斯分校學習”

Question — “Which college does John’s sister attend?”

問題— “約翰的姐姐上哪一所大學？”

When these two inputs are passed in, the model returns the correct answer, “uc davis”

傳入這兩個輸入后，模型將返回正確的答案“ uc davis”

This example proves that BERT can understand language structure and handle dependencies across sentences. It can apply simple logic to answer the question (e.g. to find out who John’s sister is). Please note that you can have a passage that is much longer than the example shown above, but the total length of the question and passage cannot exceed 512 tokens. If your passage is longer than that, the code will automatically truncate the extra part.

該示例證明BERT可以理解語言結構并處理句子之間的依存關系。它可以應用簡單的邏輯來回答問題(例如，找出約翰的姐姐是誰)。請注意，您可以通過的段落比上面顯示的示例長得多，但是問題和段落的總長度不能超過512個記號。如果您的通過時間超過該時間，則代碼將自動截斷多余的部分。

The code provides examples in addition to the one shown above— a total of 3 passages and 22 questions. One of these passages is a version of my BERT article. You will see that BERT QA is able to answer any question where it can get answer from the passage. You can customize the code for your own question-answering applications.

除了上面顯示的示例外，該代碼還提供了示例-共有3個段落和22個問題。這些文章之一是我的BERT文章的一個版本。您將看到BERT QA能夠回答任何可以從文章中獲得答案的問題。您可以為自己的問答應用程序定制代碼。

Hopefully this provides you with a good jump start to use BERT for your own practical applications. If you have any questions or feedback, feel free to let me know!

希望這可以為您在自己的實際應用中使用BERT提供一個良好的開始。如果您有任何疑問或反饋，請隨時告訴我！

翻譯自: https://medium.com/analytics-vidhya/introduction-to-bert-f9aa4075cf4f

bert簡介

總結

以上是生活随笔為你收集整理的bert简介_BERT简介的全部內容，希望文章能夠幫你解決所遇到的問題。

如果覺得生活随笔網站內容還不錯，歡迎將生活随笔推薦給好友。

上一篇： python 降噪_使用降噪自动编码器重
下一篇： html两个框架同时_两个框架的故事