當(dāng)前位置：首頁 > 编程资源 > 编程问答 >内容正文

编程问答

机器学习中激活函数和模型_探索机器学习中的激活和丢失功能

發(fā)布時間：2023/12/15 编程问答 31 豆豆

生活随笔收集整理的這篇文章主要介紹了机器学习中激活函数和模型_探索机器学习中的激活和丢失功能小編覺得挺不錯的,現(xiàn)在分享給大家,幫大家做個參考.

機器學(xué)習(xí)中激活函數(shù)和模型

In this post, we’re going to discuss the most widely-used activation and loss functions for machine learning models. We’ll take a brief look at the foundational mathematics of these functions and discuss their use cases, benefits, and limitations.

在本文中，我們將討論用于機器學(xué)習(xí)模型的最廣泛使用的激活和損失函數(shù)。我們將簡要介紹這些功能的基礎(chǔ)數(shù)學(xué)，并討論它們的用例，好處和局限性。

Without further ado, let’s get started!

事不宜遲，讓我們開始吧！

Source:https://missinglink.ai/guides/neural-network-concepts/7-types-neural-network-activation-functions-right/資料來源： https : //missinglink.ai/guides/neural-network-concepts/7-types-neural-network-activation-functions-right/

什么是激活功能？ (What is an Activation Function?)

To learn complex data patterns, the input data of each node in a neural network passes through a function the limits and defines that same node’s output value. In other words, it takes in the output signal from the previous node and converts it into a form interpretable by the next node. This is what an activation function allows us to do.

為了學(xué)習(xí)復(fù)雜的數(shù)據(jù)模式，神經(jīng)網(wǎng)絡(luò)中每個節(jié)點的輸入數(shù)據(jù)都要通過一個函數(shù)限制，并定義相同節(jié)點的輸出值。換句話說，它接收來自前一個節(jié)點的輸出信號，并將其轉(zhuǎn)換為下一個節(jié)點可解釋的形式。這就是激活功能允許我們執(zhí)行的操作。

需要激活功能 (Need for an Activation function)

Restricting value: The activation function keeps the values from the node restricted within a certain range, because they can become infinitesimally small or huge depending on the multiplication or other operations they go through in various layers (i.e. the vanishing and exploding gradient problem).

限制值：激活函數(shù)將節(jié)點的值限制在一定范圍內(nèi)，因為根據(jù)它們在各個層中經(jīng)歷的乘法或其他運算(即消失和爆炸梯度問題)，它們的值可能無限小或很大。

Add non-linearity: In the absence of an activation function, the operations done by various functions can be considered as stacked over one another, which ultimately means a linear combination of operations performed on the input. Thus, a neural network without an activation function is essentially a linear regression model.

增加非線性：在沒有激活功能的情況下，可以將各種功能完成的操作視為彼此疊加，這最終意味著對輸入執(zhí)行的操作的線性組合。因此， 沒有激活函數(shù)的神經(jīng)網(wǎng)絡(luò)本質(zhì)上是線性回歸模型 。

激活功能的類型 (Types of Activation functions)

Various types of activation functions are listed below:

下面列出了各種類型的激活功能：

乙狀結(jié)腸激活功能 (Sigmoid Activation function)

The sigmoid function was traditionally used for binary classification problems (goes along the lines of “if x≤0.5, y=0 else y=1”). But, it tends to cause vanishing gradients problem, and if the values are too close to 0 or +1, the curve or gradient is almost flat and thus the learning would be too slow.

S型函數(shù)通常用于二進制分類問題(沿“如果x≤0.5，y = 0，否則y = 1”)。但是，它往往會導(dǎo)致梯度消失的問題 ，并且，如果值太接近0或+1，則曲線或梯度幾乎是平坦的，因此學(xué)習(xí)會太慢。

It’s also computationally expensive, since there are a lot of complex mathematical operations involved.

由于涉及許多復(fù)雜的數(shù)學(xué)運算，因此它在計算上也很昂貴。

Tanh激活功能 (Tanh Activation Function)

The tanh function was also traditionally used for binary classification problems (goes along the lines of “if x≤0, y=0 else y=1”).

傳統(tǒng)上，tanh函數(shù)也用于二進制分類問題(沿“如果x≤0，y = 0，否則y = 1”的線)。

It’s different than sigmoid in the sense that it’s zero-centred, and thus restricts input values between -1 and +1. It’s even more computationally expensive than sigmoid since there are a lot of complex mathematical operations involved, which need to be performed for every input and iteration, repeatedly.

它在以零為中心的意義上不同于Sigmoid，因此將輸入值限制在-1和+1之間。它比Sigmoid的計算成本更高，因為其中涉及許多復(fù)雜的數(shù)學(xué)運算，每個輸入和迭代都需要重復(fù)執(zhí)行這些運算。

ReLU激活功能 (ReLU Activation Function)

ReLU is a famous, widely-used non-linear activation function, which stands for Rectified Linear Unit (goes along the lines of “if x≤0, y=0 else y=1”).

ReLU是一個著名的，廣泛使用的非線性激活函數(shù)，代表整流線性單位(沿“如果x≤0，y = 0，否則y = 1”的線)。

Thus, it’s only activated when the values are positive. ReLU is less computationally expensive than tanh and sigmoid because it involves simpler mathematical operations.

因此，僅當(dāng)值為正時才激活它。 ReLU在計算上比tanh和Sigmoid 便宜，因為它涉及更簡單的數(shù)學(xué)運算。

But it faces what’s known as the “dying ReLU problem”—that is, when inputs approach zero, or are negative, the gradient of the function becomes zero, and thus the model learns slowly. ReLU is considered a go-to function if one is new to activation function or is unsure about which one to choose.

但是它面臨著所謂的“即將死去的ReLU問題”，也就是說，當(dāng)輸入接近零或為負時，函數(shù)的梯度變?yōu)榱?#xff0c;因此模型學(xué)習(xí)緩慢 。如果對激活功能不熟悉或不確定要選擇哪個功能，則ReLU被認為是一項入門功能。

泄漏ReLU激活功能 (Leaky-ReLU Activation function)

The leaky ReLU function proposes a solution to the dying ReLU problem. It has a small positive slope in the negative plane, so it enables the model to learn, even for negative input values.

泄漏的ReLU函數(shù)為即將死去的ReLU問題提出了解決方案。它在負平面上具有小的正斜率，因此即使對于負輸入值，它也使模型能夠?qū)W習(xí)。

Leaky ReLUs are widely used with generative adversarial networks. Parametric leaky ReLUs use a value “alpha”, which is usually around 0.1, to determine the slope of the function in the negative plane.

泄漏的ReLU被廣泛用于生成對抗網(wǎng)絡(luò) 。參數(shù)泄漏ReLU使用值“ alpha”(通常約為0.1)來確定函數(shù)在負平面中的斜率。

Softmax激活功能 (Softmax Activation Function)

Source:https://medium.com/data-science-bootcamp/understand-the-softmax-function-in-minutes-f3a59641e86d資料來源： https : //medium.com/data-science-bootcamp/understand-the-softmax-function-in-minutes-f3a59641e86d

The softmax function is a function that helps us represent inputs in terms of a discrete probability distribution. According to the formula, we need to apply an exponential function to each element of the output layer and normalize the values to ensure their summation is 1. The output class is the one with the highest confidence score.

softmax函數(shù)是幫助我們以離散概率分布表示輸入的函數(shù)。根據(jù)公式，我們需要對輸出層的每個元素應(yīng)用指數(shù)函數(shù)，并對值進行歸一化以確保其總和為1。輸出類是置信度得分最高的類。

This function is mostly used as the last layer in classification problems—especially multi-class classification problems—where the model ultimately outputs a probability for each of the available classes, and the most probable one is chosen as the answer.

此功能通常用作分類問題(尤其是多分類問題)的最后一層，在該問題中，模型最終為每個可用分類輸出概率，然后選擇最有可能的一個作為答案。

The future of machine learning is on the edge. Subscribe to the Fritz AI Newsletter to discover the possibilities and benefits of embedding ML models inside mobile apps.

機器學(xué)習(xí)的未來處于邊緣。訂閱Fritz AI新聞簡報，發(fā)現(xiàn)將ML模型嵌入移動應(yīng)用程序的可能性和好處。

什么是損失函數(shù)？ (What is a Loss Function?)

Source: https://www.analyticsvidhya.com/blog/2019/08/detailed-guide-7-loss-functions-machine-learning-python-code/來源： https : //www.analyticsvidhya.com/blog/2019/08/detailed-guide-7-loss-functions-machine-learning-python-code/

To understand how well/poorly our model is working, we monitor the value of loss functions for several iterations. It helps us measure the accuracy of our model and understand how our model behaves for certain inputs. Thus, it can be considered as the error or deviation in the prediction from the correct classes or values. The larger the value of the loss function, the further our model strays from making the correct prediction.

為了了解我們的模型工作的好壞，我們監(jiān)視損失函數(shù)的值進行多次迭代。它可以幫助我們評估模型的準確性，并了解模型在某些輸入下的行為。因此， 可以將其視為與正確的類或值的預(yù)測中的誤差或偏差 。損失函數(shù)的值越大，我們的模型就越無法做出正確的預(yù)測。

損失函數(shù)的類型 (Types of loss functions)

Depending on the type of learning task, loss functions can be broadly classified into 2 categories:

根據(jù)學(xué)習(xí)任務(wù)的類型，損失函數(shù)可以大致分為兩類：

Regression loss functions

回歸損失函數(shù)

Classification loss functions

分類損失函數(shù)

回歸損失函數(shù) (Regression Loss Functions)

In this sub-section, we’ll discuss some of the more widely-used regression loss functions:

在本小節(jié)中，我們將討論一些更廣泛使用的回歸損失函數(shù)：

平均絕對誤差或L1損耗(MAE) (Mean Absolute Error or L1 Loss (MAE))

The mean absolute error is the average of absolute differences between the values predicted by the model and the actual values. There’s an issue with MAE though—if some values are underestimated (negative value of error) and some are almost equally overestimated (positive value of error), they might cancel each other out, and we may get the wrong idea about the net error.

平均絕對誤差是模型預(yù)測的值與實際值之間的絕對差的平均值。但是，MAE存在一個問題-如果某些值被低估(錯誤的負值)，而有些值幾乎同樣被高估了(錯誤的正值)，它們可能會相互抵消，并且我們可能會得出關(guān)于凈錯誤的錯誤觀念。

均方誤差或L2損耗(MSE) (Mean Squared Error or L2 Loss(MSE))

The mean squared error is the average of the squared differences between the values predicted by the model and the actual values. Squaring the error also helps us avoid the nullification issue faced by MAE.

均方誤差是模型預(yù)測的值與實際值之間的平方差的平均值。對錯誤進行平方也有助于我們避免MAE面臨的無效問題。

MSE is also used to emphasize the error terms in cases where the input and output values have small scales. Thus, due to squaring the error terms, large errors have relatively greater influence when using MSE than smaller errors.

在輸入和輸出值的比例較小的情況下，MSE還用于強調(diào)誤差項。因此，由于對誤差項進行平方，因此使用MSE時，大誤差比小誤差具有更大的影響。

However, this can be a gamble when there are a lot of outliers in your data. Since the outliers would have greater weight due to higher error values being squared, it can make the error or loss function biased. Thus, outlier eradication should be performed before applying MSE.

但是，當(dāng)數(shù)據(jù)中有很多異常值時，這可能是一場賭博。由于較高的誤差值會導(dǎo)致異常值具有更大的權(quán)重，因此會使誤差或損失函數(shù)產(chǎn)生偏差。因此，應(yīng)在應(yīng)用MSE之前執(zhí)行離群值消除。

胡貝爾損失 (Huber loss)

Huber loss is an absolute error, and as you can see from the formula above, it becomes quadratic as the error grows smaller and smaller. In the above formula, y is the expected value, x? ? is the predicted value, and t? is a user-defined hyper-parameter.

Huber損耗是絕對誤差，從上式可以看出，隨著誤差變得越來越小，它變?yōu)槎畏健?在上式中，y是期望值，x?是預(yù)測值，t?是用戶定義的超參數(shù)。

分類損失函數(shù) (Classification Loss Functions)

In this sub-section, we’ll discuss some of the more widely-used loss functions for classification tasks:

在本小節(jié)中，我們將討論用于分類任務(wù)的一些更廣泛使用的損失函數(shù)：

交叉熵損失 (Cross-Entropy loss)

This loss is also called log loss. To understand cross-entropy loss, let’s first understand what entropy is. Entropy refers to the disorder or uncertainty in data. The larger the entropy value, the higher the level of disorder.

此丟失也稱為對數(shù)丟失。為了了解交叉熵損失，讓我們首先了解什么是熵。熵是指數(shù)據(jù)的混亂或不確定性。熵值越大，無序程度越高。

As you can see in the above formula, the entropy is basically the negative summation of the product of the probability of occurrence of an event with its log over all possible outcomes. Thus, cross entropy as a loss function signifies reducing entropy or uncertainty for the class to be predicted.

如上式所示，熵基本上是事件發(fā)生概率與所有可能結(jié)果的對數(shù)的乘積的負和。因此，作為損失函數(shù)的交叉熵表示要預(yù)測的類別的熵或不確定性降低。

Cross-entropy loss is therefore defined as the negative summation of the product of the expected class and the natural log of the predicted class over all possible classes. The negative sign is used because the positive log of numbers < 1 returns negative values, which is confusing to work with while evaluating model performance.

因此，交叉熵損失定義為在所有可能類別上，預(yù)期類別與預(yù)測類別的自然對數(shù)之積的負和。使用負號是因為數(shù)字<1的正對數(shù)會返回負值，這在評估模型性能時會造成混淆。

For example, if the problem at hand is binary classification, the value of y can be 0 or 1. In such a case, the above loss formula reduces to:

例如，如果眼前的問題是二進制分類，則y的值可以為0或1。在這種情況下，上述損耗公式可簡化為：

-(ylog(p)+(1?y)log(1?p))

-(ylog(p)+(1-y)log(1-p))

where p is the value of predicted probability that an observation O is of class C.

其中p是觀測值O屬于C類的預(yù)測概率的值。

Thus, the loss function over the complete set of samples would be:

因此，整個樣本集的損失函數(shù)為：

鉸鏈損失 (Hinge loss)

Hinge loss helps in penalizing the wrongly-predicted values, as well as the values that were correctly predicted but with a lower probability score. Hinge loss is primarily used with Support Vector Machines (SVMs), since it supports the formation of a large-margin classifier by penalizing wrongly-predicted values, as well as the correctly-predicted ones with low probability.

鉸鏈損失有助于懲罰錯誤預(yù)測的值以及正確預(yù)測但具有較低概率分數(shù)的值。鉸鏈損耗主要與支持向量機(SVM)一起使用，因為它通過懲罰錯誤預(yù)測的值以及低概率正確預(yù)測的值來支持大利潤分類器的形成。

結(jié)論 (Conclusion)

In this post we discussed about various activation functions like sigmoid, tanh, ReLU, leaky-ReLU and softmax, along with their primary use cases. These are the most widely-used activation functions and are essential for developing efficient neural networks.

在這篇文章中，我們討論了各種激活函數(shù)，例如Sigmoid，tanh，ReLU，leaky-ReLU和softmax，以及它們的主要用例。這些是使用最廣泛的激活函數(shù)，對于開發(fā)有效的神經(jīng)網(wǎng)絡(luò)至關(guān)重要。

We also discussed a few major loss functions like mean squared error, mean absolute error, huber loss, cross-entropy loss, and hinge loss.

我們還討論了一些主要的損失函數(shù)，例如均方誤差，平均絕對誤差，huber損失，交叉熵損失和鉸鏈損失。

I hope this article has helped you learn and understand more about these fundamental ML concepts. All feedback is welcome. Please help me improve!

我希望本文能幫助您學(xué)習(xí)和了解有關(guān)這些基本ML概念的更多信息。歡迎所有反饋。請幫我改善！

Until next time!😊

直到下次！😊

Editor’s Note: Heartbeat is a contributor-driven online publication and community dedicated to exploring the emerging intersection of mobile app development and machine learning. We’re committed to supporting and inspiring developers and engineers from all walks of life.

編者注：心跳是由貢獻者驅(qū)動的在線出版物和社區(qū)，致力于探索移動應(yīng)用程序開發(fā)和機器學(xué)習(xí)的新興交集。我們致力于為各行各業(yè)的開發(fā)人員和工程師提供支持和啟發(fā)。

Editorially independent, Heartbeat is sponsored and published by Fritz AI, the machine learning platform that helps developers teach devices to see, hear, sense, and think. We pay our contributors, and we don’t sell ads.

Heartbeat在編輯上是獨立的，由以下機構(gòu)贊助和發(fā)布 Fritz AI ，一種機器學(xué)習(xí)平臺，可幫助開發(fā)人員教設(shè)備看，聽，感知和思考。我們向貢獻者付款，并且不出售廣告。

If you’d like to contribute, head on over to our call for contributors. You can also sign up to receive our weekly newsletters (Deep Learning Weekly and the Fritz AI Newsletter), join us on Slack, and follow Fritz AI on Twitter for all the latest in mobile machine learning.

如果您想做出貢獻，請繼續(xù)我們的 呼吁捐助者 。您還可以注冊以接收我們的每周新聞通訊(《 深度學(xué)習(xí)每周》 和《 Fritz AI新聞通訊》 )，并加入我們 Slack ，然后繼續(xù)關(guān)注Fritz AI Twitter 提供了有關(guān)移動機器學(xué)習(xí)的所有最新信息。

翻譯自: https://heartbeat.fritz.ai/exploring-activation-and-loss-functions-in-machine-learning-39d5cb3ba1fc

機器學(xué)習(xí)中激活函數(shù)和模型

總結(jié)

以上是生活随笔為你收集整理的机器学习中激活函数和模型_探索机器学习中的激活和丢失功能的全部內(nèi)容，希望文章能夠幫你解決所遇到的問題。

如果覺得生活随笔網(wǎng)站內(nèi)容還不錯，歡迎將生活随笔推薦給好友。

上一篇：熔池沉积_用于3D打印的AI（第3部分
下一篇： macos上的硬盘检测工具_如何在Mac