當前位置：首頁 > 编程资源 > 编程问答 >内容正文

编程问答

r中如何求变量的对数转换_对数转换以求阳性。

發布時間：2023/12/15 编程问答 43 豆豆

生活随笔收集整理的這篇文章主要介紹了 r中如何求变量的对数转换_对数转换以求阳性。小編覺得挺不錯的,現在分享給大家,幫大家做個參考.

r中如何求變量的對數轉換

In Simple terms, log transform squashes or compresses range of large numbers and expands the range of small numbers. So if x is larger, then slower the log(x) increments.

用簡單的術語來說，對數變換可以擠壓或壓縮大數范圍，并擴大小數范圍。因此，如果x較大，則log(x)的增量會變慢。

Log transform on range(1,1000), on x axis is real value and on y axis is log transformed value.在range(1,1000)上的對數變換，在x軸上是實數值，在y軸上是對數變換值。

If you closely look at the plot above, which actually talks about log transformation on values ranging from 1 to 1000. As we can see from the plot, log has transformed values from [1,1000] into [0,7] range.

如果仔細看一下上面的圖，它實際上是關于從1到1000的值的對數轉換。從圖中可以看出，對數已將值從[1,1000]轉換為[0,7]范圍。

Note that how x values from 200 to 1000 get compressed into just ~5 and 7. So the larger the x, slower the log(x) increments.

請注意，如何將200到1000之間的x值壓縮為僅?5和7。因此，x越大，log(x)的增量越慢。

Log is only defined when x>0. Log 0 is undefined. It’s not a real number, let’s say Log (base 10) 0=x, so 10^x=0, if you try to solve this, you will see that no value of x raised to the power of 10 gives you zero. 10? is also 1.

僅在x> 0時定義對數。日志0未定義。它不是一個實數，比方說對數(以10為底)0 = x，所以10 ^ x = 0，如果嘗試解決這個問題，您會發現x的任何數值都不提高到10的冪。 10?也是1。

Log transform is also known as variance stabilizing transform, which is useful when dealing with heavy tailed distributions. Log transform can make highly skewed distributions less skewed. So log transform reduces or removes skewness in data.

對數變換也稱為方差穩定變換，在處理重尾分布時很有用。對數變換可以使高度偏斜的分布減少偏斜。因此，對數變換可以減少或消除數據的偏斜。

Log transform reduces or removes skewness and tries to make our distribution normal.對數變換可減少或消除偏斜，并嘗試使分布正常。

使用對數變換作為特征工程技術： (Using Log transform as feature engineering technique:)

To reduce or remove skewness in our data distribution and make it more normal (A.K.A Gaussian distribution) we can use log transformation on our input features (X).

為了減少或消除數據分布中的偏斜并使之更正態(又稱高斯分布)，我們可以對輸入要素(X)使用對數變換。

We usually see heavy tailed distributions in real world data where values are right skewed(More larger values in distribution) and left skewed(More smaller values in distribution). Algorithms can be sensitive to such distribution of values and can under perform if the range is not properly normalized.

我們通常會在現實世界數據中看到重尾分布，其中值右偏(分布中的值更大)和左偏(分布中的值更小)。算法可能對這種值的分布很敏感，如果范圍未正確歸一化，則算法可能會表現不佳。

Skewed distribution偏斜分布 Log transform distribution對數變換分布

It is common practice to apply a logarithmic transformation on the data so that the very large and very small values do not negatively affect the performance of a learning algorithm. Log transform reduces the range of values caused by outliers.

通常的做法是對數據應用對數轉換，以使非常大和非常小的值都不會對學習算法的性能產生負面影響。對數變換可減少由異常值引起的值范圍。

However it is important to remember that once log transform is done, observing data in its raw form will no longer have the same original meaning, as Log transforming the data.

但是，重要的是要記住，一旦完成對數轉換，以原始形式觀察數據將不再具有與對數進行數據轉換相同的原始含義。

Next question is: when we do linear regression and get coefficient for X (Independent variable) how do we interpret log transformed independent variables (X) coefficient (Feature importance).

下一個問題是：當我們進行線性回歸并獲得X(獨立變量)的系數時，我們如何解釋對數變換后的獨立變量(X)系數(特征重要性)。

For Independent variable(X) Divide the coefficient by 100. This tells us that a 1% increase in the independent variable increases (or decreases) the dependent variable by (coefficient/100) units.

對于自變量(X)，將系數除以100。這告訴我們，自變量增加1％，因變量增加(或減少)的系數為(系數/ 100)單位。

Example: the coefficient is 0.198. 0.198/100 = 0.00198. For every 1% increase in the independent variable, our dependent variable increases by about 0.002.

示例：系數為0.198。 0.198 / 100 = 0.00198。自變量每增加1％，我們的因變量將增加約0.002。

Note: I’m also attaching a link below which dives deep into interpreting log transformed features.

注意：我還將在下面附加一個鏈接，以深入了解解釋日志轉換的功能。

在目標變量上使用對數變換： (Using Log transform on target variable:)

For example let’s consider a machine learning problem where you want to predict price of a house based on input features like (Area, number of bed rooms,…etc).

例如，讓我們考慮一個機器學習問題，您希望根據輸入特征(面積，床房數量等)來預測房屋價格 。

In this problem if you choose to create a linear regression model to fit prices(y) on X(Area, number of bed rooms….) and gradient descent in optimizing the model, the dataset would have some extreme prices (higher values properties) due to which your gradient descent algorithm would focus more on optimizing higher valued properties(Due to large error) and hence would produce a bad model. So performing a log transform on target variable makes sense when your performing linear regression.More importantly linear regression can predict values that are any real number (Negative values). If your model is far off, it can produce negative values, especially when predicting some of the cheaper houses. Real world values like price, income, stock price are positive so its good to log transform it before using linear regression otherwise the linear regression would predict negative values as predictions which doesn’t make sense.

在此問題中，如果您選擇創建線性回歸模型以在X(面積，床位數…。)上擬合價格(y)，并在優化模型時采用梯度下降，則數據集將具有一些極端價格(較高值的屬性)因此，您的梯度下降算法將更多地專注于優化更高價值的屬性(由于誤差較大)，因此會產生錯誤的模型。因此，在執行線性回歸時，對目標變量執行對數轉換是有意義的。更重要的是，線性回歸可以預測任何實數的值(負值)。如果您的模型相差太遠，則可能會產生負值，尤其是在預測一些較便宜的房屋時。諸如價格，收入，股票價格等現實世界中的值都是正值，因此在使用線性回歸之前最好先對其進行對數轉換，否則線性回歸會將負值預測為沒有意義的預測。

Example: Predicting house prices示例：預測房價

If you look at the above example, if you chose to go with RMSE as the cost function then the model would focus more on high valued properties and would perform bad. If you chose log(Actual)-log(Predicted) value it intuitively works in optimizing the model and thereby produce a good model.

如果看上面的示例，如果選擇將RMSE作為成本函數，則該模型將更多地關注高價值的房地產，并且表現不佳。如果選擇log(Actual)-log(Predicted)值，則可以直觀地優化模型，從而生成一個好的模型。

Model will be under more pressure on correcting large errors due to High valued properties so using log here makes sense.

由于具有高價值的屬性，模型在校正大錯誤時將承受更大的壓力，因此在此處使用log是有意義的。

Converting log predictions back to actual values.將對數預測轉換回實際值。

Converting to actual predictions using np.exp:But you would need actual predictions not the log of predictions, so you can always convert back to actual predictions using exponential of the value (Log(price)).

使用np.exp轉換為實際預測：但是您將需要實際預測而不是預測的對數，因此您始終可以使用值的指數(Log(price))轉換回實際預測。

日志損失以改善模型 (Log loss to improve models)

Logarithmic loss (related to cross-entropy) measures the performance of a classification model where the prediction input is a probability value between 0 and 1. The goal of our machine learning models is to minimize this value. A perfect model would have a log loss of 0. Log loss increases as the predicted probability diverges from the actual label. So predicting a probability of .012 when the actual observation label is 1 would be bad and result in a high log loss.

對數損失(與交叉熵有關 )用于衡量分類模型的性能，其中預測輸入為0到1之間的概率值。我們的機器學習模型的目標是最小化該值。理想模型的對數損失為0。對數損失隨著預測概率與實際標簽的偏離而增加。因此，當實際觀察標簽為1時預測0.01的概率將很糟糕，并會導致較高的對數損失。

Log loss in binary classification setting二進制分類設置中的對數丟失

If you look at the above example when true value is 1 and predicted probability is 0.1, the log loss is high. Whereas when true value is 1 and predicted probability is 0.9, log loss is low.

如果看上面的示例，當true值為1且預測概率為0.1時，對數損失很高。而當真實值為1且預測概率為0.9時，對數損失較低。

文本分類中的日志轉換(自然語言處理) (Log transformation in Text Classification (Natural language processing))

We use tf-idf method to encode our text data to fit machine learning models. Tf-idf uses log transform on inverse document frequency, so the word that appears in every single document will be effectively zeroed out, and a word that appears in very few documents will have an even larger count than before.

我們使用tf-idf方法對文本數據進行編碼，以適合機器學習模型。 Tf-idf對文檔的逆頻率使用對數變換，因此每個文檔中出現的單詞將被有效地清零，而在很少文檔中出現的單詞的計數將比以前更大。

TF-IDF特遣部隊

Please share this article if it helped you understand how important log is to machine learning. Do comment if you have any questions.

如果可以幫助您了解日志對機器學習的重要性，請分享此文章。如有任何疑問，請發表評論。

GOOD DAY!

美好的一天！

Reference:1. https://data.library.virginia.edu/interpreting-log-transformations-in-a-linear-model/#:~:text=We%20simply%20log%2Dtransform%20x.&text=To%20interpret%20the%20slope%20coefficient%20we%20divide%20it%20by%20100.&text=The%20result%20is%20multiplying%20the,variable%20by%20the%20coefficient%2F100.2. http://wiki.fast.ai/index.php/Log_Loss#:~:text=Logarithmic%20loss%20(related%20to%20cross,a%20log%20loss%20of%200.

參考： 1. https://data.library.virginia.edu/interpreting-log-transformations-in-a-linear-model/#:~:text=We%20simply%20log%2Dtransform%20x.&text=To% 20解釋％20the％20slope％20coefficient％20we％20divide％20it％20by％20100。＆text =％20result％20is％20乘以％20the，可變％20by％20the％20coefficient％2F100。 2. http://wiki.fast.ai/index.php/Log_Loss#:~:text=Logarithmic%20loss%20(related%20to%20cross,a%20log%20loss%20of%200。

翻譯自: https://medium.com/analytics-vidhya/log-transform-for-positivity-d3e1f183c804

r中如何求變量的對數轉換

總結

以上是生活随笔為你收集整理的r中如何求变量的对数转换_对数转换以求阳性。的全部內容，希望文章能夠幫你解決所遇到的問題。

如果覺得生活随笔網站內容還不錯，歡迎將生活随笔推薦給好友。

上一篇：唐宇迪机器学习课程数据集_最受欢迎的数据
下一篇：美团脱颖而出的经验_使数据科学项目脱颖而