日韩性视频-久久久蜜桃-www中文字幕-在线中文字幕av-亚洲欧美一区二区三区四区-撸久久-香蕉视频一区-久久无码精品丰满人妻-国产高潮av-激情福利社-日韩av网址大全-国产精品久久999-日本五十路在线-性欧美在线-久久99精品波多结衣一区-男女午夜免费视频-黑人极品ⅴideos精品欧美棵-人人妻人人澡人人爽精品欧美一区-日韩一区在线看-欧美a级在线免费观看

歡迎訪問 生活随笔!

生活随笔

當前位置: 首頁 > 编程资源 > 编程问答 >内容正文

编程问答

单变量线性回归模型_了解如何为单变量模型选择效果最好的线性回归

發布時間:2023/11/29 编程问答 23 豆豆
生活随笔 收集整理的這篇文章主要介紹了 单变量线性回归模型_了解如何为单变量模型选择效果最好的线性回归 小編覺得挺不錯的,現在分享給大家,幫大家做個參考.

單變量線性回歸模型

by Bj?rn Hartmann

比約恩·哈特曼(Bj?rnHartmann)

找出哪種線性回歸模型最適合您的數據 (Find out which linear regression model is the best fit for your data)

Inspired by a question after my previous article, I want to tackle an issue that often comes up after trying different linear models: You need to make a choice which model you want to use. More specifically, Khalifa Ardi Sidqi asked:

在上一篇文章之后受到一個問題的啟發,我想解決在嘗試不同的線性模型后經常出現的一個問題:您需要選擇要使用的模型。 更具體地說, Khalifa Ardi Sidqi問:

“How to determine which model suits best to my data? Do I just look at the R square, SSE, etc.?“如何確定哪種模型最適合我的數據? 我是否只看R平方,SSE等? As the interpretation of that model (quadratic, root, etc.) will be very different, won’t it be an issue?”由于該模型(二次方,根等)的解釋將非常不同,這不是問題嗎?”

The second part of the question can be answered easily. First, find a model that best suits to your data and then interpret its results. It is good if you have ideas how your data might be explained. However, interpret the best model, only.

問題的第二部分很容易回答。 首先,找到最適合您的數據的模型,然后解釋其結果。 如果您有想法可以解釋您的數據,這是很好的。 但是,僅解釋最佳模型。

The rest of this article will address the first part of his question. Please note that I will share my approach on how to select a model. There are multiple ways, and others might do it differently. But I will describe the way that works best for me.

本文的其余部分將解決他的問題的第一部分。 請注意,我將分享 我的方法 如何 選擇一個模型。 有多種方法,其他方法可能會有所不同。 但是我將描述最適合我的方式。

In addition, this approach only applies to univariate models. Univariate models have just one input variable. I am planning a further article, where I will show you how to assess multivariate models with more input variables. For today, however, let us focus on the basics and univariate models.

另外, 這種方法僅適用于單變量模型 。 單變量模型只有一個輸入變量。 我正在計劃另一篇文章,我將向您展示如何評估具有更多輸入變量的多元模型。 但是,今天,讓我們關注基礎知識和單變量模型。

To practice and get a feeling for this, I wrote a small ShinyApp. Use it and play around with different datasets and models. Notice how parameters change and become more confident with assessing simple linear models. Finally, you can also use the app as a framework for your data. Just copy it from Github.

為了練習并對此有所了解,我編寫了一個小的ShinyApp。 使用它并使用不同的數據集和模型。 注意參數如何變化,并通過評估簡單的線性模型變得更加自信。 最后,您還可以將應用程序用作數據框架。 只需從Github復制它即可 。

將調整后的R2用于單變量模型 (Use the Adjusted R2 for univariate models)

If you only use one input variable, the adjusted R2 value gives you a good indication of how well your model performs. It illustrates how much variation is explained by your model.

如果僅使用一個輸入變量,則adjusted R2值可以很好地指示模型的性能。 它說明了您的模型解釋了多少變化。

In contrast to the simple R2, the adjusted R2 takes the number of input factors into account. It penalizes too many input factors and favors parsimonious models.

與簡單的R2 , adjusted R2考慮了輸入因子的數量。 它懲罰了太多的輸入因素,并偏愛簡約模型。

In the screenshot above, you can see two models with a value of 71.3 % and 84.32%. Apparently, the second model is better than the first one. Models with low values, however, can still be useful because the adjusted R2 is sensitive to the amount of noise in your data. As such, only compare this indicator of models for the same dataset than comparing it across different datasets.

在上面的屏幕截圖中,您可以看到兩個模型,其值分別為71.3%和84.32%。 顯然,第二種模式比第一種更好。 但是,低值的模型仍然有用,因為adjusted R2對數據中的噪聲量很敏感。 因此,僅比較同一數據集的模型指標而不是比較不同數據集的模型指標。

通常,對SSE的需求很少 (Usually, there is little need for the SSE)

Before you read on, let’s make sure we are talking about the same SSE. On Wikipedia, SSE refers to the sum of squared errors. In some statistic textbooks, however, SSE can refer to the explained sum of squares (the exact opposite). So for now, suppose SSE refers to the sum of squared errors.

在繼續閱讀之前,請確保我們正在談論相同的SSE。 在Wikipedia上 ,SSE是指平方誤差的總和。 但是,在一些統計教科書中,SSE可以參考所解釋的平方和(正好相反)。 因此,現在,假設SSE是指平方誤差的總和。

Hence, the adjusted R2 is approximately 1 — SSE /SST. With SST referring to the total sum of squares.

因此, adjusted R2約為1 -SSE / SST。 SST是指平方和的總和。

I do not want to dive deeper into the math behind this. What I want to show you is that the adjusted R2 is computed with the SSE. So the SSE usually does not give you any additional information.

我不想深入探討其背后的數學原理。 我想向您展示的是, adjusted R2是使用SSE計算的 。 因此,SSE通常不會為您提供任何其他信息

Furthermore, the adjusted R2 is normalized such that it is always between zero and one. So it is easier for you and others to interpret an unfamiliar model with an adjusted R2 of 75% rather than an SSE of 394 — even though both figures might explain the same model.

此外,將adjusted R2歸一化,使其始終在零和一之間。 因此,您和其他人更容易解釋adjusted R2為75%而不是394的SSE的陌生模型,即使兩個數字都可能解釋了相同的模型。

看一下殘差或誤差項! (Have a look at the residuals or error terms!)

What is often ignored are error terms or so-called residuals. They often tell you more than what you might think.

通常忽略的是誤差項或所謂的殘差。 他們經常告訴您比您想的更多的信息。

殘差是您的預測值和實際值之間的差。 (The residuals are the difference between your predicted values and the actual values.)

Their benefit is that they can show you both the magnitude as well as the direction of your errors. Let’s have a look at an example:

它們的好處是,它們可以向您顯示錯誤的幅度和方向。 讓我們看一個例子

Here, I tried to predict a polynomial dataset with a linear function. Analyzing the residuals shows that there are areas where the model has an upward or downward bias.

在這里,我試圖用線性函數預測多項式數據集。 分析殘差表明,在某些區域中模型具有向上或向下的偏差。

For 50 < x < 100, the residuals are above zero. So in this area, the actual values have been higher than the predicted values — our model has a downward bias.

50 &l t ; x &l 50 &l t ; x &l t; 100,殘差大于零。 因此,在該區域中,實際值高于預測值-我們的模型存在向下偏差。

For100 < x &lt; 150, however, the residuals are below zero. Thus, the actual values have been lower than the predicted values — the model has an upward bias.

對于100 < x &l t; 150,但是,殘差低于零。 因此,實際值已低于預測值-模型具有向上偏差。

It is always good to know, whether your model suggests too high or too low values. But you usually do not want to have patterns like this.

總是很高興知道您的模型建議的值是太高還是太低。 但是您通常不希望有這樣的模式。

The residuals should be zero on average (as indicated by the mean) and they should be equally distributed. Predicting the same dataset with a polynomial function of 3 degrees suggests a much better fit:

殘差平均應為零(如平均值所示),并且它們應平均分布。 用3 degrees的多項式函數預測相同的數據集將顯示出更好的擬合度:

In addition, you can observe whether the variance of your errors increases. In statistics, this is called Heteroscedasticity. You can fix this easily with robust standard errors. Otherwise, your hypothesis tests are likely to be wrong.

此外,您可以觀察誤差的方差是否增加。 在統計上,這稱為異方差性 。 您可以通過強大的標準錯誤輕松解決此問題。 否則,您的假設檢驗可能是錯誤的。

殘差直方圖 (Histogram of residuals)

Finally, the histogram summarizes the magnitude of your error terms. It provides information about the bandwidth of errors and indicates how often which errors occurred.

最后,直方圖總結了誤差項的大小。 它提供有關錯誤帶寬的信息,并指示發生錯誤的頻率。

The above screenshots show two models for the same dataset. In the left histogram, errors occur within a range of -338 and 520.

上面的屏幕截圖顯示了同一數據集的兩個模型。 在左側的直方圖中,誤差發生在-338和520的范圍內。

In the right histogram, errors occur within -293 and 401. So the outliers are much lower. Furthermore, most errors in the model of the right histogram are closer to zero. So I would favor the right model.

右邊的直方圖中,錯誤發生在-293和401 。 因此,異常值要低得多。 此外,右直方圖模型中的大多數誤差都接近于零。 因此,我傾向于正確的模型。

摘要 (Summary)

When choosing a linear model, these are factors to keep in mind:

選擇線性模型時,請牢記以下因素:

  • Only compare linear models for the same dataset.

    僅比較同一數據集的線性模型。
  • Find a model with a high adjusted R2

    查找調整后的R2高的模型
  • Make sure this model has equally distributed residuals around zero

    確保該模型的殘差均勻分布在零附近
  • Make sure the errors of this model are within a small bandwidth

    確保此模型的誤差在較小的帶寬內

If you have any questions, write a comment below or contact me. I appreciate your feedback.

如有任何疑問,請在下面寫評論或與我聯系 。 感謝您的反饋。

翻譯自: https://www.freecodecamp.org/news/learn-how-to-select-the-best-performing-linear-regression-for-univariate-models-e9d429c40581/

單變量線性回歸模型

總結

以上是生活随笔為你收集整理的单变量线性回归模型_了解如何为单变量模型选择效果最好的线性回归的全部內容,希望文章能夠幫你解決所遇到的問題。

如果覺得生活随笔網站內容還不錯,歡迎將生活随笔推薦給好友。