日韩性视频-久久久蜜桃-www中文字幕-在线中文字幕av-亚洲欧美一区二区三区四区-撸久久-香蕉视频一区-久久无码精品丰满人妻-国产高潮av-激情福利社-日韩av网址大全-国产精品久久999-日本五十路在线-性欧美在线-久久99精品波多结衣一区-男女午夜免费视频-黑人极品ⅴideos精品欧美棵-人人妻人人澡人人爽精品欧美一区-日韩一区在线看-欧美a级在线免费观看

歡迎訪問 生活随笔!

生活随笔

當前位置: 首頁 > 编程资源 > 编程问答 >内容正文

编程问答

了解回归:迈向机器学习的第一步

發布時間:2023/12/15 编程问答 35 豆豆
生活随笔 收集整理的這篇文章主要介紹了 了解回归:迈向机器学习的第一步 小編覺得挺不錯的,現在分享給大家,幫大家做個參考.

Regression — as fancy as it sounds can be thought of as a “relationship” between any two things. For example, imagine you are on the ground and the temperature is 30 ℃. You start climbing a hill and as you climb, you realize that you are feeling colder and the temperature is dropping. As you reach the hilltop you see that temperature has decreased to 20 ℃ and its much colder now. Using this data, we can conclude that there is a relationship between height and temperature. This is termed as “regression” in statistics.

回歸-聽起來很花哨,可以看作是任何兩件事之間的“關系”。 例如,假設您在地面上且溫度為30℃。 您開始爬山時,隨著爬山,您會發現自己感覺越來越冷,溫度正在下降。 當您到達山頂時,您會發現溫度已降至20℃,并且現在溫度更低。 使用此數據,我們可以得出結論,高度和溫度之間存在關系。 這在統計中稱為“ 回歸 ”。

Regression analysis is a form of predictive modeling technique that investigates the relationship between a dependent (target) and independent variable (s) (predictor). In a regression problem, we try to map input variables to a continuous function.

回歸分析是一種預測建模技術,用于研究 變量(目標)與自變量 (預測變量)之間的關系 。 在回歸問題中,我們嘗試將輸入變量映射到連續函數。

In the previously explained example, the temperature depends on height and hence is the “dependent” variable, whereas height is the “independent” variable. There may be various factors influencing the temperature such as humidity, pressure, even air pollution levels, etc. All such factors have a relationship with the temperature which can be written as a mathematical equation. We use this mathematical equation (cost function) to train the machine learning model on a given dataset so that our model can later predict the temperature at certain given conditions.

在前面所解釋的例子中,溫度取決于高度,并因此是“ 從屬 ”變量,而高度是“ 獨立 ”變量。 可能有多種影響溫度的因素,例如濕度,壓力,甚至空氣污染水平等。所有這些因素都與溫度有關系,可以用數學公式表示。 我們使用這個數學方程式( 成本函數 )在給定的數據集上訓練機器學習模型,以便我們的模型以后可以預測在給定條件下的溫度。

回歸如何運作? (How does Regression work?)

Regression is a form of Supervised Machine Learning. We initially split the dataset into Training and Test set. Regression consists of a mathematical equation known as Cost Function. The cost function shows the relationship between independent and dependent variables. The objective of regression is to minimize this cost function which is achieved using optimization algorithms like Gradient Descent. The regression model is then trained on a training set to achieve ‘Line of Best Fit’. The below-given illustration shows how the line of best fit is found in linear regression.

回歸是監督機器學習的一種形式。 我們最初將數據集分為訓練和測試集。 回歸包括一個稱為成本函數的數學方程。 成本函數顯示自變量和因變量之間的關系。 回歸的目標是最小化使用優化算法(例如Gradient Descent )實現的成本函數。 然后在訓練集上訓練回歸模型,以實現“ 最佳擬合線 ”。 下圖顯示了如何在線性回歸中找到最佳擬合線。

Notice the minimization of the cost function in the illustration.注意插圖中成本函數的最小化。

Once trained and optimized, this model predicts outputs on a test set which is compared with observed output for accuracy.

經過訓練和優化后,該模型將預測測試集上的輸出,并與觀察到的輸出進行比較以確保準確性。

回歸類型 (Types of regression)

線性回歸 (Linear Regression)

This is the most fundamental regression model which needs to be understood to know the basics of regression analysis. When we have one predictor variable ‘x’ for one dependent or response variable ‘y’ that are linearly related to each other, the model is called Simple Linear Regression model. In the case of more than one predictor present (multiple input variables), the model is called Multiple Linear Regression model. The relation is defined using the equation:-

這是最基本的回歸模型,需要了解回歸分析的基礎知識。 當我們有一個線性相關的因變量或響應變量“ y”的預測變量“ x”時 ,該模型稱為簡單線性回歸模型 。 在存在多個預測變量(多個輸入變量)的情況下,該模型稱為多元線性回歸模型 。 該關系使用以下公式定義:

The line that best fits the model is determined by the values of parameters b0 and b1. The difference between the observed outcome Y and the predicted outcome y is known as a prediction error. Hence, the values of b0 and b1 should be such that they minimize the sum of the squares of the prediction error.

最適合模型的線由參數b0和b1的值確定。 觀察到的結果Y與預測結果y之差稱為預測誤差 。 因此,b0和b1的值應使它們最小化預測誤差的平方和。

Error term e is the prediction error誤差項e是預測誤差 Sum of squares of prediction error Q should be minimum預測誤差Q的平方和應最小

Linear Regression does not perform well on large datasets as it assumes a linear relationship between dependent and independent variables. That means it assumes that there is a straight-line relationship between them. It assumes independence between attributes.

線性回歸在大型數據集上表現不佳,因為它假設因變量和自變量之間存在線性關系。 這意味著它假定它們之間存在直線關系。 它假定屬性之間的獨立性。

多項式回歸 (Polynomial Regression)

Polynomial regression is similar to Multiple Linear Regression. However, in this type of regression, the relationship between X and Y variables is defined by taking the nth degree polynomial in X. Polynomial regression fits a non-linear model to the data but as an estimator, it is a linear model.

多項式回歸類似于多元線性回歸。 但是,在這種類型的回歸中,X和Y變量之間的關系通過采用X中的n次多項式來定義。多項式回歸將非線性模型擬合到數據中,但作為估計量,它是線性模型。

Polynomial regression models are analyzed for accuracy similar to Linear regression models but are slightly difficult to interpret as the input variables are highly correlated. The estimated value of the dependent variable Y is modeled with the equation (for the nth-order polynomial):

與線性回歸模型相似,對多項式回歸模型進行準確性分析,但由于輸入變量高度相關,因此難以解釋。 因變量Y的估計值用以下公式建模(對于n階多項式):

The line that passes through the points will not be a straight line but a curved one depending on the power of X. High-degree polynomials are observed to induce more oscillations in the observed curve and have poor interpolator properties. See the illustration below to understand how polynomial regression works.

穿過點的線將不是直線,而是取決于X的冪的彎曲曲線。觀察到高階多項式會在觀察到的曲線中引起更多的振蕩,并且插值器的性能也很差。 請參閱下圖以了解多項式回歸的工作原理。

嶺回歸 (Ridge Regression)

A standard linear or polynomial regression will fail in the case where there is high collinearity among the feature variables. Collinearity is the existence of near-linear relationships among the independent variables.

在特征變量之間存在高共線性的情況下,標準線性或多項式回歸將失敗。 線性是自變量之間存在近線性關系。

We can first look at the optimization function of a standard linear regression to gain some insight as to how ridge regression can help:

我們首先可以查看標準線性回歸的優化功能,以了解嶺回歸如何提供幫助:

Where X represents the feature/input variables, w represents the weights, and y represents the observed output.

其中X代表特征/輸入變量, w代表權重, y代表觀察到的輸出。

Ridge Regression is a measure taken to reduce collinearity amongst regression predictor variables in a model. If the feature variables are correlated, the final regression model will be restricted and rigid in its approximation i.e it has high variance and will result in overfitting.

嶺回歸是一種用于減少模型中回歸預測變量之間的共線性的措施。 如果特征變量相關,則最終回歸模型將受到限制且近似逼近,即具有高方差并會導致過度擬合

The image shows underfitting, robust fit, and overfitting該圖顯示了擬合不足,穩固擬合和過度擬合

Overfitting refers to a model that models the training data too well. Overfitting happens when a model learns the detail and noise in the training data to the extent that it negatively impacts the performance of the model on new data.

過度擬合是指對訓練數據建模得太好的模型。 當模型學習訓練數據中的細節和噪聲時,就會過度擬合,從而對模型在新數據上的性能產生負面影響。

To solve this issue, Ridge Regression adds a small squared bias value to the variables:

為了解決這個問題,Ridge Regression向變量添加了一個小的平方偏差值:

The highlighted part is squared bias突出顯示的部分是偏差的平方

Such a squared bias factor pulls the feature variable coefficients away from this rigidness, introducing a small amount of bias into the model but greatly reducing the variance.

這樣的平方偏差因子使特征變量系數遠離此剛度,在模型中引入了少量偏差,但大大減小了方差。

套索回歸 (Lasso Regression)

Lasso Regression is quite similar to Ridge Regression but the only difference is that it penalizes the absolute size of the regression coefficients instead of squared bias.

Lasso回歸與Ridge回歸非常相似,但唯一的區別是,它懲罰了回歸系數的絕對大小而不是平方偏差。

The highlighted part is absolute bias突出顯示的部分是絕對偏差

By penalizing the absolute values, the estimated coefficients shrink more towards zero which could not be possible using ridge regression. This method makes it useful for feature selection where a set of variables and parameters are picked for model construction. LASSO takes the relevant features and zeroes the irrelevant values such that overfitting is avoided and also makes the learning faster.

通過對絕對值進行懲罰,估計的系數將更趨近于零,這是使用嶺回歸無法實現的。 這種方法對于特征選擇很有用,在特征選擇中選擇一組變量和參數以進行模型構建。 LASSO具有相關特征并將不相關的值歸零,從而避免了過擬合,并且使學習速度更快。

結論 (Conclusion)

Regression Analysis is a very interesting machine learning technique that can be applied in different areas to predict numerical values such as predicting the price of a product/house, predicting the number of goals soccer players score in a season, and predicting the BMI of people. I have covered four basic regression models in this blog. The rest require intensive knowledge of mathematics to understand. Hope you like this blog and find it informative!

回歸分析是一種非常有趣的機器學習技術,可以應用于不同領域來預測數值,例如預測產品/房屋的價格,預測足球運動員在一個賽季中進球的數量以及預測人的BMI。 我在此博客中介紹了四個基本的回歸模型。 其余的需要深入的數學知識才能理解。 希望您喜歡這個博客,并從中獲得啟發!

翻譯自: https://medium.com/analytics-vidhya/understanding-regression-first-step-towards-machine-learning-9b5728ac65d3

總結

以上是生活随笔為你收集整理的了解回归:迈向机器学习的第一步的全部內容,希望文章能夠幫你解決所遇到的問題。

如果覺得生活随笔網站內容還不錯,歡迎將生活随笔推薦給好友。