當前位置：首頁 > 编程资源 > 编程问答 >内容正文

编程问答

机器学习多元线性回归_过度简化的机器学习（1）：多元回归

發布時間：2023/12/15 编程问答 28 豆豆

生活随笔收集整理的這篇文章主要介紹了机器学习多元线性回归_过度简化的机器学习（1）：多元回归小編覺得挺不錯的,現在分享給大家,幫大家做個參考.

機器學習多元線性回歸

The term machine learning may sound provocative. Machines do not learn like humans do. However, when given identifiable tasks to complete, machines excel.

機器學習一詞聽起來可能很挑釁。機器不像人類那樣學習。但是，當完成可確定的任務后，機器就會表現出色。

Algorithms answer questions. Once trained they take in relevant information and give an educated guess to their question.

算法回答問題。一旦接受培訓，他們就會獲取相關信息，并對問題進行有根據的猜測。

Let’s pretend you work for an advertising company that wants to identify a target audience for luxury items. The company won’t bother advertising to most consumers. So the company decides it wants to estimate salaries in Topeka, Kansas to find suitable targets for their marketing campaign. The firm finds a dataset online about local residents. This data is comprised of hours worked monthly, age, and monthly salary. The firm hopes to use this dataset to predict the salary of consumers not inside the dataset.

假設您在一家廣告公司工作，該公司希望確定奢侈品的目標受眾。該公司不會打擾大多數消費者的廣告。因此，公司決定要估算堪薩斯州托皮卡的薪水，以便為其營銷活動找到合適的目標。該公司在線查找有關本地居民的數據集。此數據包括每月工作時間，年齡和月薪。該公司希望使用該數據集來預測不在該數據集中的消費者的工資。

Monthly Salary in Topeka, Kansas堪薩斯州托皮卡的月薪

These first five examples contain monthly examples of descriptive information and salaries.

前五個示例包含描述性信息和薪水的每月示例。

It seems like there is a pattern to this data. Intuitively, if someone works more hours they are paid more. It also makes sense that older workers earn more because of their experience.

似乎此數據有模式。憑直覺，如果某人工作時間更長，他們的工資就會更高。同樣有意義的是，老年工人由于他們的經驗而獲得更多的收入。

Let’s take a closer look at the entire dataset. A pattern emerges: both Age and Hours Worked increase salary.

讓我們仔細看看整個數據集。出現了一種模式：“年齡”和“工作時間”均增加薪水。

For each hour worked, salary on average increases by a certain amount. For every year of age salary on average increases by some other amount. The formula for this relationship looks like:

每工作一小時，工資平均增加一定數量。對于每個年齡段，薪水平均增加一些其他金額。這種關系的公式如下：

A + (Hours Worked * B) + (Age * C) = Salary

A +(工作時間* B)+(年齡* C)=工資

Where A, B, and C are unknown variables.

其中A，B和C是未知變量。

For every A, B, and C there exists a unique plane.

對于每個A，B和C，都有一個唯一的平面。

Let’s take the list of <A,B,C> and call it θ. Afterwards, let’s pick a random number for each. This creates a plane.

讓我們以<A，B，C>的列表命名為θ。之后，讓我們為每個選擇一個隨機數。這將創建一個平面。

This plane is like the bolded formula. It takes in two numbers and returns Z. Depending on the three numbers of Theta the plane takes a different shape.

這架飛機就像加粗的公式。它采用兩個數字并返回Z。根據Theta的三個數字，平面采用不同的形狀。

Let’s use this randomly generated plane as an equation for the salary data. For every row in the dataset we create a single prediction.

讓我們使用這個隨機生成的平面作為薪水數據的方程式。對于數據集中的每一行，我們創建一個預測。

Now we will establish a benchmark for the performance of the prediction. Lets take the difference between the prediction and actual salary columns, square every number, and sum them up:

現在，我們將為預測的性能建立基準。讓我們估算一下預測和實際工資列之間的差，對每個數字求平方，然后求和：

Cost(Salary_in_Thousands, Predicted Salary) = Sum((Salary_in_Thousands-Predicted Salary)2)

成本(千薪標準工資)=總和((千薪標準)2)

This benchmark for performance is called a cost function or loss function. The larger the sum, the larger the distance between the predictions and actual salaries.

此性能基準稱為成本函數或損失函數。總和越大，預測和實際薪水之間的距離越大。

Depending on the location of θ the model will be more or less accurate. In order to minimize the cost function we will need to adjust the values of θ.

根據θ的位置，模型將或多或少準確。為了最小化成本函數，我們將需要調整θ的值。

。

The question remains: which direction do we move θ?

問題仍然存在：我們向哪個方向移動θ？

Gradient in Blue藍色漸變

Contour maps show elevation using contour lines. Along these lines elevation is equal. Perpendicular to this is the line of steepest slope, called the gradient. Moving in this direction increases the Z value the fastest. By moving in the opposite direction the Z value decreases fastest.

等高線圖使用等高線顯示高程。沿著這些線，海拔是相等的。垂直于此的是最陡的斜率線，稱為梯度。沿該方向移動將Z值最快地增加。通過沿相反方向移動，Z值下降最快。

We will use the same idea to decrease error in our salary model. First, we calculate error as a function of θ.

我們將使用相同的想法來減少薪資模型中的錯誤。首先，我們將誤差計算為θ的函數。

Then we update θ using the following formula:

然后，使用以下公式更新θ：

θ = θ-?Cost() … where ?Cost() is the gradient

θ=θ-?Cost()…其中?Cost()是梯度

By subtracting the gradient we decrease the overall cost function and find a better θ. Because we are subtracting the gradient this method is called Gradient Descent.

通過減去梯度，我們降低了總成本函數并找到了一個更好的θ。因為我們要減去梯度，所以此方法稱為“ 梯度下降” 。

With our new θ we calculate the new predictions. We then calculate the new cost and new gradient. This leads to the following loop.

利用我們的新θ，我們可以計算出新的預測。然后，我們計算新的成本和新的梯度。這導致以下循環。

Once the cost is low enough we keep θ as our final model. For example, if the cost does not decrease for three straight loops, we can stop the process.

一旦成本足夠低，我們將θ作為最終模型。例如，如果成本沒有連續三??個循環下降，我們可以停止該過程。

We will use the final model to predict salaries of consumers in Topeka, Kansas using ages and workload of consumers. If our model predicts a high enough salary we will mark them as the target audience for our luxury brand adds.

我們將使用最終模型，根據消費者的年齡和工作量來預測堪薩斯州托皮卡的消費者工資。如果我們的模型預測工資足夠高，我們會將其標記為我們奢侈品牌的目標受眾。

Here is our final model

這是我們的最終模型

Considering the original data, it is a strong approximation of the underlying pattern.

考慮到原始數據，它是基礎模式的強近似值。

This is a graph representing our model.

這是代表我們模型的圖形。

θ[0] is the first variable of θ.θ[0]是θ的第一個變量。

The left represents the inputs to the model: age and hours worked.

左側代表模型的輸入：年齡和工作時間。

The data moves from the left to the right through three channels. The wider the channel, indicated by θ[i], the larger the output to the right. The top line represents 1*θ[0]

數據通過三個通道從左到右移動。通道越寬，用θ[i]表示，右側的輸出越大。第一行代表1 *θ[0]

The right hand side is the output of the model. It is the sum of each input multiplied by its θ.

右側是模型的輸出。它是每個輸入的總和乘以其θ。

The math is written as <1, age, hours worked>*θ = Salary Prediction

數學寫為<1，年齡，工作小時數> *θ=工資預測

Check out part 2 here:

在這里查看第2部分：

翻譯自: https://medium.com/data-for-associates/oversimplified-ml-1-multiple-regression-eaf49fee7aa1

機器學習多元線性回歸

總結

以上是生活随笔為你收集整理的机器学习多元线性回归_过度简化的机器学习（1）：多元回归的全部內容，希望文章能夠幫你解決所遇到的問題。

如果覺得生活随笔網站內容還不錯，歡迎將生活随笔推薦給好友。

上一篇：刺激战场信号枪什么时候更新上线
下一篇：机器学习与分布式机器学习_机器学习的歧义