當(dāng)前位置：首頁(yè) > 编程资源 > 编程问答 >内容正文

编程问答

Andrew NG 机器学习编程作业5 Octave

發(fā)布時(shí)間：2024/4/17 编程问答 34 豆豆

生活随笔收集整理的這篇文章主要介紹了 Andrew NG 机器学习编程作业5 Octave 小編覺(jué)得挺不錯(cuò)的,現(xiàn)在分享給大家,幫大家做個(gè)參考.

問(wèn)題描述:根據(jù)水庫(kù)中蓄水標(biāo)線(water level) 使用正則化的線性回歸模型預(yù) 水流量(water flowing out of dam)，然后 debug 學(xué)習(xí)算法以及討論偏差和方差對(duì) 該線性回歸模型的影響

①可視化數(shù)據(jù)集

本作業(yè)的數(shù)據(jù)集分成三部分：

?訓(xùn)練集(training set)，樣本矩陣(訓(xùn)練集)：X，結(jié)果標(biāo)簽(label of result)向量 y

?交叉驗(yàn)證集(cross validation set)，確定正則化參數(shù) Xval 和 yval

?測(cè)試集(test set) for evaluating performance，測(cè)試集中的數(shù)據(jù) 是從未出現(xiàn)在訓(xùn)練集中的

將數(shù)據(jù)加載到Octave中如下：訓(xùn)練集中一共有12個(gè)訓(xùn)練實(shí)例，每個(gè)訓(xùn)練實(shí)例只有一個(gè)特征。故假設(shè)函數(shù)h_θ(x) = θ₀·x₀ + θ₁·x₁ ，用向量表示成：h_θ(x) = θ^T·x

一般地，x₀ 為 bais unit，默認(rèn) x₀==1

數(shù)據(jù)可視化：

②正則化線性回歸模型的代價(jià)函數(shù)

代價(jià)函數(shù)公式如下：

Octave代碼實(shí)現(xiàn)如下：這里的代價(jià)函數(shù)是用向量(矩陣)乘法來(lái)實(shí)現(xiàn)的。

reg = (lambda / (2*m)) * ( ( theta( 2:length(theta) ) )' * theta(2:length(theta)) ); J = sum((X*theta-y).^2)/(2*m) + reg;

?注意：由于θ₀不參與正則化項(xiàng)的，故上面Octave數(shù)組下標(biāo)是從2開(kāi)始的(Matlab數(shù)組下標(biāo)是從1開(kāi)始的，θ₀是Matlab數(shù)組中的第一個(gè)元素)。

③正則化的線性回歸梯度

梯度的計(jì)算公式如下：

其中，下面公式的向量表示就是：[X^T?· (X·θ - y)]/m，用Matlab表示就是：X'*(X*theta-y) / m

梯度的Octave代碼實(shí)現(xiàn)如下：

grad_tmp = X'*(X*theta-y) / m;
grad = [ grad_tmp(1:1); grad_tmp(2:end) + (lambda/m)*theta(2:end) ];

function [J, grad] = linearRegCostFunction(X, y, theta, lambda) %LINEARREGCOSTFUNCTION Compute cost and gradient for regularized linear %regression with multiple variables % [J, grad] = LINEARREGCOSTFUNCTION(X, y, theta, lambda) computes the % cost of using theta as the parameter for linear regression to fit the % data points in X and y. Returns the cost in J and the gradient in grad% Initialize some useful values m = length(y); % number of training examples% You need to return the following variables correctly J = 0; grad = zeros(size(theta));% ====================== YOUR CODE HERE ====================== % Instructions: Compute the cost and gradient of regularized linear % regression for a particular choice of theta. % % You should set J to the cost and grad to the gradient. %reg = (lambda / (2*m)) * ( ( theta( 2:length(theta) ) )' * theta(2:length(theta)) ); J = sum((X*theta-y).^2)/(2*m) + reg; grad_tmp = X'*(X*theta-y) / m; grad = [ grad_tmp(1:1); grad_tmp(2:end) + (lambda/m)*theta(2:end) ];% =========================================================================grad = grad(:);end

④使用Octave的函數(shù) fmincg 函數(shù)訓(xùn)練線性回歸模型，得到模型的參數(shù)。

function [theta] = trainLinearReg(X, y, lambda) %TRAINLINEARREG Trains linear regression given a dataset (X, y) and a %regularization parameter lambda % [theta] = TRAINLINEARREG (X, y, lambda) trains linear regression using % the dataset (X, y) and regularization parameter lambda. Returns the % trained parameters theta. %% Initialize Theta initial_theta = zeros(size(X, 2), 1); % Create "short hand" for the cost function to be minimized costFunction = @(t) linearRegCostFunction(X, y, t, lambda);% Now, costFunction is a function that takes in only one argument options = optimset('MaxIter', 200, 'GradObj', 'on');% Minimize using fmincg theta = fmincg(costFunction, initial_theta, options);end

⑤線性回歸模型的圖形化表示

上面已經(jīng)通過(guò) fmincg 求得了模型參數(shù)了，那么我們求得的模型與數(shù)據(jù)的擬合程度怎樣呢？看下圖：

從上圖中可以看出，由于我們的數(shù)據(jù)是二維的，但是卻用一個(gè)線性模型去擬合，故很明顯出現(xiàn)了 underfiting problem

在這里，我們很容易將模型以圖形化方式表現(xiàn)出來(lái)，因?yàn)?#xff0c;我們的訓(xùn)練數(shù)據(jù)的特征很少(一維)。當(dāng)訓(xùn)練數(shù)據(jù)的特征很多(feature variables)時(shí)，就很難畫(huà)圖了(三維以上很難直接用圖形表示了...)。這時(shí)，就需要用 “學(xué)習(xí)曲線”來(lái)檢查訓(xùn)練出來(lái)的模型與數(shù)據(jù)是否很好地?cái)M合了。

⑥偏差與方差之間的權(quán)衡

高偏差---欠擬合，underfit

高方差---過(guò)擬合，overfit

可以用學(xué)習(xí)曲線(learning curve)來(lái)診斷偏差--方差問(wèn)題。學(xué)習(xí)曲線的 x 軸是訓(xùn)練集大小(training set size)，y 軸則是交叉驗(yàn)證誤差和訓(xùn)練誤差。

訓(xùn)練誤差的定義如下：

注意：訓(xùn)練誤差J_train(θ)是沒(méi)有正則化項(xiàng)的，因此在調(diào)用linearRegCostFunction時(shí)，lambda==0。Octave實(shí)現(xiàn)如下(learningCurve.m)

function [error_train, error_val] = ...learningCurve(X, y, Xval, yval, lambda) %LEARNINGCURVE Generates the train and cross validation set errors needed %to plot a learning curve % [error_train, error_val] = ... % LEARNINGCURVE(X, y, Xval, yval, lambda) returns the train and % cross validation set errors for a learning curve. In particular, % it returns two vectors of the same length - error_train and % error_val. Then, error_train(i) contains the training error for % i examples (and similarly for error_val(i)). % % In this function, you will compute the train and test errors for % dataset sizes from 1 up to m. In practice, when working with larger % datasets, you might want to do this in larger intervals. %% Number of training examples m = size(X, 1);% You need to return these values correctly error_train = zeros(m, 1); error_val = zeros(m, 1);% ====================== YOUR CODE HERE ====================== % Instructions: Fill in this function to return training errors in % error_train and the cross validation errors in error_val. % i.e., error_train(i) and % error_val(i) should give you the errors % obtained after training on i examples. % % Note: You should evaluate the training error on the first i training % examples (i.e., X(1:i, :) and y(1:i)). % % For the cross-validation error, you should instead evaluate on % the _entire_ cross validation set (Xval and yval). % % Note: If you are using your cost function (linearRegCostFunction) % to compute the training and cross validation error, you should % call the function with the lambda argument set to 0. % Do note that you will still need to use lambda when running % the training to obtain the theta parameters. % % Hint: You can loop over the examples with the following: % % for i = 1:m % % Compute train/cross validation errors using training examples % % X(1:i, :) and y(1:i), storing the result in % % error_train(i) and error_val(i) % .... % % end %% ---------------------- Sample Solution ----------------------for i = 1:mtheta = trainLinearReg(X(1:i, :), y(1:i), lambda);error_train(i) = linearRegCostFunction(X(1:i, :), y(1:i), theta, 0);error_val(i) = linearRegCostFunction(Xval, yval, theta, 0);% -------------------------------------------------------------% =========================================================================end

學(xué)習(xí)曲線的圖形如下：可以看出欠擬合時(shí)，在 training examples 數(shù)目很少時(shí)，訓(xùn)練出來(lái)的模型還能擬合"一點(diǎn)點(diǎn)數(shù)據(jù)"，故訓(xùn)練誤差相對(duì)較小；但對(duì)于交叉驗(yàn)證誤差而言，它是使用未知的數(shù)據(jù)得算出來(lái)到的，而現(xiàn)在模型欠擬合，故幾乎不能擬合未知的數(shù)據(jù)，因此交叉驗(yàn)證誤差非常大。

隨著 training examples 數(shù)目的增多，由于欠擬合，訓(xùn)練出來(lái)的模型越來(lái)越來(lái)能擬合一些數(shù)據(jù)了，故訓(xùn)練誤差增大了。而對(duì)于交叉驗(yàn)證誤差而言，最終慢慢地與訓(xùn)練誤差一致并變得越來(lái)越平坦，此時(shí)，再增加訓(xùn)練樣本(training examples)已經(jīng)對(duì)模型的訓(xùn)練效果沒(méi)有太大影響了---在欠擬合情況下，再增加訓(xùn)練集的個(gè)數(shù)也不能再降低訓(xùn)練誤差了。

⑦多項(xiàng)式回歸

從上面的學(xué)習(xí)曲線圖形可以看出：出現(xiàn)了underfit problem，通過(guò)添加更多的特征(features)，使用更高冪次的多項(xiàng)式來(lái)作為假設(shè)函數(shù)擬合數(shù)據(jù)，以解決欠擬合問(wèn)題。

多項(xiàng)式回歸模型的假設(shè)函數(shù)如下：

通過(guò)對(duì)特征“擴(kuò)充”，以添加更多的features，代碼實(shí)現(xiàn)如下：polyFeatures.m

for i = 1:pX_poly(:,i) = X.^i; end

“擴(kuò)充”了特征之后，就變成了多項(xiàng)式回歸了，但由于多項(xiàng)式回歸的特征取值范圍差距太大（比如有些特征的取值很小，而有些特征的取值非常大），故需要用到Normalization(歸一化)，歸一化的代碼如下：

function [X_norm, mu, sigma] = featureNormalize(X) %FEATURENORMALIZE Normalizes the features in X % FEATURENORMALIZE(X) returns a normalized version of X where % the mean value of each feature is 0 and the standard deviation % is 1. This is often a good preprocessing step to do when % working with learning algorithms.mu = mean(X); X_norm = bsxfun(@minus, X, mu);sigma = std(X_norm); X_norm = bsxfun(@rdivide, X_norm, sigma);% ============================================================end

繼續(xù)再用原來(lái)的linearRegCostFunction.m計(jì)算多項(xiàng)式回歸的代價(jià)函數(shù)和梯度，得到的多項(xiàng)式回歸模型的假設(shè)函數(shù)的圖形如下：（注意：lambda==0，沒(méi)有使用正則化）：

從多項(xiàng)式回歸模型的圖形看出：它幾乎很好地?cái)M合了所有的訓(xùn)練樣本數(shù)據(jù)。因此，可認(rèn)為出現(xiàn)了：過(guò)擬合問(wèn)題(overfit problem)---高方差

多項(xiàng)式回歸模型的學(xué)習(xí)曲線圖形如下：

從多項(xiàng)式回歸的學(xué)習(xí)曲線圖形看出：訓(xùn)練誤差幾乎為0(非常貼近 x 軸了)，這正是因?yàn)檫^(guò)擬合---模型幾乎完美地穿過(guò)了訓(xùn)練數(shù)據(jù)集中的每個(gè)數(shù)據(jù)點(diǎn)，從而訓(xùn)練誤差非常小。

交叉驗(yàn)證誤差先是很大(訓(xùn)練樣本數(shù)目為2時(shí))，然后隨著訓(xùn)練樣本數(shù)目的增多，cross validation error 變得越來(lái)越小了(訓(xùn)練樣本數(shù)目2 增加到 5 過(guò)程中)；然后，當(dāng)訓(xùn)練樣本數(shù)目再增多時(shí)（11個(gè)以上的訓(xùn)練樣本時(shí)...），交叉驗(yàn)證誤差又變得大了(過(guò)擬合導(dǎo)致泛化能力下降)。

⑧使用正則化來(lái)解決多項(xiàng)化回歸模型的過(guò)擬合問(wèn)題

設(shè)置正則化項(xiàng) lambda == 1(λ==1)時(shí)，得到的模型假設(shè)函數(shù)圖形如下：

可以看出：這里的擬合曲線不再是 lambda == 0 時(shí) 那樣彎彎曲曲的了，也不是非常精準(zhǔn)地穿過(guò)每一個(gè)點(diǎn)，而是變得相對(duì)比較平滑。這正是正則化的效果。

lambda==1 (λ==1) 時(shí)的學(xué)習(xí)曲線如下：

lambda==1時(shí)的學(xué)習(xí)曲線表明：該模型有較好的泛化能力，能夠?qū)ξ粗臄?shù)據(jù)進(jìn)行較好的預(yù)測(cè)。因?yàn)?#xff0c;它的交叉驗(yàn)證誤差和訓(xùn)練誤差非常接近，且非常小。（訓(xùn)練誤差小，表明模型能很好地?cái)M合數(shù)據(jù)，但有可能出現(xiàn)過(guò)擬合的問(wèn)題，過(guò)擬合時(shí)，是不能很好地對(duì)未知數(shù)據(jù)進(jìn)行預(yù)測(cè)的；而此處交叉驗(yàn)證誤差也小，表明模型也能夠很好地對(duì)未知數(shù)據(jù)進(jìn)行預(yù)測(cè)）

最后來(lái)看下，多項(xiàng)式回歸模型的正則化參數(shù) lambda == 100(λ==100)時(shí)的情況：（出現(xiàn)了underfit problem--欠擬合--高偏差）

模型“假設(shè)函數(shù)”曲線如下：

學(xué)習(xí)曲線圖形如下：

⑨如何自動(dòng)選擇合適的正則化參數(shù) lambda(λ) ？

從第⑧點(diǎn)中看出：正則化參數(shù) lambda(λ) 等于0時(shí)，出現(xiàn)了過(guò)擬合， lambda(λ) 等于100時(shí)，又出現(xiàn)了欠擬合， lambda(λ) 等于1時(shí)，模型剛剛好。

那在訓(xùn)練過(guò)程中如何自動(dòng)選擇合適的lambda參數(shù)呢？

可以使用交叉驗(yàn)證集(根據(jù)交叉驗(yàn)證誤差來(lái)選擇合適的 lambda 參數(shù))

Concretely, you will use a cross validation set to evaluate how good each lambda value is.
After selecting the best lambda value using the cross validation set,
we can then evaluate the model on the test set to estimate how well the model will perform on actual unseen data.

?具體的選擇方法如下：

首先有一系列的待選擇的 lambda(λ) 值，在本λ作業(yè)中用一個(gè)lambda_vec向量保存這些 lambda 值（一共有10個(gè)）：

lambda_vec = [0 0.001 0.003 0.01 0.03 0.1 0.3 1 3 10]'

然后，使用訓(xùn)練數(shù)據(jù)集針對(duì)這10個(gè) lambda 分別訓(xùn)練 10個(gè)正則化的模型。然后對(duì)每個(gè)訓(xùn)練出來(lái)的模型，計(jì)算它的交叉驗(yàn)證誤差，選擇交叉驗(yàn)證誤差最小的那個(gè)模型所對(duì)應(yīng)的lambda(λ)值，作為最適合的 λ 。（注意：在計(jì)算訓(xùn)練誤差和交叉驗(yàn)證誤差時(shí)，是沒(méi)有正則化項(xiàng)的，相當(dāng)于 lambda==0）

for i = 1:length(lambda_vec)theta = trainLinearReg(X,y,lambda_vec(i));%對(duì)于每個(gè)lambda,訓(xùn)練出模型參數(shù)theta%compute jcv and jval without regularization,causse last arguments(lambda) is zero error_train(i) = linearRegCostFunction(X, y, theta, 0);%計(jì)算訓(xùn)練誤差error_val(i) = linearRegCostFunction(Xval, yval, theta, 0);%計(jì)算交叉驗(yàn)證誤差 end

對(duì)于這10個(gè)不同的 lambda，計(jì)算出來(lái)的訓(xùn)練誤差和交叉驗(yàn)證誤差如下：

lambda Train Error Validation Error0.000000 0.173616 22.0666020.001000 0.156653 18.5976380.003000 0.190298 19.9815030.010000 0.221975 16.9690870.030000 0.281852 12.8290030.100000 0.459318 7.5870130.300000 0.921760 4.6368331.000000 2.076188 4.2606253.000000 4.901351 3.82290710.000000 16.092213 9.945508

訓(xùn)練誤差、交叉驗(yàn)證誤差以及 lambda 之間的關(guān)系圖形表示如下：

當(dāng) lambda >= 3 的時(shí)候，交叉驗(yàn)證誤差開(kāi)始上升，如果再增大 lambda 就可能出現(xiàn)欠擬合了...

從上面看出：lambda == 3 時(shí)，交叉驗(yàn)證誤差最小。lambda==3時(shí)的擬合曲線如下：（可與 lambda==1時(shí)的擬合曲線及學(xué)習(xí)曲線對(duì)比一下，看有啥不同）

學(xué)習(xí)曲線如下：

轉(zhuǎn)載于:https://www.cnblogs.com/abella/p/10348460.html

總結(jié)

以上是生活随笔為你收集整理的Andrew NG 机器学习编程作业5 Octave的全部?jī)?nèi)容，希望文章能夠幫你解決所遇到的問(wèn)題。

如果覺(jué)得生活随笔網(wǎng)站內(nèi)容還不錯(cuò)，歡迎將生活随笔推薦給好友。