當(dāng)前位置：首頁(yè) > 人工智能 > pytorch >内容正文

pytorch

吴恩达深度学习笔记（四）—— 正则化

發(fā)布時(shí)間：2024/1/17 pytorch 28 豆豆

生活随笔收集整理的這篇文章主要介紹了吴恩达深度学习笔记（四）—— 正则化小編覺(jué)得挺不錯(cuò)的,現(xiàn)在分享給大家,幫大家做個(gè)參考.

有關(guān)正則化的詳細(xì)內(nèi)容：

吳恩達(dá)機(jī)器學(xué)習(xí)筆記（三） —— Regularization正則化

《機(jī)器學(xué)習(xí)實(shí)戰(zhàn)》學(xué)習(xí)筆記第五章 —— Logistic回歸

主要內(nèi)容：

一.無(wú)正則化

二.L2正則化

三.Dropout正則化

一.無(wú)正則化

深度學(xué)習(xí)的訓(xùn)練模型如下（可接受“無(wú)正則化”、“L2正則化”、“Dropout正則化”三種方式）：

def model(X, Y, learning_rate = 0.3, num_iterations = 30000, print_cost = True, lambd = 0, keep_prob = 1):"""Implements a three-layer neural network: LINEAR->RELU->LINEAR->RELU->LINEAR->SIGMOID.Arguments:X -- input data, of shape (input size, number of examples)Y -- true "label" vector (1 for blue dot / 0 for red dot), of shape (output size, number of examples)learning_rate -- learning rate of the optimizationnum_iterations -- number of iterations of the optimization loopprint_cost -- If True, print the cost every 10000 iterationslambd -- regularization hyperparameter, scalarkeep_prob - probability of keeping a neuron active during drop-out, scalar.Returns:parameters -- parameters learned by the model. They can then be used to predict."""grads = {}costs = [] # to keep track of the costm = X.shape[1] # number of exampleslayers_dims = [X.shape[0], 20, 3, 1]# Initialize parameters dictionary.parameters = initialize_parameters(layers_dims)# Loop (gradient descent)for i in range(0, num_iterations):# Forward propagation: LINEAR -> RELU -> LINEAR -> RELU -> LINEAR -> SIGMOID.if keep_prob == 1:a3, cache = forward_propagation(X, parameters)elif keep_prob < 1:a3, cache = forward_propagation_with_dropout(X, parameters, keep_prob)# Cost functionif lambd == 0:cost = compute_cost(a3, Y)else:cost = compute_cost_with_regularization(a3, Y, parameters, lambd)# Backward propagation.assert(lambd==0 or keep_prob==1) # it is possible to use both L2 regularization and dropout, # but this assignment will only explore one at a timeif lambd == 0 and keep_prob == 1:grads = backward_propagation(X, Y, cache)elif lambd != 0:grads = backward_propagation_with_regularization(X, Y, cache, lambd)elif keep_prob < 1:grads = backward_propagation_with_dropout(X, Y, cache, keep_prob)# Update parameters.parameters = update_parameters(parameters, grads, learning_rate)# Print the loss every 10000 iterationsif print_cost and i % 10000 == 0:print("Cost after iteration {}: {}".format(i, cost))if print_cost and i % 1000 == 0:costs.append(cost)# plot the cost plt.plot(costs)plt.ylabel('cost')plt.xlabel('iterations (x1,000)')plt.title("Learning rate =" + str(learning_rate))plt.show()return parameters View Code

對(duì)于無(wú)正則化的，直接帶入必須參數(shù)即可。測(cè)試效果如下：

parameters = model(train_X, train_Y) print ("On the training set:") predictions_train = predict(train_X, train_Y, parameters) print ("On the test set:") predictions_test = predict(test_X, test_Y, parameters)

plt.title("Model without regularization") axes = plt.gca() axes.set_xlim([-0.75,0.40]) axes.set_ylim([-0.75,0.65]) plot_decision_boundary(lambda x: predict_dec(parameters, x.T), train_X, train_Y[0])

從圖像可以看出，無(wú)正則化的神經(jīng)網(wǎng)絡(luò)，存在過(guò)擬合的問(wèn)題，即便測(cè)試準(zhǔn)確率已經(jīng)比較高了。

二.L2正則化

由于使用了正則項(xiàng)，則計(jì)算代價(jià)的函數(shù)和反向傳播的函數(shù)需要做相應(yīng)的修改：

代價(jià)函數(shù)：

# GRADED FUNCTION: compute_cost_with_regularizationdef compute_cost_with_regularization(A3, Y, parameters, lambd):"""Implement the cost function with L2 regularization. See formula (2) above.Arguments:A3 -- post-activation, output of forward propagation, of shape (output size, number of examples)Y -- "true" labels vector, of shape (output size, number of examples)parameters -- python dictionary containing parameters of the modelReturns:cost - value of the regularized loss function (formula (2))"""m = Y.shape[1]W1 = parameters["W1"]W2 = parameters["W2"]W3 = parameters["W3"]cross_entropy_cost = compute_cost(A3, Y) # This gives you the cross-entropy part of the cost### START CODE HERE ### (approx. 1 line)L2_regularization_cost = lambd/(2*m)*(np.sum(W1**2) + np.sum(W2**2) + np.sum(W3**2))### END CODER HERE ### cost = cross_entropy_cost + L2_regularization_costreturn cost View Code

反向傳播：

# GRADED FUNCTION: backward_propagation_with_regularizationdef backward_propagation_with_regularization(X, Y, cache, lambd):"""Implements the backward propagation of our baseline model to which we added an L2 regularization.Arguments:X -- input dataset, of shape (input size, number of examples)Y -- "true" labels vector, of shape (output size, number of examples)cache -- cache output from forward_propagation()lambd -- regularization hyperparameter, scalarReturns:gradients -- A dictionary with the gradients with respect to each parameter, activation and pre-activation variables"""m = X.shape[1](Z1, A1, W1, b1, Z2, A2, W2, b2, Z3, A3, W3, b3) = cachedZ3 = A3 - Y### START CODE HERE ### (approx. 1 line)dW3 = 1./m * np.dot(dZ3, A2.T) + lambd/m*W3### END CODE HERE ###db3 = 1./m * np.sum(dZ3, axis=1, keepdims = True)dA2 = np.dot(W3.T, dZ3)dZ2 = np.multiply(dA2, np.int64(A2 > 0))### START CODE HERE ### (approx. 1 line)dW2 = 1./m * np.dot(dZ2, A1.T) + + lambd/m*W2### END CODE HERE ###db2 = 1./m * np.sum(dZ2, axis=1, keepdims = True)dA1 = np.dot(W2.T, dZ2)dZ1 = np.multiply(dA1, np.int64(A1 > 0))### START CODE HERE ### (approx. 1 line)dW1 = 1./m * np.dot(dZ1, X.T) + + lambd/m*W1### END CODE HERE ###db1 = 1./m * np.sum(dZ1, axis=1, keepdims = True)gradients = {"dZ3": dZ3, "dW3": dW3, "db3": db3,"dA2": dA2,"dZ2": dZ2, "dW2": dW2, "db2": db2, "dA1": dA1, "dZ1": dZ1, "dW1": dW1, "db1": db1}return gradients View Code

當(dāng)正則項(xiàng)系數(shù)為0.7時(shí)，其測(cè)試效果如下：

parameters = model(train_X, train_Y, lambd = 0.7) print ("On the train set:") predictions_train = predict(train_X, train_Y, parameters) print ("On the test set:") predictions_test = predict(test_X, test_Y, parameters)

plt.title("Model with L2-regularization") axes = plt.gca() axes.set_xlim([-0.75,0.40]) axes.set_ylim([-0.75,0.65]) plot_decision_boundary(lambda x: predict_dec(parameters, x.T), train_X, train_Y[0])

從分類邊界可以看出，使用L2正則化的神經(jīng)網(wǎng)絡(luò)沒(méi)有了過(guò)擬合的問(wèn)題，測(cè)試準(zhǔn)確率也比無(wú)正則化的模型高，以此可以證明適當(dāng)?shù)恼齽t化可以增強(qiáng)模型對(duì)數(shù)據(jù)的泛化能力。

三.Dropout正則化

（疑問(wèn)：在前向傳播時(shí)，A[l]需要除以keep_prob以保持A[l]的取值規(guī)模。在發(fā)現(xiàn)傳播時(shí)，根據(jù)逆過(guò)程的一般思路，dA[l]為什么不是乘上keep_prob，而是又要除以keep_prob呢？）

1.Dropout正則化，并非如L2正則化那樣以數(shù)學(xué)的形式對(duì)代價(jià)函數(shù)引入一個(gè)正則項(xiàng)來(lái)降低參數(shù)的規(guī)模，而是一個(gè)感性的、較為直觀的正則化方式。它的做法是：在每次跌在中，隨機(jī)關(guān)閉一些結(jié)點(diǎn)進(jìn)行前向傳播和反向傳播、更新參數(shù)。每一次迭代所關(guān)閉掉的結(jié)點(diǎn)可能都不一樣。

2.為什么這樣隨機(jī)關(guān)閉掉結(jié)點(diǎn)的做法可以實(shí)現(xiàn)正則化呢？

因?yàn)樵诿看蔚倪^(guò)程中，實(shí)際所訓(xùn)練的模型都是不同的，因?yàn)槲覀冇玫降闹皇浅跏寄Ｐ偷碾S機(jī)子集。由于每個(gè)結(jié)點(diǎn)在一輪迭代中可能消失，所以一個(gè)結(jié)點(diǎn)對(duì)于其他結(jié)點(diǎn)的敏感度降低。這里點(diǎn)自己還沒(méi)能理解，所以直接看看原話吧：

3.前向傳播中Dropout的具體步驟如下：

代碼實(shí)現(xiàn)：

# GRADED FUNCTION: forward_propagation_with_dropoutdef forward_propagation_with_dropout(X, parameters, keep_prob = 0.5):"""Implements the forward propagation: LINEAR -> RELU + DROPOUT -> LINEAR -> RELU + DROPOUT -> LINEAR -> SIGMOID.Arguments:X -- input dataset, of shape (2, number of examples)parameters -- python dictionary containing your parameters "W1", "b1", "W2", "b2", "W3", "b3":W1 -- weight matrix of shape (20, 2)b1 -- bias vector of shape (20, 1)W2 -- weight matrix of shape (3, 20)b2 -- bias vector of shape (3, 1)W3 -- weight matrix of shape (1, 3)b3 -- bias vector of shape (1, 1)keep_prob - probability of keeping a neuron active during drop-out, scalarReturns:A3 -- last activation value, output of the forward propagation, of shape (1,1)cache -- tuple, information stored for computing the backward propagation"""np.random.seed(1)# retrieve parametersW1 = parameters["W1"]b1 = parameters["b1"]W2 = parameters["W2"]b2 = parameters["b2"]W3 = parameters["W3"]b3 = parameters["b3"]# LINEAR -> RELU -> LINEAR -> RELU -> LINEAR -> SIGMOIDZ1 = np.dot(W1, X) + b1A1 = relu(Z1)### START CODE HERE ### (approx. 4 lines) # Steps 1-4 below correspond to the Steps 1-4 described above. D1 = np.random.rand(A1.shape[0],1) # Step 1: initialize matrix D1 = np.random.rand(..., ...)D1 = D1 < keep_prob # Step 2: convert entries of D1 to 0 or 1 (using keep_prob as the threshold)A1 = A1 * D1 # Step 3: shut down some neurons of A1A1 = A1 / keep_prob # Step 4: scale the value of neurons that haven't been shut down### END CODE HERE ###Z2 = np.dot(W2, A1) + b2A2 = relu(Z2)### START CODE HERE ### (approx. 4 lines)D2 = np.random.rand(A2.shape[0],1) # Step 1: initialize matrix D2 = np.random.rand(..., ...)D2 = D2 < keep_prob # Step 2: convert entries of D2 to 0 or 1 (using keep_prob as the threshold)A2 = A2 * D2 # Step 3: shut down some neurons of A2A2 = A2 / keep_prob # Step 4: scale the value of neurons that haven't been shut down### END CODE HERE ###Z3 = np.dot(W3, A2) + b3A3 = sigmoid(Z3)cache = (Z1, D1, A1, W1, b1, Z2, D2, A2, W2, b2, Z3, A3, W3, b3)return A3, cache View Code

4.反向傳播中Dropout的具體步驟如下：

代碼實(shí)現(xiàn)：

# GRADED FUNCTION: backward_propagation_with_dropoutdef backward_propagation_with_dropout(X, Y, cache, keep_prob):"""Implements the backward propagation of our baseline model to which we added dropout.Arguments:X -- input dataset, of shape (2, number of examples)Y -- "true" labels vector, of shape (output size, number of examples)cache -- cache output from forward_propagation_with_dropout()keep_prob - probability of keeping a neuron active during drop-out, scalarReturns:gradients -- A dictionary with the gradients with respect to each parameter, activation and pre-activation variables"""m = X.shape[1](Z1, D1, A1, W1, b1, Z2, D2, A2, W2, b2, Z3, A3, W3, b3) = cachedZ3 = A3 - YdW3 = 1./m * np.dot(dZ3, A2.T)db3 = 1./m * np.sum(dZ3, axis=1, keepdims = True)dA2 = np.dot(W3.T, dZ3)### START CODE HERE ### (≈ 2 lines of code)dA2 = dA2 * D2 # Step 1: Apply mask D2 to shut down the same neurons as during the forward propagationdA2 = dA2 / keep_prob # Step 2: Scale the value of neurons that haven't been shut down### END CODE HERE ###dZ2 = np.multiply(dA2, np.int64(A2 > 0))dW2 = 1./m * np.dot(dZ2, A1.T)db2 = 1./m * np.sum(dZ2, axis=1, keepdims = True)dA1 = np.dot(W2.T, dZ2)### START CODE HERE ### (≈ 2 lines of code)dA1 = dA1 * D1 # Step 1: Apply mask D1 to shut down the same neurons as during the forward propagationdA1 = dA1 / keep_prob # Step 2: Scale the value of neurons that haven't been shut down### END CODE HERE ###dZ1 = np.multiply(dA1, np.int64(A1 > 0))dW1 = 1./m * np.dot(dZ1, X.T)db1 = 1./m * np.sum(dZ1, axis=1, keepdims = True)gradients = {"dZ3": dZ3, "dW3": dW3, "db3": db3,"dA2": dA2,"dZ2": dZ2, "dW2": dW2, "db2": db2, "dA1": dA1, "dZ1": dZ1, "dW1": dW1, "db1": db1}return gradients View Code

5.測(cè)試效果：

parameters = model(train_X, train_Y, keep_prob = 0.86, learning_rate = 0.3) print ("On the train set:") predictions_train = predict(train_X, train_Y, parameters) print ("On the test set:") predictions_test = predict(test_X, test_Y, parameters)

可以看出，三者之中Dropout的測(cè)試準(zhǔn)確率是最高的，所以說(shuō)明Dropout正則化是可行的。

轉(zhuǎn)載于:https://www.cnblogs.com/DOLFAMINGO/p/9737325.html

總結(jié)

以上是生活随笔為你收集整理的吴恩达深度学习笔记（四）—— 正则化的全部?jī)?nèi)容，希望文章能夠幫你解決所遇到的問(wèn)題。

如果覺(jué)得生活随笔網(wǎng)站內(nèi)容還不錯(cuò)，歡迎將生活随笔推薦給好友。

上一篇：【TensorFlow官方文档】MNIS
下一篇：人脸识别核心算法