當(dāng)前位置：首頁 > 人工智能 > pytorch >内容正文

pytorch

深度学习之一【神经网络介绍】

發(fā)布時間：2023/12/20 pytorch 27 豆豆

生活随笔收集整理的這篇文章主要介紹了深度学习之一【神经网络介绍】小編覺得挺不錯的,現(xiàn)在分享給大家,幫大家做個參考.

感知器方程公式定義

一般的二維線性分類預(yù)測公式：
y hat = W * x + b

一般的三維線性分類預(yù)測公式：
y hat = W1 * x1 + W2 * x2 + W3 * x3 + b

如果是n維線性分類預(yù)測公式：
y hat = W1 * x1 + W2 * x2 + ... + Wn * xn + b

感知器就是矩陣相乘，并應(yīng)用S型函數(shù)的結(jié)果

1.為什么感知器被稱作為“神經(jīng)網(wǎng)絡(luò)“？

因為感知器結(jié)構(gòu)和大腦神經(jīng)元的結(jié)構(gòu)很相似
感知器的作用是利用某一方程組對輸入進行計算，并決定返回 1 或 0
而大腦神經(jīng)元從它的多個樹突獲得輸入（這些輸入是神經(jīng)脈沖），所以神經(jīng)元的作用就是對神經(jīng)脈沖進行處理，然后判斷是否通過軸突輸出神經(jīng)脈沖

所以神經(jīng)網(wǎng)絡(luò)中的神經(jīng)元就是把一個神經(jīng)元的輸出作為另一個神經(jīng)元的輸入

2.感知器的邏輯運算符

AND、OR、NOT 和 XOR

有趣的現(xiàn)象是某些邏輯運算符可以表示為感知器，例如：邏輯與運算（AND）是如何進行的？

AND 感知器，與運算就是true和true就是true，true和false就是false，false和true就是false，false和false還是false
OR 感知器，或運算 OR 感知器和 AND 感知器很相似，我們可以增大權(quán)重，減小偏差就是 OR 感知器
NOT 感知器，非運算如果是1就返回0，如果是0就返回1
XOR 感知器，異或運算一個為true，另一個為false

小測試

將權(quán)重（weight1、weight2）和偏差 bias 設(shè)為正確的值，以便如上所示地計算 AND 運算。

import pandas as pd# 設(shè)置 weight1, weight2, 和 bias weight1 = 0.0 weight2 = 0.0 bias = 0.0# 不要修改下面的任何代碼 # 這是輸入和輸出 test_inputs = [(0, 0), (0, 1), (1, 0), (1, 1)] correct_outputs = [False, False, False, True] outputs = []for test_input, correct_output in zip(test_inputs, correct_outputs):linear_combination = weight1 * test_input[0] + weight2 * test_input[1] + biasprint(linear_combination)output = int(linear_combination >= 0)is_correct_string = 'Yes' if output == correct_output else 'No'outputs.append([test_input[0], test_input[1], linear_combination, output, is_correct_string])print(outputs) # Print output num_wrong = len([output[4] for output in outputs if output[4] == 'No']) output_frame = pd.DataFrame(outputs, columns=['Input 1', ' Input 2', ' Linear Combination', ' Activation Output', ' Is Correct']) print(num_wrong) if not num_wrong:print('太棒了! 你已經(jīng)掌握了如何設(shè)置權(quán)重和偏差.\n') else:print('錯了 {} 個. 繼續(xù)努力!\n'.format(num_wrong)) print(output_frame.to_string(index=False))

答案是：

weight1 = 1.0 weight2 = 1.0 bias = -2

感知器算法視頻

參考視頻：https://www.youtube.com/embed/M9c9bN5nJ3U

import numpy as np # Setting the random seed, feel free to change it and see different solutions. np.random.seed(12)def stepFunction(t):if t >= 0:return 1return 0def prediction(X, W, b):return stepFunction((np.matmul(X,W)+b)[0])# TODO: Fill in the code below to implement the perceptron trick. # The function should receive as inputs the data X, the labels y, # the weights W (as an array), and the bias b, # update the weights and bias W, b, according to the perceptron algorithm, # and return W and b. def perceptronStep(X, y, W, b, learn_rate = 0.01):for i in range(len(X)):y_hat = prediction(X[i],W,b)if y[i]-y_hat == 1:W[0] += X[i][0]*learn_rateW[1] += X[i][1]*learn_rateb += learn_rateelif y[i]-y_hat == -1:W[0] -= X[i][0]*learn_rateW[1] -= X[i][1]*learn_rateb -= learn_ratereturn W, b# This function runs the perceptron algorithm repeatedly on the dataset, # and returns a few of the boundary lines obtained in the iterations, # for plotting purposes. # Feel free to play with the learning rate and the num_epochs, # and see your results plotted below. def trainPerceptronAlgorithm(X, y, learn_rate = 0.01, num_epochs = 25):x_min, x_max = min(X.T[0]), max(X.T[0])y_min, y_max = min(X.T[1]), max(X.T[1])W = np.array(np.random.rand(2,1))b = np.random.rand(1)[0] + x_max# These are the solution lines that get plotted below.boundary_lines = []for i in range(num_epochs):# In each epoch, we apply the perceptron step.W, b = perceptronStep(X, y, W, b, learn_rate)boundary_lines.append((-W[0]/W[1], -b/W[1]))return boundary_lines

3.非線性數(shù)據(jù)

一條直線能完好分割的數(shù)據(jù)，就是線性的
如果一條直線不能完好分割的數(shù)據(jù)，就是非線性的

在非線性的數(shù)據(jù)里，我們需要借助誤差函數(shù)（Error Function）來達到目的，誤差函數(shù)越小，離目標(biāo)越近。

對于優(yōu)化而言，連續(xù)性誤差函數(shù)比離散型誤差函數(shù)更好。

那如果將離散型誤差函數(shù)轉(zhuǎn)變成連續(xù)性誤差函數(shù)了？
- 1.離散性可以用 0或1來表示，y = 1 if x >= 0 else 0
- 2.連續(xù)性可以用概率來表示，sigmoid 函數(shù)，公式為： y = 1 / (1 + exp(-x))
- 3.對于離散性的激活函數(shù)，我們用階躍函數(shù)(Step Function) ，step(Wx + b)
- 4.對于連續(xù)性激活函數(shù)，我們用 S 函數(shù)，Sigmoid(Wx + b)

參考視頻：https://www.youtube.com/embed/Rm2KxFaPiJg

4.多類別分類和softmax

對于之前的二分類問題，我們得到的結(jié)果要么是1，要么是0。但是如果我們希望有多個類別了？比如：結(jié)果是黃色，綠色，還是藍色？貓，狗，還是老虎？

指數(shù) (exp) 就是對數(shù)字進行平方運算，所以結(jié)果始終為正數(shù)

那如果有多個類別，各自的數(shù)字不一樣，比如：1，2，3，那如何讓他們的概率加起來等于1了？
公式就是：
概率1 = exp(1) / (exp(1) + exp(2) + exp(3))
概率2 = exp(2) / (exp(1) + exp(2) + exp(3))
概率3 = exp(3) / (exp(1) + exp(2) + exp(3))

softmax的公式函數(shù)

import numpy as npdef softmax(L):expl = np.exp(L) # 將L數(shù)組里的所有元素的值都進行指數(shù)運算sumExpl = sum(expl) # 對expl數(shù)組求和result = []for i in expl:result.append(i * 1.0 / sumExpl) # 計算數(shù)組里每個元素的概率值return result

另外，維基百科里有更簡單的寫法，參考鏈接：https://en.wikipedia.org/wiki/Softmax_function

>>> import numpy as np >>> z = [1.0, 2.0, 3.0, 4.0, 1.0, 2.0, 3.0] >>> softmax = lambda x : np.exp(x)/np.sum(np.exp(x)) >>> softmax(z)

5.One-Hot Encoding

什么是One-Hot Encoding

在數(shù)字電路的一組比特中，合法的組合值高位是1，低位是0
參考One-Hot

為什么要使用One-Hot Encoding

在機器學(xué)習(xí)領(lǐng)域，為了ML算法更好的運算，我們會將輸入都轉(zhuǎn)換成One-Hot Encoding，那就意味著，所有的輸入都是1或者0，那么對于兩個類別表示起來就簡單了。

比如：你收到禮物就是1，沒有收到就是0。
但是，假如有多個類別，有鴨子，海象，海貍，對這樣的多類別進行One-Hot Encoding處理，那如何做了？如圖所示，就是對每個類別分開分類，形成矩陣，然后對自己就是1，對別的類別就是0

6.最大似然數(shù) Maximum Likelihood

概率（probability）對于深度學(xué)習(xí)來說，非常重要。
對于一個好的模型來說，最大化概率，也就會最小化誤差函數(shù)，這樣離預(yù)測目標(biāo)就更近。

對于計算概率來講，使用對數(shù)（log）是通常是非常好的選擇。即：log(ab) = log(a) + log(b)

在這里，我們要使用底數(shù)為e的對數(shù)（自然對數(shù)），而不是底數(shù)為10的對數(shù)

7.交叉熵 Cross Entropy

求概率的對數(shù)是負(fù)值，對它取相反數(shù)就會得到正數(shù)，最后對它們的相反數(shù)求和，就是交叉熵。
簡單地說：就是對對數(shù)的負(fù)數(shù)求和，就是交叉熵。

比如，模型A和模型B的交叉熵：
模型A 0.51 + 1.61 + 2.3 + 0.36 = 4.78
模型B 0.36 + 0.1 + 0.22 + 0.51 = 1.19
事實證明模型B的交叉熵較小。越準(zhǔn)確的模型可以得到較低的交叉熵

誤差較大的模型得到的交叉熵較高，反之，誤差較小的模型得到的交叉熵較小（模型就越優(yōu)）
這是因為，好模型有較高的概率，反之亦然。
所以，交叉熵可以告訴我們模型的好壞。

現(xiàn)在，我們的目標(biāo)是，使最大化概率變?yōu)樽钚』徊骒?/p>

我們得到的規(guī)律就是：概率和誤差函數(shù)之間肯定有一定的聯(lián)系，這種聯(lián)系就叫做交叉熵。

交叉熵公式：

import numpy as np# Write a function that takes as input two lists Y, P, # and returns the float corresponding to their cross-entropy. def cross_entropy(Y, P):Y = np.float_(Y)P = np.float_(P)return -np.sum(Y * np.log(P) + (1 - Y) * np.log(1 - P))

多類別交叉熵 Multi-Class Cross Entropy

8.Logistic回歸

在機器學(xué)習(xí)領(lǐng)域中，最有用的、最熱門的基石算法之一：對數(shù)幾率回歸算法
基本上是這樣的：

獲取數(shù)據(jù)
選擇一個隨機模型
計算誤差
最小化誤差，獲得更好的模型
完成

* 注意 *
- 1.線性回歸是盡可能靠近所有的點
- 2.logistic回歸是盡可能分開兩種點

9.梯度下降算法

# 激活 (sigmoid) 函數(shù) def sigmoid(x):return 1 / (1 + np.exp(-x))# 輸出 (prediction) 公式 def output_formula(features, weights, bias):return sigmoid(np.matmul(features, weights) + bias)# Error (log-loss) 公式 def error_formula(y, output):return - y*np.log(output) - (1 - y) * np.log(1-output)# Gradient descent step 梯度下降步數(shù)，也就是更新權(quán)重 def update_weights(x, y, weights, bias, learnrate):output = output_formula(x, weights, bias)d_error = -(y - output)weights -= learnrate * d_error * xbias -= learnrate * d_errorreturn weights, bias

感知器(perceptron)算法與梯度下降(gradient descent)算法的區(qū)別：
1.在感知器算法中，并非每個點都會更改權(quán)重，只有分類錯誤的點才會
2.在感知器算法中，y hat只能是1或者0；而在梯度下降算法中，y hat 可以是0到1之間的值
3.在梯度下降算法中，一次預(yù)測被正確分類的點會更改權(quán)重，讓分類直線離這個點遠(yuǎn)點，而分類錯誤的點也會更改權(quán)重，讓分類直線離這個點更近些

如圖，在 epochs=100 的情況下，看看直線的預(yù)測（移動）過程

看 Github代碼演示

10.神經(jīng)網(wǎng)絡(luò)架構(gòu)

現(xiàn)在我們可以將這些構(gòu)建基石組合到一起，并構(gòu)建出色的神經(jīng)網(wǎng)絡(luò)。（神經(jīng)網(wǎng)絡(luò)也可以稱之為多層感知器）

之前的都是線性的分類，現(xiàn)在我們對現(xiàn)有的線性模型進行線性組合，得到更復(fù)雜的新模型，使之成為非線性模型，如圖：

當(dāng)多個線性模型疊加計算時，每個點得出的結(jié)果基本都會大于1，但是對于概率來說，我們的點，必須在0到1之間，此時，我們的做法是將疊加計算的結(jié)果（大于1的值）轉(zhuǎn)換成0到1之間的概率，就需要用到上面所講到的 sigmoid 函數(shù)。

通常我們會將線性模型組合到一起，形成非線性模型，然后我們再將這些非線性模型進一步組合，形成更多的非線性模型，隨著更多的組合，就會更多的非線性模型，這就是深度神經(jīng)網(wǎng)絡(luò)。在深度神經(jīng)網(wǎng)絡(luò)中，這些中間的巨大的非線性模型就是隱藏層。

神經(jīng)網(wǎng)絡(luò)使用高度非線性化的邊界，拆分整個n維空間。

神經(jīng)網(wǎng)絡(luò)的多類別分類

之前的神經(jīng)網(wǎng)絡(luò)，我們預(yù)測的結(jié)果都是一個值。那假如，我們需要預(yù)測多個值，該怎么做了？
答案很簡單，就是在輸出層添加多個節(jié)點，每個節(jié)點都會告訴我們預(yù)測（輸出）的結(jié)果對應(yīng)的label 的得分是多少，然后通過 softmax函數(shù) 得到每個類別的概率，這就是神經(jīng)網(wǎng)絡(luò)進行多類別分類的方法。（softmax函數(shù)就是對多類別分類來計算概率的）

11.前向反饋 Feedforward

前向反饋是神經(jīng)網(wǎng)絡(luò)用來將輸入變成輸出的流程，也就是將一個輸出變成下一個輸入，依次類推，直到最后一個輸出的流程。

訓(xùn)練神經(jīng)網(wǎng)絡(luò)，實際上就是各邊的權(quán)重是多少，才能很好的對數(shù)據(jù)建模

12.反向傳播 Backpropagation

我們將要訓(xùn)練神經(jīng)網(wǎng)絡(luò)，就需要使用反向傳播，反向傳播的流程是：

1.進行前向反饋運算
2.將模型的輸出與期望的輸出進行比較
3.計算誤差
4.向后運行前向反饋運算（反向傳播），將誤差分散到每個權(quán)重上
5.更新權(quán)重，并獲得更好的模型
6.繼續(xù)此流程，直到獲得很好的模型

def error_term_formula(y, output):return (y-output) * output * (1 - output)

* 反向傳播正好是前向反饋的逆過程 *

13.平方平均誤差

平方平均誤差，可以表示預(yù)測值和標(biāo)簽值的差的平方的平均值

學(xué)習(xí)權(quán)重

使用感知器來構(gòu)建AND, OR, NOT 或 XOR運算，它們的權(quán)重都是人為設(shè)定的；那如果你要預(yù)測大學(xué)錄取結(jié)果，但你又不知道權(quán)重是多少，怎么辦？這就需要從樣本中學(xué)習(xí)權(quán)重，然后用這些權(quán)重來做預(yù)測。

預(yù)測指標(biāo)

對于模型預(yù)測的有多壞，或多好？我們可以使用誤差這個指標(biāo) 來衡量。一個普遍的指標(biāo)是誤差平方和（sum of the squared error 亦稱作SSE）

誤差平方和，可以用于衡量神經(jīng)網(wǎng)絡(luò)的預(yù)測效果；值越低，效果越好。

y hat 是預(yù)測值，y 是真實值。

梯度是改變率或者斜度的另一個稱呼。如果你需要回顧這個概念，可以看下可汗學(xué)院對這個問題的講解。

現(xiàn)在假設(shè)只有一個輸出單元，以下就是運行代碼的步驟，本次還是用sigmoid函數(shù)作為激活函數(shù)

# Defining the sigmoid function for activations # 定義 sigmoid 激活函數(shù)，用來計算隱藏層的輸出值 def sigmoid(x):return 1/(1+np.exp(-x))# Derivative of the sigmoid function # 激活函數(shù)的導(dǎo)數(shù)，用來計算梯度下降的輸出值 def sigmoid_prime(x):return sigmoid(x) * (1 - sigmoid(x))# Input data # 輸入數(shù)據(jù) x = np.array([0.1, 0.3]) # Target # 目標(biāo) y = 0.2 # Input to output weights # 輸入到輸出的權(quán)重 weights = np.array([-0.8, 0.5])# The learning rate, eta in the weight step equation # 權(quán)重更新的學(xué)習(xí)率 learnrate = 0.5# the linear combination performed by the node (h in f(h) and f'(h)) # 輸入和權(quán)重的線性組合 h = x[0]*weights[0] + x[1]*weights[1] # or h = np.dot(x, weights)# The neural network output (y-hat) # 神經(jīng)網(wǎng)絡(luò)輸出 nn_output = sigmoid(h)# output error (y - y-hat) # 輸出誤差 error = y - nn_output# output gradient (f'(h)) # 輸出梯度 output_grad = sigmoid_prime(h)# error term (lowercase delta) error_term = error * output_grad# Gradient descent step # 梯度下降一步 del_w = [ learnrate * error_term * x[0],learnrate * error_term * x[1]] # or del_w = learnrate * error_term * xprint('Neural Network output:') print(nn_output) print('Amount of Error:') print(error) print('Change in Weights:') print(del_w)

14.兩層神經(jīng)網(wǎng)絡(luò)

以一個簡單的兩層神經(jīng)網(wǎng)絡(luò)為例，計算其權(quán)重的更新過程。假設(shè)該神經(jīng)網(wǎng)絡(luò)包含兩個輸入值，一個隱藏節(jié)點和一個輸出節(jié)點，隱藏層和輸出層的激活函數(shù)都是 sigmoid，如下圖所示。（注意：圖底部的節(jié)點為輸入值，圖頂部的 y hat 為輸出值。輸入層不計入層數(shù)，所以該結(jié)構(gòu)被稱為兩層神經(jīng)網(wǎng)絡(luò)。）

如果你的神經(jīng)網(wǎng)絡(luò)有很多層，使用 sigmoid 激活函數(shù)會很快把靠近輸入層的權(quán)重步長降為很小的值，該問題稱作梯度消失。

代碼實例，正向傳播和反向傳播

import numpy as npdef sigmoid(x):return 1 / (1 + np.exp(-x))x = np.array([0.5, 0.1, -0.2]) target = 0.6 learnrate = 0.5weights_input_hidden = np.array([[0.5, -0.6],[0.1, -0.2],[0.1, 0.7]])weights_hidden_output = np.array([0.1, -0.3])## 正向傳播 hidden_layer_input = np.dot(x, weights_input_hidden) hidden_layer_output = sigmoid(hidden_layer_input)output_layer_in = np.dot(hidden_layer_output, weights_hidden_output) output = sigmoid(output_layer_in)## 反向傳播 ## 計算網(wǎng)絡(luò)輸出誤差 error = target - output# 計算輸出層誤差項 output_error_term = error * output * (1 - output) # 用反向傳播計算隱藏層誤差項 hidden_error_term = np.dot(output_error_term, weights_hidden_output) * \hidden_layer_output * (1 - hidden_layer_output)# 計算隱藏層的權(quán)重更新到輸出層 delta_w_h_o = learnrate * output_error_term * hidden_layer_output# 計算輸入層的權(quán)重更新到隱藏層 delta_w_i_h = learnrate * hidden_error_term * x[:, None]print('Change in weights for hidden layer to output layer:') print(delta_w_h_o) print('Change in weights for input layer to hidden layer:') print(delta_w_i_h)

反向傳播 Andrej Karpathy：是的，你應(yīng)該了解反向傳播
反向傳播 Andrej Karpathy：斯坦福的 CS231n 課程的一個視頻

15.訓(xùn)練神經(jīng)網(wǎng)絡(luò)

過擬合和欠擬合

我們使用兩幅圖來表示過擬合還是欠擬合，如下所示：

我們總是在尋找中間的那個模型，但是實際中遇到的情況是，模型不是欠擬合了，就是過擬合了，那怎么辦了？我們通常的做法是，選擇一個復(fù)雜一些的模型，然后通過一些技巧來防止過擬合。

防止過擬合的技巧：

1.早期停止

epochs的作用就是訓(xùn)練次數(shù)，在訓(xùn)練期間，隨著epochs的增加，訓(xùn)練誤差和測試誤差一開始降低，后來增大，而在中間的某個點就是最好的位置，我們就在最佳的epochs位置停止繼續(xù)訓(xùn)練，這就是早期停止（Early Stopping）。

2.正則化

L1正則化：支持稀疏向量，它表示較小權(quán)重趨向于0；
- 所以如果你想降低權(quán)重值，最終得到較小的數(shù)；這也有利于選中特征
L2正則化：不支持稀疏向量，它確保表示所有權(quán)重一致較小
- 一般用來訓(xùn)練模型，得出更好的結(jié)果

為什么L1正則化得出稀疏權(quán)重的向量，而L2正則化得出小齊權(quán)的向量？

3.Dropout

Dropout是一種非常常見的正則化技術(shù)，用來降低過擬合。
在訓(xùn)練過程中，一次的epoch時，會暫時關(guān)閉一些節(jié)點，下一次開啟上一次的關(guān)閉的節(jié)點，然后再關(guān)閉其他的一些節(jié)點，依次類推，在每一次的epoch過程中，都執(zhí)行這個操作，以確保每個節(jié)點都均勻的訓(xùn)練到了。

4.梯度消失

梯度為什么會消失？
因為在梯度下降過程中，會尋找到局部最低點，但是這不是最終的點，于是就卡在了這里。

解決方法，就是使用其他的激活函數(shù)，而不是用sigmoid

tanh（雙曲正切函數(shù)），和S函數(shù)很像，但是它的范圍是在-1到1之間
relu（修正線性單元），如果為正值就返回相同的值，如果是負(fù)值就返回0

但是最后結(jié)果輸出還是用sigmoid

4.1.隨機梯度下降（Stochastic Gradient Descent）

epochs在梯度下降中的意思，就是步長

隨機梯度下降的原理，就是我們拿出一部分?jǐn)?shù)據(jù)，讓他們經(jīng)歷整個神經(jīng)網(wǎng)絡(luò)，根據(jù)這些點計算誤差函數(shù)的梯度，然后沿著該方向移動一個步長。

實際訓(xùn)練的時候，比如：有24個數(shù)據(jù)點需要訓(xùn)練，我們就分epochs為4次，每次batch是6個點，這樣就需要讓神經(jīng)網(wǎng)絡(luò)執(zhí)行24次。

這樣做雖然會讓精確度降低，但是在現(xiàn)實中，采用大量稍微不太準(zhǔn)的步長比采取一個很準(zhǔn)確的步長要好的多。

基本規(guī)則：
- 1.如果學(xué)習(xí)率很大，那就采取很大的步長
- 2.如果學(xué)習(xí)率很小，就采取很穩(wěn)定的步長，更有可能到底底部最低值，訓(xùn)練速度會變慢
- 3.一個好的經(jīng)驗是，如果模型不行，就降低學(xué)習(xí)速率

4.2 動量 momentum

動量是0到1之間的常量beta

除了隨機梯度下降可以解決局部最低點的方案外，還有一個解決局部最低點方案就是用動力和決心快速行動

項目

【共享單車 - 神經(jīng)網(wǎng)絡(luò)】

示例代碼：

class NeuralNetwork(object):def sigmoid(self, x):return 1/(1 + np.exp(-x))def __init__(self, input_nodes, hidden_nodes, output_nodes, learning_rate):# Set number of nodes in input, hidden and output layers.self.input_nodes = input_nodesself.hidden_nodes = hidden_nodesself.output_nodes = output_nodes# Initialize weightsself.weights_input_to_hidden = np.random.normal(0.0, self.hidden_nodes**-0.5, (self.hidden_nodes, self.input_nodes))self.weights_hidden_to_output = np.random.normal(0.0, self.output_nodes**-0.5, (self.output_nodes, self.hidden_nodes))self.lr = learning_rate# Activation function is the sigmoid functionself.activation_function = self.sigmoiddef train(self, inputs_list, targets_list):# Convert inputs list to 2d array, column vectorinputs = np.array(inputs_list, ndmin=2).Ttargets = np.array(targets_list, ndmin=2).T#### Implement the forward pass here ####### Forward pass ####Hidden layerhidden_inputs = np.dot(self.weights_input_to_hidden, inputs)hidden_outputs = self.activation_function(hidden_inputs)#Output layerfinal_inputs = np.dot(self.weights_hidden_to_output, hidden_outputs)final_outputs = final_inputs#### Implement the backward pass here ####### Backward pass #### 1 is the gradient of f'(x) where f(x) = x output_delta = (targets - final_outputs) * 1hidden_delta = np.dot(self.weights_hidden_to_output.T, output_delta) * hidden_outputs * (1-hidden_outputs)# TODO: Update the weightsself.weights_hidden_to_output += self.lr * np.dot(output_delta, hidden_outputs.T)self.weights_input_to_hidden += self.lr * np.dot(hidden_delta, inputs.T)#predict with a inputs_listdef run(self, inputs_list): # Run a forward pass through the networkinputs = np.array(inputs_list, ndmin=2).T#### Implement the forward pass here #####Hidden layerhidden_inputs = np.dot(self.weights_input_to_hidden, inputs)hidden_outputs = self.activation_function(hidden_inputs)#Output layerfinal_inputs = np.dot(self.weights_hidden_to_output, hidden_outputs)final_outputs = final_inputsreturn final_outputs

【項目 - 情感分析 - 從0開始搭建完整的神經(jīng)網(wǎng)絡(luò)】

降低噪點，會提高訓(xùn)練速度，因為降低噪點就是在訓(xùn)練的過程中去除掉一些不影響訓(xùn)練結(jié)果的數(shù)據(jù)，自然就提高了訓(xùn)練速度。

參考：

感知器算法： https://machinelearningmastery.com/implement-perceptron-algorithm-scratch-python/
非線性模型： https://www.youtube.com/embed/Boy3zHVrWB4
https://www.youtube.com/embed/au-Wxkr_skM
神經(jīng)網(wǎng)絡(luò)多層級結(jié)構(gòu)：https://www.youtube.com/embed/pg99FkXYK0M
神經(jīng)網(wǎng)絡(luò)多類別分類：https://www.youtube.com/embed/uNTtvxwfox0
前向反饋：https://www.youtube.com/embed/Ioe3bwMgjAM
反向傳播：
多元微積分：https://www.khanacademy.org/math/multivariable-calculus
反向傳播 Andrej Karpathy：是的，你應(yīng)該了解反向傳播
反向傳播 Andrej Karpathy：斯坦福的 CS231n 課程的一個視頻
過擬合和欠擬合：https://www.youtube.com/embed/SVqEgaT1lXU
動量：https://www.youtube.com/embed/r-rYz_PEWC8

總結(jié)

以上是生活随笔為你收集整理的深度学习之一【神经网络介绍】的全部內(nèi)容，希望文章能夠幫你解決所遇到的問題。

如果覺得生活随笔網(wǎng)站內(nèi)容還不錯，歡迎將生活随笔推薦給好友。

上一篇：地图染色（四色定理）问题
下一篇：算法还是算力？周志华微博引爆深度学习的“