非线性回归模型(part1)--神经网络
學(xué)習(xí)筆記,僅供參考,有錯必糾
PS : 本BLOG采用中英混合模式,有些英文下有中文翻譯(并不是博主翻譯的)
非線性回歸模型
神經(jīng)網(wǎng)絡(luò)
Neural networks (Bishop 1995; Ripley 1996; Titterington 2010) are powerful nonlinear regression techniques inspired by theories about how the brain works.
The outcome is modeled by an intermediary set of unobserved variables (called hidden variables or hidden units here).
- 翻譯
結(jié)果變量利用一系列中間層的非觀測變量(在此稱為隱藏變量或隱藏元)進(jìn)行建模。
These hidden units are linear combinations of the original predictors, but, they are not estimated in a hierarchical fashion(層級的方式).
As previously stated, each hidden unit is a linear combination of some or all of the predictor variables. However, this linear combination is typically transformed by a nonlinear function g(?)g(\cdot)g(?),such as the logistic function:
hk(x)=g(β0k+∑j=1Pxjβjk)g(u)=11+e?uh_k(x)=g\left( \beta_{0k}+ \sum_{j=1}^P x_j \beta_{jk} \right) \\g(u) = \frac{1}{1+e^{-u}} hk?(x)=g(β0k?+j=1∑P?xj?βjk?)g(u)=1+e?u1?
The β\betaβ coe?cients are similar to regression coe?cients; coe?cient βjk\beta_{jk}βjk? is the
e?ect of the jthj thjth predictor on the kthk thkth hidden unit. A neural network model usually involves multiple hidden units to model the outcome.
There are no constraints that help de?ne these linear combinations. Because of this, there is little likelihood that the coe?cients in each unit represent some coherent piece of information.
- 翻譯
在這里討線性組合的形式?jīng)]有任何約束。由于這一點(diǎn),每個隱藏元上的系數(shù)可能不會反映出一致的信息。
Once the number of hidden units is de?ned, each unit must be related to the outcome. Another linear combination connects the hidden units to the outcome:
f(x)=γ0+∑k=1Hγkhkf(x)=\gamma_0 + \sum_{k=1}^H \gamma_k h_k f(x)=γ0?+k=1∑H?γk?hk?
For this type of network model and P predictors, there are a total of H(P+1)+H+1H (P +1) + H + 1H(P+1)+H+1 total parameters being estimated, which quickly becomes large as P increases.
Treating this model as a nonlinear regression model, the parameters are usually optimized to minimize the sum of the squared residuals.
- 翻譯
如果把這一模型作為一個非線性回歸來看待,那么參數(shù)將要最小化殘差平方和。
This can be a challenging numerical optimization problem (recall that there are no constraints on the parameters of this complex nonlinear model).
The parameters are usually initialized to random values and then specialized algorithms for solving the equations are used. The back-propagation algorithm (逆向傳播算法) is a highly e?cient methodology that works with derivatives to ?nd the optimal parameters. However, it is common that a solution to this equation is not a global solution, meaning that we cannot guarantee that the resulting set of parameters are uniformly better than any other set.
Also, neural networks have a tendency to over-?t the relationship between the predictors and the response due to the large number of regression coe?cients.
- 翻譯
此外,神經(jīng)網(wǎng)絡(luò)傾向于過度擬合預(yù)測變量與響應(yīng)變量之間的關(guān)系,原因是待估參數(shù)過多。
To combat this issue, several di?erent approaches have been proposed.
First, the iterative algorithms for solving for the regression equations can be prematurely halted(求解回歸方程的迭代算法可以提前被中斷) . This approach is referred to as early stopping (提前停止)and would stop the optimization procedure when some estimate of the error rate starts to increase.
Another approach to moderating over-?tting is to use weight decay(權(quán)重衰減), a penalization method to regularize the model(正則化模型) similar to ridge regression(嶺回歸).
The structure of the model described here is the simplest neural network architecture: a single-layer feed-forward network(單層前饋神經(jīng)網(wǎng)絡(luò)). There are many other kinds, such as models where there are more than one layer of hidden units (i.e., there is a layer of hidden units that models the other hidden units). Also, other model architectures have loops going both directions between layers.
Given the challenge of estimating a large number of parameters, the ?tted model ?nds parameter estimates that are locally optimal(局部最優(yōu)); that is, the algorithm converges(算法收斂), but the resulting parameter estimates are unlikely to be the globally optimal estimates.
Very often, di?erent locally optimal solutions can produce models that are very di?erent but have nearly equivalent performance.
This model instability can sometimes hinder this model(這種模型的不穩(wěn)定性往往會制約神經(jīng)網(wǎng)絡(luò)的使用).
As an alternative, several models can be created using di?erent starting values and averaging the results of these model to produce a more stable prediction
These models are often adversely a?ected by high correlation among the predictor variables.
Two approaches for mitigating this issue is to pre-?lter the predictors to remove the predictorsthat are associated with high correlations (移除高相關(guān)性變量). Alternatively a feature extraction technique(特征提取技術(shù)), such as principal component analysis(PCA), can be used prior to modeling to eliminate correlations(減緩相關(guān)性).
創(chuàng)作挑戰(zhàn)賽新人創(chuàng)作獎勵來咯,堅(jiān)持創(chuàng)作打卡瓜分現(xiàn)金大獎總結(jié)
以上是生活随笔為你收集整理的非线性回归模型(part1)--神经网络的全部內(nèi)容,希望文章能夠幫你解決所遇到的問題。
- 上一篇: 过拟合与模型调优(part3)--数据划
- 下一篇: 非线性回归模型(part2)--支持向量