當前位置：首頁 > 编程资源 > 编程问答 >内容正文

编程问答

非线性回归模型(part2)--支持向量机

發布時間：2023/12/19 编程问答 27 豆豆

生活随笔收集整理的這篇文章主要介紹了非线性回归模型(part2)--支持向量机小編覺得挺不錯的,現在分享給大家,幫大家做個參考.

學習筆記，僅供參考，有錯必糾

PS : 本BLOG采用中英混合模式

非線性回歸模型

支持向量機

SVMs are a class of powerful, highly ?exible modeling techniques.

For regression, we follow Smola (1996) and Drucker et al. (1997) and motivate this technique inthe framework of robust regression(穩健回歸) where we seek to minimize the e?ect of outliers(最小化異常值的影響) on the regression equations.

Also, there are several ?avors of support vector regression and we focus on one particular technique called $?\epsilon$ -insensitive regression( $?\epsilon$ 敏感回歸).

Recall that linear regression seeks to ?nd parameter estimates that minimize SSE(最小化SSE).One drawback of minimizing SSE is that the parameter estimates can be in?uenced by just one observation that falls far from the overall trend in the data.

為了緩解這個問題，SVM會讓用戶設定一個臨界值 $?\epsilon$ ，那些殘差位于臨界值內的點將不對回歸模型做出貢獻，而殘差的絕對值超過該臨界值的點將對模型做出線性比例的貢獻。

There are several consequences to this approach.

First, since the squared residuals are not used(殘差的平方將不被使用), large outliers have a limited e?ect on the regression equation.

Second, samples that the model ?ts well (the residuals are small) have no e?ect on the regression equation.

In fact, if the threshold(臨界值) is set to a relatively large value, then the outliers are the only points that de?ne the regression line!

This is somewhat counterintuitive(這看起來有點反直覺): the poorly predicted points de?ne the line(那些預測效果不佳的點定義了回歸直線). However, this approach has been shown to be very e?ective in de?ning the model.

為了估計模型的參數，SVM使用了上圖的 $?\epsilon$ 損失函數(橫軸為殘差,縱軸為貢獻)，同時還增加了懲罰項。SVM的回歸系數將最小化：
$Cost∑i=1nL?(yi?y^i)+∑j=1Pβj2Cost\sum_{i=1}^n L_{\epsilon}(y_i-\hat{y}_i)+ \sum_{j=1}^P \beta_j^2$

其中 $L?(?)L_{\epsilon}(\cdot)$ 為 $?\epsilon$ 不敏感函數，Cost參數是由用戶設置的代價懲罰，它懲罰大的殘差項。

Recall that the simple linear regression model predicted new samples using linear combinations of the data and parameters. For a new sample, $u$ , the prediction equation is:
$y^=β0+β1u1+...+βPuP\hat{y}=\beta_0 + \beta_1u_1 + ...+\beta_P u_P$
The linear support vector machine prediction function is very similar. The parameter estimates can be written as functions of a set ofunknown parameters ( $αi\alpha_i$ ) and the training set data points (一系列未知參數和訓練集數據點的函數)so that:
$y^=β0+β1u1+...+βPuP=β0+∑j=1P∑i=1nαixijuj=β0+∑i=1nαi(∑j=1Pxijuj)\hat{y}=\beta_0 + \beta_1u_1 + ...+\beta_P u_P \\=\beta_0 +\sum_{j=1}^P \sum_{i=1}^n\alpha_i x_{ij} u_j = \beta_0 + \sum_{i=1}^n\alpha_i (\sum_{j=1}^P x_{ij} u_j)$

There are several aspects of this equation worth pointing out.

First, there are as many $α\alpha$ parameters as there are data points.( $α\alpha$ 參數的個數與數據點個數相同).From the standpoint of classical regression modeling, this model would be considered overparameterized(過度參數化);typically, it is better to estimate fewer parameters than data points(參數個數應該小于數據點個數).

However, the use of the cost value e?ectively regularizes the model to help alleviate this problem.(模型使用的代價函數可以有效的對模型進行正則化，從而減輕這一問題)

Second, the individual training set data points (the $x_{ij}$ ) are required for new predictions(訓練集中的每個數據點都被用于預測值的計算). When the training set is large, this makes the prediction equations less compact(不候簡約) than other techniques. However, for some percentage of the training set samples, the $αi\alpha_i$ parameters will be exactly zero, indicating that they have no impact on the prediction equation. The data points associated with an $αi\alpha_i$ parameter of zero are the training set samples that are within ± of the regression line (are within the “funnel” or “tube” around the regression line). As a consequence, only a subset of training set data points, where $α=0\alpha=0$ , are needed for prediction.

Since the regression line is determined using these samples, they are called the support vectors as they
support the regression line.(由于回歸線是由這些觀測決定的，因此他們被稱為支持向量，原因是他們支撐起了最終的回歸線)

新樣本點進入預測函數的形式是它們與已有數據點叉積的和，在矩陣代數中，這對應了一個點積(即 $x^{'} u$ )，這是一個重要的特征，因為這個回歸方程可以改寫為更一般的形式：
$f(u)=β0+∑i=1nαiK(xi,u)f(u)=\beta_0+ \sum_{i=1}^n \alpha_i K(x_i, u)$
其中 $K(?)K(\cdot)$ 被稱為核函數(kernel function)，當預測變量在模型中是線性時，這個核函數就變為了簡單的叉積求和：
$K(xi,u)=∑j=1Pxijuj=xi′uK(x_i, u)=\sum_{j=1}^P x_{ij} u_j=x'_iu$
然而，還有許多其他類型的核函數可以用于擴展回歸模型，并針對預測變量引入非線性函數：
$(\phi(x'u) + 1)^{degree} \\radial \; basis \; function = exp(?\sigma ||x ? u||^2) \\hyperbolic \; tangent = tanh (\phi(x'u) + 1)$

其中 $?\phi$ 和 $σ\sigma$ 是尺度參數，由于這些預測變量的函數將生成非線性的模型，因此這種推廣往往被稱為"核方法"(kernel trick).

Which kernel function(核函數) should be used?

This depends on the problem.

When the regression line is truly linear, the linear kernel function(線性核函數) will be a better choice.

Note that some of the kernel functions have extra parameters. For example, the polynomial degree(多項式的階數) in the polynomial kernel(多項式核函數) must be speci?ed. Similarly, the radial basis function (徑向基函數)has a parameter ( $α\alpha$ ) that controls the scale. These parameters, along with the cost value(代價參數), constitute the tuning parameters(調優參數) for the model.

In the case of the radial basis function, there is a possible computational shortcut to estimating the kernel parameter. Caputo et al. (2002) suggested that the parameter can be estimated using combinations of the training set points(訓練集樣本點的組合) to calculate the distribution of $x-x'||^2$ , then use the 10th and 90th percentiles as a range for $α\alpha$ .

Instead of tuning this parameter over a grid of candidate values, we can use the midpoint of these two percentiles.(我們可以利用這兩個分位點的中點作為參數的估計，而不是在一系列網格點上進行調優)

The cost parameter is the main tool for adjusting the complexity of the model.

When the cost is large, the model becomes very ?exible since the e?ect of errors is ampli?ed(誤差的影響被放大). When the cost is small, the model will sti?en(僵硬) and become less likely to over-?t (but more likely to under?t) because the contribution of the squared parameters is proportionally large in the modi?ed error function. (在修改后的誤差函數中，參數的平方對誤差的貢獻將成比例的增大)

One could also tune the model over the size of the funnel( $?\epsilon$ ).(建模者還可以針對漏斗的大小進行調優)However, there is a relationship between $?\epsilon$ and the cost parameter. In our experience, we have found that the cost parameter provides more ?exibility for tuning the model. So we suggest ?xing a value for $?\epsilon$ and tuning over the other kernel parameters(固定 $?\epsilon$ 的取值，對其他核函數的參數進行調優).

Since the predictors enter into the model as the sum of cross products(叉積的和), di?erences in the predictor scales can a?ect the model(預測變量的標度會影響模型). Therefore, we recommend centering and scaling(中心化標準化) the predictors prior to building an SVM model.

總結

以上是生活随笔為你收集整理的非线性回归模型(part2)--支持向量机的全部內容，希望文章能夠幫你解決所遇到的問題。

如果覺得生活随笔網站內容還不錯，歡迎將生活随笔推薦給好友。

上一篇： TP-Link TL-WDR6300 无
下一篇：非线性回归模型(part3)--K近邻