當前位置：首頁 > 编程资源 > 编程问答 >内容正文

编程问答

U25%(1,16) and U25%(1,168)on《C4.5:programs for machine learning》

發(fā)布時間：2023/12/20 编程问答 35 豆豆

生活随笔收集整理的這篇文章主要介紹了 U25%(1,16) and U25%(1,168)on《C4.5:programs for machine learning》小編覺得挺不錯的,現(xiàn)在分享給大家,幫大家做個參考.

when calculating
$U_{CF}$ (e,N)
CF: Confidence Level(here is 25%)
e:misclassifying counts of current subtree we focus on
N:counts of sub-datasets relevant to current subtree who is under judgment whether to be pruned or not.

“Pr” is short for “Pessimistic error rate”.

$U_{CF}$ (e,N)=Pr= $e+0.5+Coeff22+Coeff2?{(e+0.5）[1?e+0.5N]+Coeff24}N+Coeff2①\frac{e+0.5+ \frac{Coeff^2}{2}+ \sqrt{ Coeff^2·\{ (e+0.5）[1- \frac{e+0.5}{N} ]+ \frac{Coeff^2}{4} \} } } {N+Coeff^2}①$

The method to get $Coeff\ Coeff$ is as follows:

$Coeff?Deviation[i?1]Deviation[i]?Deviation[i?1]=ConfidenceLevel?val[i?1]val[i]?val[i?1]②\frac{Coeff-Deviation[i-1]}{Deviation[i]-Deviation[i-1]}=\frac{Confidence \ Level-val[i-1]}{val[i]-val[i-1]}②$

Now Let’s analyse the meaning of ② and how to get “i” in formula ②.

What is Deviation and Val?
Val[] = { 0, 0.001, 0.005, 0.01, 0.05, 0.10, 0.20, 0.40, 1.00},
Deviation[] = {4.0, 3.09, 2.58, 2.33, 1.65, 1.28, 0.84, 0.25, 0.00};

for Normal Distribution：
P{X>Deviation[i]}=val[i],
for example:
P{X>2.58}=0.005
--------------------

Now let’s analyse the whole meaning of formula ②

the red line is $P(X>x)=∫x+∞12πe?x22dxP(X>x)=\int_x^{+∞}\frac{1}{\sqrt{2\pi}}e^{-\frac{x^2}{2}}dx$

the blue line is $f(x)=12πe?x22f(x)=\frac{1}{\sqrt{2\pi}}e^{-\frac{x^2}{2}}$
and the two points are:
(deviation[i],val[i])=(0.25,0.4)
(deviation[i-1,val[i-1])=(0.84,0.2)

Let’s amplify the above figure to the following picture:

so tan C= $Val[i]?Val[i?1]Deviation[i?1]?Deviation[i]=ConfidenceLevel?Val[i?1]Deviation[i?1]?Coeff\frac{Val[i]-Val[i-1]}{Deviation[i-1]-Deviation[i]}=\frac{Confidence\ Level-Val[i-1]}{Deviation[i-1]-Coeff}$

then we can get ②and compute $C o e f f$ and $Coeff^2$

but how to get “i” in above formula?
when $\ level≤Val[i]$
is satisfied,
we get “i”.
when Confidence Level=0.25
i=7
val[i-1]=0.2
val[i]=0.4
deviation[i-1]=0.84
deviation[i]=0.25
substitute the above values in ②
we get Coeff=0.6925
then,
$Coeff^2=0.479556$
--------------------

**Now Let’s calculate **
$U_{0.25}(1,16)$
CF=0.25
e=1
N=16
$Coeff^2=0.479556$
We’ll get :
Pr=

$1+0.5+0.4795562+0.479556?{(1+0.5）[1?1+0.516]+0.4795564}16+0.479556\frac{1+0.5+ \frac{0.479556}{2}+ \sqrt{ 0.479556·\{ (1+0.5）[1- \frac{1+0.5}{16} ]+ \frac{0.479556}{4} \} } } {16+0.479556}$ =0.156681

N·Pr=16×0.156681=2.507
Compared with Quinlan’s book page 42th:

Now Let’s calculate
$U_{0.25}(1,168)$
CF=0.25
e=1
N=168

Pr= $1+0.5+0.4795562+0.479556?{(1+0.5）[1?1+0.5168]+0.4795564}168+0.479556=0.015536\frac{1+0.5+ \frac{0.479556}{2}+ \sqrt{ 0.479556·\{ (1+0.5）[1- \frac{1+0.5}{168} ]+ \frac{0.479556}{4} \} } } {168+0.479556}=0.015536$

$N·U_{0.25}(1,168)=N·Pr=168·0.015536=2.61$
Compared with Quinlan’s Book on Page 42th:

The code to veriy the above value:

/*************************************************************************/ /* */ /* Statistical routines for C4.5 */ /* ----------------------------- */ /* */ /*************************************************************************/ #include <stdio.h> #include <math.h>/*************************************************************************/ /* */ /* Compute the additional errors if the error rate increases to the */ /* upper limit of the confidence level. The coefficient is the */ /* square of the number of standard deviations corresponding to the */ /* selected confidence level. (Taken from Documenta Geigy Scientific */ /* Tables (Sixth Edition), p185 (with modifications).) */ /* */ /*************************************************************************/ float CF=0.25;float Val[] = { 0, 0.001, 0.005, 0.01, 0.05, 0.10, 0.20, 0.40, 1.00},//概率Dev[] = {4.0, 3.09, 2.58, 2.33, 1.65, 1.28, 0.84, 0.25, 0.00};//這個就是分位點// P{X>Dev[i]}=val[i],// according to standardized normal distribution N(0,1)// for example:// P{X>2.58}=0.005,float AddErrs(float N, float e)//葉子的悲觀錯誤 {static float Coeff=0;float Val0, Pr;if ( ! Coeff )//compute only once when pruning trees{/* Compute and retain the coefficient value, interpolating fromthe values in Val and Dev */int i;i = 0;while ( CF > Val[i] ) i++;//這里可以看到，是根據(jù)置信度對悲觀錯誤數(shù)量進行計算的。//但是呢，也沒有像理論中直接計算“悲觀錯誤率”Coeff = Dev[i-1] +(Dev[i] - Dev[i-1]) * (CF - Val[i-1]) /(Val[i] - Val[i-1]);Coeff = Coeff * Coeff;printf("Coeff=%f\n",Coeff);//CF定下來以后，Coeff就是定的}if ( e < 1E-6 ){return N * (1 - exp(log(CF) / N));}elseif ( e < 0.9999 ){Val0 = N * (1 - exp(log(CF) / N));return Val0 + e * (AddErrs(N, 1.0) - Val0);//這里在進行遞歸調(diào)用}elseif ( e + 0.5 >= N ){return 0.67 * (N - e);}else{float fenzi;fenzi=(e + 0.5 + Coeff/2+ sqrt( Coeff * ((e + 0.5) * (1 - (e + 0.5)/N) + Coeff/4)) );printf("fenzi=%f\n",fenzi);Pr = (e + 0.5 + Coeff/2+ sqrt( Coeff * ((e + 0.5) * (1 - (e + 0.5)/N) + Coeff/4)) )/ (N + Coeff);printf("Pr=%f\n",Pr);return (N * Pr - e);} }int main() {float result=AddErrs(16, 1); printf("result=%f\n",result);exit(0);}

running method:

gcc -o stats stats.c -lm
./stats

the result is:
Coeff=0.479556
fenzi=2.582031
Pr=0.156681
result=1.506894

總結(jié)

以上是生活随笔為你收集整理的U25%(1,16) and U25%(1,168)on《C4.5:programs for machine learning》的全部內(nèi)容，希望文章能夠幫你解決所遇到的問題。

如果覺得生活随笔網(wǎng)站內(nèi)容還不錯，歡迎將生活随笔推薦給好友。

上一篇： Earliest PEP Algorit
下一篇： Error Based Pruning剪