日韩性视频-久久久蜜桃-www中文字幕-在线中文字幕av-亚洲欧美一区二区三区四区-撸久久-香蕉视频一区-久久无码精品丰满人妻-国产高潮av-激情福利社-日韩av网址大全-国产精品久久999-日本五十路在线-性欧美在线-久久99精品波多结衣一区-男女午夜免费视频-黑人极品ⅴideos精品欧美棵-人人妻人人澡人人爽精品欧美一区-日韩一区在线看-欧美a级在线免费观看

歡迎訪問 生活随笔!

生活随笔

當前位置: 首頁 > 编程资源 > 编程问答 >内容正文

编程问答

【StatLearn】统计学习中knn算法实验(2)

發布時間:2025/7/14 编程问答 36 豆豆
生活随笔 收集整理的這篇文章主要介紹了 【StatLearn】统计学习中knn算法实验(2) 小編覺得挺不錯的,現在分享給大家,幫大家做個參考.

接著統計學習中knn算法實驗(1)的內容

Problem:

?

  • Explore the data before classification using summary statistics or?visualization
  • Pre-process the data (such as denoising, normalization, feature selection, …)?
  • Try other distance metrics or distance-based voting
  • Try other dimensionality reduction methods
  • How to set the k value, if not using cross validation? Verify your idea
  • 問題:
  • 在對數據分類之前使用對數據進行可視化處理
  • 預處理數據(去噪,歸一化,數據選擇)
  • 在knn算法中使用不同的距離計算方法
  • 使用其他的降維算法
  • 如何在不使用交叉驗證的情況下設置k值
  • 使用Parallel coordinates plot做數據可視化,首先對數據進行歸一化處理,數據的動態范圍控制在[0,1]。注意歸一化的處理針對的是每一個fearture。




    通過對圖的仔細觀察,我們挑選出重疊度比較低的feature來進行fearture selection,feature selection實際上是對數據挑選出更易區分的類型作為下一步分類算法的數據。我們挑選出feature序號為(1)、(2)、(5)、(6)、(7)、(10)的feature。個人認為,feature selection是一種簡單而粗暴的降維和去噪的操作,但是可能效果會很好。?

    根據上一步的操作,從Parallel coordinates上可以看出,序號為(1)、(2)、(5)、(6)、(7)、(10)這幾個feature比較適合作為classify的feature。我們選取以上幾個feature作knn,得到的結果如下:

    ?

    當K=1 的時候,Accuracy達到了85.38%,并且相比于簡單的使用knn或者PCA+knn的方式,Normalization、Featrure Selection的方法使得準確率大大提升。我們也可以使用不同的feature搭配,通過實驗得到更好的結果。


    MaxAccuracy= 0.8834 when k=17 (Normalization+FeartureSelection+KNN)

    ?

    試驗中,我們使用了兩種不同的Feature Selection 策略,選用較少fearture的策略對分類的準確率還是有影響的,對于那些從平行坐標看出的不那么好的fearture,對分類還是有一定的幫助的。 在較小的k值下,Feature Selection的結果要比直接采用全部Feature的結果要好。這也體現了在相對純凈的數據下,較小的k值能夠獲得較好的結果,這和直觀感覺出來的一致。 我們再嘗試對數據進行進一步的預處理操作,比如denoising。 數據去噪的方法利用對Trainning數據進行一個去處最大最小邊緣值的操作,我們認為,對于一個合適的feature,它的數據應該處于一個合理的范圍中,過大或者過小的數據都將是異常的。

    Denoising的代碼如下:

    ?

    function[DNData]=DataDenoising(InputData,KillRange) DNData=InputData; %MedianData=median(DNData); for i=2:size(InputData,2)[temp,DNIndex]=sort(DNData(:,i));DNData=DNData(DNIndex(1+KillRange:end-KillRange),:); end

    ?



    ?

    采用LLE作為降維的手段,通過和以上的幾種方案作對比,如下:


    ?

    ?

    MaxAccuracy= 0.9376 when K=23 (LLE dimensionality reduction to 2)

    關于LLE算法,參見這篇論文

    ?

    ?

    • Nonlinear dimensionality reduction by locally linear embedding.Sam Roweis & Lawrence Saul.Science, v.290 no.5500 , Dec.22, 2000. pp.2323--2326.
    以及項目主頁: http://www.cs.nyu.edu/~roweis/lle/

    ?


    源代碼:

    StatLearnProj.m

    ?

    clear; data=load('wine.data.txt'); %calc 5-folder knn Accuracy=[]; for i=1:5Test=data(i:5:end,:);TestData=Test(:,2:end);TestLabel=Test(:,1);Trainning=setdiff(data,Test,'rows');TrainningData=Trainning(:,2:end);TrainningLabel=Trainning(:,1);Accuracy=cat(1,Accuracy,CalcAccuracy(TestData,TestLabel,TrainningData,TrainningLabel)); end AccuracyKNN=mean(Accuracy,1);%calc PCA Accuracy=[]; %PCA [Coeff,Score,Latent]=princomp(data(:,2:end)); dataPCA=[data(:,1),Score(:,1:6)]; Latent for i=1:5Test=dataPCA(i:5:end,:);TestData=Test(:,2:end);TestLabel=Test(:,1);Trainning=setdiff(dataPCA,Test,'rows');TrainningData=Trainning(:,2:end);TrainningLabel=Trainning(:,1);Accuracy=cat(1,Accuracy,CalcAccuracy(TestData,TestLabel,TrainningData,TrainningLabel)); end AccuracyPCA=mean(Accuracy,1); BarData=[AccuracyKNN;AccuracyPCA]; bar(1:2:51,BarData');[D,I]=sort(AccuracyKNN,'descend'); D(1) I(1) [D,I]=sort(AccuracyPCA,'descend'); D(1) I(1)%pre-processing data %Normalization labs1={'1)Alcohol','(2)Malic acid','3)Ash','4)Alcalinity of ash'}; labs2={'5)Magnesium','6)Total phenols','7)Flavanoids','8)Nonflavanoid phenols'}; labs3={'9)Proanthocyanins','10)Color intensity','11)Hue','12)OD280/OD315','13)Proline'}; uniData=[]; for i=2:size(data,2)uniData=cat(2,uniData,(data(:,i)-min(data(:,i)))/(max(data(:,i))-min(data(:,i)))); end figure(); parallelcoords(uniData(:,1:4),'group',data(:,1),'labels',labs1); figure(); parallelcoords(uniData(:,5:8),'group',data(:,1),'labels',labs2); figure(); parallelcoords(uniData(:,9:13),'group',data(:,1),'labels',labs3);%denoising%Normalization && Feature Selection uniData=[data(:,1),uniData]; %Normalization all featurefor i=1:5Test=uniData(i:5:end,:);TestData=Test(:,2:end);TestLabel=Test(:,1);Trainning=setdiff(uniData,Test,'rows');TrainningData=Trainning(:,2:end);TrainningLabel=Trainning(:,1);Accuracy=cat(1,Accuracy,CalcAccuracy(TestData,TestLabel,TrainningData,TrainningLabel)); end AccuracyNorm=mean(Accuracy,1);%KNN PCA Normalization BarData=[AccuracyKNN;AccuracyPCA;AccuracyNorm]; bar(1:2:51,BarData');%Normalization& FS 1 2 5 6 7 10 we select 1 2 5 6 7 10 feature FSData=uniData(:,[1 2 3 6 7 8 11]); size(FSData) for i=1:5Test=FSData(i:5:end,:);Trainning=setdiff(FSData,Test,'rows');TestData=Test(:,2:end);TestLabel=Test(:,1);TrainningData=Trainning(:,2:end);TrainningLabel=Trainning(:,1);Accuracy=cat(1,Accuracy,CalcAccuracy(TestData,TestLabel,TrainningData,TrainningLabel)); end AccuracyNormFS1=mean(Accuracy,1);%Normalization& FS 1 6 7 FSData=uniData(:,[1 2 7 8]); for i=1:5Test=FSData(i:5:end,:);Trainning=setdiff(FSData,Test,'rows');TestData=Test(:,2:end);TestLabel=Test(:,1); TrainningData=Trainning(:,2:end);TrainningLabel=Trainning(:,1);Accuracy=cat(1,Accuracy,CalcAccuracy(TestData,TestLabel,TrainningData,TrainningLabel)); end AccuracyNormFS2=mean(Accuracy,1); figure(); BarData=[AccuracyNorm;AccuracyNormFS1;AccuracyNormFS2]; bar(1:2:51,BarData');[D,I]=sort(AccuracyNorm,'descend'); D(1) I(1) [D,I]=sort(AccuracyNormFS1,'descend'); D(1) I(1) [D,I]=sort(AccuracyNormFS2,'descend'); D(1) I(1) %denoiding %Normalization& FS 1 6 7 FSData=uniData(:,[1 2 7 8]); for i=1:5Test=FSData(i:5:end,:);Trainning=setdiff(FSData,Test,'rows');Trainning=DataDenoising(Trainning,2);TestData=Test(:,2:end);TestLabel=Test(:,1); TrainningData=Trainning(:,2:end);TrainningLabel=Trainning(:,1);Accuracy=cat(1,Accuracy,CalcAccuracy(TestData,TestLabel,TrainningData,TrainningLabel)); end AccuracyNormFSDN=mean(Accuracy,1); figure(); hold on plot(1:2:51,AccuracyNormFSDN); plot(1:2:51,AccuracyNormFS2,'r');%other distance metricsDist='cityblock'; for i=1:5Test=uniData(i:5:end,:);TestData=Test(:,2:end);TestLabel=Test(:,1);Trainning=setdiff(uniData,Test,'rows');TrainningData=Trainning(:,2:end);TrainningLabel=Trainning(:,1);Accuracy=cat(1,Accuracy,CalcAccuracyPlus(TestData,TestLabel,TrainningData,TrainningLabel,Dist)); end AccuracyNormCity=mean(Accuracy,1);BarData=[AccuracyNorm;AccuracyNormCity]; figure(); bar(1:2:51,BarData');[D,I]=sort(AccuracyNormCity,'descend'); D(1) I(1)%denoising FSData=uniData(:,[1 2 7 8]); Dist='cityblock'; for i=1:5Test=FSData(i:5:end,:);TestData=Test(:,2:end);TestLabel=Test(:,1);Trainning=setdiff(FSData,Test,'rows');Trainning=DataDenoising(Trainning,3);TrainningData=Trainning(:,2:end);TrainningLabel=Trainning(:,1);Accuracy=cat(1,Accuracy,CalcAccuracyPlus(TestData,TestLabel,TrainningData,TrainningLabel,Dist)); end AccuracyNormCityDN=mean(Accuracy,1); figure(); hold on plot(1:2:51,AccuracyNormCityDN); plot(1:2:51,AccuracyNormCity,'r');%call lledata=load('wine.data.txt'); uniData=[]; for i=2:size(data,2)uniData=cat(2,uniData,(data(:,i)-min(data(:,i)))/(max(data(:,i))-min(data(:,i)))); end uniData=[data(:,1),uniData]; LLEData=lle(uniData(:,2:end)',5,2); %size(LLEData) LLEData=LLEData'; LLEData=[data(:,1),LLEData];Accuracy=[]; for i=1:5Test=LLEData(i:5:end,:);TestData=Test(:,2:end);TestLabel=Test(:,1);Trainning=setdiff(LLEData,Test,'rows');Trainning=DataDenoising(Trainning,2);TrainningData=Trainning(:,2:end);TrainningLabel=Trainning(:,1);Accuracy=cat(1,Accuracy,CalcAccuracyPlus(TestData,TestLabel,TrainningData,TrainningLabel,'cityblock')); end AccuracyLLE=mean(Accuracy,1); [D,I]=sort(AccuracyLLE,'descend'); D(1) I(1)BarData=[AccuracyNorm;AccuracyNormFS2;AccuracyNormFSDN;AccuracyLLE]; figure(); bar(1:2:51,BarData');save('ProcessingData.mat');

    CalcAccuracy.m

    ?

    ?

    function Accuracy=CalcAccuracy(TestData,TestLabel,TrainningData,TrainningLabel) %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% %calculate the accuracy of classify %TestData:M*D matrix D stand for dimension,M is sample %TrainningData:T*D matrix %TestLabel:Label of TestData %TrainningLabel:Label of Trainning Data %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% CompareResult=[]; for k=1:2:51ClassResult=knnclassify(TestData,TrainningData,TrainningLabel,k);CompareResult=cat(2,CompareResult,(ClassResult==TestLabel)); end SumCompareResult=sum(CompareResult,1); Accuracy=SumCompareResult/length(CompareResult(:,1));

    CalcAccuracyPlus.m

    function Accuracy=CalcAccuracyPlus(TestData,TestLabel,TrainningData,TrainningLabel,Dist) %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% %just as CalcAccuracy,but add distance metrics %calculate the accuracy of classify %TestData:M*D matrix D stand for dimension,M is sample %TrainningData:T*D matrix %TestLabel:Label of TestData %TrainningLabel:Label of Trainning Data %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% CompareResult=[]; for k=1:2:51ClassResult=knnclassify(TestData,TrainningData,TrainningLabel,k,Dist);CompareResult=cat(2,CompareResult,(ClassResult==TestLabel)); end SumCompareResult=sum(CompareResult,1); Accuracy=SumCompareResult/length(CompareResult(:,1));


    ?



    轉載于:https://www.cnblogs.com/pangblog/p/3402651.html

    總結

    以上是生活随笔為你收集整理的【StatLearn】统计学习中knn算法实验(2)的全部內容,希望文章能夠幫你解決所遇到的問題。

    如果覺得生活随笔網站內容還不錯,歡迎將生活随笔推薦給好友。