生活随笔
收集整理的這篇文章主要介紹了
聚类算法KMeans和KMedoid 的Matlab实现
小編覺得挺不錯的,現在分享給大家,幫大家做個參考.
KMeans和KMedoid算法是聚類算法中比較普遍的方法,本文講了其原理和matlab中實現的代碼。
1.目標:
? ? ? ?找出一個分割,使得距離平方和最小
2.K-Means算法:
? ? ? ?1. 將數據分為k個非空子集
? ? ? ?2. 計算每個類中心點(k-means中用所有點的平均值,K-medoid用離該平均值最近的一個點)center
? ? ? ?3. 將每個object聚類到最近的center
? ? ? ?4. 返回2,當聚類結果不再變化的時候stop
? ?復雜度:
? ? ? ?O(kndt)
? ? ? ?-計算兩點間距離:d
? ? ? ?-指定類:O(kn) ? ,k是類數
? ? ? ?-迭代次數上限:t
3.K-Medoids算法:
? ? ? ?1. 隨機選擇k個點作為初始medoid
? ? ? ?2.將每個object聚類到最近的medoid
? ? ? ?3. 更新每個類的medoid,計算objective function?
? ? ? ?4. 選擇最佳參數
? ? ? ?4. 返回2,當各類medoid不再變化的時候stop
? ? 復雜度:
? ? ? ?O((n^2)d)
? ? ? ?-計算各點間兩兩距離O((n^2)d)
? ? ? ?-指定類:O(kn) ? ,k是類數
4.特點:
? ? ? ?-聚類結果與初始點有關(因為是做steepest descent from a random initial starting oint)
? ? ? ?-是局部最優解
? ? ? ?-在實際做的時候,隨機選擇多組初始點,最后選擇擁有最低TSD(Total Squared Distance)的那組
Kmeans KMedoid Implementation with matlab:
===================
下面是我用matlab上的實現:
說明:fea為訓練樣本數據,gnd為樣本標號。算法中的思想和上面寫的一模一樣,在最后的判斷accuracy方面,由于聚類和分類不同,只是得到一些 cluster ,而并不知道這些 cluster 應該被打上什么標簽,或者說。由于我們的目的是衡量聚類算法的 performance ,因此直接假定這一步能實現最優的對應關系,將每個 cluster 對應到一類上去。一種辦法是枚舉所有可能的情況并選出最優解,另外,對于這樣的問題,我們還可以用?Hungarian algorithm?來求解。具體的Hungarian代碼我放在了資源里,調用方法已經寫在下面函數中了。下面給出Kmeans&Kmedoid主函數。
Kmeans.m 函數:
[cpp]?view plain
?copy ? function?[?accuracy,MIhat?]?=?KMeans(?K,mode?)?? ?? %?Artificial?Intelligence?&?Data?Mining?-?KMeans?&?K-Medoids?Clustering?? %?Author:?Rachel?Zhang?@?ZJU?? %?CreateTime:?2012-11-18?? %?Function:?Clustering?? %??-K:?number?of?clusters?? %??-mode:??? %???1:?use?kmeans?cluster?algorithm?in?matlab?? %???2:?k_medroid?algorithm:?use?data?points?as?k?centers?? %???3:?k_means?algorithm:?use?average?as?k?centers?? ?? global?N_features;?? global?N_samples;?? global?fea;?? global?gnd;?? ?? switch?(mode)?? ????case?1?%call?system?function?KMeans?? ????????label?=?kmeans(fea,K);?? ????????[label,accuracy]?=?cal_accuracy(gnd,label);?? ?????????? ????case?2%use?kmedroid?method?? ????????for?testcase?=?1:10%?do?10?times?to?get?rid?of?the?influence?from?Initial_center?? ????????????K_center?=?Initial_center(fea,K);?%select?initial?points?randomly?? ????????????changed_label?=?N_samples;?? ????????????label?=?zeros(1,N_samples);?? ????????????iteration_times?=?0;?? ????????????while?changed_label~=0?? ????????????????cls_label?=?cell(1,K);?? ????????????????for?i?=?1:?N_samples?? ????????????????????for?j?=?1?:?K?? ????????????????????????D(j)?=?dis(fea(i,:),K_center(j,:));?? ????????????????????end?? ????????????????????[~,label(i)]?=?min(D);?? ????????????????????cls_label{label(i)}?=?[cls_label{label(i)}?i];?? ????????????????end?? ????????????????changed_label?=?0;?? ????????????????cls_center?=?zeros(K,N_features);?? ????????????????for?i?=?1?:?K?? ????????????????????cls_center(i,:)?=?mean(fea(cls_label{i},:));?? ????????????????????D1?=?[];?? ????????????????????for?j?=?1:size(cls_label{i},2)%number?of?samples?clsutered?in?i-th?class?? ????????????????????????D1(j)?=?dis(cls_center(i,:),fea(cls_label{i}(j),:));?? ????????????????????end?? ????????????????????[~,min_ind]?=?min(D1);?? ????????????????????if?~isequal(K_center(i,:),fea(cls_label{i}(min_ind),:))?? ????????????????????????K_center(i,:)?=?fea(cls_label{i}(min_ind),:);?? ????????????????????????changed_label?=?changed_label+1;?? ????????????????????end?? ????????????????end?? ????????????????iteration_times?=?iteration_times+1;?? ????????????end?? ????????????[label,acc(testcase)]?=?cal_accuracy(gnd,label);?? ????????end?? ????????accuracy?=?max(acc);?? ?????????? ????case?3%use?k-means?method?? ????????for?testcase?=?1:10%?do?10?times?to?get?rid?of?the?influence?from?Initial_center?? ????????????K_center?=?Initial_center(fea,K);?%select?initial?points?randomly?? ????????????changed_label?=?N_samples;?? ????????????label?=?zeros(1,N_samples);?? ????????????label_new?=?zeros(1,N_samples);?? ????????????while?changed_label~=0?? ????????????????cls_label?=?cell(1,K);?? ????????????????changed_label?=?0;?? ????????????????for?i?=?1:?N_samples?? ????????????????????for?j?=?1?:?K?? ????????????????????????D(j)?=?dis(fea(i,:),K_center(j,:));?? ????????????????????end?? ????????????????????[~,label_new(i)]?=?min(D);?? ????????????????????if(label_new(i)~=label(i))?? ????????????????????????changed_label?=?changed_label+1;?? ????????????????????end;?? ????????????????????cls_label{label_new(i)}?=?[cls_label{label_new(i)}?i];?? ????????????????end?? ????????????????label?=?label_new;?? ?????????????????? ????????????????for?i?=?1?:?K??%recalculate?k?centroid?? ????????????????????K_center(i,:)?=?mean(fea(cls_label{i},:));?? ????????????????end?? ????????????end?? ?????????????[label,acc(testcase)]?=?cal_accuracy(gnd,label);?? ????????end?? ????????accuracy?=?max(acc);?? end?? ?? MIhat?=?MutualInfo(gnd,label);?? ?? ?? ????function?center?=?Initial_center(X,K)?? ????????rnd_Idx?=?randperm(N_samples,K);?? ????????center?=?X(rnd_Idx,:);?? ????end?? ?? ????function?res?=?dis(X1,X2)?? ????????res?=?norm(X1-X2);?? ????end?? ?? ????function?[res,acc]?=?cal_accuracy(gnd,estimate_label)?? ????????res?=?bestMap(gnd,estimate_label);?? ????????acc?=?length(find(gnd?==?res))/length(gnd);?? ????end?? end??
實驗結果分析:
對上面得到的accuracy進行畫圖,橫坐標為10個數據集,縱坐標為在其上進行聚類的準確率。
其中,auto為matlab內部kmeans函數。
畫圖:
[cpp]?view plain
?copy ? function?[??]?=?Plot(?A,B,C?)?? %PLOT?Summary?of?this?function?goes?here?? %???Detailed?explanation?goes?here?? figure;?? k?=?1:10;?? plot(k,A,'-r',k,B,'-b',k,C,'-g');?? legend('auto','medoid','means');?? ?? ?? end??
結果:
5類聚類:
7類聚類:
from:?http://blog.csdn.net/abcjennifer/article/details/8197072
總結
以上是生活随笔為你收集整理的聚类算法KMeans和KMedoid 的Matlab实现的全部內容,希望文章能夠幫你解決所遇到的問題。
如果覺得生活随笔網站內容還不錯,歡迎將生活随笔推薦給好友。