當前位置：首頁 > 编程资源 > 编程问答 >内容正文

编程问答

聚类算法KMeans和KMedoid 的Matlab实现

發布時間：2025/3/21 编程问答 26 豆豆

生活随笔收集整理的這篇文章主要介紹了聚类算法KMeans和KMedoid 的Matlab实现小編覺得挺不錯的,現在分享給大家,幫大家做個參考.

KMeans和KMedoid算法是聚類算法中比較普遍的方法，本文講了其原理和matlab中實現的代碼。

1.目標：

? ? ? ?找出一個分割，使得距離平方和最小

2.K-Means算法：

? ? ? ?1. 將數據分為k個非空子集

? ? ? ?2. 計算每個類中心點（k-means中用所有點的平均值，K-medoid用離該平均值最近的一個點）center

? ? ? ?3. 將每個object聚類到最近的center

? ? ? ?4. 返回2，當聚類結果不再變化的時候stop

? ?復雜度：

? ? ? ?O（kndt）

? ? ? ?-計算兩點間距離：d

? ? ? ?-指定類：O(kn) ? ,k是類數

? ? ? ?-迭代次數上限：t

3.K-Medoids算法:

? ? ? ?1. 隨機選擇k個點作為初始medoid

? ? ? ?2.將每個object聚類到最近的medoid

? ? ? ?3. 更新每個類的medoid，計算objective function?

? ? ? ?4. 選擇最佳參數

? ? ? ?4. 返回2，當各類medoid不再變化的時候stop

? ? 復雜度：

? ? ? ?O（(n^2)d）

? ? ? ?-計算各點間兩兩距離O（(n^2)d）

? ? ? ?-指定類：O(kn) ? ,k是類數

4.特點：

? ? ? ?-聚類結果與初始點有關（因為是做steepest descent from a random initial starting oint）

? ? ? ?-是局部最優解

? ? ? ?-在實際做的時候，隨機選擇多組初始點，最后選擇擁有最低TSD（Total Squared Distance）的那組

Kmeans KMedoid Implementation with matlab:

===================

下面是我用matlab上的實現：

說明：fea為訓練樣本數據，gnd為樣本標號。算法中的思想和上面寫的一模一樣，在最后的判斷accuracy方面，由于聚類和分類不同，只是得到一些 cluster ，而并不知道這些 cluster 應該被打上什么標簽，或者說。由于我們的目的是衡量聚類算法的 performance ，因此直接假定這一步能實現最優的對應關系，將每個 cluster 對應到一類上去。一種辦法是枚舉所有可能的情況并選出最優解，另外，對于這樣的問題，我們還可以用?Hungarian algorithm?來求解。具體的Hungarian代碼我放在了資源里，調用方法已經寫在下面函數中了。下面給出Kmeans&Kmedoid主函數。

Kmeans.m 函數：

[cpp]?view plain?copy ?

function?[?accuracy,MIhat?]?=?KMeans(?K,mode?)??

%?Artificial?Intelligence?&?Data?Mining?-?KMeans?&?K-Medoids?Clustering??

%?Author:?Rachel?Zhang?@?ZJU??

%?CreateTime:?2012-11-18??

%?Function:?Clustering??

%??-K:?number?of?clusters??

%??-mode:???

%???1:?use?kmeans?cluster?algorithm?in?matlab??

%???2:?k_medroid?algorithm:?use?data?points?as?k?centers??

%???3:?k_means?algorithm:?use?average?as?k?centers??

global?N_features;??

global?N_samples;??

global?fea;??

global?gnd;??

switch?(mode)??

????case?1?%call?system?function?KMeans??

????????label?=?kmeans(fea,K);??

????????[label,accuracy]?=?cal_accuracy(gnd,label);??

??????????

????case?2%use?kmedroid?method??

????????for?testcase?=?1:10%?do?10?times?to?get?rid?of?the?influence?from?Initial_center??

????????????K_center?=?Initial_center(fea,K);?%select?initial?points?randomly??

????????????changed_label?=?N_samples;??

????????????label?=?zeros(1,N_samples);??

????????????iteration_times?=?0;??

????????????while?changed_label~=0??

????????????????cls_label?=?cell(1,K);??

????????????????for?i?=?1:?N_samples??

????????????????????for?j?=?1?:?K??

????????????????????????D(j)?=?dis(fea(i,:),K_center(j,:));??

????????????????????end??

????????????????????[~,label(i)]?=?min(D);??

????????????????????cls_label{label(i)}?=?[cls_label{label(i)}?i];??

????????????????end??

????????????????changed_label?=?0;??

????????????????cls_center?=?zeros(K,N_features);??

????????????????for?i?=?1?:?K??

????????????????????cls_center(i,:)?=?mean(fea(cls_label{i},:));??

????????????????????D1?=?[];??

????????????????????for?j?=?1:size(cls_label{i},2)%number?of?samples?clsutered?in?i-th?class??

????????????????????????D1(j)?=?dis(cls_center(i,:),fea(cls_label{i}(j),:));??

????????????????????end??

????????????????????[~,min_ind]?=?min(D1);??

????????????????????if?~isequal(K_center(i,:),fea(cls_label{i}(min_ind),:))??

????????????????????????K_center(i,:)?=?fea(cls_label{i}(min_ind),:);??

????????????????????????changed_label?=?changed_label+1;??

????????????????????end??

????????????????end??

????????????????iteration_times?=?iteration_times+1;??

????????????end??

????????????[label,acc(testcase)]?=?cal_accuracy(gnd,label);??

????????end??

????????accuracy?=?max(acc);??

??????????

????case?3%use?k-means?method??

????????for?testcase?=?1:10%?do?10?times?to?get?rid?of?the?influence?from?Initial_center??

????????????K_center?=?Initial_center(fea,K);?%select?initial?points?randomly??

????????????changed_label?=?N_samples;??

????????????label?=?zeros(1,N_samples);??

????????????label_new?=?zeros(1,N_samples);??

????????????while?changed_label~=0??

????????????????cls_label?=?cell(1,K);??

????????????????changed_label?=?0;??

????????????????for?i?=?1:?N_samples??

????????????????????for?j?=?1?:?K??

????????????????????????D(j)?=?dis(fea(i,:),K_center(j,:));??

????????????????????end??

????????????????????[~,label_new(i)]?=?min(D);??

????????????????????if(label_new(i)~=label(i))??

????????????????????????changed_label?=?changed_label+1;??

????????????????????end;??

????????????????????cls_label{label_new(i)}?=?[cls_label{label_new(i)}?i];??

????????????????end??

????????????????label?=?label_new;??

??????????????????

????????????????for?i?=?1?:?K??%recalculate?k?centroid??

????????????????????K_center(i,:)?=?mean(fea(cls_label{i},:));??

????????????????end??

????????????end??

?????????????[label,acc(testcase)]?=?cal_accuracy(gnd,label);??

????????end??

????????accuracy?=?max(acc);??

end??

MIhat?=?MutualInfo(gnd,label);??

????function?center?=?Initial_center(X,K)??

????????rnd_Idx?=?randperm(N_samples,K);??

????????center?=?X(rnd_Idx,:);??

????end??

????function?res?=?dis(X1,X2)??

????????res?=?norm(X1-X2);??

????end??

????function?[res,acc]?=?cal_accuracy(gnd,estimate_label)??

????????res?=?bestMap(gnd,estimate_label);??

????????acc?=?length(find(gnd?==?res))/length(gnd);??

????end??

end??

實驗結果分析：

對上面得到的accuracy進行畫圖，橫坐標為10個數據集，縱坐標為在其上進行聚類的準確率。

其中，auto為matlab內部kmeans函數。

畫圖：

[cpp]?view plain?copy ?

function?[??]?=?Plot(?A,B,C?)??

%PLOT?Summary?of?this?function?goes?here??

%???Detailed?explanation?goes?here??

figure;??

k?=?1:10;??

plot(k,A,'-r',k,B,'-b',k,C,'-g');??

legend('auto','medoid','means');??

end??

結果：

5類聚類：

7類聚類：

from:?http://blog.csdn.net/abcjennifer/article/details/8197072

總結

以上是生活随笔為你收集整理的聚类算法KMeans和KMedoid 的Matlab实现的全部內容，希望文章能夠幫你解決所遇到的問題。

如果覺得生活随笔網站內容還不錯，歡迎將生活随笔推薦給好友。

上一篇： Robust PCA 学习笔记
下一篇：聚类算法K-Means, K-Medoi