當前位置：首頁 > 编程资源 > 编程问答 >内容正文

编程问答

Spectral clustering 谱聚类讲解及实现

發布時間：2025/4/16 编程问答 18 豆豆

生活随笔收集整理的這篇文章主要介紹了 Spectral clustering 谱聚类讲解及实现小編覺得挺不錯的,現在分享給大家,幫大家做個參考.

簡述

https://github.com/Sean16SYSU/MachineLearningImplement

這篇是在網上看了wiki之后寫出來的代碼。

附上一篇看過論文之后根據論文實現的版本：【論文閱讀和實現】On Spectral Clustering: Analysis and an algorithm【Python實現】

In multivariate statistics and the clustering of data, spectral clustering techniques make use of the spectrum(eigenvalues) of the similarity matrix of the data to perform dimensionality reduction before clustering in fewer dimensions. The similarity matrix is provided as an input and consists of a quantitative assessment of the relative similarity of each pair of points in the dataset.¹ 在多元統計和數據聚類當中，譜聚類技術充分利用了數據的相似度矩陣光譜（特征值）來實現在聚類之前的降維到更小的維度下。這個相似度矩陣被提供作為輸入，包括有在每對點之間的關聯相似度的量化方法。

Given an enumerated set of data points, the similarity matrix may be defined as a symmetric matrix $A{\displaystyle A}$ , where $Aij≥0{\displaystyle A_{ij}\geq 0}$ represents a measure of the similarity between data points with indices $i{\displaystyle i}$ and $j{\displaystyle j}$ .

給一個可數的點集，相似度矩陣可以被定義為一個對稱矩陣 $A{\displaystyle A}$ , 并且 $Aij≥0{\displaystyle A_{ij}\geq 0}$ 表示的是有著下面的index的 $i{\displaystyle i}$ 和 $j{\displaystyle j}$ 兩個點之間的相似度。

一般來說的譜聚類方法，就是會使用一些標準的聚類方法（包括有Kmean）相關的拉普拉斯矩陣的特征向量上。有很多種方式來定義拉普拉斯矩陣，每種都有自己的數學解讀。并且因此這個聚類的也會有不同的解讀。

這些相關的特征向量是一個基于最小的幾個拉普拉斯矩陣的特征值，除了最小的那個0。為了計算的速度，這些特征向量經常被計算為拉普拉斯函數的最大的特征值對應的特征向量。

圖問題的拉普拉斯矩陣被定義為

$L : = D ? A$
D是對角矩陣，然后對角元為對應節點的度。

一個非常著名相關的譜聚類技術，用到了normalized cuts algorithm ，被廣泛用于圖片分割。分割的時候，基于的特征想來是對稱正則化拉普拉斯矩陣的第二小的特征。這個矩陣被定義為

$Lnorm:=I?D?1/2AD?1/2{\displaystyle L^{\text{norm}}:=I-D^{-1/2}AD^{-1/2}}$

算法

To perform a spectral clustering we need 3 main steps:²

Create a similarity graph between our N objects to cluster. 在N個對象上創建相似性矩陣

Compute the first k eigenvectors of its Laplacian matrix to define a feature vector for each object. 計算出前k個拉普拉斯矩陣的特征向量，給每個點定義一個表征向量。

Run k-means on these features to separate objects into k classes. 在這個表征向量上再做k-means

針對圖算法

先構建圖網絡的鏈接

計算出拉普拉斯矩陣

做特征分解，得到特征向量，用于做表征向量

在這個基礎上再做一般的聚類方法

還是一樣的，為了封裝性，我這就直接寫了函數內嵌套著函數~

import numpy as np from sklearn.cluster import KMeansdef spectral_cluster(X, n_clusters=3, sigma=1, k=5, n_eigen=10):def graph_building_KNN(X, k=5, sigma=1):N = len(X)S = np.zeros((N, N))for i, x in enumerate(X):S[i] = np.array([np.linalg.norm(x - xi) for xi in X])S[i][i] = 0graph = np.zeros((N, N))for i, x in enumerate(X):distance_top_n = np.argsort(S[i])[1: k+1]for nid in distance_top_n:graph[i][nid] = np.exp(-S[i][nid] / (2 * sigma ** 2))return graphgraph = graph_building_KNN(X, k)def laplacianMatrix(A):dm = np.sum(A, axis=1)D = np.diag(dm)L = D - AsqrtD = np.diag(1.0 / (dm ** 0.5))return np.dot(np.dot(sqrtD, L), sqrtD)L = laplacianMatrix(graph)def smallNeigen(L, n_eigen):eigval, eigvec = np.linalg.eig(L)index = list(map(lambda x: x[1], sorted(zip(eigval, range(len(eigval))))[1:n_eigen+1]))return eigvec[:, index]H = smallNeigen(L, n_eigen)kmeans = KMeans(n_clusters=n_clusters).fit(H)return kmeans.labels_

測試

from sklearn import datasets from sklearn.decomposition import PCA iris = datasets.load_iris() X_reduced = PCA(n_components=2).fit_transform(iris.data) y = spectral_cluster(X_reduced, k=10, n_clusters=3) plt.scatter(X_reduced[:, 0], X_reduced[:, 1], c=y, cmap=plt.cm.Set1)

實際圖

有部分檢測的不是很好，sklearn上的實現的版本，實際上改進之后的版本spectral-embedding，效果好很多。

sklearn的譜聚類效果圖

https://en.wikipedia.org/wiki/Spectral_clustering ??

https://towardsdatascience.com/spectral-clustering-for-beginners-d08b7d25b4d8 ??

總結

以上是生活随笔為你收集整理的Spectral clustering 谱聚类讲解及实现的全部內容，希望文章能夠幫你解決所遇到的問題。

如果覺得生活随笔網站內容還不錯，歡迎將生活随笔推薦給好友。

上一篇： COP-kMeans限制性--kMean
下一篇：【论文阅读】A social recom