日韩性视频-久久久蜜桃-www中文字幕-在线中文字幕av-亚洲欧美一区二区三区四区-撸久久-香蕉视频一区-久久无码精品丰满人妻-国产高潮av-激情福利社-日韩av网址大全-国产精品久久999-日本五十路在线-性欧美在线-久久99精品波多结衣一区-男女午夜免费视频-黑人极品ⅴideos精品欧美棵-人人妻人人澡人人爽精品欧美一区-日韩一区在线看-欧美a级在线免费观看

歡迎訪問 生活随笔!

生活随笔

當(dāng)前位置: 首頁 > 编程资源 > 编程问答 >内容正文

编程问答

(数据挖掘 —— 无监督学习(聚类)

發(fā)布時(shí)間:2025/3/21 编程问答 14 豆豆
生活随笔 收集整理的這篇文章主要介紹了 (数据挖掘 —— 无监督学习(聚类) 小編覺得挺不錯(cuò)的,現(xiàn)在分享給大家,幫大家做個(gè)參考.

數(shù)據(jù)挖掘 —— 無監(jiān)督學(xué)習(xí)(聚類)

  • 1. K-means
    • 1.1 生成指定形狀的隨機(jī)數(shù)據(jù)
    • 1.2 進(jìn)行聚類
    • 1.3 結(jié)果
  • 2. 系統(tǒng)聚類
    • 2.1 代碼
    • 2.2 結(jié)果
  • 3 DBSCAN
    • 3.1 參數(shù)選擇
    • 3.2 代碼
    • 3.3 結(jié)果

1. K-means

K-Means為基于切割的聚類算法

1.1 生成指定形狀的隨機(jī)數(shù)據(jù)

import numpy as np import pandas as pd import matplotlib.pyplot as plt from sklearn.cluster import KMeans # *************** 生成指定形狀的隨機(jī)數(shù)據(jù) ***************** from sklearn.datasets import make_circles,make_moons,make_blobs n_samples = 1000# 生成環(huán)裝數(shù)據(jù) circles = make_circles(n_samples = n_samples,factor = 0.5,noise = 0.05) """ n_samples: 為樣本點(diǎn)個(gè)數(shù) factor:為大圓與小圓的間距 """ # 生成月牙形數(shù)據(jù) moons = make_moons(n_samples = n_samples,noise = 0.05)# 生成簇狀數(shù)據(jù) blobs = make_blobs(n_samples = n_samples,random_state = 100,center_box = (-10,10),cluster_std = 1,centers = 3) """ random_state: 隨機(jī)數(shù)種子,多少代保持隨機(jī)數(shù)不變 center_box: 中心確定后的數(shù)據(jù)邊界 默認(rèn)(-10,10) cluster_std:數(shù)據(jù)分布的標(biāo)準(zhǔn)差,決定各類數(shù)據(jù)的緊湊程度,默認(rèn)為1.0 centers:產(chǎn)生數(shù)據(jù)點(diǎn)中心的個(gè)數(shù) 默認(rèn)為3 """ # 產(chǎn)生隨機(jī)數(shù) random_data = np.random.rand(n_samples,2),np.array([0 for i in range(n_samples)]) datasets = [circles,moons,blobs,random_data] fig = plt.figure(figsize=(20,8))

1.2 進(jìn)行聚類

colors = "rgbykcm" for index,data in enumerate(datasets):X = data[0]Y_old = data[1]km_cluster = KMeans(n_clusters = 2)km_cluster.fit(X)Y_new = km_cluster.labels_fig.add_subplot(2,len(datasets),index+1)[plt.scatter(X[i,0],X[i,1],color = colors[Y_old[i]]) for i in range(len(X[:,0]))] fig.add_subplot(2,len(datasets),index+5)[plt.scatter(X[i,0],X[i,1],color = colors[Y_new[i]]) for i in range(len(X[:,0]))]

1.3 結(jié)果

2. 系統(tǒng)聚類

2.1 代碼

AgglomerativeClustering(n_clusters,affinity,linkage)
  • affinity:
  • “euclidean”,歐幾里得距離
  • “l(fā)1”, “l(fā)2”,
  • “manhattan”, 曼哈頓距離
  • “cosine”, 余弦距離
  • “precomputed”預(yù)輸入 需要輸出距離矩陣
    • linkage:{“ward”, “complete”, “average”, “single”}, default=”ward”
    from sklearn.datasets import make_circles,make_blobs,make_moons from sklearn.cluster import AgglomerativeClustering import matplotlib.pyplot as plt import numpy as np import pandas as pd# 準(zhǔn)備數(shù)據(jù) n_samples = int(1e3) circles = make_circles(n_samples = n_samples,noise = 0.05,factor = 0.5,random_state = 10) moons = make_moons(n_samples = n_samples,noise = 0.05,random_state = 10) blobs = make_blobs(n_samples=n_samples,centers = 4,cluster_std = 0.1,center_box = (-1,1),random_state = 10) np.random.seed(10) random_data = (np.random.rand(n_samples,2),np.zeros((n_samples)).astype(np.int))datasets = [circles,moons,blobs,random_data] fig = plt.figure(figsize = (20,8),dpi = 72) colors = "rgbk" for index,data in enumerate(datasets):X = data[0]Y = data[1]agg_cluster = AgglomerativeClustering(n_clusters = 2,affinity = "euclidean",linkage = "average")Y_predict = agg_cluster.fit(X).labels_fig.add_subplot(2,len(datasets),index + 1)[plt.scatter(X[i,0],X[i,1],color = colors[Y[i]]) for i in range(len(X[:,0]))]fig.add_subplot(2,len(datasets),index + 5)[plt.scatter(X[i,0],X[i,1],color = colors[Y_predict[i]]) for i in range(len(X[:,0]))]

    2.2 結(jié)果

    3 DBSCAN

    3.1 參數(shù)選擇

  • 半徑:k距離幫助設(shè)置半徑,也就是要找到突變點(diǎn),
    即選中一個(gè)點(diǎn),計(jì)算它和所有其他點(diǎn)的距離,
    從小到大排序,發(fā)現(xiàn)距離突變點(diǎn)。
    需要做大量實(shí)驗(yàn)觀察。
  • MinPts:先設(shè)置偏小一些,然后進(jìn)行多次嘗試
  • 3.2 代碼

    # 導(dǎo)入聚類數(shù)據(jù) n_samples = 1000 from sklearn.datasets import make_circles,make_moons,make_blobs from sklearn.cluster import DBSCAN import pandas as pd import numpy as np import matplotlib.pyplot as plt circles = make_circles(n_samples = n_samples,noise = 0.05,factor = 0.5,random_state = 10) moons = make_moons(n_samples = n_samples,noise = 0.05,random_state = 10) blobs = make_blobs(n_samples = n_samples,centers = 3,cluster_std = 0.1,center_box = (-1,1),random_state = 10) np.random.seed(10) random_data = (np.random.rand(n_samples,2),np.zeros((n_samples)).astype(np.int)) datasets = [circles,moons,blobs,random_data] fig = plt.figure(figsize = (20,8),dpi = 72) colors = "rgbky" for index,data in enumerate(datasets):X = data[0]Y_old = data[1]dbscan_model = DBSCAN(eps = 0.1,min_samples = 20)dbscan_model.fit(X)Y_new = dbscan_model.labels_fig.add_subplot(2,len(datasets),index+1)[plt.scatter(X[i,0],X[i,1],color = colors[Y_old[i]]) for i in range(len(X[:,0]))]plt.title("original algorithm")fig.add_subplot(2,len(datasets),index + 5)[plt.scatter(X[i,0],X[i,1],color = colors[Y_new[i]]) for i in range(len(X[:,0]))]plt.title("DBSCA algorithm")

    3.3 結(jié)果

    by CyrusMay 2022 04 05

    總結(jié)

    以上是生活随笔為你收集整理的(数据挖掘 —— 无监督学习(聚类)的全部內(nèi)容,希望文章能夠幫你解決所遇到的問題。

    如果覺得生活随笔網(wǎng)站內(nèi)容還不錯(cuò),歡迎將生活随笔推薦給好友。