當(dāng)前位置：首頁 > 编程语言 > python >内容正文

python

python实现knn算法鸢尾花_Python学习之knn实现鸢尾花分类

發(fā)布時(shí)間：2025/3/15 python 20 豆豆

生活随笔收集整理的這篇文章主要介紹了 python实现knn算法鸢尾花_Python学习之knn实现鸢尾花分类小編覺得挺不錯(cuò)的,現(xiàn)在分享給大家,幫大家做個(gè)參考.

# K近鄰算法

# 導(dǎo)入相關(guān)庫文件

import numpy as np

import matplotlib.pyplot as plt

#import pandas as pd

from sklearn import neighbors, datasets

# 導(dǎo)入數(shù)據(jù)集，數(shù)據(jù)集sklearn自帶，X與y一一對(duì)應(yīng)

dataset = datasets.load_iris()

# 獲取鳶尾花前兩列花萼長度和花萼寬度(sepal_length、sepal_width)數(shù)據(jù)作為X

X = dataset.data[:, :2]

# 獲取鳶尾花種類作為Y

# 2表示Iris-virginica，1表示Iris-versicolor，0表示Iris-setosa

y = dataset.target

# 這里沒有進(jìn)行特征縮放，是因?yàn)閄屬于一個(gè)都在一個(gè)較小的區(qū)間，所以無需進(jìn)行特征縮放(已經(jīng)達(dá)到特征縮放后的要求，觀察數(shù)據(jù)很重要)

attributes_dict = {0:"sepal_length",1:"sepal_width"}

for attribute in attributes_dict:

print("{} 最大值：{}".format(attributes_dict[attribute], np.max(X[:,attribute])))

print("{} 最小值：{}".format(attributes_dict[attribute], np.min(X[:,attribute])))

# round 函數(shù)將float數(shù)據(jù)格式化小數(shù)點(diǎn)后一位

print("{} 平均值：{}".format(attributes_dict[attribute], round(np.average(X[:, attribute]),1)))

print("-------------------------------------")

# 劃分?jǐn)?shù)據(jù)為訓(xùn)練集和測試集

from sklearn.model_selection import train_test_split

"""train_test_split(train_data,train_target,test_size=0.4, random_state=0,stratify=y_train)Parameters：train_data：所要?jiǎng)澐值臉颖咎卣骷痶rain_target：所要?jiǎng)澐值臉颖窘Y(jié)果test_size：樣本占比，如果是整數(shù)的話就是樣本的數(shù)量random_state：是隨機(jī)數(shù)的種子。隨機(jī)數(shù)種子：其實(shí)就是該組隨機(jī)數(shù)的編號(hào)，在需要重復(fù)試驗(yàn)的時(shí)候，保證得到一組一樣的隨機(jī)數(shù)。比如你每次都填1，其他參數(shù)一樣的情況下你得到的隨機(jī)數(shù)組是一樣的。但填0或不填，每次都會(huì)不一樣。"""

# train_test_split返回四個(gè)參數(shù)

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.25, random_state = 0)

# 使用訓(xùn)練集訓(xùn)練KNN

from sklearn.neighbors import KNeighborsClassifier

'''class KNeighborsClassifier(NeighborsBase, KNeighborsMixin,SupervisedIntegerMixin, ClassifierMixin):Parameters:n_neighbors: 默認(rèn)鄰居的數(shù)量weights：權(quán)重可選參數(shù)uniform: 統(tǒng)一的權(quán)重. 在每一個(gè)鄰居區(qū)域里的點(diǎn)的權(quán)重都是一樣的。distance: 權(quán)重點(diǎn)等于他們距離的倒數(shù)。使用此函數(shù)，更近的鄰居對(duì)于所預(yù)測的點(diǎn)的影響更大[callable]: 一個(gè)用戶自定義的方法，此方法接收一個(gè)距離的數(shù)組，然后返回一個(gè)相同形狀并且包含權(quán)重的數(shù)組。algorithm：采用的算法可選參數(shù)ball_tree: 使用算法 BallTreekd_tree: 使用算法 KDTreebrute: 使用暴力搜索auto: 會(huì)基于傳入fit方法的內(nèi)容，選擇最合適的算法。p: 距離度量的類型metric：樹的距離矩陣metric_params：矩陣參數(shù)n_jobs：用于搜索鄰居，可并行運(yùn)行的任務(wù)數(shù)量'''

# p=2表示選取歐式距離

classifier = KNeighborsClassifier(n_neighbors = 5, metric = 'minkowski', p = 2)

classifier.fit(X_train, y_train) #knn無訓(xùn)練過程，只是做數(shù)據(jù)保存到內(nèi)存

# 預(yù)測測試集結(jié)果

y_pred = classifier.predict(X_test)

# 創(chuàng)建混淆矩陣

from sklearn.metrics import confusion_matrix

"""def confusion_matrix(y_true, y_pred, labels=None, sample_weight=None):Parameters：y_true: 樣本真實(shí)分類結(jié)果y_pred: 樣本預(yù)測分類結(jié)果labels: 給出的類別sample_weigh: 樣本權(quán)重"""

# 所有正確預(yù)測的結(jié)果都在對(duì)角線上，非對(duì)角線上的值為預(yù)測錯(cuò)誤數(shù)量

cm = confusion_matrix(y_test, y_pred)

print('cm',cm)

# 可視化訓(xùn)練集結(jié)果

from matplotlib.colors import ListedColormap

X_set, y_set = X_train, y_train

# meshgrid函數(shù)用兩個(gè)坐標(biāo)軸上的點(diǎn)在平面上畫網(wǎng)格。

# X1，X2為坐標(biāo)矩陣，用來畫網(wǎng)格

X1, X2 = np.meshgrid(np.arange(start = X_set[:, 0].min() - 1, stop = X_set[:, 0].max() + 1, step = 0.01),

np.arange(start = X_set[:, 1].min() - 1, stop = X_set[:, 1].max() + 1, step = 0.01))

# 繪制二維等高線

# 在網(wǎng)格的基礎(chǔ)上添加高度值

plt.contourf(X1, X2, classifier.predict(np.array([X1.ravel(), X2.ravel()]).T).reshape(X1.shape),

alpha = 0.75, cmap = ListedColormap(('red', 'green', 'blue')))

plt.xlim(X1.min(), X1.max())

plt.ylim(X2.min(), X2.max())

for i, j in enumerate(np.unique(y_set)):

# 繪制散點(diǎn)圖

# 自matplotlib 3.0.3 之后，scatter的c參數(shù)接收的數(shù)據(jù)類型為numpy的二維數(shù)組

# 這里的color_list，有三種類別的點(diǎn)，采用紅、綠、藍(lán)、三種顏色辨識(shí)

# 數(shù)組內(nèi)容為rgb數(shù)組

color_list = [[[1,0,0],[0,1,0],[0,0,1]][i]]

# 使用掩碼方法獲取所有類別為0、1、2的數(shù)據(jù)點(diǎn)個(gè)數(shù)

count = np.sum((y_set == j)==True)

# 通過掩碼的方式從X_set中獲取當(dāng)類別為0、1、2時(shí)的x坐標(biāo)和y坐標(biāo)

'''plt.scatter(x, y, c, marker, cmap,alpha, linewidths, edgecolors):Parameters:x, y: 數(shù)據(jù)的坐標(biāo)c: 顏色，顏色序列marker: 繪制數(shù)據(jù)點(diǎn)的形狀，默認(rèn)是點(diǎn)cmap： atplotlib.colors.Colormap 內(nèi)置的顏色序列alpha: 繪制數(shù)據(jù)點(diǎn)的透明度范圍是[0-1] 0到1表示完全透明到完全不透明linewidths: 數(shù)據(jù)點(diǎn)形狀的邊框粗細(xì)edgecolors : 數(shù)據(jù)點(diǎn)形狀的邊框顏色'''

plt.scatter(X_set[y_set == j, 0], X_set[y_set == j, 1],

c = color_list*count, label = j)

plt.title('K-NN (Training set)')

plt.xlabel('Sepal Length')

plt.ylabel('Sepal Width')

plt.legend()

plt.show()

# 可視化測試集結(jié)果