日韩性视频-久久久蜜桃-www中文字幕-在线中文字幕av-亚洲欧美一区二区三区四区-撸久久-香蕉视频一区-久久无码精品丰满人妻-国产高潮av-激情福利社-日韩av网址大全-国产精品久久999-日本五十路在线-性欧美在线-久久99精品波多结衣一区-男女午夜免费视频-黑人极品ⅴideos精品欧美棵-人人妻人人澡人人爽精品欧美一区-日韩一区在线看-欧美a级在线免费观看

歡迎訪問 生活随笔!

生活随笔

當前位置: 首頁 > 编程资源 > 编程问答 >内容正文

编程问答

[scikit-learn 机器学习] 3. K-近邻算法分类和回归

發(fā)布時間:2024/7/5 编程问答 42 豆豆
生活随笔 收集整理的這篇文章主要介紹了 [scikit-learn 机器学习] 3. K-近邻算法分类和回归 小編覺得挺不錯的,現(xiàn)在分享給大家,幫大家做個參考.

文章目錄

    • 1. KNN模型
    • 2. KNN分類
    • 3. 使用sklearn KNN分類
    • 4. KNN回歸

本文為 scikit-learn機器學習(第2版)學習筆記

K 近鄰法(K-Nearest Neighbor, K-NN) 常用于 搜索和推薦系統(tǒng)。

1. KNN模型

  • 確定距離度量方法(如歐氏距離)
  • 根據(jù) K 個最近的距離的鄰居樣本,選擇策略做出預測
  • 模型假設:距離相近的樣本,有接近的響應值

2. KNN分類

根據(jù)身高、體重對性別進行分類

import numpy as np import matplotlib.pyplot as pltX_train = np.array([[158, 64],[170, 86],[183, 84],[191, 80],[155, 49],[163, 59],[180, 67],[158, 54],[170, 67] ]) y_train = ['male', 'male', 'male', 'male', 'female', 'female', 'female', 'female', 'female']plt.figure() plt.title('Human Heights and Weights by Sex') plt.xlabel('Height in cm') plt.ylabel('Weight in kg')for i, x in enumerate(X_train):if y_train[i] == 'male':c1 = plt.scatter(x[0], x[1], c='k', marker='x')else:c2 = plt.scatter(x[0], x[1], c='r', marker='o') plt.grid(True) plt.legend((c1,c2),('male','female'),loc='lower right') # plt.show()

  • 對身高 155cm,體重 70 kg的人進行性別預測
  • 設置 KNN 模型 k = 3
計算距離 x = np.array([[155,70]]) dis = np.sqrt(np.sum((X_train-x)**2 ,axis = 1)) dis 選取最近k個 nearset_k_neighbor = dis.argsort()[0:3] k_genders = [y_train[i] for i in nearset_k_neighbor] k_genders # ['male', 'female', 'female'] 計算最近的k個的標簽 from collections import Counter # b = Counter(np.take(y_train, dis.argsort()[0:3])) b = Counter(k_genders) b # Counter({'male': 1, 'female': 2}) 性別為女性占多數(shù) # help(Counter.most_common) # most_common(self, n=None) # List the n most common elements and their counts from the most # common to the least. If n is None, then list all element counts. b.most_common(2) # [('female', 2), ('male', 1)] b.most_common(1)[0][0] # 'female'

3. 使用sklearn KNN分類

標簽(male,female)數(shù)字化(0,1)

from sklearn.preprocessing import LabelBinarizer from sklearn.neighbors import KNeighborsClassifierlb = LabelBinarizer() y_train_lb = lb.fit_transform(y_train) y_train_lb ###### array([[1],[1],[1],[1],[0],[0],[0],[0],[0]])

預測前面的例子的性別

K=3 clf = KNeighborsClassifier(n_neighbors=K) clf.fit(X_train,y_train_lb.ravel()) pred_gender = clf.predict(x) pred_gender # array([0]) pred_label_gender = lb.inverse_transform(pred_gender) pred_label_gender # array(['female'], dtype='<U6')

在test集上驗證

X_test = np.array([[168, 65],[180, 96],[160, 52],[169, 67] ]) y_test = ['male', 'male', 'female', 'female'] y_test_lb = lb.transform(y_test)pred_lb = clf.predict(X_test) print('Predicted labels: %s' % lb.inverse_transform(pred_lb)) # Predicted labels: ['female' 'male' 'female' 'female']

計算評價指標

準確率:預測對了的比例3/4 from sklearn.metrics import accuracy_score accuracy_score(y_test_lb, pred_lb) # 0.75 精準率:正類為男,男預測為男/(男預測男+女預測男) from sklearn.metrics import precision_score precision_score(y_test_lb, pred_lb) # 1.0 召回率: 男預測男/(男預測男+男預測女) from sklearn.metrics import recall_score recall_score(y_test_lb, pred_lb) # 0.5

F1 值

F1 得分是:精準率和召回率的均衡 from sklearn.metrics import f1_score f1_score(y_test_lb, pred_lb) # 0.6667 評價報告 from sklearn.metrics import classification_report # help(classification_report) # classification_report(y_true, y_pred, labels=None, target_names=None, s # ample_weight=None, digits=2, output_dict=False, zero_division='warn') print(classification_report(y_test_lb, pred_lb, target_names=['male','female'], labels=[1,0]))

4. KNN回歸

根據(jù)身高、性別,預測其體重

from sklearn.neighbors import KNeighborsRegressor from sklearn.metrics import mean_absolute_error, mean_squared_error,r2_scoreX_train = np.array([[158, 1],[170, 1],[183, 1],[191, 1],[155, 0],[163, 0],[180, 0],[158, 0],[170, 0] ]) y_train = [64,86,84,80,49,59,67,54,67]X_test = np.array([[168, 1],[180, 1],[160, 0],[169, 0] ]) y_test = [65,96,52,67]K = 3 clf = KNeighborsRegressor(n_neighbors=K) clf.fit(X_train, y_train) predictions = clf.predict(np.array(X_test)) predictions # array([70.66666667, 79. , 59. , 70.66666667])# help(r2_score) # R^2 (coefficient of determination) r2_score(y_test, predictions) # 0.6290565226735438平均絕對值誤差 mean_absolute_error(y_test, predictions) # 8.333333333333336平均平方誤差 mean_squared_error(y_test, predictions) # 95.8888888888889
  • 數(shù)據(jù)沒有標準化的影響
from scipy.spatial.distance import euclidean # help(euclidean) # 歐氏距離 X_train = np.array([[1700,1],[1600,0] ]) X_test = np.array([1640,1]).reshape(1,-1) print(euclidean(X_train[0,:], X_test)) print(euclidean(X_train[1,:], X_test)) # 60.0 # 40.01249804748511X_train = np.array([[1.7,1],[1.6,0] ]) X_test = np.array([1.64,1]).reshape(1,-1) print(euclidean(X_train[0,:], X_test)) print(euclidean(X_train[1,:], X_test)) # 0.06000000000000005 # 1.0007996802557444

可以看出不同單位下的歐式距離差異很大

  • 進行數(shù)據(jù)標準化
from sklearn.preprocessing import StandardScaler scaler = StandardScaler() X_train_scaled = scaler.fit_transform(X_train)print(X_train) print(X_train_scaled) [[158 1][170 1][183 1][191 1][155 0][163 0][180 0][158 0][170 0]] [[-0.9908706 1.11803399][ 0.01869567 1.11803399][ 1.11239246 1.11803399][ 1.78543664 1.11803399][-1.24326216 -0.89442719][-0.57021798 -0.89442719][ 0.86000089 -0.89442719][-0.9908706 -0.89442719][ 0.01869567 -0.89442719]]
  • 標準化特征后 模型誤差更低
pred = clf.predict(X_test_scaled) pred # array([78. , 83.33333333, 54. , 64.33333333])# R^2 (coefficient of determination) r2_score(y_test, pred) # 0.6706425961745109# 平均絕對值誤差 mean_absolute_error(y_test, pred) # 7.583333333333336# 平均平方誤差 mean_squared_error(y_test, pred) # 85.13888888888893

總結

以上是生活随笔為你收集整理的[scikit-learn 机器学习] 3. K-近邻算法分类和回归的全部內容,希望文章能夠幫你解決所遇到的問題。

如果覺得生活随笔網(wǎng)站內容還不錯,歡迎將生活随笔推薦給好友。