當(dāng)前位置：首頁 > 编程资源 > 编程问答 >内容正文

编程问答

机器学习 | 网络搜索及可视化

發(fā)布時間：2025/3/15 编程问答 29 豆豆

生活随笔收集整理的這篇文章主要介紹了机器学习 | 网络搜索及可视化小編覺得挺不錯的,現(xiàn)在分享給大家,幫大家做個參考.

文章目錄

1. 網(wǎng)絡(luò)搜索
- 1.1 簡單網(wǎng)絡(luò)搜索
- 1.2 參數(shù)過擬合的風(fēng)險與驗證集
- 1.3 帶交叉驗證的網(wǎng)絡(luò)搜索
- - 1.3.1 Python 實現(xiàn)
  - 1.3.2 Sklearn 實現(xiàn)
- 1.4 網(wǎng)絡(luò)搜索可視化
- - 1.4.1 在網(wǎng)絡(luò)空間中的搜索
  - - 1.4.1.1 錯誤的參數(shù)設(shè)置和可視化
  - 1.4.2 在非網(wǎng)絡(luò)空間的搜索
參考資料

相關(guān)文章：

機器學(xué)習(xí) | 目錄

監(jiān)督學(xué)習(xí) | 決策樹之網(wǎng)絡(luò)搜索

監(jiān)督學(xué)習(xí) | SVM 之線性支持向量機原理

監(jiān)督學(xué)習(xí) | SVM 之非線性支持向量機原理

監(jiān)督學(xué)習(xí) | SVM 之支持向量機Sklearn實現(xiàn)

1. 網(wǎng)絡(luò)搜索

網(wǎng)絡(luò)搜索（Grid Search）：一種調(diào)參方法，利用窮舉搜索，在所有候選的參數(shù)選擇中，通過循環(huán)便利，嘗試每一種可能性，表現(xiàn)最好的參數(shù)就是最終的結(jié)果。其原理就是在數(shù)組里找最大值。（為什么叫網(wǎng)格搜索？以有兩個參數(shù)的模型為例，參數(shù) a 有 3 種可能，參數(shù) b 有 4 種可能，把所有可能性列出來，可以表示成一個 $3×43\times 4$ 的表格，其中每個cell就是一個網(wǎng)格，循環(huán)過程就像是在每個網(wǎng)格里遍歷、搜索，所以叫g(shù)rid search）^[1]

1.1 簡單網(wǎng)絡(luò)搜索

考慮一個具有 RBF（徑向基函數(shù)）核的核 SVM 的例子。

我們可以使用 Python 實現(xiàn)一個簡單的網(wǎng)絡(luò)搜索，在 2 個參數(shù)上使用 for 循環(huán)，對每種參數(shù)組合分別訓(xùn)練并評估一個分類器：

# naive grid search implementation from sklearn.datasets import load_iris from sklearn.model_selection import train_test_split from sklearn.svm import SVCiris = load_iris() X_train, X_test, y_train, y_test = train_test_split(iris.data, iris.target,random_state=0) print("Size of training set: {} size of test set: {}".format(X_train.shape[0], X_test.shape[0]))best_score = 0for gamma in [0.001, 0.01, 0.1, 1, 10, 100]:for C in [0.001, 0.01, 0.1, 1, 10, 100]:# for each combination of parameters, train an SVCsvm = SVC(gamma=gamma, C=C)svm.fit(X_train, y_train)# evaluate the SVC on the test setscore = svm.score(X_test, y_test)# if we got a better score, store the score and parametersif score > best_score:best_score = scorebest_parameters = {'C': C, 'gamma': gamma}print("Best score: {:.2f}".format(best_score)) print("Best parameters: {}".format(best_parameters)) Size of training set: 112 size of test set: 38 Best score: 0.97 Best parameters: {'C': 100, 'gamma': 0.001}

1.2 參數(shù)過擬合的風(fēng)險與驗證集

看到這個結(jié)果，是否意味著我們找到了一個在數(shù)據(jù)集上精度達(dá)到 97% 的模型呢？答案是否定的，原因如下：

我們嘗試了許多不同的參數(shù)，并選擇了在測試集上精度最高的那個，但這個精度不一定能推廣到新數(shù)據(jù)上。由于我們使用測試數(shù)據(jù)繼續(xù)調(diào)參，所以不能再用它來評估模型的好壞。也就是說調(diào)參過程的模型得分不能作為最終得分。我們最開始需要將數(shù)據(jù)劃分為訓(xùn)練集和測試集也是因為這個原因。我們需要一個獨立的數(shù)據(jù)集來進(jìn)行評估，一個在創(chuàng)建模型時沒有用到的數(shù)據(jù)集。

為了解決這個問題，一個方法時再次劃分?jǐn)?shù)據(jù)，這樣我們得到 3 個數(shù)據(jù)集：用于構(gòu)建模型的訓(xùn)練集（Training Set），用于選擇模型參數(shù)的驗證集（Validation Set），用于評估所選參數(shù)性能的測試集（Testing Set）。如下圖所示：

利用驗證集選定最佳參數(shù)之后，我們可以利用找到的參數(shù)設(shè)置重新構(gòu)建一個模型，但是要同時在訓(xùn)練數(shù)據(jù)和驗證數(shù)據(jù)上進(jìn)行訓(xùn)練，這樣我們可以利用盡可能多的數(shù)據(jù)來構(gòu)建模型。其實現(xiàn)如下所示：

from sklearn.svm import SVC # split data into train+validation set and test set X_trainval, X_test, y_trainval, y_test = train_test_split(iris.data, iris.target, random_state=0) # split train+validation set into training and validation sets X_train, X_valid, y_train, y_valid = train_test_split(X_trainval, y_trainval, random_state=1) print("Size of training set: {} size of validation set: {} size of test set:"" {}\n".format(X_train.shape[0], X_valid.shape[0], X_test.shape[0]))best_score = 0for gamma in [0.001, 0.01, 0.1, 1, 10, 100]:for C in [0.001, 0.01, 0.1, 1, 10, 100]:# for each combination of parameters train an SVCsvm = SVC(gamma=gamma, C=C)svm.fit(X_train, y_train)# evaluate the SVC on the validation setscore = svm.score(X_valid, y_valid)# if we got a better score, store the score and parametersif score > best_score:best_score = scorebest_parameters = {'C': C, 'gamma': gamma}# rebuild a model on the combined training and validation set, # and evaluate it on the test set svm = SVC(**best_parameters) svm.fit(X_trainval, y_trainval) test_score = svm.score(X_test, y_test) print("Best score on validation set: {:.2f}".format(best_score)) print("Best parameters: ", best_parameters) print("Test set score with best parameters: {:.2f}".format(test_score)) Size of training set: 84 size of validation set: 28 size of test set: 38Best score on validation set: 0.96 Best parameters: {'C': 10, 'gamma': 0.001} Test set score with best parameters: 0.92

驗證集上的最高分?jǐn)?shù)時 96%，這比之前略低，可能是因為我們使用了更少的數(shù)據(jù)來訓(xùn)練模型（現(xiàn)在 X_train 更小，因為我們對數(shù)據(jù)集進(jìn)行了兩次劃分）。但測試集上的分?jǐn)?shù)（這個分?jǐn)?shù)實際反映了模型的泛化能力）更低，為 92%。因此，我們只能聲稱對 92% 的新數(shù)據(jù)正確分類，而不是我們之前認(rèn)為的 97%！

1.3 帶交叉驗證的網(wǎng)絡(luò)搜索

雖然將數(shù)據(jù)劃分為訓(xùn)練集、驗證集和測試集的方法（如上所述）是可行的，也相對可用，但這種方法對數(shù)據(jù)的劃分相當(dāng)敏感。為了得到對泛化性能的更好估計，我們可以使用交叉驗證（機器學(xué)習(xí) | 模型選擇）來評估每種參數(shù)組合的性能，而不是僅將數(shù)據(jù)單次劃分為訓(xùn)練集與驗證集。整個過程如下所示：

1.3.1 Python 實現(xiàn)

from sklearn.model_selection import cross_val_scorefor gamma in [0.001, 0.01, 0.1, 1, 10, 100]:for C in [0.001, 0.01, 0.1, 1, 10, 100]:# 對每種參數(shù)組合都訓(xùn)練一個 SVCsvm = SVC(gamma=gamma, C=C)# 執(zhí)行交叉驗證scores = cross_val_score(svm, X_trainval, y_trainval, cv=5)# 計算交叉驗證平均精度score = np.mean(scores)# 如果得到更高的分?jǐn)?shù)，則保存該分?jǐn)?shù)和對應(yīng)的參數(shù)if score > best_score:best_score = scorebest_parameters = {'C': C, 'gamma': gamma} # 利用訓(xùn)練集和驗證集得到最優(yōu)參數(shù)重新構(gòu)建一個模型 svm = SVC(**best_parameters) svm.fit(X_trainval, y_trainval) SVC(C=100, cache_size=200, class_weight=None, coef0=0.0,decision_function_shape='ovr', degree=3, gamma=0.01, kernel='rbf',max_iter=-1, probability=False, random_state=None, shrinking=True,tol=0.001, verbose=False)

選擇最優(yōu)參數(shù)的過程如下所示：

交叉驗證是在特定數(shù)據(jù)集上對給定算法進(jìn)行評估的一種方法，但它通常與網(wǎng)絡(luò)搜算等參數(shù)搜索方法結(jié)合使用。因此，許多人使用交叉驗證（Cross-validation）這一術(shù)語來通俗地指代交叉驗證的網(wǎng)絡(luò)搜素。

1.3.2 Sklearn 實現(xiàn)

由于帶交叉驗證的網(wǎng)絡(luò)搜索是一種常用的調(diào)參方法，因此 sickit-learn 提供了 GridSearchCV `類，它以評估其（estimator）的形式實現(xiàn)了這種方法。要使用 GridSerachCV 類，首先需要用一個字典指定要搜索的參數(shù)，然后 GridSearchCV 會執(zhí)行所有必要的模型擬合。

sklearn.model_selection.GridSearchCV：（Sklearn 官方文檔）

創(chuàng)建網(wǎng)絡(luò)搜索器：GridSearchCV(estimator, param_grid, cv, return_train_score=False)

其中 estimator 為想要訓(xùn)練的模型，param_grid 為想要訓(xùn)練的參數(shù)字典，cv 為交叉驗證的折數(shù)。

GridSearchCV 包含的方法：

fit、predict、score：分別進(jìn)行擬合、預(yù)測和得出泛化性能分?jǐn)?shù)

best_params 、 best_score_、best_estimator_：查看最佳參數(shù)、所對應(yīng)的交叉驗證平均分?jǐn)?shù)和其對于的最佳模型

cv_results_：返回包含網(wǎng)絡(luò)搜索的結(jié)果的字典

字典的鍵是我們想要嘗試的參數(shù)設(shè)置。如 C 個 gamma 想要嘗試的取值為 0.001、 0.01、 0.1、 1 、10 和 100，可以將其轉(zhuǎn)化為下面的字典：

param_grid = {'C': [0.001, 0.01, 0.1, 1, 10, 100],'gamma': [0.001, 0.01, 0.1, 1, 10, 100]} print("Parameter grid:\n{}".format(param_grid)) Parameter grid: {'C': [0.001, 0.01, 0.1, 1, 10, 100], 'gamma': [0.001, 0.01, 0.1, 1, 10, 100]}

我們現(xiàn)在可以使用模型（SVC）、要搜索的參數(shù)網(wǎng)絡(luò)（param_grid）與要使用的交叉驗證策略（比如 5 折分層交叉驗證）將 GridSearchCV 類實例化：

from sklearn.model_selection import GridSearchCV from sklearn.svm import SVC grid_search = GridSearchCV(SVC(), param_grid, cv=5, return_train_score=True)

GridSearchCV 將使用交叉驗證來代替之前用過的劃分訓(xùn)練集和驗證集方法。但是，我們?nèi)孕枰獙?shù)據(jù)劃分為訓(xùn)練集和測試集，以避免參數(shù)過擬合：

X_train, X_test, y_train, y_test = train_test_split(iris.data, iris.target, random_state=0)

我們創(chuàng)建的 grid_search 對象的行為就像是一個分類器，我們可以對它叫用標(biāo)準(zhǔn)的 fit、predict 和 score 方法。但我們在調(diào)用 fit 時，它會對 param_grid 指定的美中參數(shù)組合都運行交叉驗證：

grid_search.fit(X_train, y_train) GridSearchCV(cv=5, error_score='raise-deprecating',estimator=SVC(C=1.0, cache_size=200, class_weight=None, coef0=0.0,decision_function_shape='ovr', degree=3,gamma='auto_deprecated', kernel='rbf', max_iter=-1,probability=False, random_state=None, shrinking=True,tol=0.001, verbose=False),iid='warn', n_jobs=None,param_grid={'C': [0.001, 0.01, 0.1, 1, 10, 100],'gamma': [0.001, 0.01, 0.1, 1, 10, 100]},pre_dispatch='2*n_jobs', refit=True, return_train_score=True,scoring=None, verbose=0)

擬合 GridSearchCV 對象不僅會搜索最佳參數(shù)，還會利用得到最佳交叉驗證性能的參數(shù)在整個訓(xùn)練數(shù)據(jù)集上自動擬合一個新模型。因此，fit 完成的工作相當(dāng)于 1.3.1 的代碼結(jié)果。

GridSerachCV 類提供了一個非常方便的接口，可以用 predic 和 score 方法來訪問重新訓(xùn)練過的模型。為了評估找到的最佳參數(shù)的泛化能力，我們可以在測試集上調(diào)用 score：

print("Test set score: {:.2f}".format(grid_search.score(X_test, y_test))) Test set score: 0.97

從結(jié)果中看出，我們利用交叉驗證選擇的參數(shù)，找到了一個在測試集上精度為 97% 的模型。重要的是，我們沒有使用測試集來選擇參數(shù)。我們找到的參數(shù)保存在 best_params屬性中，而交叉驗證最佳精度（對于這種參數(shù)設(shè)置，不同劃分的平均精度）保存在best_score_中:

print("Best parameters: {}".format(grid_search.best_params_)) print("Best cross-validation score: {:.2f}".format(grid_search.best_score_)) Best parameters: {'C': 100, 'gamma': 0.01} Best cross-validation score: 0.97

同樣，注意不要將 best_score_ 與模型在測試集上調(diào)用 score 方法計算得到的泛化性能弄混。使用 score 方法（或者對 predict 方法的輸出進(jìn)行評估）采用的是在整個訓(xùn)練集上訓(xùn)練的模型。而best_score_屬性保存的是交叉驗證的平均精度，是在訓(xùn)練集上進(jìn)行交叉驗證得到的。

能夠訪問實際找到的模型，這有時是很有幫助的，比如查看系數(shù)或特征重要性。可以使用 best_estimator_ 屬性來訪問最佳屬性對于的模型，它是在整個訓(xùn)練集上訓(xùn)練得到的：

print("Best estimator:\n{}".format(grid_search.best_estimator_)) Best estimator: SVC(C=100, cache_size=200, class_weight=None, coef0=0.0,decision_function_shape='ovr', degree=3, gamma=0.01, kernel='rbf',max_iter=-1, probability=False, random_state=None, shrinking=True,tol=0.001, verbose=False)

1.4 網(wǎng)絡(luò)搜索可視化

1.4.1 在網(wǎng)絡(luò)空間中的搜索

將交叉驗證的結(jié)果可視化通常有助于理解模型泛化能力對所搜索參數(shù)的依賴關(guān)系。由于運行網(wǎng)絡(luò)搜索的計算成本相當(dāng)高，所以通常最好從比較稀疏且較小的網(wǎng)絡(luò)開始搜索。然后我們可以檢查交叉驗證網(wǎng)絡(luò)搜索的結(jié)果，可能也會擴展搜索范圍。

網(wǎng)絡(luò)搜索的結(jié)果可以在 cv_results_ 屬性中找到，它是一個字典，其中保存了搜索的所有內(nèi)容。

可以將其轉(zhuǎn)換為 DataFrame 后再查看：

import pandas as pd # convert to Dataframe results = pd.DataFrame(grid_search.cv_results_) # show the first 5 rows display(results.head()) mean_fit_timestd_fit_timemean_score_timestd_score_timeparam_Cparam_gammaparamssplit0_test_scoresplit1_test_scoresplit2_test_score...mean_test_scorestd_test_scorerank_test_scoresplit0_train_scoresplit1_train_scoresplit2_train_scoresplit3_train_scoresplit4_train_scoremean_train_scorestd_train_score01234

0.001317	0.000458	0.001943	0.001243	0.001	0.001	{'C': 0.001, 'gamma': 0.001}	0.375	0.347826	0.363636	...	0.366071	0.011371	22	0.363636	0.370787	0.366667	0.366667	0.362637	0.366079	0.002852
0.001284	0.000543	0.001329	0.001086	0.001	0.01	{'C': 0.001, 'gamma': 0.01}	0.375	0.347826	0.363636	...	0.366071	0.011371	22	0.363636	0.370787	0.366667	0.366667	0.362637	0.366079	0.002852
0.000582	0.000024	0.000272	0.000020	0.001	0.1	{'C': 0.001, 'gamma': 0.1}	0.375	0.347826	0.363636	...	0.366071	0.011371	22	0.363636	0.370787	0.366667	0.366667	0.362637	0.366079	0.002852
0.000606	0.000021	0.000279	0.000012	0.001	1	{'C': 0.001, 'gamma': 1}	0.375	0.347826	0.363636	...	0.366071	0.011371	22	0.363636	0.370787	0.366667	0.366667	0.362637	0.366079	0.002852
0.000661	0.000032	0.000294	0.000033	0.001	10	{'C': 0.001, 'gamma': 10}	0.375	0.347826	0.363636	...	0.366071	0.011371	22	0.363636	0.370787	0.366667	0.366667	0.362637	0.366079	0.002852

5 rows × 22 columns

results 中的每一行對應(yīng)一種特定的參數(shù)設(shè)置（results[‘params’]）。對于每種參數(shù)設(shè)置，交叉驗證所有劃分的結(jié)果都被記錄下來，所有劃分的平均值和標(biāo)準(zhǔn)差也被記錄下來。由于我們搜索的是一個二維參數(shù)網(wǎng)絡(luò)（C 和 gamma），所以最適合用熱力可視化。我們首先提取平均驗證分?jǐn)?shù)，然后改變分?jǐn)?shù)數(shù)組的形狀，使其坐標(biāo)軸分別對應(yīng)于 C 和 gamma：

import mglearnscores = np.array(results.mean_test_score).reshape(6, 6)# plot the mean cross-validation scores mglearn.tools.heatmap(scores, xlabel='gamma', xticklabels=param_grid['gamma'],ylabel='C', yticklabels=param_grid['C'], cmap="viridis") <matplotlib.collections.PolyCollection at 0x1c1cc8aeb8>

熱圖中每個點對于運行一次交叉驗證以及一種特定的參數(shù)設(shè)置。顏色表示交叉驗證的精度：淺色表示高精度，深色表示低精度。

可以看到，SVC 對參數(shù)設(shè)置非常敏感。對于許多參數(shù)這只，精度都在 40% 左右，這是非常糟糕的；對于其他參數(shù)設(shè)置，精度約為 96%。

我們可以從這張圖中看出一下兩點：

首先，我們調(diào)整的參數(shù)對于獲得良好的性能非常重要。這兩個參數(shù)（C 和 gamma）都很重要，約為調(diào)節(jié)它們可以將精度從 40% 提高到 96%。

此外，在我們選擇的參數(shù)范圍中也可以看到輸出發(fā)生了明顯的變化。同樣重要的是要注意，參數(shù)的范圍要足夠大：每個參數(shù)的最佳取值不能位于圖像的邊界上。

1.4.1.1 錯誤的參數(shù)設(shè)置和可視化

下面我們來看幾張圖，其結(jié)果不那么理想，因為選擇的搜索范圍不合適：

import matplotlib.pyplot as pltfig, axes = plt.subplots(1, 3, figsize=(13, 5))param_grid_linear = {'C': np.linspace(1, 2, 6),'gamma': np.linspace(1, 2, 6)}param_grid_one_log = {'C': np.linspace(1, 2, 6),'gamma': np.logspace(-3, 2, 6)}param_grid_range = {'C': np.logspace(-3, 2, 6),'gamma': np.logspace(-7, -2, 6)}for param_grid, ax in zip([param_grid_linear, param_grid_one_log,param_grid_range], axes):grid_search = GridSearchCV(SVC(), param_grid, cv=5)grid_search.fit(X_train, y_train)scores = grid_search.cv_results_['mean_test_score'].reshape(6, 6)# plot the mean cross-validation scoresscores_image = mglearn.tools.heatmap(scores, xlabel='gamma', ylabel='C', xticklabels=param_grid['gamma'],yticklabels=param_grid['C'], cmap="viridis", ax=ax)plt.colorbar(scores_image, ax=axes.tolist()) <matplotlib.colorbar.Colorbar at 0x1c1d0a64a8>

第一張圖沒有任何變化，整個參數(shù)網(wǎng)絡(luò)的顏色相同。這種情況，是由參數(shù) C 和 gamma 不正確的縮放以及不正確的范圍造成的。但如果對于不同的參數(shù)設(shè)置都看不到精度的變化，也可能是因為這個參數(shù)根本不重要。通常最好在開始時嘗試非常極端的值，以觀察參數(shù)是否會導(dǎo)致精度發(fā)生變化。

第二張圖顯示的是垂直條形模式。這表示只有 gamma 的設(shè)置對精度有影響。這可能意味著 gamma 參數(shù)的搜索范圍是我們所關(guān)心的，而 C 參數(shù)并不是——也可能意味著 C 參數(shù)并不重要。

第三章圖中 C 和 gamma 對于的精度都有變化。但可以看到，在圖像的整個左下角都沒有發(fā)生什么有趣的事情。我們在后面的網(wǎng)絡(luò)搜索中可以不考慮非常小的值。最佳參數(shù)設(shè)置出現(xiàn)在右上角。由于最佳參數(shù)位于圖像的邊界，所以我們可以認(rèn)為，在這個邊界之外可能還有更好的取值，我們可能希望改變搜索范圍以包含這一區(qū)域內(nèi)的更多參數(shù)。

基于交叉驗證分?jǐn)?shù)來調(diào)節(jié)參數(shù)網(wǎng)絡(luò)是非常好的，也是探索不同參數(shù)等莪重要性的好方法。但是，不應(yīng)該在最終測試集上測試不同的參數(shù)范圍——前面說過，只有確切知道了想要使用的模型，才能對測試集進(jìn)行評估。

1.4.2 在非網(wǎng)絡(luò)空間的搜索

在某些情況下，嘗試所有參數(shù)的所有可能組合（正如 GridSearchCV 所做的那樣）并不是一個好主意。

例如，SVC 有一個 kernel 參數(shù)，根據(jù)所選擇的 kernel（內(nèi)核），其他桉樹也是與之相關(guān)的。如果 kernal=‘linear’，那么模型是線性的，只會用到 C 參數(shù)。如果 kernal=‘rbf’，則需要使用 C 和 gamma 兩個參數(shù)（但用不到類似 degree 的其他參數(shù)）。在這種情況下，搜索 C、gamma 和 kernel 所有可能的組合沒有意義：如果 kernal=‘linear’，那么 gamma 是用不到的，嘗試 gamma 的不同取值將會浪費時間。為了處理這種“條件”（conditional）參數(shù)，GridSearchCV 的 param_grid 可以是字典組成的列表（a list of dictionaries）。列表中的每個字典可以擴展為一個獨立的網(wǎng)絡(luò)。包含內(nèi)核與參數(shù)的網(wǎng)絡(luò)搜索可能如下所示：

param_grid = [{'kernel': ['rbf'],'C': [0.001, 0.01, 0.1, 1, 10, 100],'gamma': [0.001, 0.01, 0.1, 1, 10, 100]},{'kernel': ['linear'],'C': [0.001, 0.01, 0.1, 1, 10, 100]}] print("List of grids:\n{}".format(param_grid)) List of grids: [{'kernel': ['rbf'], 'C': [0.001, 0.01, 0.1, 1, 10, 100], 'gamma': [0.001, 0.01, 0.1, 1, 10, 100]}, {'kernel': ['linear'], 'C': [0.001, 0.01, 0.1, 1, 10, 100]}]

在第一個網(wǎng)絡(luò)里，kernel 參數(shù)始終等于’rbf’（注意 kernel 是一個長度為1 的列表），而 C 和 gamma 都是變化的。在第二個網(wǎng)絡(luò)里，kernel 參數(shù)始終等于’linear’，只有 C 是變化的。下面為來應(yīng)用這個更加復(fù)雜的參數(shù)搜索：

grid_search = GridSearchCV(SVC(), param_grid, cv=5, return_train_score=True) grid_search.fit(X_train, y_train) print("Best parameters: {}".format(grid_search.best_params_)) print("Best cross-validation score: {:.2f}".format(grid_search.best_score_)) Best parameters: {'C': 100, 'gamma': 0.01, 'kernel': 'rbf'} Best cross-validation score: 0.97//anaconda3/lib/python3.7/site-packages/sklearn/model_selection/_search.py:813: DeprecationWarning: The default of the `iid` parameter will change from True to False in version 0.22 and will be removed in 0.24. This will change numeric results when test-set sizes are unequal.DeprecationWarning)

我們再次查看 cv_results_。正如所料，如果 kernel 等于’linear’，那么只有 C 是變化的：

results = pd.DataFrame(grid_search.cv_results_) display(results.T) 0123456789...32333435363738394041mean_fit_timestd_fit_timemean_score_timestd_score_timeparam_Cparam_gammaparam_kernelparamssplit0_test_scoresplit1_test_scoresplit2_test_scoresplit3_test_scoresplit4_test_scoremean_test_scorestd_test_scorerank_test_scoresplit0_train_scoresplit1_train_scoresplit2_train_scoresplit3_train_scoresplit4_train_scoremean_train_scorestd_train_score

0.00216327	0.000975418	0.000895834	0.000586128	0.00068078	0.000671005	0.000685596	0.000640059	0.000607777	0.000593805	...	0.000373602	0.000664568	0.00153198	0.000837708	0.000766277	0.000468493	0.000435066	0.000450134	0.000438309	0.000494576
0.00140655	0.000492319	0.000402465	7.89536e-06	9.49278e-05	6.87857e-05	0.000189589	2.10245e-05	4.78872e-05	3.52042e-05	...	1.16059e-05	0.000257769	0.000392493	1.99011e-05	0.000322841	6.2482e-06	5.08608e-05	6.50786e-05	5.02317e-05	9.95536e-05
0.00120344	0.00066967	0.000573587	0.000267696	0.000307798	0.000287628	0.000324154	0.000369263	0.00028758	0.000271845	...	0.000237846	0.000820303	0.000544834	0.000293589	0.000356293	0.000244951	0.000246334	0.00025301	0.000280857	0.000261641
0.000699682	0.000710382	0.000403534	6.3643e-06	6.30153e-05	2.14702e-05	6.90788e-05	0.000185553	3.8449e-05	1.27472e-05	...	1.7688e-06	0.00111757	0.000110992	4.59797e-05	0.000161759	1.7688e-06	8.52646e-06	3.69834e-05	8.52978e-05	2.57724e-05
0.001	0.001	0.001	0.001	0.001	0.001	0.01	0.01	0.01	0.01	...	100	100	100	100	0.001	0.01	0.1	1	10	100
0.001	0.01	0.1	1	10	100	0.001	0.01	0.1	1	...	0.1	1	10	100	NaN	NaN	NaN	NaN	NaN	NaN
rbf	rbf	rbf	rbf	rbf	rbf	rbf	rbf	rbf	rbf	...	rbf	rbf	rbf	rbf	linear	linear	linear	linear	linear	linear
{'C': 0.001, 'gamma': 0.001, 'kernel': 'rbf'}	{'C': 0.001, 'gamma': 0.01, 'kernel': 'rbf'}	{'C': 0.001, 'gamma': 0.1, 'kernel': 'rbf'}	{'C': 0.001, 'gamma': 1, 'kernel': 'rbf'}	{'C': 0.001, 'gamma': 10, 'kernel': 'rbf'}	{'C': 0.001, 'gamma': 100, 'kernel': 'rbf'}	{'C': 0.01, 'gamma': 0.001, 'kernel': 'rbf'}	{'C': 0.01, 'gamma': 0.01, 'kernel': 'rbf'}	{'C': 0.01, 'gamma': 0.1, 'kernel': 'rbf'}	{'C': 0.01, 'gamma': 1, 'kernel': 'rbf'}	...	{'C': 100, 'gamma': 0.1, 'kernel': 'rbf'}	{'C': 100, 'gamma': 1, 'kernel': 'rbf'}	{'C': 100, 'gamma': 10, 'kernel': 'rbf'}	{'C': 100, 'gamma': 100, 'kernel': 'rbf'}	{'C': 0.001, 'kernel': 'linear'}	{'C': 0.01, 'kernel': 'linear'}	{'C': 0.1, 'kernel': 'linear'}	{'C': 1, 'kernel': 'linear'}	{'C': 10, 'kernel': 'linear'}	{'C': 100, 'kernel': 'linear'}
0.375	0.375	0.375	0.375	0.375	0.375	0.375	0.375	0.375	0.375	...	0.958333	0.916667	0.875	0.541667	0.375	0.916667	0.958333	1	0.958333	0.958333
0.347826	0.347826	0.347826	0.347826	0.347826	0.347826	0.347826	0.347826	0.347826	0.347826	...	1	1	0.956522	0.521739	0.347826	0.826087	0.913043	0.956522	1	1
0.363636	0.363636	0.363636	0.363636	0.363636	0.363636	0.363636	0.363636	0.363636	0.363636	...	1	1	1	0.590909	0.363636	0.818182	1	1	1	1
0.363636	0.363636	0.363636	0.363636	0.363636	0.363636	0.363636	0.363636	0.363636	0.363636	...	0.863636	0.863636	0.818182	0.590909	0.363636	0.772727	0.909091	0.954545	0.909091	0.909091
0.380952	0.380952	0.380952	0.380952	0.380952	0.380952	0.380952	0.380952	0.380952	0.380952	...	0.952381	0.952381	0.952381	0.619048	0.380952	0.904762	0.952381	0.952381	0.952381	0.952381
0.366071	0.366071	0.366071	0.366071	0.366071	0.366071	0.366071	0.366071	0.366071	0.366071	...	0.955357	0.946429	0.919643	0.571429	0.366071	0.848214	0.946429	0.973214	0.964286	0.964286
0.0113708	0.0113708	0.0113708	0.0113708	0.0113708	0.0113708	0.0113708	0.0113708	0.0113708	0.0113708	...	0.0495662	0.0519227	0.0647906	0.0356525	0.0113708	0.0547783	0.0332185	0.0223995	0.0338387	0.0338387
27	27	27	27	27	27	27	27	27	27	...	9	11	17	24	27	21	11	1	3	3
0.363636	0.363636	0.363636	0.363636	0.363636	0.363636	0.363636	0.363636	0.363636	0.363636	...	0.988636	1	1	1	0.363636	0.886364	0.965909	0.988636	0.988636	0.988636
0.370787	0.370787	0.370787	0.370787	0.370787	0.370787	0.370787	0.370787	0.370787	0.370787	...	0.977528	1	1	1	0.370787	0.88764	0.977528	0.977528	0.988764	0.988764
0.366667	0.366667	0.366667	0.366667	0.366667	0.366667	0.366667	0.366667	0.366667	0.366667	...	0.977778	1	1	1	0.366667	0.866667	0.944444	0.977778	0.977778	0.988889
0.366667	0.366667	0.366667	0.366667	0.366667	0.366667	0.366667	0.366667	0.366667	0.366667	...	1	1	1	1	0.366667	0.755556	0.977778	0.988889	0.988889	1
0.362637	0.362637	0.362637	0.362637	0.362637	0.362637	0.362637	0.362637	0.362637	0.362637	...	1	1	1	1	0.362637	0.879121	0.967033	0.989011	1	1
0.366079	0.366079	0.366079	0.366079	0.366079	0.366079	0.366079	0.366079	0.366079	0.366079	...	0.988788	1	1	1	0.366079	0.855069	0.966538	0.984368	0.988813	0.993258
0.00285176	0.00285176	0.00285176	0.00285176	0.00285176	0.00285176	0.00285176	0.00285176	0.00285176	0.00285176	...	0.00999451	0	0	0	0.00285176	0.0503114	0.0121316	0.00548507	0.00702801	0.00550551

23 rows × 42 columns

參考資料

[1] April15 .調(diào)參必備—GridSearch網(wǎng)格搜索[EB/OL].https://www.cnblogs.com/ysugyl/p/8711205.html, 2018-04-03.

[2] Andreas C.Muller, Sarah Guido, 張亮. Python 機器學(xué)習(xí)基礎(chǔ)教程[M]. 北京: 人民郵電出版社, 2018: 200-212.

總結(jié)

以上是生活随笔為你收集整理的机器学习 | 网络搜索及可视化的全部內(nèi)容，希望文章能夠幫你解決所遇到的問題。

如果覺得生活随笔網(wǎng)站內(nèi)容還不錯，歡迎將生活随笔推薦給好友。

上一篇：【转】Ubuntu 安装截图工具Shut
下一篇： 11.26报错