當前位置：首頁 > 编程资源 > 编程问答 >内容正文

编程问答

如何使用hyperopt对xgboost进行自动调参

發(fā)布時間：2025/3/21 编程问答 30 豆豆

生活随笔收集整理的這篇文章主要介紹了如何使用hyperopt对xgboost进行自动调参小編覺得挺不錯的,現(xiàn)在分享給大家,幫大家做個參考.

本教程重點在于傳授如何使用Hyperopt對xgboost進行自動調參。但是這份代碼也是我一直使用的代碼模板之一，所以在其他數(shù)據集上套用該模板也是十分容易的。

同時因為xgboost，lightgbm，catboost。三個類庫調用方法都比較一致，所以在本部分結束之后，我們有理由相信，你將會學會在這三個類庫上使用hyperopt。除此之外要額外說明的是，本文并不涉及交叉驗證的問題，交叉驗證請查看其他教程。

什么是Hyperopt？

Hyperopt：是python中的一個用于"分布式異步算法組態(tài)/超參數(shù)優(yōu)化"的類庫。使用它我們可以拜托繁雜的超參數(shù)優(yōu)化過程，自動獲取最佳的超參數(shù)。廣泛意義上，可以將帶有超參數(shù)的模型看作是一個必然的非凸函數(shù)，因此hyperopt幾乎可以穩(wěn)定的獲取比手工更加合理的調參結果。尤其對于調參比較復雜的模型而言，其更是能以遠快于人工調參的速度同樣獲得遠遠超過人工調參的最終性能。

文檔地址？

目前中文文檔的地址由本人FontTian在2017年翻譯，但是hyperopt文檔本身確實寫的不怎么樣。所以才有了這份教程。源代碼請前往Github教程地址下載下載。

中文文檔地址
FontTian的博客
Hyperopt官方文檔地址

教程

獲取數(shù)據

這里我們使用UCI的紅酒質量數(shù)據集，除此之外我還額外增加了兩個特征。

from hyperopt import fmin, tpe, hp, partial import numpy as np from sklearn.model_selection import train_test_split, cross_val_score from sklearn.metrics import mean_squared_error, zero_one_loss import xgboost as xgb import pandas as pddef GetNewDataByPandas():wine = pd.read_csv("../data/wine.csv")wine['alcohol**2'] = pow(wine["alcohol"], 2)wine['volatileAcidity*alcohol'] = wine["alcohol"] * wine['volatile acidity']y = np.array(wine.quality)X = np.array(wine.drop("quality", axis=1))columns = np.array(wine.columns)return X, y, columns

分割數(shù)據并轉換

首先將數(shù)據分割為三份，一部分用于預測，訓練數(shù)據則同樣分成額外的兩部分用于evallist參數(shù)。

同時為了加快速度和減少內存，我們將數(shù)據轉換為xgboost自帶的讀取格式。

# Read wine quality data from file X, y, wineNames = GetNewDataByPandas()# split data to [[0.8,0.2],01] x_train_all, x_predict, y_train_all, y_predict = train_test_split(X, y, test_size=0.10, random_state=100)x_train, x_test, y_train, y_test = train_test_split(x_train_all, y_train_all, test_size=0.2, random_state=100)dtrain = xgb.DMatrix(data=x_train,label=y_train,missing=-999.0) dtest = xgb.DMatrix(data=x_test,label=y_test,missing=-999.0)evallist = [(dtest, 'eval'), (dtrain, 'train')]

定義參數(shù)空間

使用hyperopt自帶的函數(shù)定義參數(shù)空間，但是因為其randint()方法產生的數(shù)組范圍是從0開始的，所以我額外定義了一個數(shù)據轉換方法，對原始參數(shù)空間進行一次轉換。

關于hyperopt中定義參數(shù)區(qū)間需要使用的函數(shù)請參考：

中文地址，請點擊這里
英文地址，請點擊這里

# 自定義hyperopt的參數(shù)空間 space = {"max_depth": hp.randint("max_depth", 15),"n_estimators": hp.randint("n_estimators", 300),'learning_rate': hp.uniform('learning_rate', 1e-3, 5e-1),"subsample": hp.randint("subsample", 5),"min_child_weight": hp.randint("min_child_weight", 6),}def argsDict_tranform(argsDict, isPrint=False):argsDict["max_depth"] = argsDict["max_depth"] + 5argsDict['n_estimators'] = argsDict['n_estimators'] + 150argsDict["learning_rate"] = argsDict["learning_rate"] * 0.02 + 0.05argsDict["subsample"] = argsDict["subsample"] * 0.1 + 0.5argsDict["min_child_weight"] = argsDict["min_child_weight"] + 1if isPrint:print(argsDict)else:passreturn argsDict

創(chuàng)建模型工廠與分數(shù)獲取器

xgboost模型工廠用于生產我們需要的model，而分數(shù)獲取器則是為了解耦。這樣在實際的測試工作中更加套用代碼和修改。

def xgboost_factory(argsDict):argsDict = argsDict_tranform(argsDict)params = {'nthread': -1, # 進程數(shù)'max_depth': argsDict['max_depth'], # 最大深度'n_estimators': argsDict['n_estimators'], # 樹的數(shù)量'eta': argsDict['learning_rate'], # 學習率'subsample': argsDict['subsample'], # 采樣數(shù)'min_child_weight': argsDict['min_child_weight'], # 終點節(jié)點最小樣本占比的和'objective': 'reg:linear','silent': 0, # 是否顯示'gamma': 0, # 是否后剪枝'colsample_bytree': 0.7, # 樣本列采樣'alpha': 0, # L1 正則化'lambda': 0, # L2 正則化'scale_pos_weight': 0, # 取值>0時,在數(shù)據不平衡時有助于收斂'seed': 100, # 隨機種子'missing': -999, # 填充缺失值}params['eval_metric'] = ['rmse']xrf = xgb.train(params, dtrain, params['n_estimators'], evallist,early_stopping_rounds=100)return get_tranformer_score(xrf)def get_tranformer_score(tranformer):xrf = tranformerdpredict = xgb.DMatrix(x_predict)prediction = xrf.predict(dpredict, ntree_limit=xrf.best_ntree_limit)return mean_squared_error(y_predict, prediction)

調用Hyperopt開始調參

之后我們調用hyperopt進行自動調參即可，同時通過返回值獲取最佳模型的結果。

# 開始使用hyperopt進行自動調參 algo = partial(tpe.suggest, n_startup_jobs=1) best = fmin(xgboost_factory, space, algo=algo, max_evals=20, pass_expr_memo_ctrl=None) [15:23:32] /workspace/src/tree/updater_prune.cc:74: tree pruning end, 1 roots, 142 extra nodes, 0 pruned nodes, max_depth=10 [0] eval-rmse:5.03273 train-rmse:4.90203 Multiple eval metrics have been passed: 'train-rmse' will be used for early stopping.Will train until train-rmse hasn't improved in 100 rounds. [15:23:32] /workspace/src/tree/updater_prune.cc:74: tree pruning end, 1 roots, 172 extra nodes, 0 pruned nodes, max_depth=10 [1] eval-rmse:4.77384 train-rmse:4.64767...[15:24:04] /workspace/src/tree/updater_prune.cc:74: tree pruning end, 1 roots, 192 extra nodes, 0 pruned nodes, max_depth=15 [299] eval-rmse:0.570382 train-rmse:0.000749

展示結果

展示我們獲取的最佳參數(shù)，以及該模型在訓練集上的最終表現(xiàn)，如果想要使用交叉驗證請參考其他教程。

RMSE = xgboost_factory(best) print('best :', best) print('best param after transform :') argsDict_tranform(best,isPrint=True) print('rmse of the best xgboost:', np.sqrt(RMSE)) [15:24:52] /workspace/src/tree/updater_prune.cc:74: tree pruning end, 1 roots, 428 extra nodes, 0 pruned nodes, max_depth=14 [0] eval-rmse:5.02286 train-rmse:4.89385 Multiple eval metrics have been passed: 'train-rmse' will be used for early stopping.Will train until train-rmse hasn't improved in 100 rounds. [15:24:52] /workspace/src/tree/updater_prune.cc:74: tree pruning end, 1 roots, 680 extra nodes, 0 pruned nodes, max_depth=14 [1] eval-rmse:4.75938 train-rmse:4.63251...[298] eval-rmse:0.583923 train-rmse:0.000705 [15:24:54] /workspace/src/tree/updater_prune.cc:74: tree pruning end, 1 roots, 22 extra nodes, 0 pruned nodes, max_depth=7 [299] eval-rmse:0.583926 train-rmse:0.000704 best : {'learning_rate': 0.05385158551863543, 'max_depth': 14, 'min_child_weight': 2, 'n_estimators': 173, 'subsample': 0.8} best param after transform : {'learning_rate': 0.051077031710372714, 'max_depth': 19, 'min_child_weight': 3, 'n_estimators': 323, 'subsample': 0.5800000000000001} rmse of the best xgboost: 0.5240080946197716

總結

以上是生活随笔為你收集整理的如何使用hyperopt对xgboost进行自动调参的全部內容，希望文章能夠幫你解決所遇到的問題。

如果覺得生活随笔網站內容還不錯，歡迎將生活随笔推薦給好友。

上一篇： Anconda下的R语言
下一篇：在Hyperopt框架下使用XGboos