日韩性视频-久久久蜜桃-www中文字幕-在线中文字幕av-亚洲欧美一区二区三区四区-撸久久-香蕉视频一区-久久无码精品丰满人妻-国产高潮av-激情福利社-日韩av网址大全-国产精品久久999-日本五十路在线-性欧美在线-久久99精品波多结衣一区-男女午夜免费视频-黑人极品ⅴideos精品欧美棵-人人妻人人澡人人爽精品欧美一区-日韩一区在线看-欧美a级在线免费观看

歡迎訪問 生活随笔!

生活随笔

當前位置: 首頁 > 编程资源 > 编程问答 >内容正文

编程问答

5行代码可实现5倍Scikit-Learn参数调整的更快速度

發布時間:2023/12/15 编程问答 26 豆豆
生活随笔 收集整理的這篇文章主要介紹了 5行代码可实现5倍Scikit-Learn参数调整的更快速度 小編覺得挺不錯的,現在分享給大家,幫大家做個參考.

By Michael Chau, Anthony Yu, Richard Liaw

由 邁克爾洲 , 安東尼宇 , 理查德·廖

Everyone knows about Scikit-Learn — it’s a staple for data scientists, offering dozens of easy-to-use machine learning algorithms. It also provides two out-of-the-box techniques to address hyperparameter tuning: Grid Search (GridSearchCV) and Random Search (RandomizedSearchCV).

每個人都知道Scikit-Learn,它是數據科學家的必備品,它提供了數十種易于使用的機器學習算法。 它還提供了兩種開箱即用的技術來解決超參數調整:網格搜索(GridSearchCV)和隨機搜索(RandomizedSearchCV)。

Though effective, both techniques are brute-force approaches to finding the right hyperparameter configurations, which is an expensive and time-consuming process!

盡管有效,但這兩種技術都是尋找正確的超參數配置的蠻力方法,這是一個昂貴且耗時的過程!

Image by author圖片作者

如果您想加快此過程怎么辦? (What if you wanted to speed up this process?)

In this blog post, we introduce tune-sklearn, which makes it easier to leverage these new algorithms while staying in the Scikit-Learn API. Tune-sklearn is a drop-in replacement for Scikit-Learn’s model selection module with cutting edge hyperparameter tuning techniques (bayesian optimization, early stopping, distributed execution) — these techniques provide significant speedups over grid search and random search!

在此博客文章中,我們介紹tune-sklearn ,這使得在保留Scikit-Learn API的同時更容易利用這些新算法。 Tune-sklearn使用尖端的超參數調整技術(貝葉斯優化,提前停止,分布式執行) 替代了Scikit-Learn的模型選擇模塊,這些技術大大提高了網格搜索和隨機搜索的速度!

Here’s what tune-sklearn has to offer:

以下是tune-sklearn提供的功能:

  • Consistency with Scikit-Learn API: tune-sklearn is a drop-in replacement for GridSearchCV and RandomizedSearchCV, so you only need to change less than 5 lines in a standard Scikit-Learn script to use the API.

    與Scikit-Learn API的一致性: tune-sklearn是GridSearchCV和RandomizedSearchCV的直接替代,因此您只需在標準Scikit-Learn腳本中更改少于5行即可使用該API。

  • Modern hyperparameter tuning techniques: tune-sklearn allows you to easily leverage Bayesian Optimization, HyperBand, and other optimization techniques by simply toggling a few parameters.

    現代超參數調整技術: tune-sklearn使您可以通過簡單地切換幾個參數來輕松利用貝葉斯優化,HyperBand和其他優化技術。

  • Framework support: tune-sklearn is used primarily for tuning Scikit-Learn models, but it also supports and provides examples for many other frameworks with Scikit-Learn wrappers such as Skorch (Pytorch), KerasClassifiers (Keras), and XGBoostClassifiers (XGBoost).

    框架支持: tune-sklearn主要用于調整Scikit-Learn模型,但它也支持許多Scikit-Learn包裝器,例如Skorch(Pytorch),KerasClassifiers(Keras)和XGBoostClassifiers(XGBoost)。

  • Scale up: Tune-sklearn leverages Ray Tune, a library for distributed hyperparameter tuning, to efficiently and transparently parallelize cross validation on multiple cores and even multiple machines.

    擴大規模: Tune-sklearn利用Ray Tune (一個用于分布式超參數調整的庫)來高效透明地并行化多核甚至多臺機器上的交叉驗證。

A sample of the frameworks supported by tune-sklearn. tune-sklearn支持的框架示例 。

Tune-sklearn is also fast. To see this, we benchmark tune-sklearn (with early stopping enabled) against native Scikit-Learn on a standard hyperparameter sweep. In our benchmarks we can see significant performance differences on both an average laptop and a large workstation of 48 CPU cores.

Tune-sklearn也很快 。 為此,我們在標準超參數掃描中將tune-sklearn (啟用了提前停止)與本機Scikit-Learn進行了基準測試。 在我們的基準測試中,我們可以看到普通筆記本電腦和具有48個CPU內核的大型工作站在性能上都存在顯著差異。

For the larger benchmark 48-core computer, Scikit-Learn took 20 minutes for a 40,000-size dataset searching over 75 hyperparameter sets. Tune-sklearn took a mere 3 and a half minutes — sacrificing minimal accuracy.*

對于更大的基準48核計算機,Scikit-Learn用了20分鐘的時間來搜索40,000個數據集,并搜索了75個超參數集。 Tune-sklearn僅花費了3分半鐘-犧牲了最低的準確性。*

On left: On a personal dual core i5 8GB RAM laptop using a parameter grid of 6 configurations. On right: On a large 48 core 250 GB RAM computer using a parameter grid of 75 configurations.左側:在個人雙核i5 8GB RAM筆記本電腦上,使用6種配置的參數網格。 右圖:在使用75個配置的參數網格的大型48核250 GB RAM計算機上。

* Note: For smaller datasets (10,000 or fewer data points), there may be a sacrifice in accuracy when attempting to fit with early stopping. We don’t anticipate this to make a difference for users as the library is intended to speed up large training tasks with large datasets.

*注意:對于較小的數據集(10,000個或更少的數據點),在嘗試適應早期停止時可能會犧牲準確性。 我們預計這不會對用戶產生任何影響,因為該庫旨在加快使用大型數據集的大型培訓任務的速度。

簡單的60秒演練 (Simple 60 second Walkthrough)

Let’s take a look at how it all works.

讓我們看一下它們的工作原理。

Run pip install tune-sklearn ray[tune] or pip install tune-sklearn "ray[tune]" to get started with our example code below.

運行pip install tune-sklearn ray[tune]或pip install tune-sklearn "ray[tune]"以開始下面的示例代碼。

Hyperparam set 2 is a set of unpromising hyperparameters that would be detected by tune’s early stopping mechanisms, and stopped early to avoid wasting training time and resources.超參數集2是一組沒有希望的超參數,可以通過曲調的早期停止機制檢測到,并且盡早停止以避免浪費訓練時間和資源。

TuneGridSearchCV示例 (TuneGridSearchCV Example)

To start out, it’s as easy as changing our import statement to get Tune’s grid search cross validation interface:

首先,就像更改導入語句以獲取Tune的網格搜索交叉驗證界面一樣簡單:

And from there, we would proceed just like how we would in Scikit-Learn’s interface! Let’s use a “dummy” custom classification dataset and an SGDClassifier to classify the data.

從那里開始,我們將像在Scikit-Learn界面中一樣進行操作! 讓我們使用“虛擬”自定義分類數據集和SGDClassifier對數據進行分類。

We choose the SGDClassifier because it has a partial_fit API, which enables it to stop fitting to the data for a certain hyperparameter configuration. If the estimator does not support early stopping, we would fall back to a parallel grid search.

我們選擇SGDClassifier是因為它具有partial_fit API,這使它可以停止擬合特定超參數配置的數據。 如果估算器不支持提早停止,我們將退回到并行網格搜索。

As you can see, the setup here is exactly how you would do it for Scikit-Learn! Now, let’s try fitting a model.

如您所見,此處的設置正是您為Scikit-Learn所做的設置! 現在,讓我們嘗試擬合模型。

Note the slight differences we introduced above:

請注意我們上面介紹的細微差別:

  • a new early_stopping variable, and

    一個新的early_stopping變量,以及

  • a specification of max_iters parameter

    max_iters參數的規范

  • The early_stopping determines when to stop early — MedianStoppingRule is a great default but see Tune’s documentation on schedulers here for a full list to choose from. max_iters is the maximum number of iterations a given hyperparameter set could run for; it may run for fewer iterations if it is early stopped.

    early_stopping決定了何時提前停止-MedianStoppingRule是一個很好的默認設置,但是請參閱此處有關調度程序的Tune文檔,以獲取完整列表。 max_iters是給定超參數集可以運行的最大迭代次數; 如果它提前停止,它可能會運行較少的迭代。

    Try running this compared to the GridSearchCV equivalent.

    嘗試將其與GridSearchCV等效運行。

    TuneSearchCV貝葉斯優化示例 (TuneSearchCV Bayesian Optimization Example)

    Other than the grid search interface, tune-sklearn also provides an interface, TuneSearchCV, for sampling from distributions of hyperparameters.

    除了網格搜索界面之外, tune-sklearn還提供了一個接口TuneSearchCV,用于從超參數分布中進行采樣。

    In addition, you can easily enable Bayesian optimization over the distributions in TuneSearchCV in only a few lines of code changes.

    此外,您只需更改幾行代碼即可輕松地對TuneSearchCV中的發行版啟用貝葉斯優化。

    Run pip install scikit-optimize to try out this example:

    運行pip install scikit-optimize嘗試以下示例:

    Lines 17, 18, and 26 are the only lines of code changed to enable Bayesian optimization第17、18和26行是更改的僅有幾行代碼,以啟用貝葉斯優化

    As you can see, it’s very simple to integrate tune-sklearn into existing code. Check out more detailed examples and get started with tune-sklearn here and let us know what you think! Also take a look at Ray’s replacement for joblib, which allows users to parallelize training over multiple nodes, not just one node, further speeding up training.

    如您所見,將tune-sklearn集成到現有代碼中非常簡單。 在此處查看更詳細的示例并開始使用tune-sklearn ,讓我們知道您的想法! 還可以看看Ray 替代 joblib的方法,它可以使用戶在多個節點(而不僅僅是一個節點)上并行進行訓練,從而進一步加快了訓練速度。

    文檔和示例 (Documentation and Examples)

    • Documentation*

      文檔 *

    • Example: Skorch with tune-sklearn

      示例: 帶有tune-sklearn的Skorch

    • Example: Scikit-Learn Pipelines with tune-sklearn

      示例: 使用tune-sklearn的Scikit-Learn管道

    • Example: XGBoost with tune-sklearn

      示例: 帶有tune-sklearn的XGBoost

    • Example: KerasClassifier with tune-sklearn

      示例: 帶有tune-sklearn的KerasClassifier

    • Example: LightGBM with tune-sklearn

      示例: LightGBM和tune-sklearn

    *Note: importing from ray.tune as shown in the linked documentation is available only on the nightly Ray wheels and will be available on pip soon

    *注意: 如鏈接文檔中所示, 從 ray.tune 導入 僅在每晚的Ray輪上可用,并且很快將在pip上可用

    翻譯自: https://medium.com/@michaelchau_99485/5x-faster-scikit-learn-parameter-tuning-in-5-lines-of-code-be6bdd21833c

    總結

    以上是生活随笔為你收集整理的5行代码可实现5倍Scikit-Learn参数调整的更快速度的全部內容,希望文章能夠幫你解決所遇到的問題。

    如果覺得生活随笔網站內容還不錯,歡迎將生活随笔推薦給好友。