普元部署包部署找不到构建_让我们在5分钟内构建和部署AutoML解决方案
普元部署包部署找不到構建
Practical machine learning used to be hard — and still is in some specialized areas. Average machine learning tasks are getting easier by the day, and also automated to a degree. Today we’ll explore just how easy it is to create and deploy a fully automated machine learning platform in 50 lines of code.
實用的機器學習曾經很難-仍然在某些專業(yè)領域。 日常的平均機器學習任務變得越來越容易,并且在一定程度上實現(xiàn)了自動化。 今天,我們將探討用50行代碼創(chuàng)建和部署全自動機器學習平臺有多么容易。
Before we get started, let me make a bold disclaimer first. The solution you’re about to see works only for classification tasks, although we’d need only a couple of minor changes for regression tasks. There are some things that could be improved, such as logging, and you’re free to further work on the code on your own.
在開始之前,讓我先大膽地聲明。 您將要看到的解決方案僅適用于分類任務,盡管對于回歸任務我們只需要進行一些小改動。 有一些可以改進的地方,例如日志記錄,您可以自由地自行處理代碼。
Also, we’ll deploy the solution as a REST API. Why? Because we want other tech professionals (non-data-scientists) to be able to use our AutoML tool without breaking a sweat.
另外,我們將解決方案部署為REST API。 為什么? 因為我們希望其他技術專業(yè)人員(非數(shù)據(jù)科學家)能夠使用我們的AutoML工具而不會費勁 。
Down below is a list of requirements for using this solution:
下面是使用此解決方案的要求的列表:
Prepared dataset — dataset must be in a machine-learning-ready format, so do the proper data preparation first. Our data is stored as CSV.
準備的數(shù)據(jù)集 -數(shù)據(jù)集必須采用機器學習就緒的格式,因此請首先進行適當?shù)臄?shù)據(jù)準備。 我們的數(shù)據(jù)存儲為CSV。
Knowing how to make a POST request — either from tools like Postman or from any programming language (we’ll cover that).
知道如何發(fā)出POST請求 -通過諸如Postman之類的工具或任何編程語言(我們將對此進行介紹)。
Okay, so without much ado, let’s get started!
好吧,事不宜遲,讓我們開始吧!
數(shù)據(jù)集的收集和準備 (Dataset gathering and preparation)
We’ll use the Iris dataset for this simple example. The dataset consists of various flower measurements and the target variable which indicates the flower species.
在此簡單示例中,我們將使用Iris數(shù)據(jù)集 。 數(shù)據(jù)集包括各種花卉測量值和指示花卉種類的目標變量。
It’s a perfect dataset for demonstration purposes because we don’t want to spend much time cleaning the data. If you’re following along, download the dataset from the provided link and store it somewhere. I have it in a separate folder where the Python scripts will eventually be stored.
這是用于演示的理想數(shù)據(jù)集,因為我們不想花很多時間清理數(shù)據(jù)。 如果要繼續(xù),請從提供的鏈接下載數(shù)據(jù)集并將其存儲在某處。 我將其放置在一個單獨的文件夾中,最終將在其中存儲Python腳本。
Now we have the dataset ready and there are no preparation requirements. Let’s get to the fun part now.
現(xiàn)在我們已經準備好數(shù)據(jù)集,并且沒有準備要求。 現(xiàn)在讓我們開始有趣的部分。
編寫AutoML腳本 (Coding the AutoML script)
This is where the fun begins. Inside the directory where your dataset is located (hopefully a new, empty directory) create a Python file and name it as you wish. I’ve named mine automl.py.
這就是樂趣的開始。 在數(shù)據(jù)集所在的目錄內(希望是一個新的空目錄),創(chuàng)建一個Python文件并根據(jù)需要命名。 我已將其命名為automl.py 。
Before we get started with the coding, I would just like to mention that the logic behind this solution is based on the PyCaret library, and this post particularly. PyCaret is an amazing library for machine learning and you should definitely learn more about it. You can start here:
在我們開始之前與編碼,我只想一提的是該解決方案背后的邏輯是基于PyCaret庫,以及這個職位特別。 PyCaret是一個很棒的機器學習庫,您絕對應該了解更多。 您可以從這里開始:
PyCaret: Better Machine Learning with Python
PyCaret:使用Python更好的機器學習
Regression with PyCaret: A better machine learning library
PyCaret回歸:更好的機器學習庫
Classification with PyCaret: A better machine learning library
使用PyCaret進行分類:更好的機器學習庫
PyCaret 2.0 is here — What’s New?
PyCaret 2.0在這里-新增功能?
Inside automl.py file we’ll import Pandas library and everything from the PyCaret classification module:
在automl.py文件中,我們將導入Pandas庫以及PyCaret分類模塊中的所有內容:
import pandas as pdfrom pycaret.classification import *
Next, we’ll declare an AutoML class with a couple of fields: a path to the CSV data file, target column name, and the name of the metric we want to optimize for (such as accuracy, recall…). We’ll also declare a custom field for storing information about the best model:
接下來,我們將聲明一個AutoML類,其中包含幾個字段:CSV數(shù)據(jù)文件的路徑,目標列名稱以及我們要優(yōu)化的指標名稱(例如準確性,召回率……)。 我們還將聲明一個自定義字段,用于存儲有關最佳模型的信息:
class AutoML:def __init__(self, path_to_data: str, target_column: str, metric):
self.df = pd.read_csv(path_to_data)
self.target_column = target_column
self.metric = metric
self.best_model = None
Great! We can now declare a function below the __init__ that handles all the machine learning logic. I’ve called mine fit(), but feel free to change it if you wish. This function has the following tasks:
大! 現(xiàn)在,我們可以在__init__下面聲明一個函數(shù),該函數(shù)可以處理所有機器學習邏輯。 我已經將其稱為mine fit() ,但是如果您愿意,可以隨時進行更改。 該功能具有以下任務:
- Perform the initial setup 執(zhí)行初始設置
- Find the best 5 algorithms 找到最好的5種算法
- Tune the hyperparameters of these 5 algorithms 調整這5種算法的超參數(shù)
- Perform bagging 執(zhí)行裝袋
- Perform blending 執(zhí)行混合
- Perform stacking 執(zhí)行堆疊
- Find the best overall model for the specified metric 查找指定指標的最佳整體模型
- Save the model to a file 將模型保存到文件
Sounds like a lot of logic to write, but it’s only 10 lines of code:
聽起來很多邏輯都要編寫,但是只有10行代碼:
def fit(self):clf = setup(data=self.df, target=self.target_column, session_id=42, html=False, silent=True, verbose=False) top5_models = compare_models(n_select=5) tuned_top5_models = [tune_model(model) for model in top5_models]
bagged_tuned_top5_models = [ensemble_model(model, method=’Bagging’) for model in tuned_top5_models] blended = blend_models(estimator_list=top5_models) stacked = stack_models(estimator_list=top5_models[1:], meta_model=top5_models[0]) best_model = automl(optimize=self.metric)
self.best_model = best_model save_model(best_model, ‘best_model’)
And that’s it for this file. Let’s proceed with API development.
就是這個文件。 讓我們繼續(xù)進行API開發(fā)。
編寫REST API (Coding the REST API)
We’ve got the machine learning portion covered, and now it’s time to make this logic accessible for other software developers. Python makes this step stupidly easy, as we can easily build a simple REST API with libraries like Flask.
我們已經涵蓋了機器學習部分,現(xiàn)在是時候讓其他軟件開發(fā)人員可以訪問此邏輯了。 Python使這一步驟變得異常簡單,因為我們可以使用Flask之類的庫輕松構建一個簡單的REST API。
Before starting, create a new Python file called app.py. Let’s now import Flask alongisde with our automl.py and perform the basic setup:
在開始之前,請創(chuàng)建一個名為app.py的新Python文件。 現(xiàn)在,讓我們與automl.py一起導入Flask和automl.py并執(zhí)行基本設置:
from flask import Flask, jsonify, requestfrom flask_restful import Api, Resource
from automl import AutoMLapp = Flask(__name__)
app.config[‘JSON_SORT_KEYS’] = False
api = Api(app)
Great! We’re now ready to handle requests sent from users. To do so, we’ll declare a class which inherits from flask_restful.Resource. Inside we can have various methods, named as the type of API call. We’ll name ours post(), as we’ll be making a POST request.
大! 現(xiàn)在,我們準備處理用戶發(fā)送的請求。 為此,我們將聲明一個從flask_restful.Resource繼承的類。 在內部,我們可以有各種方法,稱為API調用的類型。 我們將命名為post() ,因為我們將進行POST請求。
Inside this method, we’ll need to capture the JSON data provided by the user when making the call. The metric parameter is optional and is set to Accuracy if not specified. Remember, we’ll pass the values of these parameters to an instance of AutoML class.
在此方法內部,我們需要在進行調用時捕獲用戶提供的JSON數(shù)據(jù)。 metric參數(shù)是可選的,如果未指定,則設置為Accuracy 。 記住,我們將這些參數(shù)的值傳遞給AutoML類的實例。
Now we’re able to call the AutoML.fit() method and return the results back to the user. Here’s the code for this class:
現(xiàn)在,我們可以調用AutoML.fit()方法并將結果返回給用戶。 這是此類的代碼:
class Optimize(Resource):@staticmethod
def post():
posted_data = request.get_json()
path_to_data = posted_data[‘path_to_data’]
target_column = posted_data[‘target_column’]
try: metric = posted_data[‘metric’]
except KeyError: metric = ‘Accuracy’ auto_ml = AutoML(path_to_data=path_to_data, target_column=target_column, metric=metric)
try:
auto_ml.fit()
return jsonify({ ‘Message’: ‘Success’, ‘BestModel’: str(auto_ml.best_model) })
except Exception as e:
return jsonify({ ‘Message’: str(e) })
And finally, we need to connect our Optimize class to some actual endpoint and make app.py executable. Here’s how:
最后,我們需要將Optimize類連接到某個實際端點,并使app.py可執(zhí)行。 這是如何做:
api.add_resource(Optimize, ‘/optimize’)if __name__ == ‘__main__’:app.run(host=’0.0.0.0', port=9000)
And that’s it — that’s the entire code of the fully automated machine learning pipeline for classification tasks!
就這樣-這就是用于分類任務的全自動機器學習管道的全部代碼!
Here’s the code recap, just in case you got stuck somewhere:
這是代碼摘要,以防萬一您卡在某個地方:
Exactly 50 lines of code恰好50行代碼That all we need to do. Let’s test the thing now.
這就是我們需要做的。 讓我們現(xiàn)在測試一下。
測試中 (Testing)
We can use applications like Postman to test if our API is working properly. But before, we need to run the app.py file. To do so, open up the Terminal/CMD window and go to the location of app.py. Execute the following:
我們可以使用Postman之類的應用程序來測試我們的API是否正常工作。 但是在此之前,我們需要運行app.py文件。 為此,請打開“終端/ CMD”窗口,然后轉到app.py的位置。 執(zhí)行以下命令:
python app.pyYou should see something like this pop up:
您應該會看到類似這樣的彈出窗口:
We can open up the Postman now, change the call type to POST, enter the URL, and JSON parameters with their respective values:
我們現(xiàn)在可以打開郵遞員 ,將呼叫類型更改為POST,輸入URL和JSON參數(shù)及其各自的值:
The process took around 4 minutes on my machine even for this simple dataset, but that’s the price of training and optimizing multiple machine learning models. And yeah, once executed you’ll get the best model saved to your PC, so you’re able to make new predictions right away.
即使對于這個簡單的數(shù)據(jù)集,該過程在我的機器上也花費了大約4分鐘,但這就是訓練和優(yōu)化多個機器學習模型的代價。 是的,一旦執(zhí)行,您就會將最佳模型保存到您的PC,因此您可以立即做出新的預測。
Here’s how.
這是如何做。
And that’s pretty much it. Let’s wrap things up in the next section.
就是這樣。 讓我們在下一節(jié)中總結一下。
你走之前 (Before you go)
Don’t consider yourself to be a machine learning expert if you’ve managed to follow along. This article wasn't meant for machine learning experts, but for regular software developers wanting to implement machine learning in their projects.
如果您能夠成功跟隨,那就不要認為自己是機器學習專家。 本文并非針對機器學習專家,而是針對希望在其項目中實施機器學習的常規(guī)軟件開發(fā)人員。
I hope you see just how easy it is to fully automate machine learning tasks. If you need something more specific, this maybe won’t be enough for you. For the majority of simpler tasks, this code will suit you fine. Data preparation is still the king — so that’s where you’ll be spending most of the time. Machine learning is easy, at least for most of the tasks.
我希望您看到完全自動化機器學習任務有多么容易。 如果您需要更具體的內容,可能對您來說還不夠。 對于大多數(shù)較簡單的任務,此代碼適合您。 數(shù)據(jù)準備仍然是最主要的-因此,這是您大部分時間要花費的時間。 機器學習很容易,至少對于大多數(shù)任務而言都是如此。
Join my private email list for more helpful insights.
加入我的私人電子郵件列表以獲取更多有用的見解。
翻譯自: https://towardsdatascience.com/lets-build-deploy-automl-solution-in-5-minutes-4e5683635caf
普元部署包部署找不到構建
總結
以上是生活随笔為你收集整理的普元部署包部署找不到构建_让我们在5分钟内构建和部署AutoML解决方案的全部內容,希望文章能夠幫你解決所遇到的問題。
- 上一篇: 数学建模算法:支持向量机_从零开始的算法
- 下一篇: 基于决策树的多分类_R中基于决策树的糖尿