當前位置：首頁 > 编程资源 > 编程问答 >内容正文

编程问答

TensorFlow Serving + Docker + Tornado机器学习模型生产级快速部署

發布時間：2025/1/21 编程问答 89 豆豆

生活随笔收集整理的這篇文章主要介紹了 TensorFlow Serving + Docker + Tornado机器学习模型生产级快速部署小編覺得挺不錯的,現在分享給大家,幫大家做個參考.

點擊上方“AI搞事情”關注我們

內容轉載自知乎：https://zhuanlan.zhihu.com/p/52096200

Justin ho

〉

本文將會介紹使用TensorFlow Serving + Docker + Tornado來部署機器學習模型到生產環境的方法。在往下看之前，答應我，這么干貨的文章先點贊再收藏好嗎？

2019-12-12更新：由于tensorflow更新至2.0.0之后，1.x的一些api已經不再使用，本教程使用的部分代碼不能在tf 2.0以上的版本運行。如果您想查看最新的教程，可以看我的后一篇文章：

一、簡介

當我們訓練完一個tensorflow（或keras）模型后，需要把它做成一個服務，讓使用者通過某種方式來調用你的模型，而不是直接運行你的代碼（因為你的使用者不一定懂怎樣安裝），這個過程需要把模型部署到服務器上。常用的做法如使用flask、Django、tornado等web框架創建一個服務器app，這個app在啟動后就會一直掛在后臺，然后等待用戶使用客戶端POST一個請求上來（例如上傳了一張圖片的url），app檢測到有請求，就會下載這個url的圖片，接著調用你的模型，得到推理結果后以json的格式把結果返回給用戶。

這個做法對于簡單部署來說代碼量不多，對于不熟悉web框架的朋友來說隨便套用一個模板就能寫出來，但是也會有一些明顯的缺點：

1. 需要在服務器上重新安裝項目所需的所有依賴。

2. 當接收到并發請求的時候，服務器可能要后臺啟動多個進程進行推理，造成資源緊缺。

3. 不同的模型需要啟動不同的服務。

而為了解決第一個問題，Docker是最好的方案。這里舉一個不是十分準確但是能幫助理解的例子：Docker在直覺上可以理解成為碼頭上的“集裝箱”，我們把計算機系統比喻成碼頭，把應用程序比喻成碼頭上的貨物，當集裝箱還未被發明的時候，貨物在碼頭上到處亂放，當要挑選某些貨物的時候（執行程序），工人們到處亂找彼此干擾（依賴沖突），影響效率。如果把貨物裝在一個個集裝箱里面，那么每個集裝箱里面的貨物整理就不會影響到其它集裝箱。

Docker有兩個重要概念，分別是image（鏡像）和container（容器）。image可以理解成python中的類，container就是類的一個instance（實例）。我們把image pull到本地后，在這個image中啟動一個container，然后我們就可以進入這個container里面做我們想做的事，例如配置環境，存放文件等等，這個過程可以形象地理解成我們買了一臺新電腦，然后打開電腦裝軟件。

針對第二個問題，對于使用tensorflow、keras框架進行算法開發的用戶來說，TensorFlow Serving（官網）能夠很簡單的把你的模型掛在服務器后臺，然后你只需要寫一個客戶端把請求發過去，它就會把運算后的結果返回給你。而TensorFlow Serving的最佳使用方式就是使用一個已經編譯好TensorFlow Serving功能的docker，你所要做的只是簡單的運行這個docker即可。

TensorFlow Serving還支持同時掛載多個模型或者多個版本的模型，只需簡單地指定模型名稱即可調用相應的模型，無需多寫幾份代碼、運行多個后臺服務。因此優勢在于：

1. 自動刷新使用新版本模型，無需重啟服務。

2. 無需寫任何部署代碼。

3. 可以同時掛載多個模型。

二、導出你的模型

TensorFlow Serving只需要一個導出的tensorflow或keras模型文件，這個模型文件定義了整個模型的計算圖，因此我們首先把一個訓練好的模型進行導出，tensorflow模型導出代碼例子如下：

with tf.get_default_graph().as_default():# 定義你的輸入輸出以及計算圖input_images = tf.placeholder(tf.float32, shape=[None, None, None, 3], name='input_images')output_result = model(input_images, is_training=False) # 改成你實際的計算圖saver = tf.train.Saver(variable_averages.variables_to_restore())# 導入你已經訓練好的模型.ckpt文件with tf.Session(config=tf.ConfigProto(allow_soft_placement=True)) as sess:ckpt_state = tf.train.get_checkpoint_state(FLAGS.checkpoint_path)model_path = os.path.join(FLAGS.checkpoint_path,os.path.basename(ckpt_state.model_checkpoint_path))print('Restore from {}'.format(model_path))saver.restore(sess, model_path)# 定義導出模型的各項參數# 定義導出地址export_path_base = FLAGS.export_model_direxport_path = os.path.join(tf.compat.as_bytes(export_path_base),tf.compat.as_bytes(str(FLAGS.model_version)))print('Exporting trained model to', export_path)builder = tf.saved_model.builder.SavedModelBuilder(export_path)# 定義Input tensor info，需要前面定義的input_imagestensor_info_input = tf.saved_model.utils.build_tensor_info(input_images)# 定義Output tensor info，需要前面定義的output_resulttensor_info_output = tf.saved_model.utils.build_tensor_info(output_result)# 創建預測簽名prediction_signature = (tf.saved_model.signature_def_utils.build_signature_def(inputs={'images': tensor_info_input},outputs={'result': tensor_info_output},method_name=tf.saved_model.signature_constants.PREDICT_METHOD_NAME))builder.add_meta_graph_and_variables(sess, [tf.saved_model.tag_constants.SERVING],signature_def_map={'predict_images': prediction_signature})# 導出模型builder.save(as_text=True)print('Done exporting!')

代碼中有許多難懂的API，但大概流程都是先定義好模型計算圖，然后導入訓練好的參數（一般都是ckpt文件），接著創建一個builder，定義好幾個導出模型所需的東西，其中最重要的是指定輸入輸出，build_tensor_info這個方法會指定輸入輸出所在計算圖的節點，最后builder會幫你導出模型。注意上面的代碼適用于回歸問題，如果是分類問題等，可以參考官方文檔中，有關tf.saved_model.signature_constants的METHOD_NAME的介紹。如果你想理解所有API的意義，TensorFlow的《Serving a TensorFlow Model》、《保存和恢復》文章能夠幫助你。如果你想馬上就能使用，參考上面的代碼基本沒問題。導出的文件結構如下：

keras的導出代碼稍微簡單一點，可以參考一下來自《keras、tensorflow serving踩坑記》這篇文章的代碼。

如果模型的輸出還不是最終的結果，需要進行其它運算，請盡可能把后處理的操作都用tf或者keras的API寫進計算圖的節點里面，盡量使模型的預測結果就是最終的結果，否則需要在web的代碼中對返回的結果進行其它處理。

可以看到，TensorFlow Serving不需要其它環境依賴，只要tensorflow版本對了，導出的模型就能直接在TensorFlow Serving上使用，接收輸入，返回輸出，無需寫任何部署代碼。

三、Docker

1. 安裝docker

TensorFlow Serving的安裝推薦使用docker，所以必須先安裝docker。docker安裝命令請參考官網頁面。

如果你運行安裝測試時看到以下語句，證明安裝成功：

Hello from Docker! This message shows that your installation appears to be working correctly.

2. 安裝nvidia-docker

接著，我們需要安裝docker的nvidia插件，nvidia-docker能夠使你的應用在GPU上運行，安裝nvidia-docker請參考官方頁面。

安裝命令最后一句是用于驗證nvidia-docker是否安裝成功，如果你能看見nvidia-smi輸出的顯卡信息，證明已經安裝成功。如果想了解更多Docker的基礎知識，可以閱讀：Docker Documentation或者Docker -- 從入門到實踐這些教程文章。

3. 拉取TensorFlow Serving鏡像

TensorFlow Serving已經制作了基于多個tensorflow版本的docker，你可以在TensorFlow Serving Docker Tag這個頁面找到你想要的版本。例如你的代碼是基于tensorflow 1.11.1的話，那就可以選擇“1.11.1”、“1.11.1-devel”、“1.11.1-devel-gpu”、“1.11.1-gpu”，這幾個的區別在于，只有版本號不帶devel的是cpu版本，是官方封裝好的docker，無法對其進行任何修改；帶devel的是development版本，你可以進入鏡像的容器里面修改配置，然后使用docker的commit命令來保存修改；帶gpu的是gpu版本，同樣如果不帶devel就無法修改里面的配置。

我們這里假設使用最新的非開發版的gpu版本，即：“latest-gpu”，用docker pull命令把鏡像拉到本地：

sudo docker pull tensorflow/serving:latest-gpu

Docker會把所需的文件下載到本地，下載速度因你的帶寬而異，帶gpu版本的鏡像下載時間會更長一點。如果你想通過阿里云源來加快下載速度，可以參考Docker 鏡像加速器-博客-云棲社區-阿里云。pull完成即可使用。如果在這個過程當中發生一些錯誤無法pull完整，你需要在其它機器上拉取后導出，再導入此機器了，詳情請搜索Docker load功能。

四、運行TensorFlow Serving Docker

1. 直接啟動

TensorFlow Serving官網有詳細的教程，這里總結了一些開箱即用的經驗，細節后面可以慢慢閱讀官網教程。完成鏡像的拉取后，在命令行中輸入以下命令即可啟動TensorFlow Serving：

sudo nvidia-docker run -p 8500:8500 \--mount type=bind,source=/home/huzhihao/projects/EAST/models,target=/models \-t --entrypoint=tensorflow_model_server tensorflow/serving:latest-gpu \--port=8500 --per_process_gpu_memory_fraction=0.5 \--enable_batching=true --model_name=east --model_base_path=/models/east_model &

這里解釋一下各個參數的意義：

-p 8500:8500 ：指的是開放8500這個gRPC端口。
--mount type=bind, source=/your/local/model, target=/models：把你導出的本地模型文件夾掛載到docker container的/models這個文件夾，tensorflow serving會從容器內的/models文件夾里面找到你的模型。
-t --entrypoint=tensorflow_model_server tensorflow/serving:latest-gpu：如果使用非devel版的docker，啟動docker之后是不能進入容器內部bash環境的，--entrypoint的作用是允許你“間接”進入容器內部，然后調用tensorflow_model_server命令來啟動TensorFlow Serving，這樣才能輸入后面的參數。緊接著指定使用tensorflow/serving:latest-gpu 這個鏡像，可以換成你想要的任何版本。
--port=8500：開放8500這個gRPC端口（需要先設置上面entrypoint參數，否則無效。下面參數亦然）
--per_process_gpu_memory_fraction=0.5：只允許模型使用多少百分比的顯存，數值在[0, 1]之間。
--enable_batching：允許模型進行批推理，提高GPU使用效率。
--model_name：模型名字，在導出模型的時候設置的名字。
--model_base_path：模型所在容器內的路徑，前面的mount已經掛載到了/models文件夾內，這里需要進一步指定到某個模型文件夾，例如/models/east_model指的是使用/models/east_model這個文件夾下面的模型。

更多的tensorflow_model_server參數意義，可以看以下官方介紹：

usage: tensorflow_model_server Flags:--port=8500 int32 Port to listen on for gRPC API--rest_api_port=0 int32 Port to listen on for HTTP/REST API. If set to zero HTTP/REST API will not be exported. This port must be different than the one specified in --port.--rest_api_num_threads=160 int32 Number of threads for HTTP/REST API processing. If not set, will be auto set based on number of CPUs.--rest_api_timeout_in_ms=30000 int32 Timeout for HTTP/REST API calls.--enable_batching=false bool enable batching--batching_parameters_file="" string If non-empty, read an ascii BatchingParameters protobuf from the supplied file name and use the contained values instead of the defaults.--model_config_file="" string If non-empty, read an ascii ModelServerConfig protobuf from the supplied file name, and serve the models in that file. This config file can be used to specify multiple models to serve and other advanced parameters including non-default version policy. (If used, --model_name, --model_base_path are ignored.)--model_name="default" string name of model (ignored if --model_config_file flag is set--model_base_path="" string path to export (ignored if --model_config_file flag is set, otherwise required)--file_system_poll_wait_seconds=1 int32 interval in seconds between each poll of the file system for new model version--flush_filesystem_caches=true bool If true (the default), filesystem caches will be flushed after the initial load of all servables, and after each subsequent individual servable reload (if the number of load threads is 1). This reduces memory consumption of the model server, at the potential cost of cache misses if model files are accessed after servables are loaded.--tensorflow_session_parallelism=0 int64 Number of threads to use for running a Tensorflow session. Auto-configured by default.Note that this option is ignored if --platform_config_file is non-empty.--ssl_config_file="" string If non-empty, read an ascii SSLConfig protobuf from the supplied file name and set up a secure gRPC channel--platform_config_file="" string If non-empty, read an ascii PlatformConfigMap protobuf from the supplied file name, and use that platform config instead of the Tensorflow platform. (If used, --enable_batching is ignored.)--per_process_gpu_memory_fraction=0.000000 float Fraction that each process occupies of the GPU memory space the value is between 0.0 and 1.0 (with 0.0 as the default) If 1.0, the server will allocate all the memory when the server starts, If 0.0, Tensorflow will automatically select a value.--saved_model_tags="serve" string Comma-separated set of tags corresponding to the meta graph def to load from SavedModel.--grpc_channel_arguments="" string A comma separated list of arguments to be passed to the grpc server. (e.g. grpc.max_connection_age_ms=2000)--enable_model_warmup=true bool Enables model warmup, which triggers lazy initializations (such as TF optimizations) at load time, to reduce first request latency.--version=false bool Display version

2. 進入devel版鏡像的容器內部啟動

如果你使用的是devel版本，希望進入容器內部的終端配置自己想要的環境，我們使用以下命令進入容器：

sudo nvidia-docker run -it tensorflow/serving:latest-devel-gpu bash

-it的意思是以交互的方式進入容器內部，鏡像名后跟一個```bash```指的是進入容器的shell，運行后你就可以像在平常的ubuntu終端那樣使用pip、apt等命令來設置你的定制環境了。如果想在容器內部啟動TensorFlow Serving，就要運行以下命令：

tensorflow_model_server --port=8500 --rest_api_port=8501 \--model_name=${MODEL_NAME} --model_base_path=${MODEL_BASE_PATH}/${MODEL_NAME}

參數的意義跟上面直接啟動章節給出的意義一樣，可以看到，如果不進入容器，在命令中加入--entrypoint以及其它參數，就跟上面進入容器使用tensorflow_model_server命令的效果一樣！當你想把本地文件夾里面的文件復制到容器內部的某個文件夾內，可以使用docker cp命令：

sudo docker cp /your/local/file YOUR_CONTAINER_ID:/your/container/dir

這里的YOUR_CONTAINER_ID可以在你的容器命令行終端里面，root@后面接的一串英文數字組合就是你的container id，如root@dc238c481f14:，“dc238c481f14”就是容器id。

當你一切設置妥當后，此時如果直接exit退出容器會導致你所做的一切改動都會全部消失！?必須先commit一下你的新鏡像，保存下來（注意以下命令不要在容器內部的shell執行，新開一個命令行）：

sudo docker commit $(sudo docker ps --last 1 -q) YOUR_IMAGE_NAME:VERSION

```YOUR_IMAGE_NAME:VERSION```就是你想改的鏡像名:版本號了，完成后輸入```sudo docker images```就能看到你的新鏡像了，此時你就可以容器內部輸入```exit```來退出你的容器。

有時候執行了一些掛在后臺的tensorflow serving服務，即使你```exit```退出容器或者ctrl+c都不會殺死這個服務，如果你想殺死不想再用的后臺應用，輸入```sudo docker ps```來查看正在運行的容器，然后```sudo docker kill IMAGE_NAME```就可以殺死服務。

五、Client客戶端

TensorFlow Serving啟動后，我們需要用一個客戶端來發送預測請求，跟以往請求不同的是，TensorFlow Serving使用的是gRPC協議，我們的客戶端需要安裝使用gRPC的API，以特定的方式進行請求以及接收結果。

安裝

pip install tensorflow-serving-api

Client Demo

這里展示核心代碼部分，完整的代碼可以參看TensorFlow Serving官方mnist client示例。

from tensorflow_serving.apis import predict_pb2 from tensorflow_serving.apis import prediction_service_pb2_grpcdef request_server(img_resized, server_url):'''用于向TensorFlow Serving服務請求推理結果的函數。:param img_resized: 經過預處理的待推理圖片數組，numpy array，shape：(h, w, 3):param server_url: TensorFlow Serving的地址加端口，str，如：'0.0.0.0:8500' :return: 模型返回的結果數組，numpy array'''# Request.channel = grpc.insecure_channel(server_url)stub = prediction_service_pb2_grpc.PredictionServiceStub(channel)request = predict_pb2.PredictRequest()request.model_spec.name = "east" # 模型名稱request.model_spec.signature_name = "predict_images" # 簽名名稱# "images"是你導出模型時設置的輸入名稱request.inputs["images"].CopyFrom(tf.contrib.util.make_tensor_proto(img_resized, shape=[1, ] + list(img_resized.shape)))response = stub.Predict(request, 5.0) # 5 secs timeoutreturn np.asarray(response.outputs["score"].float_val)

TensorFlow Serving返回protobuf格式的結果，不是json，無法使用json來解析結果，你可以打印出變量```response```的值，大概會是這樣的格式：

outputs {key: "score"value {dtype: DT_FLOATtensor_shape {dim {size: 1}dim {size: 200}dim {size: 200}dim {size: 5}}float_val: 160.14822387695312float_val: 112.23966217041016float_val: 95.28953552246094float_val: 130.53846740722656......

上面示例中```response.outputs["score"].float_val```會返回一個行向量如```array([160.14822387695312, 112.23966217041016, 95.28953552246094, ......])```，不會保留真實的shape，如果要把它reshape成原來的shape，如上面顯示的dim：(1, 200, 200, 5)，需要使用```tf.make_ndarray()```：

return tf.make_ndarray(response.outputs["score"])

六、Tornado Web服務

TensorFlow模型的計算圖，一般輸入的類型都是張量，你需要提前把你的圖像、文本或者其它數據先進行預處理，轉換成張量才能輸入到模型當中。而一般來說，這個數據預處理過程不會寫進計算圖里面，因此當你想使用TensorFlow Serving的時候，需要在客戶端上寫一大堆數據預處理代碼，然后把張量通過gRPC發送到serving，最后接收結果。現實情況是你不可能要求每一個用戶都要寫一大堆預處理和后處理代碼，用戶只需使用簡單POST一個請求，然后接收最終結果即可。因此，這些預處理和后處理代碼必須由一個“中間人”來處理，這個“中間人”就是Web服務。

我們使用Tornado框架來搭建我們的Web服務，Tornado是一個支持異步非阻塞的高性能Web框架，可以接收多個用戶的并發請求，然后向TensorFlow Serving并發請求結果，并在其中承擔所有的數據預處理、后處理任務。

一個典型的Tornado app的偽代碼：

class MainHandler(tornado.web.RequestHandler):async def post(self):# 從客戶端post過來的信息中解析出圖片urlsurls = self.request.body.decode()urls = json.loads(urls)img = await fetch_urls(urls) # 異步下載圖片url函數img = preprocessing(img) # 預處理圖片函數result = await inference(img) # 調用tfserving預測函數result = postprocessing(result) # result后處理函數self.finish(result) # 返回信息給客戶端def make_app():return tornado.web.Application([(r"/", MainHandler)])if __name__ == '__main__':app = make_app()app.listen(8131) # tornado服務端監聽端口tornado.ioloop.IOLoop.current().start()

上面這個代碼涉及一些自定義的函數這里沒有給出，但可以根據上面的注釋大概了解Tornado構建的元素以及大致流程。網上大部分教程都是基于5.1.1以下的版本，這類教程用到大量的函數包裝器，但包裝器的寫法在5.1.1版本以后都準備棄用，取而代之的是使用```async```、```await```這類方式來定義異步函數。

建議Tornado初學者直接學習Tornado官網的文檔：Tornado Web Server - Tornado 5.1.1 documentation，網上大部分教程都不適合新版（5.1.1以上）的API，會引起混亂，官網提供的異步爬蟲：Queue example - a concurrent web spider - Tornado 5.1.1 documentation案例比較實用。如果你還未了解“同步與異步”、“阻塞與非阻塞”這些概念，建議你通過閱讀莫煩的多進程多線程：Threading 多線程教程系列 | 莫煩Python、廖雪峰的進程和線程以及異步IO章節了解這些概念，清晰易懂。另外，這里有一個非常棒的github項目，非常標準地使用TensorFlow Serving部署，流程十分清晰，建議大家參考：pakrchen/text-antispam

七、總結

由于TensorFlow Serving、TensorRT Infer Serving等等框架的出現，模型的部署、維護越來越方便，使得工程師更加專注于模型的研究上，大大縮短了研發-部署的流程。

總結

以上是生活随笔為你收集整理的TensorFlow Serving + Docker + Tornado机器学习模型生产级快速部署的全部內容，希望文章能夠幫你解決所遇到的問題。

如果覺得生活随笔網站內容還不錯，歡迎將生活随笔推薦給好友。

上一篇：使用tensorflow serving
下一篇： Pycharm连接远程服务器进行代码调试