當(dāng)前位置：首頁(yè) > 编程资源 > 编程问答 >内容正文

编程问答

Yolov4-tiny pth转onnx转tensorrt

發(fā)布時(shí)間：2023/12/18 编程问答 42 豆豆

生活随笔收集整理的這篇文章主要介紹了 Yolov4-tiny pth转onnx转tensorrt 小編覺(jué)得挺不錯(cuò)的,現(xiàn)在分享給大家,幫大家做個(gè)參考.

Yolov4-tiny pth模型轉(zhuǎn)換成onnx

Yolov4-tiny模型參考鏈接

trt加載推理代碼提取碼：ou91

載入模型并完成轉(zhuǎn)換

def pth2onnx(pth_model,input,model_name):torch.onnx.export(pth_model, # 需要轉(zhuǎn)換的模型input, # 模型的輸入"model_data/%s.onnx" % model_name, # 保存位置export_params=True, # 是否在模型中保存訓(xùn)練過(guò)的參數(shù)opset_version=11, # ONNX版本input_names=['input'],)# 轉(zhuǎn)化為可以再netro查看每步尺寸的模型onnx.save(onnx.shape_inference.infer_shapes(onnx.load("model_data/%s.onnx" % model_name)), "model_data/%s.onnx" % model_name)print('%s.pth convert to onnx is done' % model_name)batch_size = 1model_name = 'Digital_large_crop'class_path = '../data/%s/classes.txt' % model_namemodel_path = 'model_data/%s.pth' % model_nameyolo = YoloBody(anchors_mask=[[3,4,5],[1,2,3]],num_classes=11, phi = 0)yolo.load_state_dict(torch.load('model_data/%s.pth'%model_name))x = torch.ones(batch_size, 3, 416, 416, requires_grad=False)# out1,out2 = yolo(x)pth2onnx(yolo,x,model_name)

模型可視化

在網(wǎng)站 https://netron.app/ 中加載生成的onnx模型，便可以看到整個(gè)網(wǎng)絡(luò)結(jié)構(gòu)。

載入onnx模型

加載onnx模型，對(duì)比pth模型推理的結(jié)果是否正確。

def compare_pth_onnx(model_name):session = onnxruntime.InferenceSession('model_data/%s.onnx' %model_name)yolo = YoloBody(anchors_mask=[[3,4,5],[1,2,3]],num_classes=11, phi = 0)yolo.load_state_dict(torch.load('model_data/%s.pth'%model_name))yolo.evel()img = Image.open('../data/Digital_crop/JPEGImages/1.jpg')image_data = resize_image(img, (416,416), False)#---------------------------------------------------------## 添加上batch_size維度#---------------------------------------------------------#image_data = np.expand_dims(np.transpose(preprocess_input(np.array(image_data, dtype='float32')), (2, 0, 1)), 0)pth_out = yolo(torch.from_numpy(image_data))onnx_out = session.run([], {"input": image_data})print(torch.max(torch.abs(pth_out[0]-torch.from_numpy(onnx_out[0]))))

發(fā)現(xiàn)推理結(jié)果基本吻合，查看模型權(quán)重是否一樣。

發(fā)現(xiàn)第一個(gè)第一個(gè)卷積層的BN權(quán)重并不相同，查找原因。查閱官方文檔可以看到在訓(xùn)練/評(píng)估模式中，Dropout，BatchNorm等操作會(huì)有不同的參數(shù)值。發(fā)現(xiàn)是因?yàn)閷?dǎo)出時(shí)Conv層和Bn層整合到了一起，torch.onnx.export 時(shí)添加參數(shù) training=2，可以將conv和bn 分開(kāi)顯示。

這時(shí)我們可以看到，參數(shù)是對(duì)應(yīng)上了的。那之前的參數(shù)是怎么來(lái)的呢？參考文章 https://zhuanlan.zhihu.com/p/353697121 通過(guò)公式融合參數(shù)：

結(jié)果與前圖對(duì)應(yīng)上了。至此驗(yàn)證pth轉(zhuǎn)換onnx成功。

onnx轉(zhuǎn)trt

首先配置好環(huán)境可以參考我之前的文章 https://blog.csdn.net/weixin_44241884/article/details/122084953

locate trtexec # 找到轉(zhuǎn)換程序的路徑 /yourtrtexecpath/trtexec --onnx=youmodelname.onnx --saveEngine=yourmodelname.trt[01/11/2022-15:34:10] [I] Average on 10 runs - GPU latency: 0.771057 ms - Host latency: 0.976514 ms (end to end 1.41353 ms, enqueue 0.361804 ms) [01/11/2022-15:34:10] [I] Average on 10 runs - GPU latency: 0.788892 ms - Host latency: 1.04755 ms (end to end 1.46624 ms, enqueue 0.568628 ms) [01/11/2022-15:34:10] [I] Average on 10 runs - GPU latency: 0.768213 ms - Host latency: 0.977173 ms (end to end 1.40577 ms, enqueue 0.404419 ms) [01/11/2022-15:34:10] [I] Host Latency [01/11/2022-15:34:10] [I] min: 0.946533 ms (end to end 0.971191 ms) [01/11/2022-15:34:10] [I] max: 3.65527 ms (end to end 4.02917 ms) [01/11/2022-15:34:10] [I] mean: 0.983374 ms (end to end 1.39919 ms) [01/11/2022-15:34:10] [I] median: 0.972412 ms (end to end 1.39661 ms) [01/11/2022-15:34:10] [I] percentile: 1.11963 ms at 99% (end to end 1.52612 ms at 99%) [01/11/2022-15:34:10] [I] throughput: 0 qps [01/11/2022-15:34:10] [I] walltime: 3.00267 s [01/11/2022-15:34:10] [I] Enqueue Time [01/11/2022-15:34:10] [I] min: 0.267456 ms [01/11/2022-15:34:10] [I] max: 3.63794 ms [01/11/2022-15:34:10] [I] median: 0.374756 ms [01/11/2022-15:34:10] [I] GPU Compute [01/11/2022-15:34:10] [I] min: 0.74646 ms [01/11/2022-15:34:10] [I] max: 3.43762 ms [01/11/2022-15:34:10] [I] mean: 0.767578 ms [01/11/2022-15:34:10] [I] median: 0.764893 ms [01/11/2022-15:34:10] [I] percentile: 0.811005 ms at 99% [01/11/2022-15:34:10] [I] total compute time: 2.97513 s

至此轉(zhuǎn)換成功，后續(xù)通過(guò)python或者c++讀取trt文件推理，還需學(xué)習(xí)了解。

TensorRT推理Python實(shí)現(xiàn)

參考官方文檔：https://github.com/NVIDIA/TensorRT/blob/main/quickstart/SemanticSegmentation/tutorial-runtime.ipynb

import numpy as np import os import torch import pycuda.driver as cuda import pycuda.autoinit import tensorrt as trtimport matplotlib.pyplot as plt from PIL import ImageTRT_LOGGER = trt.Logger()def preprocess_input(image):image /= 255.0return imagedef resize_image(image, size, letterbox_image):iw, ih = image.sizew, h = sizeif letterbox_image:scale = min(w/iw, h/ih)nw = int(iw*scale)nh = int(ih*scale)image = image.resize((nw,nh), Image.BICUBIC)new_image = Image.new('RGB', size, (128,128,128))new_image.paste(image, ((w-nw)//2, (h-nh)//2))else:new_image = image.resize((w, h), Image.BICUBIC)return new_imagedef load_engine(engine_file_path):assert os.path.exists(engine_file_path)print("Reading engine from file {}".format(engine_file_path))with open(engine_file_path, "rb") as f, trt.Runtime(TRT_LOGGER) as runtime:return runtime.deserialize_cuda_engine(f.read())if __name__ == '__main__':img = Image.open('../data/Digital_crop/JPEGImages/1.jpg')image_data = resize_image(img, (416, 416), False)# ---------------------------------------------------------## 添加上batch_size維度# ---------------------------------------------------------#image_data = np.expand_dims(np.transpose(preprocess_input(np.array(image_data, dtype='float32')), (2, 0, 1)), 0)input_image = image_dataengine = load_engine('model_data/Digital_large_crop.trt')with engine.create_execution_context() as context:# Set input shape based on image dimensions for inferencecontext.set_binding_shape(engine.get_binding_index("input.1"), (1, 3, 416, 416))# Allocate host and device buffersbindings, output_memory, output_buffer = [], [], []for binding in engine:binding_idx = engine.get_binding_index(binding)size = trt.volume(context.get_binding_shape(binding_idx))dtype = trt.nptype(engine.get_binding_dtype(binding))if engine.binding_is_input(binding):input_buffer = np.ascontiguousarray(input_image)input_memory = cuda.mem_alloc(input_image.nbytes)bindings.append(int(input_memory))else:output_buffer.append(cuda.pagelocked_empty(size, dtype))output_memory.append(cuda.mem_alloc(output_buffer[-1].nbytes))bindings.append(int(output_memory[-1]))stream = cuda.Stream()# Transfer input data to the GPU.cuda.memcpy_htod_async(input_memory, input_buffer, stream)# Run inferencecontext.execute_async_v2(bindings=bindings, stream_handle=stream.handle)# Transfer prediction output from the GPU.for i in range(len(output_buffer)):cuda.memcpy_dtoh_async(output_buffer[i], output_memory[i], stream)output_buffer[0]=output_buffer[0].reshape(1,48,13,13)output_buffer[1]=output_buffer[1].reshape(1,48,26,26)# Synchronize the streamstream.synchronize()print(output_buffer)

推理結(jié)果卻大相徑庭，原因待查。
經(jīng)過(guò)后處理后，發(fā)現(xiàn)結(jié)果是正確的，不需要考慮其得出的推理張量是否一致。

左邊為pth推理結(jié)果，右邊為trt推理結(jié)果，置信度略有不同，個(gè)別框也有些區(qū)別。但是推理速度pth反而比trt快一些，原因待查。

下一步學(xué)習(xí)在C++環(huán)境下實(shí)現(xiàn)trt推理。

總結(jié)

以上是生活随笔為你收集整理的Yolov4-tiny pth转onnx转tensorrt的全部?jī)?nèi)容，希望文章能夠幫你解決所遇到的問(wèn)題。

如果覺(jué)得生活随笔網(wǎng)站內(nèi)容還不錯(cuò)，歡迎將生活随笔推薦給好友。

上一篇：电脑录屏软件哪个好用，分享4款电脑录制视
下一篇：服务器字体文件太大,网页的字体文件过大