Yolov4-tiny pth转onnx转tensorrt
Yolov4-tiny pth模型轉(zhuǎn)換成onnx
Yolov4-tiny模型參考鏈接
trt加載推理代碼 提取碼:ou91
載入模型并完成轉(zhuǎn)換
def pth2onnx(pth_model,input,model_name):torch.onnx.export(pth_model, # 需要轉(zhuǎn)換的模型input, # 模型的輸入"model_data/%s.onnx" % model_name, # 保存位置export_params=True, # 是否在模型中保存訓(xùn)練過(guò)的參數(shù)opset_version=11, # ONNX版本input_names=['input'],)# 轉(zhuǎn)化為可以再netro查看每步尺寸的模型onnx.save(onnx.shape_inference.infer_shapes(onnx.load("model_data/%s.onnx" % model_name)), "model_data/%s.onnx" % model_name)print('%s.pth convert to onnx is done' % model_name)batch_size = 1model_name = 'Digital_large_crop'class_path = '../data/%s/classes.txt' % model_namemodel_path = 'model_data/%s.pth' % model_nameyolo = YoloBody(anchors_mask=[[3,4,5],[1,2,3]],num_classes=11, phi = 0)yolo.load_state_dict(torch.load('model_data/%s.pth'%model_name))x = torch.ones(batch_size, 3, 416, 416, requires_grad=False)# out1,out2 = yolo(x)pth2onnx(yolo,x,model_name)模型可視化
在網(wǎng)站 https://netron.app/ 中加載生成的onnx模型,便可以看到整個(gè)網(wǎng)絡(luò)結(jié)構(gòu)。
載入onnx模型
加載onnx模型,對(duì)比pth模型推理的結(jié)果是否正確。
def compare_pth_onnx(model_name):session = onnxruntime.InferenceSession('model_data/%s.onnx' %model_name)yolo = YoloBody(anchors_mask=[[3,4,5],[1,2,3]],num_classes=11, phi = 0)yolo.load_state_dict(torch.load('model_data/%s.pth'%model_name))yolo.evel()img = Image.open('../data/Digital_crop/JPEGImages/1.jpg')image_data = resize_image(img, (416,416), False)#---------------------------------------------------------## 添加上batch_size維度#---------------------------------------------------------#image_data = np.expand_dims(np.transpose(preprocess_input(np.array(image_data, dtype='float32')), (2, 0, 1)), 0)pth_out = yolo(torch.from_numpy(image_data))onnx_out = session.run([], {"input": image_data})print(torch.max(torch.abs(pth_out[0]-torch.from_numpy(onnx_out[0]))))發(fā)現(xiàn)推理結(jié)果基本吻合,查看模型權(quán)重是否一樣。
發(fā)現(xiàn)第一個(gè)第一個(gè)卷積層的BN權(quán)重并不相同,查找原因。查閱官方文檔可以看到在訓(xùn)練/評(píng)估模式中,Dropout,BatchNorm等操作會(huì)有不同的參數(shù)值。 發(fā)現(xiàn)是因?yàn)閷?dǎo)出時(shí)Conv層和Bn層整合到了一起,torch.onnx.export 時(shí)添加參數(shù) training=2,可以將conv和bn 分開(kāi)顯示。
這時(shí)我們可以看到,參數(shù)是對(duì)應(yīng)上了的。那之前的參數(shù)是怎么來(lái)的呢?參考文章 https://zhuanlan.zhihu.com/p/353697121 通過(guò)公式融合參數(shù):
結(jié)果與前圖對(duì)應(yīng)上了。至此驗(yàn)證pth轉(zhuǎn)換onnx成功。
onnx轉(zhuǎn)trt
首先配置好環(huán)境可以參考我之前的文章 https://blog.csdn.net/weixin_44241884/article/details/122084953
locate trtexec # 找到轉(zhuǎn)換程序的路徑 /yourtrtexecpath/trtexec --onnx=youmodelname.onnx --saveEngine=yourmodelname.trt[01/11/2022-15:34:10] [I] Average on 10 runs - GPU latency: 0.771057 ms - Host latency: 0.976514 ms (end to end 1.41353 ms, enqueue 0.361804 ms) [01/11/2022-15:34:10] [I] Average on 10 runs - GPU latency: 0.788892 ms - Host latency: 1.04755 ms (end to end 1.46624 ms, enqueue 0.568628 ms) [01/11/2022-15:34:10] [I] Average on 10 runs - GPU latency: 0.768213 ms - Host latency: 0.977173 ms (end to end 1.40577 ms, enqueue 0.404419 ms) [01/11/2022-15:34:10] [I] Host Latency [01/11/2022-15:34:10] [I] min: 0.946533 ms (end to end 0.971191 ms) [01/11/2022-15:34:10] [I] max: 3.65527 ms (end to end 4.02917 ms) [01/11/2022-15:34:10] [I] mean: 0.983374 ms (end to end 1.39919 ms) [01/11/2022-15:34:10] [I] median: 0.972412 ms (end to end 1.39661 ms) [01/11/2022-15:34:10] [I] percentile: 1.11963 ms at 99% (end to end 1.52612 ms at 99%) [01/11/2022-15:34:10] [I] throughput: 0 qps [01/11/2022-15:34:10] [I] walltime: 3.00267 s [01/11/2022-15:34:10] [I] Enqueue Time [01/11/2022-15:34:10] [I] min: 0.267456 ms [01/11/2022-15:34:10] [I] max: 3.63794 ms [01/11/2022-15:34:10] [I] median: 0.374756 ms [01/11/2022-15:34:10] [I] GPU Compute [01/11/2022-15:34:10] [I] min: 0.74646 ms [01/11/2022-15:34:10] [I] max: 3.43762 ms [01/11/2022-15:34:10] [I] mean: 0.767578 ms [01/11/2022-15:34:10] [I] median: 0.764893 ms [01/11/2022-15:34:10] [I] percentile: 0.811005 ms at 99% [01/11/2022-15:34:10] [I] total compute time: 2.97513 s至此轉(zhuǎn)換成功,后續(xù)通過(guò)python或者c++讀取trt文件推理,還需學(xué)習(xí)了解。
TensorRT推理Python實(shí)現(xiàn)
參考官方文檔:https://github.com/NVIDIA/TensorRT/blob/main/quickstart/SemanticSegmentation/tutorial-runtime.ipynb
import numpy as np import os import torch import pycuda.driver as cuda import pycuda.autoinit import tensorrt as trtimport matplotlib.pyplot as plt from PIL import ImageTRT_LOGGER = trt.Logger()def preprocess_input(image):image /= 255.0return imagedef resize_image(image, size, letterbox_image):iw, ih = image.sizew, h = sizeif letterbox_image:scale = min(w/iw, h/ih)nw = int(iw*scale)nh = int(ih*scale)image = image.resize((nw,nh), Image.BICUBIC)new_image = Image.new('RGB', size, (128,128,128))new_image.paste(image, ((w-nw)//2, (h-nh)//2))else:new_image = image.resize((w, h), Image.BICUBIC)return new_imagedef load_engine(engine_file_path):assert os.path.exists(engine_file_path)print("Reading engine from file {}".format(engine_file_path))with open(engine_file_path, "rb") as f, trt.Runtime(TRT_LOGGER) as runtime:return runtime.deserialize_cuda_engine(f.read())if __name__ == '__main__':img = Image.open('../data/Digital_crop/JPEGImages/1.jpg')image_data = resize_image(img, (416, 416), False)# ---------------------------------------------------------## 添加上batch_size維度# ---------------------------------------------------------#image_data = np.expand_dims(np.transpose(preprocess_input(np.array(image_data, dtype='float32')), (2, 0, 1)), 0)input_image = image_dataengine = load_engine('model_data/Digital_large_crop.trt')with engine.create_execution_context() as context:# Set input shape based on image dimensions for inferencecontext.set_binding_shape(engine.get_binding_index("input.1"), (1, 3, 416, 416))# Allocate host and device buffersbindings, output_memory, output_buffer = [], [], []for binding in engine:binding_idx = engine.get_binding_index(binding)size = trt.volume(context.get_binding_shape(binding_idx))dtype = trt.nptype(engine.get_binding_dtype(binding))if engine.binding_is_input(binding):input_buffer = np.ascontiguousarray(input_image)input_memory = cuda.mem_alloc(input_image.nbytes)bindings.append(int(input_memory))else:output_buffer.append(cuda.pagelocked_empty(size, dtype))output_memory.append(cuda.mem_alloc(output_buffer[-1].nbytes))bindings.append(int(output_memory[-1]))stream = cuda.Stream()# Transfer input data to the GPU.cuda.memcpy_htod_async(input_memory, input_buffer, stream)# Run inferencecontext.execute_async_v2(bindings=bindings, stream_handle=stream.handle)# Transfer prediction output from the GPU.for i in range(len(output_buffer)):cuda.memcpy_dtoh_async(output_buffer[i], output_memory[i], stream)output_buffer[0]=output_buffer[0].reshape(1,48,13,13)output_buffer[1]=output_buffer[1].reshape(1,48,26,26)# Synchronize the streamstream.synchronize()print(output_buffer)
推理結(jié)果卻大相徑庭,原因待查。
經(jīng)過(guò)后處理后,發(fā)現(xiàn)結(jié)果是正確的,不需要考慮其得出的推理張量是否一致。
左邊為pth推理結(jié)果,右邊為trt推理結(jié)果,置信度略有不同,個(gè)別框也有些區(qū)別。但是推理速度pth反而比trt快一些,原因待查。
下一步學(xué)習(xí)在C++環(huán)境下實(shí)現(xiàn)trt推理。
總結(jié)
以上是生活随笔為你收集整理的Yolov4-tiny pth转onnx转tensorrt的全部?jī)?nèi)容,希望文章能夠幫你解決所遇到的問(wèn)題。
- 上一篇: 电脑录屏软件哪个好用,分享4款电脑录制视
- 下一篇: 服务器 字体文件太大,网页的字体文件过大