當前位置：首頁 > 编程资源 > 编程问答 >内容正文

编程问答

openvino系列 16. OpenVINO 手写字体识别 OCR

發布時間：2023/12/9 编程问答 33 豆豆

生活随笔收集整理的這篇文章主要介紹了 openvino系列 16. OpenVINO 手写字体识别 OCR 小編覺得挺不錯的,現在分享給大家,幫大家做個參考.

openvino系列 16. OpenVINO 手寫字體識別 OCR

此案例中，我們對手寫中文（簡體）和日語進行OCR識別。該模型一次只能處理一行符號。

handwritten-japanese-recognition-0001
handwritten-simplified-chinese-recognition-0001

環境描述：

本案例運行環境：Win10，10代i5筆記本
IDE：VSCode
openvino版本：2022.1
代碼鏈接，11-OCR

文章目錄

openvino系列 16. OpenVINO 手寫字體識別 OCR
- 1 關于手寫體識別模型使用
- - 1.1 `handwritten-japanese-recognition`
  - 1.2 `handwritten-simplified-chinese`
- 2 手寫體識別模型代碼
- - 2.1 選擇手寫字體模型，并加載
  - 2.2 加載圖像，并調整其尺寸以符合模型輸入尺寸
  - 2.3 準備Charlist
  - 2.4 模型推理輸出結果

1 關于手寫體識別模型使用

這個案例中，我們對手寫中文（簡體）和日語進行OCR識別。該模型一次只能處理一行符號。

本筆記本使用的模型是handwritten-japanese-recognition和handwritten-simplified-chinese。將模型輸出解碼為可讀文本 kondate_nakayosi 和 scut_ept 字符列表被使用。這兩種模型都可以在 Open Model Zoo 上找到。

1.1 handwritten-japanese-recognition

這里我們不對模型的具體算法做解釋，只技術其輸入輸出。

輸入：[1,1,96,2000]，對應 [B,C,H,W]，即：B - batch size，C - number of channels，H - image height，W - image width。

注意：源圖片在保持寬高比的情況下，調整到特定高度（如96），調整后的寬度不大于2000，然后右下角用邊緣值填充到2000。

輸出：[186,1,4442]，對應 [W,B,L]，即：W - output sequence length，B - batch size，L - confidence distribution across the supported symbols in Kondate and Nakayosi。

1.2 handwritten-simplified-chinese

輸入：[1,1,96,2000]，對應 [B,C,H,W]，即：B - batch size，C - number of channels，H - image height，W - image width。

注意：源圖片在保持寬高比的情況下，調整到特定高度（如96），調整后的寬度不大于2000，然后右下角用邊緣值填充到2000。

輸出：[186,1,4059]，對應 [W,B,L]，即：W - output sequence length，B - batch size，L - confidence distribution across the supported symbols in SCUT-EPT。

2 手寫體識別模型代碼

2.1 選擇手寫字體模型，并加載

from collections import namedtuple from itertools import groupby from pathlib import Pathimport cv2 import matplotlib.pyplot as plt import numpy as np from openvino.runtime import Core# Directories where data will be placed model_folder = "model" data_folder = "data" charlist_folder = f"{data_folder}/charlists" # Precision used by model precision = "FP16"# To group files, you have to define the collection. In this case, you can use `namedtuple`. Language = namedtuple(typename="Language", field_names=["model_name", "charlist_name", "demo_image_name"] ) chinese_files = Language(model_name="handwritten-simplified-chinese-recognition-0001",charlist_name="chinese_charlist.txt",demo_image_name="handwritten_chinese_test.jpg", ) japanese_files = Language(model_name="handwritten-japanese-recognition-0001",charlist_name="japanese_charlist.txt",demo_image_name="handwritten_japanese_test.png", )print("1 - Choose a language model to download, either Chinese or Japanese.") # Select language by using either language='chinese' or language='japanese' language = "chinese" languages = {"chinese": chinese_files, "japanese": japanese_files} selected_language = languages.get(language)# Download the model path_to_model_weights = Path(f'{model_folder}/intel/{selected_language.model_name}/{precision}/{selected_language.model_name}.bin') if not path_to_model_weights.is_file():download_command = f'omz_downloader --name {selected_language.model_name} --output_dir {model_folder} --precision {precision}'print(download_command)! $download_command else:print("model has been downloaded.")print("2 - Load the model, and print its input and output") ie = Core() path_to_model = path_to_model_weights.with_suffix(".xml") model = ie.read_model(model=path_to_model) # Select Device Name compiled_model = ie.compile_model(model=model, device_name="CPU") recognition_output_layer = compiled_model.output(0) recognition_input_layer = compiled_model.input(0) print("- model input shape: {}".format(recognition_input_layer)) print("- model output shape: {}".format(recognition_output_layer))

Terminal打印：

1 - Choose a language model to download, either Chinese or Japanese. model has been downloaded. 2 - Load the model, and print its input and output - model input shape: <ConstOutput: names[actual_input] shape{1,1,96,2000} type: f32> - model output shape: <ConstOutput: names[output] shape{186,1,4059} type: f32>

2.2 加載圖像，并調整其尺寸以符合模型輸入尺寸

下一步是加載圖像。該模型需要單通道圖像作為輸入，這就是我們以灰度讀取圖像的原因。加載輸入圖像后，下一步是獲取用于計算比例的信息。這描述了所需輸入層高度與當前圖像高度之間的比率。在下面的單元格中，圖像將被調整大小和填充以保持字母成比例并符合輸入形狀。

print("3 - load image to test.") # Read file name of demo file based on the selected model file_name = selected_language.demo_image_name # Text detection models expects an image in grayscale format # IMPORTANT!!! This model allows to read only one line at time # Read image image = cv2.imread(filename=f"{data_folder}/{file_name}", flags=cv2.IMREAD_GRAYSCALE) # Fetch shape image_height, _ = image.shape print("- Original image shape: {}".format(image.shape)) print("- Image scale needs to be reshaped into: {}".format(recognition_input_layer.shape)) # B,C,H,W = batch size, number of channels, height, width _, _, H, W = recognition_input_layer.shape print("- We need to first resize image then add paddings in order to align with model input size.") # Calculate scale ratio between input shape height and image height to resize image scale_ratio = H / image_height # Resize image to expected input sizes resized_image = cv2.resize(image, None, fx=scale_ratio, fy=scale_ratio, interpolation=cv2.INTER_AREA ) # Pad image to match input size, without changing aspect ratio resized_image = np.pad(resized_image, ((0, 0), (0, W - resized_image.shape[1])), mode="edge" ) # Reshape to network the input shape input_image = resized_image[None, None, :, :]## Visualise Input Image plt.figure() plt.axis("off") plt.imshow(image, cmap="gray", vmin=0, vmax=255); plt.figure(figsize=(20, 1)) plt.axis("off") plt.imshow(resized_image, cmap="gray", vmin=0, vmax=255);

Terminal 打印：

3 - load image to test. - Original image shape: (115, 1250) - Image scale needs to be reshaped into: {1, 1, 96, 2000} - We need to first resize image then add paddings in order to align with model input size.

輸入的原圖如下：

尺寸調整后如下：

2.3 準備Charlist

現在模型已加載，圖像已準備就緒。下一步，我們下載的字符列表。在我們使用它之前，必須在字符列表的開頭添加一個空白符號。

print("4 - Prepare Charlist, which is a ground truth list which we could match with our inference results.") # Get dictionary to encode output, based on model documentation used_charlist = selected_language.charlist_name # With both models, there should be blank symbol added at index 0 of each charlist blank_char = "~" with open(f"{charlist_folder}/{used_charlist}", "r", encoding="utf-8") as charlist:letters = blank_char + "".join(line.strip() for line in charlist)

2.4 模型推理輸出結果

現在運行推理。 compiled_model() 采用與模型輸入順序相同的輸入列表。然后我們可以從輸出張量中獲取輸出。

模型格式的輸出為 W x B x L，其中：

W - 輸出序列長度
B - 批量大小
L - Kondate 和 Nakayosi 中支持的符號的置信度分布。

要獲得更易于閱讀的格式，請選擇概率最高的符號。由于 CTC 解碼的限制，我們將刪除并發符號，然后刪除空白。

最后一步是從 charlist 中的相應索引中獲取符號。

# Run inference on the model predictions = compiled_model([input_image])[recognition_output_layer] print("5 - Model Inference. Prediction results shape: {}".format(predictions.shape)) # Remove batch dimension predictions = np.squeeze(predictions) print("- We first squeeze the inference result into shape: {}".format(predictions.shape)) # Run argmax to pick the symbols with the highest probability predictions_indexes = np.argmax(predictions, axis=1) # Use groupby to remove concurrent letters, as required by CTC greedy decoding output_text_indexes = list(groupby(predictions_indexes)) # Remove grouper objects output_text_indexes, _ = np.transpose(output_text_indexes, (1, 0)) print("- We find out the highest probability character, and remove concurrent letters and grouper objects into shape: {}".format(output_text_indexes.shape)) # Remove blank symbols output_text_indexes = output_text_indexes[output_text_indexes != 0] print("- We remove blank symbolsa into shape: {}".format(output_text_indexes.shape)) # Assign letters to indexes from output array output_text = [letters[letter_index] for letter_index in output_text_indexes] print("- Final results: {}".format(output_text)) # Print Output plt.figure(figsize=(20, 1)) plt.axis("off") plt.imshow(resized_image, cmap="gray", vmin=0, vmax=255)

Terminal 打印：

5 - Model Inference. Prediction results shape: (186, 1, 4059) - We first squeeze the inference result into shape: (186, 4059) - We find out the highest probability character, and remove concurrent letters and grouper objects into shape: (32,) - We remove blank symbolsa into shape: (20,) - Final results: ['人', '有', '悲', '歡', '離', '合', '，', '月', '有', '陰', '睛', '圓', '缺', '，', '此', '事', '古', '難', '全', '。']

總結

以上是生活随笔為你收集整理的openvino系列 16. OpenVINO 手写字体识别 OCR的全部內容，希望文章能夠幫你解決所遇到的問題。

如果覺得生活随笔網站內容還不錯，歡迎將生活随笔推薦給好友。

上一篇： u-blox gps 串口驱动安装恢复解
下一篇： [html] 写一个搜索框，聚焦时搜索框