日韩性视频-久久久蜜桃-www中文字幕-在线中文字幕av-亚洲欧美一区二区三区四区-撸久久-香蕉视频一区-久久无码精品丰满人妻-国产高潮av-激情福利社-日韩av网址大全-国产精品久久999-日本五十路在线-性欧美在线-久久99精品波多结衣一区-男女午夜免费视频-黑人极品ⅴideos精品欧美棵-人人妻人人澡人人爽精品欧美一区-日韩一区在线看-欧美a级在线免费观看

歡迎訪問 生活随笔!

生活随笔

當前位置: 首頁 > 编程资源 > 编程问答 >内容正文

编程问答

openvino系列 16. OpenVINO 手写字体识别 OCR

發布時間:2023/12/9 编程问答 33 豆豆
生活随笔 收集整理的這篇文章主要介紹了 openvino系列 16. OpenVINO 手写字体识别 OCR 小編覺得挺不錯的,現在分享給大家,幫大家做個參考.

openvino系列 16. OpenVINO 手寫字體識別 OCR

此案例中,我們對手寫中文(簡體)和日語進行OCR識別。該模型一次只能處理一行符號。

  • handwritten-japanese-recognition-0001
  • handwritten-simplified-chinese-recognition-0001

環境描述:

  • 本案例運行環境:Win10,10代i5筆記本
  • IDE:VSCode
  • openvino版本:2022.1
  • 代碼鏈接,11-OCR

文章目錄

  • openvino系列 16. OpenVINO 手寫字體識別 OCR
    • 1 關于手寫體識別模型使用
      • 1.1 `handwritten-japanese-recognition`
      • 1.2 `handwritten-simplified-chinese`
    • 2 手寫體識別模型代碼
      • 2.1 選擇手寫字體模型,并加載
      • 2.2 加載圖像,并調整其尺寸以符合模型輸入尺寸
      • 2.3 準備Charlist
      • 2.4 模型推理輸出結果


1 關于手寫體識別模型使用

這個案例中,我們對手寫中文(簡體)和日語進行OCR識別。該模型一次只能處理一行符號。

本筆記本使用的模型是handwritten-japanese-recognition和handwritten-simplified-chinese。將模型輸出解碼為可讀文本 kondate_nakayosi 和 scut_ept 字符列表被使用。這兩種模型都可以在 Open Model Zoo 上找到。

1.1 handwritten-japanese-recognition

這里我們不對模型的具體算法做解釋,只技術其輸入輸出。

輸入:[1,1,96,2000],對應 [B,C,H,W],即:B - batch size,C - number of channels,H - image height,W - image width。

注意:源圖片在保持寬高比的情況下,調整到特定高度(如96),調整后的寬度不大于2000,然后右下角用邊緣值填充到2000。

輸出:[186,1,4442],對應 [W,B,L],即:W - output sequence length,B - batch size,L - confidence distribution across the supported symbols in Kondate and Nakayosi。

1.2 handwritten-simplified-chinese

輸入:[1,1,96,2000],對應 [B,C,H,W],即:B - batch size,C - number of channels,H - image height,W - image width。

注意:源圖片在保持寬高比的情況下,調整到特定高度(如96),調整后的寬度不大于2000,然后右下角用邊緣值填充到2000。

輸出:[186,1,4059],對應 [W,B,L],即:W - output sequence length,B - batch size,L - confidence distribution across the supported symbols in SCUT-EPT。

2 手寫體識別模型代碼

2.1 選擇手寫字體模型,并加載

from collections import namedtuple from itertools import groupby from pathlib import Pathimport cv2 import matplotlib.pyplot as plt import numpy as np from openvino.runtime import Core# Directories where data will be placed model_folder = "model" data_folder = "data" charlist_folder = f"{data_folder}/charlists" # Precision used by model precision = "FP16"# To group files, you have to define the collection. In this case, you can use `namedtuple`. Language = namedtuple(typename="Language", field_names=["model_name", "charlist_name", "demo_image_name"] ) chinese_files = Language(model_name="handwritten-simplified-chinese-recognition-0001",charlist_name="chinese_charlist.txt",demo_image_name="handwritten_chinese_test.jpg", ) japanese_files = Language(model_name="handwritten-japanese-recognition-0001",charlist_name="japanese_charlist.txt",demo_image_name="handwritten_japanese_test.png", )print("1 - Choose a language model to download, either Chinese or Japanese.") # Select language by using either language='chinese' or language='japanese' language = "chinese" languages = {"chinese": chinese_files, "japanese": japanese_files} selected_language = languages.get(language)# Download the model path_to_model_weights = Path(f'{model_folder}/intel/{selected_language.model_name}/{precision}/{selected_language.model_name}.bin') if not path_to_model_weights.is_file():download_command = f'omz_downloader --name {selected_language.model_name} --output_dir {model_folder} --precision {precision}'print(download_command)! $download_command else:print("model has been downloaded.")print("2 - Load the model, and print its input and output") ie = Core() path_to_model = path_to_model_weights.with_suffix(".xml") model = ie.read_model(model=path_to_model) # Select Device Name compiled_model = ie.compile_model(model=model, device_name="CPU") recognition_output_layer = compiled_model.output(0) recognition_input_layer = compiled_model.input(0) print("- model input shape: {}".format(recognition_input_layer)) print("- model output shape: {}".format(recognition_output_layer))

Terminal打印:

1 - Choose a language model to download, either Chinese or Japanese. model has been downloaded. 2 - Load the model, and print its input and output - model input shape: <ConstOutput: names[actual_input] shape{1,1,96,2000} type: f32> - model output shape: <ConstOutput: names[output] shape{186,1,4059} type: f32>

2.2 加載圖像,并調整其尺寸以符合模型輸入尺寸

下一步是加載圖像。該模型需要單通道圖像作為輸入,這就是我們以灰度讀取圖像的原因。加載輸入圖像后,下一步是獲取用于計算比例的信息。 這描述了所需輸入層高度與當前圖像高度之間的比率。在下面的單元格中,圖像將被調整大小和填充以保持字母成比例并符合輸入形狀。

print("3 - load image to test.") # Read file name of demo file based on the selected model file_name = selected_language.demo_image_name # Text detection models expects an image in grayscale format # IMPORTANT!!! This model allows to read only one line at time # Read image image = cv2.imread(filename=f"{data_folder}/{file_name}", flags=cv2.IMREAD_GRAYSCALE) # Fetch shape image_height, _ = image.shape print("- Original image shape: {}".format(image.shape)) print("- Image scale needs to be reshaped into: {}".format(recognition_input_layer.shape)) # B,C,H,W = batch size, number of channels, height, width _, _, H, W = recognition_input_layer.shape print("- We need to first resize image then add paddings in order to align with model input size.") # Calculate scale ratio between input shape height and image height to resize image scale_ratio = H / image_height # Resize image to expected input sizes resized_image = cv2.resize(image, None, fx=scale_ratio, fy=scale_ratio, interpolation=cv2.INTER_AREA ) # Pad image to match input size, without changing aspect ratio resized_image = np.pad(resized_image, ((0, 0), (0, W - resized_image.shape[1])), mode="edge" ) # Reshape to network the input shape input_image = resized_image[None, None, :, :]## Visualise Input Image plt.figure() plt.axis("off") plt.imshow(image, cmap="gray", vmin=0, vmax=255); plt.figure(figsize=(20, 1)) plt.axis("off") plt.imshow(resized_image, cmap="gray", vmin=0, vmax=255);

Terminal 打印:

3 - load image to test. - Original image shape: (115, 1250) - Image scale needs to be reshaped into: {1, 1, 96, 2000} - We need to first resize image then add paddings in order to align with model input size.

輸入的原圖如下:

尺寸調整后如下:

2.3 準備Charlist

現在模型已加載,圖像已準備就緒。 下一步,我們下載的字符列表。在我們使用它之前,必須在字符列表的開頭添加一個空白符號。

print("4 - Prepare Charlist, which is a ground truth list which we could match with our inference results.") # Get dictionary to encode output, based on model documentation used_charlist = selected_language.charlist_name # With both models, there should be blank symbol added at index 0 of each charlist blank_char = "~" with open(f"{charlist_folder}/{used_charlist}", "r", encoding="utf-8") as charlist:letters = blank_char + "".join(line.strip() for line in charlist)

2.4 模型推理輸出結果

現在運行推理。 compiled_model() 采用與模型輸入順序相同的輸入列表。 然后我們可以從輸出張量中獲取輸出。

模型格式的輸出為 W x B x L,其中:

  • W - 輸出序列長度
  • B - 批量大小
  • L - Kondate 和 Nakayosi 中支持的符號的置信度分布。

要獲得更易于閱讀的格式,請選擇概率最高的符號。 由于 CTC 解碼的限制,我們將刪除并發符號,然后刪除空白。

最后一步是從 charlist 中的相應索引中獲取符號。

# Run inference on the model predictions = compiled_model([input_image])[recognition_output_layer] print("5 - Model Inference. Prediction results shape: {}".format(predictions.shape)) # Remove batch dimension predictions = np.squeeze(predictions) print("- We first squeeze the inference result into shape: {}".format(predictions.shape)) # Run argmax to pick the symbols with the highest probability predictions_indexes = np.argmax(predictions, axis=1) # Use groupby to remove concurrent letters, as required by CTC greedy decoding output_text_indexes = list(groupby(predictions_indexes)) # Remove grouper objects output_text_indexes, _ = np.transpose(output_text_indexes, (1, 0)) print("- We find out the highest probability character, and remove concurrent letters and grouper objects into shape: {}".format(output_text_indexes.shape)) # Remove blank symbols output_text_indexes = output_text_indexes[output_text_indexes != 0] print("- We remove blank symbolsa into shape: {}".format(output_text_indexes.shape)) # Assign letters to indexes from output array output_text = [letters[letter_index] for letter_index in output_text_indexes] print("- Final results: {}".format(output_text)) # Print Output plt.figure(figsize=(20, 1)) plt.axis("off") plt.imshow(resized_image, cmap="gray", vmin=0, vmax=255)

Terminal 打印:

5 - Model Inference. Prediction results shape: (186, 1, 4059) - We first squeeze the inference result into shape: (186, 4059) - We find out the highest probability character, and remove concurrent letters and grouper objects into shape: (32,) - We remove blank symbolsa into shape: (20,) - Final results: ['人', '有', '悲', '歡', '離', '合', ',', '月', '有', '陰', '睛', '圓', '缺', ',', '此', '事', '古', '難', '全', '。']

總結

以上是生活随笔為你收集整理的openvino系列 16. OpenVINO 手写字体识别 OCR的全部內容,希望文章能夠幫你解決所遇到的問題。

如果覺得生活随笔網站內容還不錯,歡迎將生活随笔推薦給好友。