當(dāng)前位置：首頁 > 编程语言 > python >内容正文

python

适合小白的几个入门级Python ocr识别库

發(fā)布時(shí)間：2024/8/1 python 44 豆豆

生活随笔收集整理的這篇文章主要介紹了适合小白的几个入门级Python ocr识别库小編覺得挺不錯(cuò)的,現(xiàn)在分享給大家,幫大家做個(gè)參考.

適合小白的幾個(gè)入門級Python ocr識(shí)別庫

1、pytesseract
2、PaddleOCR
3、easyocr
4、muggle_ocr
5、dddd_ocr
6、其他

工作生活中經(jīng)常會(huì)遇到需要提取圖片中文字信息的情況，以前都是手動(dòng)自己把圖片里的字敲出來，但隨著這幾年人工智能技術(shù)的愈發(fā)成熟，市面上有越來越多的ocr產(chǎn)品了，基本上能大部分正常圖片的文字提取需求。當(dāng)然有時(shí)候需要提取文字的圖片數(shù)量較多或者有某個(gè)應(yīng)用程序編寫需求時(shí)，就需要借助代碼來實(shí)現(xiàn)了，這里介紹幾個(gè)比較適合新手小白的python ocr庫，簡單實(shí)用，可滿足絕大多數(shù)常規(guī)的圖片文字提取、驗(yàn)證碼識(shí)別需求。

1、pytesseract

pytesseract需要配合安裝在本地的tesseract-ocr.exe文件一起使用，tesseract-ocr.exe安裝教程可參考這里：Tesseract Ocr文字識(shí)別，需要注意的是安裝時(shí)一定要選中中文包，默認(rèn)是只支持英文識(shí)別。

python庫安裝命令如下：

pip install pytesseract

待識(shí)別圖片如下：

代碼實(shí)現(xiàn)：

import pytesseract from PIL import Image text = pytesseract.image_to_string(Image.open(r"d:\Desktop\39DEE621-40EA-4ad1-90CC-79EB51D39347.png")) print(text)

識(shí)別結(jié)果輸出：

Using Tesseract OCR with Python # import the necessary packages from PIL import Image import pytesseract import ergperse import cv2 import os# construct the argument parse and parse the arguments ap = argparse.ArgunentParser() ap.add_argument("-i", "--image", required-True, help="path to input image to be OCR'd") ap.add_argument("-p", "--preprocess", typesstr, default="thresh", helpe"type of preprocessing to be done") args = vars (ap.parse_args())

2、PaddleOCR

PaddleOCR是百度開源的一款基于深度學(xué)習(xí)的ocr識(shí)別庫，對中文的識(shí)別精度相當(dāng)不錯(cuò)，可以應(yīng)付絕大多數(shù)的文字提取需求。

需要依次安裝三個(gè)依賴庫，安裝命令如下，其中shapely庫可能會(huì)受系統(tǒng)影響安裝報(bào)錯(cuò)，具體解決方案參考這篇博客：百度OCR（文字識(shí)別）服務(wù)使用入坑指南

pip install paddlepaddle pip install shapely pip install paddleocr

待識(shí)別圖片如下：

代碼實(shí)現(xiàn)：

ocr = PaddleOCR(use_angle_cls=True, lang="ch")# 輸入待識(shí)別圖片路徑img_path = r"d:\Desktop\4A34A16F-6B12-4ffc-88C6-FC86E4DF6912.png"# 輸出結(jié)果保存路徑result = ocr.ocr(img_path, cls=True)for line in result:print(line)from PIL import Imageimage = Image.open(img_path).convert('RGB')boxes = [line[0] for line in result]txts = [line[1][0] for line in result]scores = [line[1][1] for line in result]im_show = draw_ocr(image, boxes, txts, scores)im_show = Image.fromarray(im_show)im_show.show()

識(shí)別結(jié)果輸出如下，會(huì)顯示出每個(gè)區(qū)域字體識(shí)別的置信度，以及其坐標(biāo)位置信息：

Namespace(cls=False, cls_batch_num=30, cls_image_shape='3, 48, 192', cls_model_dir='C:\\Users\\Administrator/.paddleocr/cls', cls_thresh=0.9, det=True, det_algorithm='DB', det_db_box_thresh=0.5, det_db_thresh=0.3, det_db_unclip_ratio=2.0, det_east_cover_thresh=0.1, det_east_nms_thresh=0.2, det_east_score_thresh=0.8, det_max_side_len=960, det_model_dir='C:\\Users\\Administrator/.paddleocr/det', enable_mkldnn=False, gpu_mem=8000, image_dir=None, ir_optim=True, label_list=['0', '180'], lang='ch', max_text_length=25, rec=True, rec_algorithm='CRNN', rec_batch_num=30, rec_char_dict_path='./ppocr/utils/ppocr_keys_v1.txt', rec_char_type='ch', rec_image_shape='3, 32, 320', rec_model_dir='C:\\Users\\Administrator/.paddleocr/rec/ch', use_angle_cls=True, use_gpu=True, use_space_char=True, use_tensorrt=False, use_zero_copy_run=False) dt_boxes num : 16, elapse : 0.04799485206604004 cls num : 16, elapse : 0.1860027313232422 rec_res num : 16, elapse : 0.4859299659729004 [[[6.0, 2.0], [85.0, 2.0], [85.0, 31.0], [6.0, 31.0]], ['幫助文檔', 0.99493873]] [[[309.0, 13.0], [324.0, 13.0], [324.0, 28.0], [309.0, 28.0]], ['X', 0.9667116]] [[[82.0, 50.0], [120.0, 50.0], [120.0, 71.0], [82.0, 71.0]], ['目錄', 0.993418]] [[[136.0, 50.0], [176.0, 50.0], [176.0, 71.0], [136.0, 71.0]], ['標(biāo)題', 0.99969745]] [[[13.0, 53.0], [60.0, 53.0], [60.0, 70.0], [13.0, 70.0]], ['快捷鍵', 0.9995322]] [[[191.0, 49.0], [314.0, 49.0], [314.0, 72.0], [191.0, 72.0]], ['文本樣式列表', 0.9967863]] [[[61.0, 84.0], [120.0, 84.0], [120.0, 101.0], [61.0, 101.0]], ['代碼片', 0.9997086]] [[[134.0, 81.0], [181.0, 84.0], [180.0, 104.0], [132.0, 101.0]], ['表格', 0.9891155]] [[[187.0, 84.0], [232.0, 84.0], [232.0, 101.0], [187.0, 101.0]], ['注腳', 0.99958]] [[[13.0, 115.0], [90.0, 115.0], [90.0, 135.0], [13.0, 135.0]], ['自定義列表', 0.99823236]] [[[109.0, 115.0], [219.0, 115.0], [219.0, 135.0], [109.0, 135.0]], ['LaTeX數(shù)學(xué)公式', 0.98812836]] [[[237.0, 115.0], [315.0, 115.0], [315.0, 135.0], [237.0, 135.0]], ['插入甘特圖', 0.9982792]] [[[12.0, 148.0], [94.0, 148.0], [94.0, 167.0], [12.0, 167.0]], ['插入U(xiǎn)ML圖', 0.9926085]] [[[113.0, 148.0], [249.0, 148.0], [249.0, 167.0], [113.0, 167.0]], ['插入Mermaid流程圖', 0.996088]] [[[11.0, 176.0], [153.0, 176.0], [153.0, 200.0], [11.0, 200.0]], ['插入Flowchart流程圖', 0.9780351]] [[[174.0, 179.0], [237.0, 179.0], [237.0, 200.0], [174.0, 200.0]], ['插入類圖', 0.9519753]]

3、easyocr

github上一萬多個(gè)star的開源ocr項(xiàng)目（github地址：EasyOCR），支持80多種語言的識(shí)別，識(shí)別精度超高。

python庫安裝命令如下：

pip install easyocr

待識(shí)別圖片如下：

代碼實(shí)現(xiàn)：

import easyocr#設(shè)置識(shí)別中英文兩種語言 reader = easyocr.Reader(['ch_sim','en'], gpu = False) # need to run only once to load model into memory result = reader.readtext(r"d:\Desktop\4A34A16F-6B12-4ffc-88C6-FC86E4DF6912.png", detail = 0) print(result)

初次運(yùn)行需要在線下載檢測模型和識(shí)別模型，建議在網(wǎng)速好點(diǎn)的環(huán)境運(yùn)行：

Using CPU. Note: This module is much faster with a GPU. Downloading detection model, please wait. This may take several minutes depending upon your network connection. Downloading recognition model, please wait. This may take several minutes depending upon your network connection.

識(shí)別結(jié)果輸出如下，沒有遺漏任何一個(gè)文字，精度甚至要優(yōu)于前面的PaddleOCR：

['幫助文檔', '快捷鍵', '目錄', '標(biāo)題', '文本樣式', '列表', '鏈接', '代碼片', '表格', '注腳', '注釋', '自定義列表', 'LaTex 數(shù)學(xué)公式', '插入甘犄圖', '插入U(xiǎn)ML圖', '插入Mernaid流程圖', '插入 Flowchart流程圖', '插入類圖']

4、muggle_ocr

muggle_ocr是一款輕量級的ocr識(shí)別庫，從名字也可以看出來，專為麻瓜設(shè)計(jì)！使用也非常簡單，但其強(qiáng)項(xiàng)主要是用于識(shí)別各類驗(yàn)證碼，一般文字提取效果就稍差了。

python庫安裝命令如下：

pip install muggle_ocr

待識(shí)別驗(yàn)證碼如下：

代碼實(shí)現(xiàn)：

import muggle_ocr# 初始化sdk；model_type 包含了 ModelType.OCR/ModelType.Captcha 兩種模式,分別對應(yīng)常規(guī)圖片與驗(yàn)證碼 sdk = muggle_ocr.SDK(model_type=muggle_ocr.ModelType.Captcha)with open(r"d:\Desktop\四位驗(yàn)證碼.png", "rb") as f:img = f.read()text = sdk.predict(image_bytes=img) print(text)

識(shí)別結(jié)果輸出如下：

MuggleOCR Session [captcha] Loaded. 3n3d

5、dddd_ocr

dddd_ocr也是一個(gè)用于識(shí)別驗(yàn)證碼的開源庫，又名帶帶弟弟ocr，爬蟲界大佬sml2h3開發(fā)，識(shí)別效果也是非常不錯(cuò)，對一些常規(guī)的數(shù)字、字母驗(yàn)證碼識(shí)別有奇效。

python庫安裝命令如下：

pip install dddd_ocr

待識(shí)別驗(yàn)證碼如下：

代碼實(shí)現(xiàn)：

import ddddocrocr = ddddocr.DdddOcr()with open("d:\Desktop\四位驗(yàn)證碼2.png", 'rb') as f:img_bytes = f.read()res = ocr.classification(img_bytes)print(res)

識(shí)別結(jié)果輸出如下，可以看出即使有一些線條干擾，還是準(zhǔn)確的識(shí)別出了四個(gè)字母：

jepv

6、其他

還有其他優(yōu)秀的ocr識(shí)別庫，以后慢慢更新

總結(jié)

以上是生活随笔為你收集整理的适合小白的几个入门级Python ocr识别库的全部內(nèi)容，希望文章能夠幫你解決所遇到的問題。

如果覺得生活随笔網(wǎng)站內(nèi)容還不錯(cuò)，歡迎將生活随笔推薦給好友。

上一篇：智能传感器应用领域及其发展现状
下一篇： websocket python爬虫_p