日韩性视频-久久久蜜桃-www中文字幕-在线中文字幕av-亚洲欧美一区二区三区四区-撸久久-香蕉视频一区-久久无码精品丰满人妻-国产高潮av-激情福利社-日韩av网址大全-国产精品久久999-日本五十路在线-性欧美在线-久久99精品波多结衣一区-男女午夜免费视频-黑人极品ⅴideos精品欧美棵-人人妻人人澡人人爽精品欧美一区-日韩一区在线看-欧美a级在线免费观看

歡迎訪問 生活随笔!

生活随笔

當(dāng)前位置: 首頁 > 编程资源 > 编程问答 >内容正文

编程问答

机器学习项目案例 简单的数字验证码自动识别

發(fā)布時(shí)間:2024/3/13 编程问答 33 豆豆
生活随笔 收集整理的這篇文章主要介紹了 机器学习项目案例 简单的数字验证码自动识别 小編覺得挺不錯(cuò)的,現(xiàn)在分享給大家,幫大家做個(gè)參考.

本篇文章將實(shí)現(xiàn)一個(gè)識(shí)別驗(yàn)證碼的案例。
基本思路及步驟:
1.先寫一個(gè)關(guān)于驗(yàn)證碼生成器的代碼,得到一個(gè)有關(guān)驗(yàn)證碼的庫
2.對(duì)驗(yàn)證碼庫中的驗(yàn)證碼圖片進(jìn)行處理并對(duì)其分割
3.訓(xùn)練數(shù)據(jù),得到模型
4.對(duì)未知的驗(yàn)證碼圖片進(jìn)行預(yù)測(cè)

由于目前的驗(yàn)證碼的形式比較多樣,但是驗(yàn)證的思路都是類似的,因此就先從簡單的數(shù)字開始進(jìn)行識(shí)別。我們先需要寫一個(gè)驗(yàn)證碼生成器,生成驗(yàn)證碼庫。
驗(yàn)證碼需要有5個(gè)數(shù)字,并且有不同的顏色,還要再圖片上加一些噪點(diǎn)和一些隨機(jī)的線。
代碼如下:

from PIL import Image from PIL import ImageDraw from PIL import ImageFont import randomdef getRandomColor():"""獲取一個(gè)隨機(jī)顏色(r,g,b)格式的:return:"""c1 = random.randint(0, 255)c2 = random.randint(0, 255)c3 = random.randint(0, 255)if c1 == 255:c1 = 0if c2 == 255:c2 = 0if c3 == 255:c3 = 0return(c1, c2, c3)def getRandomStr():"""獲取一個(gè)隨機(jī)數(shù)字,每個(gè)數(shù)字的顏色也是隨機(jī)的:return:"""random_num = str(random.randint(0, 9))return random_numdef generate_captcha():# 獲取一個(gè)Image對(duì)象,參數(shù)分別是RGB模式。寬150,高30, 隨機(jī)顏色image = Image.new('RGB', (150, 50), (255,255,255))# 獲取一個(gè)畫筆對(duì)象,將圖片對(duì)象傳過去draw = ImageDraw.Draw(image)# 獲取一個(gè)font字體對(duì)象參數(shù)是ttf的字體文件的目錄,以及字體的大小font = ImageFont.truetype("ARLRDBD.TTF", size=32)label = ""for i in range(5):random_char = getRandomStr()label += random_char# 在圖片上寫東西,參數(shù)是:定位,字符串,顏色,字體draw.text((10+i*30, 0), random_char, getRandomColor(), font=font)# 噪點(diǎn)噪線width = 150height = 30# 畫線for i in range(3):x1 = random.randint(0, width)x2 = random.randint(0, width)y1 = random.randint(0, height)y2 = random.randint(0, height)draw.line((x1, y1, x2, y2), fill=(0, 0, 0))# 畫點(diǎn)for i in range(5):draw.point([random.randint(0, width), random.randint(0, height)], fill=getRandomColor())x = random.randint(0, width)y = random.randint(0, height)draw.arc((x, y, x + 4, y + 4), 0, 90, fill=(0, 0, 0))# 保存到硬盤,名為test.png格式為png的圖片image.save(open(''.join(['captcha_images/', label, '.png']), 'wb'), 'png')# image.save(open(''.join(['captcha_predict/', label, '.png']), 'wb'), 'png') if __name__ == '__main__':for i in range(150):generate_captcha()

運(yùn)行程序之后生成150個(gè)驗(yàn)證碼圖片,會(huì)將驗(yàn)證碼保存到文件夾中,相當(dāng)于一個(gè)庫,如下:




生成驗(yàn)證碼之后,我們需要對(duì)驗(yàn)證碼圖片進(jìn)行處理,具體處理的步驟如下:
1.對(duì)驗(yàn)證碼圖片二值化,首先把圖像從RGB 三通道轉(zhuǎn)化成Gray單通道,然后把灰度圖(0~255)轉(zhuǎn)化成二值圖(0,1)。
2.對(duì)二值化驗(yàn)證碼圖片進(jìn)行降噪處理,把干擾的點(diǎn)和線去掉
3.對(duì)處理后的驗(yàn)證碼圖片進(jìn)行分割,根據(jù)像素格,把圖片中的所有(5個(gè))數(shù)字,分別保存到對(duì)應(yīng)的0~9文件夾下。

具體代碼如下:

from PIL import Image import numpy as np import matplotlib.pyplot as plt import osdef binarization(path):img = Image.open(path)img_gray = img.convert('L')img_gray = np.array(img_gray)w, h = img_gray.shapefor x in range(w):for y in range(h):gray = img_gray[x, y]if gray <= 220:img_gray[x, y] = 0else:img_gray[x, y] = 1return img_gray# plt.figure('')# plt.imshow(img_gray, cmap='gray')# plt.axis('off')# plt.show()def noiseReduction(img_gray, label):height, width = img_gray.shapefor x in range(height-1):for y in range(width-1):cnt = 0if img_gray[x, y] == 1:continueelse:for i in [-1, 0, 1]:n = xn += iif n < 0:n = 0for j in [-1, 0, 1]:m = ym += jif m < 0:m = 0if img_gray[n, m] == 0:cnt += 1if cnt <= 4:img_gray[x, y] = 1plt.figure('')plt.imshow(img_gray, cmap='gray')plt.axis('off')plt.savefig(''.join(['clean_captcha_img/', label, '.png']))def img_2_clean():captchas = os.listdir(''.join(['captcha_images/']))for captcha in captchas:label = captcha.split('.')[0]img_path = ''.join(['captcha_images/', captcha])im = binarization(img_path)noiseReduction(im, label)def cutImg(label):labels = list(label)img = Image.open(''.join(['clean_captcha_img/', label, '.png']))for i in range(5):pic = img.crop((100*(1+i), 170, 100*(1+i)+100, 280))plt.imshow(pic)seq = get_save_seq(label[i])pic.save(''.join(['cut_number/', str(label[i]), '/', str(seq), '.png']))def get_save_seq(num):numlist = os.listdir(''.join(['cut_number/', num, '/']))if len(numlist) == 0 or numlist is None:return 0else:max_file = 0for file in numlist:if int(file.split('.')[0]) > max_file:max_file = int(file.split('.')[0])return int(max_file)+1def create_dir():for i in range(10):os.makedirs(''.join(['cut_number/', str(i)]))def clean2cut():clean_img = os.listdir(''.join(['clean_captcha_img/']))for img in clean_img:label = img.split('.')[0]cutImg(label)if __name__ == '__main__':img_2_clean()create_dir()clean2cut()

二值化并且降噪后的圖片如下:

切割后的圖片會(huì)保存在對(duì)應(yīng)的數(shù)字文件夾中,

比如切割后的數(shù)字 6 如下:


1.把數(shù)據(jù)帶入邏輯回歸進(jìn)行建模

(1)把切割好的數(shù)據(jù),按照x(二位數(shù)組),y(一維數(shù)組)的方式傳入logisticRegression.fit()函數(shù)進(jìn)行擬合
我們可以通過網(wǎng)格搜索(GridSearch)來進(jìn)行調(diào)參
(2)通過joblib包,把模型保存到本地

2.得到模型后,進(jìn)行圖像驗(yàn)證
(1)根據(jù)之前處理圖像的步驟,重復(fù)操作新的圖像
(2)對(duì)切割好的每個(gè)圖像,獨(dú)立的進(jìn)行預(yù)測(cè)
(3)把最后預(yù)測(cè)結(jié)果進(jìn)行拼接

注意在代碼中需要導(dǎo)入之前寫的函數(shù),
代碼如下:

import os from PIL import Image import numpy as np from sklearn.model_selection import train_test_split from sklearn.linear_model import LogisticRegression from sklearn.externals import joblib from CAPTCHA.captcha_logistic import *def load_data():# 假設(shè)20*5像素塊構(gòu)成 20*5 = 100# [[11...1111]# [111...111]# ....# [11111111]]# X = [[11111.....11111]] 100位 Y = [0]X, Y = [], []cut_list = os.listdir('cut_number')for numC in cut_list:num_list_dir = ''.join(['cut_number/', str(numC), '/'])nums_dir = os.listdir(num_list_dir)for num_file in nums_dir:img = Image.open(''.join(['cut_number/', str(numC), '/', num_file]))img_gray = img.convert('L')img_array = np.array(img_gray)w, h = img_array.shapefor x in range(w):for y in range(h):gray = img_array[x, y]if gray <= 240:img_array[x, y] = 0else:img_array[x, y] = 1img_re = img_array.reshape(1, -1)X.append(img_re[0])Y.append(int(numC))return np.array(X), np.array(Y)def generate_model(X, Y):X_train, X_test, Y_train, Y_test = train_test_split(X, Y, test_size=0.3)log_clf = LogisticRegression(multi_class='ovr', solver='sag', max_iter=10000)# 利用交叉驗(yàn)證選擇參數(shù)# param_grid = {"tol": [1e-4, 1e-3, 1e-2],# "C": [0.4, 0.6, 0.8]}# grid_search = GridSearchCV(log_clf, param_grid=param_grid, cv=3)# grid_search.fit(X_train, Y_train)log_clf.fit(X_train, Y_train)# 將模型持久化joblib.dump(log_clf, 'captcha_model/captcha_model.model')def get_model():model = joblib.load('captcha_model/captcha_model.model')return modeldef capthca_predict():path = 'captcha_predict/unknown.png'pre_img_gray = binarizaion(path)noiseReduction(pre_img_gray, 'unknown')# cut imagelabels = ['0', '1', '2', '3', '4']img = Image.open(''.join(['clean_captcha_img/unknown.png']))for i in range(5):pic = img.crop((100*(1+i), 170, 100*(1+i)+100, 280))plt.imshow(pic)pic.save(''.join(['captcha_predict/', labels[i], '.png']))result = ''model = get_model()for i in range(5):path = ''.join(['captcha_predict/', labels[i], '.png'])img = Image.open(path)img_gray = img.convert('L')img_array = np.array(img_gray)w, h = img_array.shapefor x in range(w):for y in range(h):gray = img_array[x, y]if gray <= 220:img_array[x, y] = 0else:img_array[x, y] = 1img_re = img_array.reshape(1, -1)X = img_re[0]y_pre = model.predict([X])result = ''.join([result, str(y_pre[0])])return resultif __name__ == '__main__':X, Y = load_data()generate_model(X, Y)model = get_model()result = capthca_predict()print(result)

將要預(yù)測(cè)識(shí)別的驗(yàn)證碼圖片:

最終識(shí)別結(jié)果:

可以看到對(duì)給出的驗(yàn)證碼圖片進(jìn)行了成功識(shí)別。

總結(jié)

以上是生活随笔為你收集整理的机器学习项目案例 简单的数字验证码自动识别的全部內(nèi)容,希望文章能夠幫你解決所遇到的問題。

如果覺得生活随笔網(wǎng)站內(nèi)容還不錯(cuò),歡迎將生活随笔推薦給好友。