當(dāng)前位置：首頁(yè) > 编程语言 > python >内容正文

python

使用Python和OpenCV创建自己的“ CamScanner”

發(fā)布時(shí)間：2023/12/15 python 36 豆豆

生活随笔收集整理的這篇文章主要介紹了使用Python和OpenCV创建自己的“ CamScanner” 小編覺(jué)得挺不錯(cuò)的,現(xiàn)在分享給大家,幫大家做個(gè)參考.

感謝Soham Mhatre為本文做出了重要貢獻(xiàn)。 (Thanks to Soham Mhatre for contributing significantly towards this article.)

計(jì)算機(jī)視覺(jué)又為何嗡嗡聲？ (Computer Vision and why the buzz?)

Computer vision is an interdisciplinary scientific field that deals with how computers can gain high-level understanding from digital images or videos. From the perspective of engineering, it seeks to understand and automate tasks that the human visual system can do. Basically, it’s a scientific field to make the computers understand a photo/video similar to how it will be interpreted by a human being.

計(jì)算機(jī)視覺(jué)是一門(mén)跨學(xué)科的科學(xué)領(lǐng)域，涉及計(jì)算機(jī)如何從數(shù)字圖像或視頻中獲得高級(jí)了解。從工程學(xué)的角度來(lái)看，它試圖理解和自動(dòng)化人類視覺(jué)系統(tǒng)可以完成的任務(wù)。 基本上 ，使計(jì)算機(jī)理解與人類如何理解照片/視頻類似的科學(xué)領(lǐng)域。

那么為什么嗡嗡聲 (So why the buzz)

Advancement in AI and Machine Learning has accelerated the developments in computer vision. Earlier these were two separate fields and there were different techniques, coding languages & academic researchers in both of them. But now, the gap has reduced significantly and more and more data scientists are working in the field of computer vision and vice-a-versa. The reason is the simple common denominator in both the fields— Data.

人工智能和機(jī)器學(xué)習(xí)的進(jìn)步加速了計(jì)算機(jī)視覺(jué)的發(fā)展。之前，這是兩個(gè)單獨(dú)的領(lǐng)域，并且兩者都有不同的技術(shù)，編碼語(yǔ)言和學(xué)術(shù)研究人員。但是現(xiàn)在，這種差距已大大縮小，越來(lái)越多的數(shù)據(jù)科學(xué)家正在計(jì)算機(jī)視覺(jué)領(lǐng)域進(jìn)行研究，反之亦然。原因是兩個(gè)字段(數(shù)據(jù))中的簡(jiǎn)單公分母。

At the end of the day, a computer will learn by consuming data. And AI helps the computers to not only process, but also improve it’s Understanding/Interpretation by trial-and-error. So now, if we can combine the data from images and run complex machine learning algorithms on it, what we get is an actual AI.

最終，計(jì)算機(jī)將通過(guò)使用數(shù)據(jù)來(lái)學(xué)習(xí)。 AI不僅可以幫助計(jì)算機(jī)進(jìn)行處理，還可以通過(guò)反復(fù)試驗(yàn)來(lái)提高對(duì)計(jì)算機(jī)的理解/解釋。因此，現(xiàn)在，如果我們可以合并圖像中的數(shù)據(jù)并在其上運(yùn)行復(fù)雜的機(jī)器學(xué)習(xí)算法，那么我們得到的就是一個(gè)真正的AI。

One modern company who has pioneered the technology of Computer Vision is Tesla Motors

特斯拉汽車(chē)公司 ( Tesla Motors)是率先開(kāi)發(fā)計(jì)算機(jī)視覺(jué)技術(shù)的現(xiàn)代公司

Tesla Motors is known for pioneering the self-driving vehicle revolution in the world. They are also known for achieving high reliability in autonomous vehicles. Tesla cars depend entirely upon computer vision.

特斯拉汽車(chē)公司(Tesla Motors)以在世界上引領(lǐng)自動(dòng)駕駛汽車(chē)革命而聞名。它們還以在自動(dòng)駕駛汽車(chē)中實(shí)現(xiàn)高可靠性而聞名。特斯拉汽車(chē)完全取決于計(jì)算機(jī)視覺(jué)。

今天我們要實(shí)現(xiàn)什么？ (What are we gonna achieve today?)

For this article we will concentrate only on Computer Vision and leave Machine Learning for some later time. Also we will just use just one library OpenCV to create the whole thing.

在本文中，我們將僅專注于計(jì)算機(jī)視覺(jué)，并在以后再使用機(jī)器學(xué)習(xí)。同樣，我們將僅使用一個(gè)庫(kù)OpenCV 來(lái)創(chuàng)建整個(gè)程序。

指數(shù) (Index)

What is OpenCV?

什么是OpenCV？

Preprocess the image using different concepts such as blurring, thresholding, denoising (Non-Local Means).

使用不同的概念(例如模糊，閾值處理，去噪(非局部均值))對(duì)圖像進(jìn)行預(yù)處理。

Canny Edge detection & Extraction of biggest contour

Canny Edge檢測(cè)和最大輪廓提取

Finally — Sharpening & Brightness correction

最后-銳化和亮度校正

什么是OpenCV (What is OpenCV)

OpenCV is a library of programming functions mainly aimed at real-time computer vision. Originally developed by Intel, it was later supported by Willow Garage and then Itseez. The library is cross-platform and free for use under the open-source BSD license. It was initially developed in C++ but now it’s available across multiple languages such Python, Java, etc.

OpenCV是主要針對(duì)實(shí)時(shí)計(jì)算機(jī)視覺(jué)的編程功能庫(kù)。它最初由英特爾開(kāi)發(fā)，后來(lái)得到了Willow Garage和Itseez的支持。該庫(kù)是跨平臺(tái)的，可在開(kāi)源BSD許可下免費(fèi)使用。它最初是用C ++開(kāi)發(fā)的，但現(xiàn)在可以在多種語(yǔ)言中使用，例如Python，Java等。

從預(yù)處理開(kāi)始 (Start with Preprocessing)

燃燒 (BLURRING)

The goal of blurring is to reduce the noise in the image. It removes high frequency content (e.g: noise, edges) from the image — resulting in blurred edges. There are multiple blurring techniques (filters) in OpenCV, and the most common are:

模糊的目的是減少圖像中的噪點(diǎn)。它可以去除圖像中的高頻成分(例如，噪聲，邊緣)，從而導(dǎo)致邊緣模糊。 OpenCV中有多種模糊技術(shù)(過(guò)濾器)，最常見(jiàn)的是：

Averaging — It simply takes the average of all the pixels under kernel area and replaces the central element with this average

平均 -僅取內(nèi)核區(qū)域下所有像素的平均值，然后用該平均值替換中心元素

Gaussian Filter — Instead of a box filter consisting of equal filter coefficients, a Gaussian kernel is used

高斯濾波器 -使用高斯核代替由相等濾波器系數(shù)組成的盒式濾波器

Median Filter — Computes the median of all the pixels under the kernel window and the central pixel is replaced with this median value

中值過(guò)濾器 —計(jì)算內(nèi)核窗口下所有像素的中值，并將中心像素替換為該中值

Bilateral Filter — Advanced version of Gaussian blurring. Not only does it removes noise, but also smoothens edges.

雙邊過(guò)濾器 -高斯模糊的高級(jí)版本。它不僅可以消除噪音，還可以平滑邊緣。

Original Vs Gaussian Blurred原始與高斯模糊

閾值 (THRESHOLDING)

In image processing, thresholding is the simplest method of segmenting images. From a grayscale image, thresholding can be used to create binary images. This is generally done so as to clearly differentiate between different shades of pixel intensities. Most common thresholding techniques in OpenCV are:

在圖像處理中，閾值化是分割圖像的最簡(jiǎn)單方法。從灰度圖像中，閾值可用于創(chuàng)建二進(jìn)制圖像。通常這樣做是為了清楚地區(qū)分像素強(qiáng)度的不同陰影。 OpenCV中最常用的閾值技術(shù)是：

Simple Thresholding — If pixel value is greater than a threshold value, it is assigned one value (may be white), else it is assigned another value (may be black)

簡(jiǎn)單閾值處理 -如果像素值大于閾值，則為其分配一個(gè)值(可以是白色)，否則可以分配另一個(gè)值(可以是黑色)

Adaptive Thresholding — Algorithm calculates the threshold for a small regions of the image. So we get different thresholds for different regions of the same image and it gives us better results for images with varying illumination.

自適應(yīng)閾值處理 -算法為圖像的一小部分計(jì)算閾值。因此，對(duì)于同一圖像的不同區(qū)域，我們獲得了不同的閾值，對(duì)于光照變化的圖像，它可以提供更好的結(jié)果。

Note:Remember to convert the images to grayscale before thresholding

注意：請(qǐng)記住在閾值化之前將圖像轉(zhuǎn)換為灰度

GreyScaled on Original Vs Adaptive Gaussian相對(duì)于自適應(yīng)高斯的灰度

去噪 (DENOISING)

There is another kind of de-noising that we conduct —Non-Local Means Denoising. The principle of the initial denoising methods were to replace the colour of a pixel with an average of the colours of nearby pixels. The variance law in probability theory ensures that if nine pixels are averaged, the noise standard deviation of the average is divided by three. Hence giving us a denoised picture.

我們還會(huì)進(jìn)行另一種降噪- 非本地均值降噪。 初始去噪方法的原理是用附近像素的平均顏色替換像素的顏色。概率論中的方差定律可確保如果將9個(gè)像素平均，則將平均噪聲標(biāo)準(zhǔn)偏差除以3。因此給了我們一張去噪的圖片。

But what if there is edge or elongated pattern where denoising by averaging wont work. Therefore, we need to scan a vast portion of the image in search of all the pixels that really resemble the pixel we want to denoise. Denoising is then done by computing the average colour of these most resembling pixels. This is called — Non-Local Means Denoising.

但是，如果存在邊緣或拉長(zhǎng)的圖案而無(wú)法通過(guò)平均去噪怎么辦？因此，我們需要掃描圖像的很大一部分，以查找與我們要去噪的像素非常相似的所有像素。然后通過(guò)計(jì)算這些最相似像素的平均顏色來(lái)進(jìn)行降噪。這稱為非局部均值去噪。

Use cv2.fastNlMeansDenoising for the same.

使用cv2.fastNlMeansDenoising相同。

Original vs Gaussian Blurred vs Non-Local Means Denoised原始vs高斯模糊vs非本地平均值去噪

Canny Edge檢測(cè)和最大輪廓提取 (Canny Edge detection & Extraction of biggest contour)

After image blurring & thresholding, the next step is to find the biggest contour (biggest bounding box) and crop out the image. This is done by using Canny Edge Detection followed by extraction of biggest contour using four-point transformation.

在圖像模糊和閾值化之后，下一步是找到最大的輪廓(最大的邊界框)并裁剪出圖像。這是通過(guò)使用Canny Edge Detection進(jìn)行的，然后使用四點(diǎn)變換提取最大輪廓。

佳能 (CANNY EDGE)

Canny edge detection is a multi-step algorithm that can detect edges. We should send a de-noised image to this algorithm so that it is able to detect relevant edges only.

Canny邊緣檢測(cè)是可以檢測(cè)邊緣的多步算法。我們應(yīng)該將降噪后的圖像發(fā)送到此算法，以便它只能檢測(cè)相關(guān)的邊緣。

查找輪廓 (FIND CONTOURS)

After finding the edges, pass the image through cv2.findcontours(). It joins all the continuous points (along the edges), having same colour or intensity. After this we will get all contours — rectangles, spheres, etc

找到邊緣后，將圖像傳遞到cv2.findcontours() 。它連接具有相同顏色或強(qiáng)度的所有連續(xù)點(diǎn)(沿邊)。之后，我們將獲得所有輪廓-矩形，球形等

Use cv2.convexHull() and cv2.approxPolyDP to find the biggest rectangular contour(approx) in the photo.

使用cv2.convexHull()和cv2.approxPolyDP查找照片中最大的矩形輪廓。

Original vs Original with biggest bounding box原始版與原始版具有最大邊界框

提取最大的輪廓 (EXTRACTING THE BIGGEST CONTOUR)

Although we have found the biggest contour which looks like a rectangle, we still need to find the corners so as to find the exact co-ordinates to crop the image.

盡管我們找到了看起來(lái)像矩形的最大輪廓，但仍然需要找到拐角以便找到精確的坐標(biāo)來(lái)裁剪圖像。

For this first you pass the co-ordinates of the approx rectangle(biggest contour) and apply an order points transformation on the same. The resultant is an exact (x,y) coordinates of the biggest contour.

首先，您傳遞近似矩形(最大輪廓)的坐標(biāo)，并在其上應(yīng)用順序點(diǎn)轉(zhuǎn)換。結(jié)果是最大輪廓的精確(x，y)坐標(biāo)。

Four Point Transformation — Using the above (x,y) coordinates, calculate the width and height of the contour. Pass it through the cv2.warpPerspective()to crop the contour. Voila — you have the successfully cropped out the relevant data from the input image

四點(diǎn)變換 —使用上述(x，y)坐標(biāo)，計(jì)算輪廓的寬度和高度。通過(guò)cv2.warpPerspective()來(lái)裁剪輪廓。瞧-您已成功從輸入圖像中裁剪出相關(guān)數(shù)據(jù)

Original vs Cropped Image原始圖像與裁剪圖像

Notice — How well the image is cropped out even though its a poorly lit and clicked image

注意—即使光線不足并單擊的圖像，其裁剪效果也很好

最后-銳化和亮度校正 (Finally — Sharpening & Brightness correction)

Now that we have cropped out the relevant info (biggest contour) from the image, the last step is to sharpen the picture so that we get well illuminated and readable document.

現(xiàn)在我們已經(jīng)從圖像中裁剪出了相關(guān)的信息(最大輪廓)，最后一步是使圖片銳化，從而使我們獲得照亮且可讀的文檔。

— For this we use hue, saturation, value (h,s,v) concept where value represents the brightness. Can play around with this value to increase the brightness of the documents

—為此，我們使用色相，飽和度，值(h，s，v)概念，其中值表示亮度。可以玩這個(gè)值來(lái)增加文件的亮度

— Kernel Sharpening - A kernel, convolution matrix, or mask is a small matrix. It is used for blurring, sharpening, embossing, edge detection, and more. This is accomplished by doing a convolution between a kernel and an image

— 內(nèi)核銳化-內(nèi)核 ， 卷積矩陣或掩碼是小矩陣。它用于模糊，銳化，壓紋，邊緣檢測(cè)等。這是通過(guò)在內(nèi)核和映像之間進(jìn)行卷積來(lái)完成的

結(jié)果 (Resultant)

Original Vs Final Resultant (Cropped, Brightened & Sharpened)原始與最終結(jié)果(裁剪，變亮和變亮)

完整的代碼 (Complete Code)

Here is the final code

這是最終代碼

To go through my other Data Science/Machine Learning blogs please visit:

要瀏覽其他數(shù)據(jù)科學(xué)/機(jī)器學(xué)習(xí)博客，請(qǐng)?jiān)L問(wèn)：

The end for now. Have any ideas to improve this or want me to try any new ideas? Please give your suggestions in the comments. Adios.

現(xiàn)在結(jié)束。有什么想法可以改善這一點(diǎn)，還是要我嘗試任何新的想法？請(qǐng)?jiān)谠u(píng)論中提出您的建議。 Adios。

翻譯自: https://levelup.gitconnected.com/create-your-own-camscanner-using-python-opencv-66251212270

總結(jié)

以上是生活随笔為你收集整理的使用Python和OpenCV创建自己的“ CamScanner”的全部?jī)?nèi)容，希望文章能夠幫你解決所遇到的問(wèn)題。

如果覺(jué)得生活随笔網(wǎng)站內(nèi)容還不錯(cuò)，歡迎將生活随笔推薦給好友。

上一篇：使用OpenCV，Keras和Tenso
下一篇： python进阶指南_Python特性工