當前位置：首頁 > 编程资源 > 编程问答 >内容正文

编程问答

面向表开发面向服务开发_面向繁忙开发人员的计算机视觉

發布時間：2023/12/15 编程问答 34 豆豆

生活随笔收集整理的這篇文章主要介紹了面向表开发面向服务开发_面向繁忙开发人员的计算机视觉小編覺得挺不錯的,現在分享給大家,幫大家做個參考.

面向表開發面向服務開發

This article is part of a series introducing developers to Computer Vision. Check out other articles in this series.

本文是向開發人員介紹 Computer Vision 的系列文章的一部分。查看本系列的其他文章。

In my career, object detection and tracking has been one of the hottest topics in Computer Vision. I wish I could dive right into what makes all of it possible, but I learned that object detection and tracking relies on a whole lot of other concepts — most of which we’ve already covered in previous articles. First, let’s first define what these terms mean so that there’s no confusion. Object Detection is the ability to determine if a predetermined object is contained within a given image and it’s location within the overall image (2D Space). Often, but not always, Object Detection can also determine the position, orientation and scale of the object within the 3D space that is represented by the image. The term pose is used to describe the position, orientation and (sometimes) scale of an object in 3D space.

在我的職業生涯中，對象檢測和跟蹤一直是Computer Vision中最熱門的主題之一。我希望我能深入研究使這一切成為可能的原因，但是我了解到對象檢測和跟蹤還依賴于許多其他概念-我們在前面的文章中已經介紹了其中的大多數概念。首先，讓我們首先定義這些術語的含義，以免造成混淆。 對象檢測是確定給定圖像中是否包含預定對象及其在整個圖像(2D空間)中的位置的能力。通常但并非總是如此，對象檢測還可以確定對象在3D空間中由圖像表示的位置，方向和比例。術語“ 姿勢”用于描述對象在3D空間中的位置，方向和(有時)比例。

Image Classification is a similar concept where we are trying to determine if a class of objects is located within an image. The big difference is that Image Classification can detect a much broader set of objects and does so in the 2D space of the image. For example, does this image contain a type of “dog” or “cat” (regardless of breed) within this image? As of the time of this writing, determining 3D pose for image classification is still a hard and unsolved problem. We’ll go into more details of Image Classification when we cover Machine Learning later on.

圖像分類是一個相似的概念，我們試圖確定一類對象是否位于圖像內。最大的區別在于，圖像分類可以檢測到更廣泛的對象集，并且可以在圖像的2D空間中進行檢測。例如，此圖像中的圖像中是否包含“狗”或“貓”的類型(與品種無關)？截至撰寫本文時，確定3D姿勢進行圖像分類仍然是一個艱巨而未解決的問題。稍后我們將討論機器學習時，我們將進一步介紹圖像分類。

SIDE NOTE: One of the confusing things about researching computer vision is that the terms are not always consistent and there isn’t always a well accepted definition for the terms. Object Detection, Image Classification, Image Recognition are such terms. I hypothesize that this is a result of the field being relatively new/evolving and, due to the international research community, there’s likely some definitions and terms that are lost in translation. Instead of splitting hairs on the terms, I will do my best to describe the desired result of each term.

旁注：關于研究計算機視覺的一個令人困惑的事情是，這些術語并不總是一致的，并且對于這些術語而言，始終沒有一個公認的定義。對象檢測，圖像分類，圖像識別就是這些術語。我假設這是該領域相對較新/不斷發展的結果，并且由于國際研究界的影響，翻譯中可能會丟失一些定義和術語。我將盡力描述每個術語的預期結果，而不是在術語上費力。

Object Tracking is where we continuously extract pose information across consecutive frames — such as a video. Object Tracking asks the question: how is my object moving throughout a video. It’s also important to point out that in some use-cases object detection and tracking must happen in real-time and other cases it does not. Real-time object detection and tracking happens when our algorithms are able to provide pose information at a rate that is (or feels) comparable to the rate at which the frames change. For example, if the video is played back at a rate of 30 frames per second, the pose information needs to be provided at a similar rate.

對象跟蹤是我們在連續的幀(例如視頻)中連續提取姿勢信息的地方。對象跟蹤提出了一個問題：我的對象在整個視頻中如何移動。還必須指出，在某些用例中，對象檢測和跟蹤必須實時進行，而在其他情況下則不是。當我們的算法能夠以與幀變化速率可比(或感覺)可比的速率提供姿勢信息時，便會進行實時對象檢測和跟蹤。例如，如果以每秒30幀的速率播放視頻，則需要以類似的速率提供姿勢信息。

從描述符開始 (Starting with Descriptors)

We’re going to kick things off by starting with the basics: detecting flat or planar objects within our image. Think of objects such as the cover of a book, a postcard, a poster, or any other flat, 2D object. Our approach here is going to be reminiscent of what we did when we discussed the Template technique. For these examples, we are going to refer to the image that defines our object as the “Image Target” and we’re going to refer to the image where we are trying to detect the object as our “Search Image”.

我們將從基礎知識入手，在圖像中檢測平面對象。想想諸如書的封面，明信片，海報或任何其他平面2D對象之類的對象。我們這里的方法將使我們想起討論模板技術時所做的事情。對于這些示例，我們將把定義對象的圖像稱為“圖像目標”，并將試圖將其檢測到的圖像稱為“搜索圖像”。

SIDE NOTE: In augmented reality and other computer vision systems, the source image which we use to define our object has many names. In ARKit and ARCore they are called “Reference Images”. In Vuforia , Wikitude and Magic Leap, they are referred to as “Image Targets” while SparkAR uses the term “Target Image”. To make things a little bit more confusing, OpenCV and academic research papers often use the term “Image Marker” — often to refer to artificial images like ArUco markers or QR Codes.

旁注：在增強現實和其他計算機視覺系統中，用于定義對象的源圖像具有許多名稱。在ARKit和ARCore中，它們稱為“參考圖像”。在Vuforia ， Wikitude和Magic Leap中，它們被稱為“圖像目標”，而SparkAR使用術語“目標圖像”。為了使事情更加混亂， OpenCV和學術研究論文經常使用“圖像標記”一詞-經常指代諸如ArUco標記或QR碼之類的人工圖像。

While in the Template Technique we defined our object as a list of pixel values, with Image Targets, we define our object as a list of feature descriptors that are invariant to scale, orientation and any other property that is relevant to our use-case. (If you need a refresher on descriptors, check out the Describing Features article)

在模板技術中，我們將對象定義為像素值列表，而在使用“圖像目標”時，將對象定義為特征描述符的列表，這些特征描述符的大小，方向和與用例相關的其他屬性不變。 (如果需要對描述符進行復習，請查看“ 描述功能”文章)

SIDE NOTE: While we are using SIFT Feature descriptors in this example, there are use-cases where having scale and orientation invariance is not necessary. For example, in lots of Visual FX work, the scale and orientation does not change dramatically and a simple Harris Feature detector is sufficient for object detection. The specific algorithms used in computer vision should always be driven by the use-case.

旁注：在本示例中，當我們使用SIFT特征描述符時，在某些用例中，不需要縮放和方向不變。例如，在許多Visual FX工作中，比例和方向不會發生很大變化，并且簡單的Harris特征檢測器就足以進行對象檢測。計算機視覺中使用的特定算法應始終由用例決定。

This leads us to the first step: we need to generate a list of feature descriptors from the image target. The example below shows the SIFT features extracted from our image target.

這將我們引至第一步：我們需要從圖像目標生成特征描述符列表。下面的示例顯示了從圖像目標中提取的SIFT特征。

Creating AR+VR)創建AR + VR )

Once we have our list of features, we need to determine if lots of these same descriptors appear on the Search Image. The second step would be for us to run the same algorithm we used on the Image Target on the search image. The Example below shows the SIFT features extracted from our search image.

有了功能列表后，我們需要確定搜索圖像上是否出現大量相同的描述符。第二步是讓我們運行在搜索圖像上的“圖像目標”上使用的相同算法。下面的示例顯示了從搜索圖像中提取的SIFT特征。

Vinny DaSilva)Vinny DaSilva )

比較兩個描述符 (Comparing Two Descriptors)

Once we have the descriptors from both the template image and the image target, we need to compare our feature descriptors to each other. To get things started, let’s learn how to compare just two individual feature descriptors first. Again, we’re going to look back at how we approached Templates. Recall that the way that we compare these data points is by calculating a “distance”. The smaller the distance, the more alike the two data points are. We already know we can calculate the distance between pixels by using the Pythagorean Theorem. We can do this because we treat the individual color values of red, green and blue as if they were the x, y and z values of a 3D point in space. The intuition that I’d like to get across here, is that we can treat points in 3D space, color pixels and even Feature Descriptors in the same way. For instance, here is what a pixel would look like if we were to put it into histogram form.

一旦有了模板圖像和圖像目標的描述符，就需要將特征描述符進行比較。首先，讓我們學習如何僅比較兩個單獨的特征描述符。再次，我們將回顧如何處理Templates 。回想一下，我們比較這些數據點的方式是通過計算“距離”。距離越小，兩個數據點越相似。我們已經知道可以使用勾股定理來計算像素之間的距離。我們之所以這樣做，是因為我們將紅色，綠色和藍色的各個顏色值視為空間中3D點的x，y和z值。我想在這里理解的直覺是，我們可以以相同的方式處理3D空間中的點，彩色像素甚至特征描述符。例如，如果將像素設置為直方圖形式，則像素的外觀如下。

A different way to look at a pixel is to describe it as a histogram觀察像素的另一種方法是將其描述為直方圖

Another way to think about a pixel is that it represents the distribution of color intensity across the buckets of Red, Green and Blue. When we see a pixel being represented this way, it’s clear that it looks a lot like the data of we have for descriptors, a histogram of gradients:

思考像素的另一種方法是，它代表整個紅色，綠色和藍色桶中的顏色強度分布。當我們看到以這種方式表示像素時，很明顯，它看起來很像我們用于描述符的數據，即梯度的直方圖：

Histograms are a clever way of organizing data in a way that makes it easier to compare直方圖是一種組織數據的聰明方法，使比較起來更容易

So, if we can treat a pixel’s three colors in the same way that we treat a descriptor’s histogram of gradients, then we should be able to calculate distances of a histogram of gradients in the same way we calculate distances between two colors.

因此，如果我們可以像對待描述符的梯度直方圖一樣處理像素的三種顏色，那么我們應該能夠像計算兩種顏色之間的距離一樣計算梯度直方圖的距離。

Of course, descriptors have more data than what is shown above. The specifics of the data will be different depending on the descriptor algorithm used (SIFT includes data for 16 descriptor sub-regions each sub-region containing 8 buckets — a total of 128 individual values). While feature descriptors do include a whole lot more data than a simple color pixel, we can calculate distances between descriptors in the same way — we use the Pythagorean Theorem and treat every piece of data as an additional dimension for our descriptor.

當然，描述符比上面顯示的具有更多的數據。數據的具體情況將根據所使用的描述符算法而有所不同(SIFT包含16個描述符子區域的數據，每個子區域包含8個存儲桶-總共128個單獨值)。盡管特征描述符比簡單的彩色像素包含的數據要多得多，但我們可以以相同的方式計算描述符之間的距離-我們使用勾股定理，并將每條數據視為描述符的附加維度。

This approach using Pythagorean Theorem to calculate distance is referred to as the Euclidean Distance. This is not the only method of calculating distances. In cases where the descriptors are binary, such as is the case with ORB Descriptors, the Hamming Distance may be more appropriate. Even if the feature descriptors are not binary, there are some situations where the Manhattan Distance may be more appropriate. The specific formula to calculate distance between descriptors should be tailored to your use-case.

使用勾股定理計算距離的這種方法稱為歐幾里得距離。這不是計算距離的唯一方法。在描述符是二進制的情況下(例如ORB描述符) ，漢明距離可能更合適。即使特征描述符不是二進制的，在某些情況下曼哈頓距離可能更合適。計算描述符之間距離的特定公式應適合您的用例。

SIDE NOTE: As mentioned in the Thresholding article, — the last square-root operation in the euclidean distance formula is expensive and often unnecessary when we are just comparing distances between each other — this results in the squared euclidean distance.

旁注：如“ 閾值”文章所述，—歐幾里德距離公式中的最后一個平方根運算很昂貴，并且當我們僅比較彼此之間的距離時通常是不必要的—這導致了歐幾里德距離的平方。

比較所有描述符 (Comparing All of the Descriptors)

Once we know how to compare two feature descriptors, we are able to compare all of the descriptors. Remember, we’re looking to “match” the descriptors from the Image Target to the appropriate descriptors in the search image. Finding an appropriate pair for a feature descriptor is known as finding its Correspondence. Similar to thresholds, correspondence matches won’t be exact, rather we’re looking for the descriptors that most resembles the descriptors in the image target — In other words, we’re looking for the Nearest Neighbor matches.

一旦我們知道如何比較兩個特征描述符，便能夠比較所有描述符。記住，我們希望將“圖像目標”中的描述符“匹配”到搜索圖像中的適當描述符。為特征描述符找到合適的對稱為“找到其對應關系” 。與閾值相似，對應匹配也不是精確的，而是我們正在尋找與圖像目標中的描述符最相似的描述符-換句話說，我們正在尋找最近的鄰居匹配。

We can certainly check every feature descriptor in one image against every feature descriptor in the other image in a brute force manner, but it’s not a very efficient approach for real-time algorithms. This type of search is actually a fairly common problem in computer science. There are several approaches to solving this problem, including popular algorithms like the K-d Trees algorithm.

當然，我們可以用蠻力方式將一幅圖像中的每個特征描述符與另一幅圖像中的每個特征描述符進行檢查，但這對于實時算法而言并不是一種非常有效的方法。這種搜索實際上是計算機科學中相當普遍的問題。有幾種解決此問題的方法，包括流行的算法，例如Kd Trees算法。

K-d Trees is a generic algorithm for quickly searching for multidimensional data. I won’t go into a ton of detail here since there are lots of great resources explaining K-d Trees, but the general idea is that we take all of the feature descriptors in our Image Target and we reorganize them into a data structure that iteratively splits all of the the descriptors in half each iteration on a different dimension of the data.

Kd Trees是用于快速搜索多維數據的通用算法。由于這里有大量解釋Kd樹的豐富資源，因此在此不做大量介紹，但是總體思路是，我們在Image Target中采用了所有特征描述符，并將它們重新組織成一個迭代分解的數據結構所有描述符在不同維度的數據上每次迭代進行一半。

These search/match algorithms are not perfect. This is particularly the case when a specific feature is part of a repeating pattern. In this case, there will be more than one match with very similar distances — in other words, we have some ambiguous matches. Since we cannot confidently determine which match is correct, we reject the match to avoid false matches. In order to filter out ambiguous matches, the Nearest Neighbor Distance Ratio (NNDR) check is performed. We take the two best instances and compare them to each other. If they are very similar, we will reject it as an ambiguous match. Ideally, the top two matches are sufficiently different that they are not ambiguous. We want one feature in our Image Target to only match well with one other feature in the search image. The NNDR check is performed by dividing the best match by the second best match and you want to reject matches that are larger than a threshold (the SIFT Paper recommends 0.8).

這些搜索/匹配算法并不完美。當特定特征是重復圖案的一部分時，尤其如此。在這種情況下，將有多個距離非常相似的比賽-換句話說，我們有一些模糊的比賽。由于我們無法自信地確定哪個匹配是正確的，因此我們拒絕該匹配以避免虛假匹配。為了過濾出不明確的匹配項，將執行最近鄰居距離比 (NNDR)檢查。我們將兩個最佳實例進行比較。如果它們非常相似，我們將其視為歧義匹配。理想情況下，前兩個比賽要有足夠的區別，以至于它們不會模糊。我們希望圖像目標中的一項功能只能與搜索圖像中的另一項功能很好地匹配。 NNDR檢查是通過將最佳匹配項除以第二最佳匹配項來執行的，并且您要拒絕大于閾值的匹配項( SIFT紙建議為0.8)。

Vinny DaSilva)Vinny DaSilva )

The last bit of info I’d like to share on matching descriptors is FLANN. FLANN stands for Fast Library for Approximate Nearest Neighbors. FLANN contains the most effective algorithms for nearest neighbor searching and attempts to pick the best algorithms and tunes them based on the dataset. FLANN is an open source library that came out of the same university as SIFT and is included as part of OpenCV.

我想在匹配描述符上分享的最后一點信息是FLANN 。 FLANN代表大約最近鄰居的快速庫。 FLANN包含用于最近鄰居搜索的最有效算法，并嘗試選擇最佳算法并根據數據集對其進行調整。 FLANN是一個開放源代碼庫，與SIFT來自同一所大學，并且是OpenCV一部分。

圖像配準 (Image Registration)

This next section was one of the toughest for me to write. There’s a lot of new concepts here along with lots of different terminology. I am going to focus on the concepts and terms which I feel are the most important to detecting a planar object.

下一節是我最難寫的部分。這里有很多新概念以及許多不同的術語。我將重點介紹我認為對于檢測平面物體最重要的概念和術語。

The above illustrates the correspondence of SIFT feature descriptors that are matched between the Image Target and the Search Image. The two images were matched using FLANN and a Nearest Neighbor Distance Ratio of 0.8 (as per Lowe’s original paper)上面示出了在圖像目標和搜索圖像之間匹配的SIFT特征描述符的對應關系。使用FLANN和0.8的最近鄰居距離比(根據Lowe的原始論文)對這兩個圖像進行了匹配。

Now that we have our feature descriptors matched across our Image Target and Search Image, we’re ready for the final step — Image Registration. Image Registration is the process of putting two images together so they are in the same frame-of-reference. Image Registration leads us to an estimated 2D or 3D pose of the Image Target within the Search Image. Image Registration works by estimating the projection of the object onto an image frame based on the matched feature descriptors.

現在，我們已經在圖像目標和搜索圖像中匹配了特征描述符，我們已經準備好進行最后一步-圖像配準。 圖像配準是將兩個圖像放在一起以使它們位于同一參照系中的過程。圖像配準將我們帶到搜索圖像中圖像目標的估計2D或3D姿勢。圖像配準通過基于匹配的特征描述符估計對象在圖像幀上的投影來進行。

The last step is to “project” our Image Target onto the Search Image so that the two images are in the same frame-of-reference.最后一步是將我們的圖像目標“投影”到搜索圖像上，以使兩個圖像位于同一參照系中。

Let’s take a second to conceptualize what is happening when we are trying to find our Image Target within our Search Image. Assume for a second that we have a magic function that given any particular X and Y pair of pixels in our Image Target, it returns the corresponding pair of pixels that are matched in our search image.

讓我們花一點時間來概念化當我們試圖在搜索圖像中找到圖像目標時發生的情況。假設一秒鐘，我們有一個魔術函數，給定圖像目標中的任何特定X和Y對像素，它將返回在搜索圖像中匹配的相應像素對。

var pointInSearchImage = FindCorrespondingPoint(int x, int y);

If we were to look into this magic function, we would see that the input X and Y values are being manipulated by a 3x3 matrix called the Homography Matrix. The Homography Matrix can convert the position of any pixel in our image target and find the corresponding pixel in the Search Image (it can also do the inverse). With this information, we can determine the 2D (or 3D) pose of our planar object within the search image.

如果我們研究這個魔術函數，我們將看到輸入的X和Y值被稱為Homography Matrix的3x3矩陣操縱。同構矩陣可以轉換圖像目標中任何像素的位置，并在搜索圖像中找到相應的像素(它也可以進行逆運算)。有了這些信息，我們就可以確定搜索圖像中平面對象的2D(或3D)姿勢。

A magic function isn’t super helpful, though! How do we make this function something we can actually use? The first thing we need to do is to calculate the Homography Matrix. In order to do that, we’re going to need to take 4 of the corresponding pairs of points we got from our K-d Tree search ( as mentioned above) and we can calculate the Homography Matrix through a method called Direct Linear Transform (DLT). Fortunately for us, libraries like OpenCV have a function called findHomography which will calculate the homography for us. For those of you that want to get your hands dirty and calculate the homography yourself, you can check out this video by Behnan Asadi which goes through the process step-by-step.

但是，魔術功能并不是超級有用！我們如何使此功能真正可用？我們要做的第一件事是計算同形矩陣。為此，我們將需要從Kd Tree搜索中獲取4對相應的點(如上所述)，并且可以通過稱為直接線性變換(DLT)的方法計算同構矩陣。對我們來說幸運的是，像OpenCV這樣的庫都有一個名為findHomography的函數，它將為我們計算單應性。對于那些想要弄臟雙手并自己計算單應性的人，您可以觀看Behnan Asadi的這段視頻，該視頻將逐步進行。

If you are feeling a little lost at this point when it comes to matrices and linear transformations, don’t feel discouraged! I actually ran into similar speed-bumps as I was learning the material. I highly recommend you check out the Essence of Linear Algebra series by 3Blue1Brown. In particular, check out the video on 3D linear transformations. This series helped me immensely on my journey to understanding all of this material!

如果您在此時對矩陣和線性變換感到有些迷茫，請不要灰心！在學習這些材料時，我實際上遇到了類似的減速。我強烈建議您查看3Blue1Brown 的線性代數系列的精華。特別是，請觀看有關3D線性變換的視頻。這個系列極大地幫助了我理解所有這些材料的旅程！

SIDE NOTE: Some of you might be wondering why we need to pick 4 points to calculate our Homography Matrix. This has to do with the fact that the Homography Matrix is a Perspective Transform and it manipulates 8 Degrees of Freedom in 2D Space. 2 Degrees of Freedom for X and Y translation (up/down on x/y axis), 1 on rotation, 1 on scale, 2 on affine scaling (scaling along x and y independently), 2 on projective(again on x and y).

旁注：某些人可能想知道為什么我們需要選擇4個點來計算我們的Homography矩陣。這與同形矩陣是透視變換并且在2D空間中操縱8個自由度這一事實有關。 X和Y平移的自由度為2(在x / y軸上向上/向下)，旋轉為1，比例為1，仿射縮放為2(沿x和y獨立縮放)，投影為2(再次在x和y上縮放) )。

So, at this point we’ve calculated a Homography Matrix, but we’re not quite done. Here’s the thing we need to consider: while we did match our corresponding points earlier, there’s still going to be some false matches. This is a consequence of how we use the gradient information around feature descriptors and the threshold we use to determine similarities between descriptors. How do we know if we calculated our Homography Matrix using a bad correspondence match? Without being sure about our matches, it puts the rest of the process in jeopardy. This dilemma is referred to as the Correspondence Problem and it will lead to one of the coolest algorithms in Computer Science: RANSAC.

因此，到目前為止，我們已經計算了同形矩陣，但是還沒有完成。這是我們需要考慮的事情：雖然我們早些時候確實匹配了對應的點，但是仍然會有一些錯誤的匹配。這是我們如何使用特征描述符周圍的梯度信息以及用于確定描述符之間相似性的閾值的結果。我們如何知道我們是否使用錯誤的對應匹配來計算同源矩陣？如果不確定我們的比賽情況，則會危及整個過程。這個難題被稱為對應問題 ，它將導致計算機科學中最酷的算法之一：RANSAC。

If you look back at the correspondence match results, you will see that there are several incorrect matches. We need to exclude these outliers in order to calculate the correct homography matrix.如果您回頭看一下對應匹配結果，將會發現存在一些不正確的匹配。為了計算正確的單應性矩陣，我們需要排除這些異常值。

We have a bunch of matching feature descriptors. We know that some of these will be good matches and some of these will be bad matches. Let’s call the good matches our inliers and our bad matches outliers. We need to undertake a process of outlier rejection and for that we will use the Random Sample Consensus (RANSAC) algorithm. RANSAC is a general model-fitting algorithm that can be used in many areas of computer science, but has found many uses in computer vision.

我們有一堆匹配的特征描述符。我們知道其中一些將是好比賽，而某些將是不好的比賽。讓我們將良好匹配項稱為我們的內在值，將其稱為不好匹配項離群值。我們需要進行離群值剔除的過程，為此，我們將使用隨機樣本共識(RANSAC)算法。 RANSAC是一種通用的模型擬合算法，可以在計算機科學的許多領域中使用，但已在計算機視覺中找到了許多用途。

We start off our RANSAC journey by picking (or sampling) 4 random corresponding pairs of feature descriptors — 4 points from the Image Target image and 4 points from the Search image.

我們通過挑選(或采樣)4對隨機對應的特征描述符對來開始RANSAC旅程-從圖像目標圖像獲取4點，從搜索圖像獲取4點。

Next, we calculate a Homography Matrix for those 4 pairs of descriptors.

接下來，我們為這4對描述符計算同形矩陣。

Once we have a Homography Matrix, we’re going to grab some other corresponding descriptors and see how well they Fit our model. We do this by running a descriptor position from the Image Target through the Homography Matrix to get a calculated position.

一旦有了Homography矩陣，我們將獲取其他一些對應的描述符，并查看它們與模型的擬合程度。我們通過從圖像目標到同形矩陣運行一個描述符位置來獲得一個計算出的位置。

We compute the distance of the calculated positions against the corresponding pair. We can average out the distance to determine how close the homography model was at predicting where the pairs would end up in the Search Image. Think of this like a “score” for how good this homography is at predicting correspondences.

我們計算相對于相應位置的計算位置的距離。我們可以對距離進行平均，以確定單應性模型在預測成對的最終位置在搜索圖像中的距離。可以將其視為“單項得分”，以了解該單應性在預測對應性方面的表現。

We repeat steps 1–4 Several more times until we find a Homography Matrix that fits our data the best.

我們重復步驟1-4幾次，直到找到最適合我們數據的單應矩陣。

Once we have a randomized Homography Matrix that scores well, we then find the four correspondences which best fit this Homography and create a new Homography Matrix which we will use

一旦我們獲得了評分良好的隨機同構矩陣，我們便找到最適合該同構的四個對應關系，并創建一個新的同構矩陣，我們將使用它

If the process of running a RANSAC algorithm is still not very clear, I highly recommend you check out this video by Cyrill Stachniss with a quick 5-minute introduction to RANSAC

如果運行RANSAC算法的過程仍然不太清楚，我強烈建議您觀看Cyrill Stachniss的這段視頻，并快速介紹RANSAC 5分鐘

SIDE NOTE: The specific number of times we resample our correspondences or the number of times we test any particular Homography Matrix can be determined on the needs of your use-case. In some situations, you may want to take a probabilistic approach and calculate the number of iterations based on the accuracy you need to achieve.

旁注：我們可以根據您的用例需求確定對我們的信件進行重新采樣的具體次數或對任何特定的單應性矩陣進行測試的次數。在某些情況下，您可能需要采用一種概率方法，并根據需要達到的精度來計算迭代次數。

Once we filter out any matches that does not fit our Homography, you can clearly see the correct correspondences between the Image Target and the Search Image一旦我們濾除了不符合同形異義詞的任何匹配項，您就可以清楚地看到圖像目標和搜索圖像之間的正確對應關系

Congratulations! At this point, we have a Homography Matrix that can accurately take any pixel from our Image Target and find the appropriate pixel in our Search Image. We have successfully detected the entire Target Image within our Search image, and we can derive 2D and 3D pose from our object in the Search Image.

恭喜你！至此，我們有了一個“全息矩陣”，可以準確地從“圖像目標”中獲取任何像素，并在“搜索圖像”中找到合適的像素。我們已經成功地在搜索圖像中檢測到整個目標圖像，并且可以從搜索圖像中的對象導出2D和3D姿態。

Using the Homography Matrix, we’re able to render content on top of our original image based on the frame-of-reference of the Image Target within the Search Image使用Homography Matrix，我們可以根據搜索圖像中圖像目標的參照系在原始圖像上渲染內容

TLDR (TLDR)

Feature Descriptors start becoming incredibly useful tools once we are able to compare descriptors to other descriptors. When detecting objects, we compare feature descriptors from one image to feature descriptors found in another image. Once we start thinking of feature descriptors as a set of values having N-Dimensions, we realize we can use pretty simple tools to compare them to each other — specifically, the Pythagorean theorem. Comparing descriptors is a great start, but it’s not all we need to properly detect objects. We need to take advantage of common computer science algorithms such as K-d tree searches and RANSAC. We leverage K-d tree searching to find the best possible corresponding pairs of descriptors in two images. We then use RANSAC in order to fine-tune a model which best represents our object. The end-result is a homography matrix which we can use to find any pixel of our object within the search image.

一旦我們能夠將描述符與其他描述符進行比較，功能描述符就開始成為非常有用的工具。在檢測物體時，我們將一幅圖像中的特征描述符與另一幅圖像中的特征描述符進行比較。一旦我們開始將特征描述符視為具有N維的一組值，我們就會意識到可以使用非常簡單的工具將它們彼此進行比較，尤其是勾股定理。比較描述符是一個很好的開始，但是這并不是我們正確檢測對象所需要的全部。我們需要利用常見的計算機科學算法，例如Kd樹搜索和RANSAC。我們利用Kd樹搜索在兩個圖像中找到最佳的對應描述符對。然后，我們使用RANSAC來微調最能代表我們對象的模型。最終結果是一個單應矩陣，我們可以用它在搜索圖像中找到對象的任何像素。

來源和更多信息 (Sources and More Info)

Object and Feature Detection — Lecture Video by Rich Radke
對象和特征檢測 — Rich Radke的講座視頻
K-d Tree in Python — Video Playlist by Tsoding
在Python中的Kd樹 — Video by Tsoding
Scalable Nearest Neighbor Algorithms for High Dimensional Data — Academic Paper by Marius Muja
高維數據的可擴展最近鄰算法 — Marius Muja發表的學術論文
FLANN — Code Repository by Marius Muja
FLANN — Marius Muja的代碼存儲庫
Euclidean Distance — Wikipedia
歐氏距離 —維基百科
Geometric Transformation for Image Registration — Video by ADipLearn
圖像注冊的幾何變換 — Video by ADipLearn
Homography Matrix — Wikipedia
單應矩陣 — Wikipedia
Homography in Computer Vision — Video by Behnam Asadi
在計算機視覺中的單應性 —視頻影像作者Behnam Asadi
Three-dimensional linear transformations — Video by 3Blue1Brown
三維線性變換 — Video by 3Blue1Brown
Parametric Transformations and Scattered Data Interpolation — Lecture Video by Rich Radke
參數轉換和分散數據插值 —演講視頻，Rich Radke
RANSAC — Video by Cyrill Stachniss
RANSAC -視頻通過Cyrill Stachniss
Sift Features and RANSAC — Lecture Video by Cyrill Stachniss
篩查功能和RANSAC —演講視頻，作者Cyrill Stachniss
Ransac Song — by Daniel Wedge (don’t ask, just check it out)
Ransac Song -Daniel Wedge著(不要問，只看一看 )
A brief look at Homographies — Blot Post by Mike
簡要介紹同形異義詞 — Mike的Blot Post
Homographies Explained through Code — OpenCV Documentation
通過代碼解釋同形異義詞 — OpenCV文檔
Be sure to check out the sources from the previous article as well
一定還要檢查上一篇文章的來源

翻譯自: https://medium.com/@vad710/cv-for-busy-developers-detecting-objects-35081faf1b3d