當前位置：首頁 > 编程资源 > 编程问答 >内容正文

编程问答

检测和语义分割_分割和对象检测-第2部分

發布時間：2023/12/15 编程问答 23 豆豆

生活随笔收集整理的這篇文章主要介紹了检测和语义分割_分割和对象检测-第2部分小編覺得挺不錯的,現在分享給大家,幫大家做個參考.

檢測和語義分割

有關深層學習的FAU講義 (FAU LECTURE NOTES ON DEEP LEARNING)

These are the lecture notes for FAU’s YouTube Lecture “Deep Learning”. This is a full transcript of the lecture video & matching slides. We hope, you enjoy this as much as the videos. Of course, this transcript was created with deep learning techniques largely automatically and only minor manual modifications were performed. Try it yourself! If you spot mistakes, please let us know!

這些是FAU YouTube講座“ 深度學習 ”的講義。 這是演講視頻和匹配幻燈片的完整記錄。 我們希望您喜歡這些視頻。 當然，此成績單是使用深度學習技術自動創建的，并且僅進行了較小的手動修改。 自己嘗試！ 如果發現錯誤，請告訴我們！

導航 (Navigation)

Previous Lecture / Watch this Video / Top Level / Next Lecture

上一個講座 / 觀看此視頻 / 頂級 / 下一個講座

U-net for cell segmentation. Image created using gifify. Source: YouTubeU-net用于細胞分割。使用gifify創建的圖像。資料來源： YouTube

Welcome back to deep learning! So today, we want to talk about the further advanced methods of image segmentation. Let’s look at our slides. You can see here this is part two of this lecture video series on image segmentation and object detection.

歡迎回到深度學習！因此，今天，我們要討論圖像分割的其他高級方法。讓我們看一下幻燈片。您可以在這里看到這是本講座視頻系列中有關圖像分割和目標檢測的第二部分。

CC BY 4.0 from the 深度學習講座中 Deep Learning Lecture.CC BY 4.0下的圖像。

Now, the key idea that we need to know about is how to integrate the context knowledge. Just using this encoder-decoder structure that we talked about in the last video will not be enough to get a good segmentation. The key concept is that you somehow have to tell your method where what happened in order to get a good segmentation mask. You need to balance local and global information. Of course, this is very important because the local information is crucial to give good pixel accuracy and the global context is important in order to figure out the classes correctly. CNNs typically struggle with this balance. So, we now need some good ideas on how to incorporate this context information.

現在，我們需要了解的關鍵思想是如何整合上下文知識。僅使用我們在上一個視頻中討論過的這種編解碼器結構還不足以實現良好的分割效果。關鍵概念是您必須以某種方式告訴您的方法發生了什么，才能獲得良好的分割蒙版。您需要平衡本地和全局信息。當然，這非常重要，因為本地信息對于提供良好的像素精度至關重要，而全局上下文對于正確地確定類別至關重要。 CNN通常會為此尋求平衡。因此，我們現在需要一些有關如何合并此上下文信息的好主意。

CC BY 4.0 from the 深度學習講座中 Deep Learning Lecture.CC BY 4.0下的圖像。

Now Long et al. showed one of the first approaches to do so. They essentially using an upsampling that is consisting of learnable transposed convolutions. The key idea was that you want to add links combining the final prediction with the previous lower layers in the finer strides. Additionally, he had 1x1 convolutions after the pooling layer, and then the predictions were added up to make local predictions with a global structure. So the network topology is a directed acyclic graph with skip connections from lower to higher layers. Therefore, you can then refine a coarse segmentation.

現在龍等。展示了這樣做的第一種方法。他們本質上使用的是由可學習的轉置卷積組成的升采樣。關鍵思想是您要添加將最終預測與以前的較低層結合在一起的鏈接，以實現更精細的步幅。此外，他在池化層之后具有1x1卷積，然后將這些預測相加以形成具有全局結構的局部預測。因此，網絡拓撲是有向無環圖，具有從較低層到較高層的跳過連接。因此，您可以隨后細化粗略的細分。

CC BY 4.0 from the 深度學習講座中 Deep Learning Lecture.CC BY 4.0下的圖像。

So, let’s look at this idea in some more detail. You can see now if you have the ground truth here on the bottom right, this has a very high resolution. If you would simply use your CNN and upsample, you would get a very coarse resolution as shown on the left-hand side. So, what is Long et al. proposing? Well, they propose then to use the information from the previous downsampling step which still had higher resolution and use it within the decoder branch using a sum to produce a more highly resolved image. Of course, you can then do this again in the decoder branch. You can see that this way we can upsample the segmentation and reuse the information from the encoder branch in order to produce better highly resolved results. Now, you can introduce those skip connections and they produce much better segmentations than if you were just using the decoder and upsample that information.

因此，讓我們更詳細地看一下這個想法。現在，您可以看到是否在右下角具有基本事實，這具有很高的分辨率。如果僅使用CNN和上采樣，您將獲得非常粗糙的分辨率，如左側所示。那么，Long等人是什么。提出？好吧，他們建議然后使用來自先前的下采樣步驟的信息，該信息仍然具有更高的分辨率，并在解碼器分支中使用總和來使用它來生成分辨率更高的圖像。當然，您可以在解碼器分支中再次執行此操作。您可以看到，通過這種方式，我們可以對細分進行上采樣并重新使用來自編碼器分支的信息，以便產生更好的高度解析結果。現在，您可以引入那些跳過連接，與僅使用解碼器并對該信息進行上采樣相比，它們可以產生更好的分段效果。

CC BY 4.0 from the 深度學習講座中 Deep Learning Lecture.CC BY 4.0下的圖像。

You see integrating context knowledge is key. In SegNet, a different approach was taken here. You also have this encoder-decoder structure that is convolutional. Here, the key idea was that in the upsampling step, you reuse the information from the max-pooling in the downsampling steps such that you get better-resolved decoding. This is already a pretty good idea to integrate the context knowledge.

您會看到整合上下文知識是關鍵。在SegNet中，此處采用了另一種方法。您還具有卷積的編碼器-解碼器結構。在這里，關鍵思想是，在上采樣步驟中，您將在下采樣步驟中重用最大池中的信息，以便獲得更好解析的解碼。集成上下文知識已經是一個很好的主意。

CC BY 4.0 from the 深度學習講座中 Deep Learning Lecture.CC BY 4.0下的圖像。

An even better idea is then demonstrated in U-net. Here, the network consists of the encoder branch which is then a contracting path to capture the context. The decoder branch does symmetric expansion for the localization. So, the encoder follows the typical structure of a CNN. The decoder now consists of the upsampling step and a concatenation of the previous feature maps of the respective layers of the corresponding encoder step. So, then the training strategy relies also on data augmentation. There were non-rigid deformations, rotations, and translations that were used to give the U-net an additional kick of performance.

然后在U-net中展示了一個更好的主意。在此，網絡由編碼器分支組成，該分支隨后是捕獲上下文的收縮路徑。解碼器分支進行對稱擴展以進行本地化。因此，編碼器遵循CNN的典型結構。解碼器現在包括上采樣步驟和相應編碼器步驟各個層的先前特征圖的串聯。因此，訓練策略也依賴于數據擴充。存在非剛性的變形，旋轉和平移，這些變形，旋轉和平移使U-net有了更多的性能提升。

CC BY 4.0 from the 深度學習講座中 Deep Learning Lecture.CC BY 4.0下的圖像。

You can say that U-net is essentially the state-of-the-art method for image segmentation. This is also the reason why it has this name. It stems from its shape. You can see that you get this U structure because you have a high resolution on the fine levels. Then you downsample to a lower resolution. The decoder branch upsamples everything again. The key information is here the skip connections that are connecting the two respective levels of the decoder and the encoder. This way you can get very, very good image segmentation. It’s quite straightforward to train and this paper has been cited thousands of times (August 11th, 2020: 16471 citations). Every day you can check the citation count and it already increased. Olaf Ronneberger was able to publish a very important paper here and it’s dominating the entire scene of image segmentation.

可以說，U-net本質上是用于圖像分割的最新方法。這也是它具有此名稱的原因。它源自其形狀。您會看到您獲得了此U結構，因為您在精細級別上具有高分辨率。然后，您將采樣降低到較低的分辨率。解碼器分支再次對所有內容進行升采樣。此處的關鍵信息是跳過連接，它們連接了解碼器和編碼器的兩個相應級別。這樣，您可以獲得非常非常好的圖像分割。培訓非常簡單，本文被引用了數千次 (2020年8月11日：16471次引用)。每天您都可以檢查引文計數，并且它已經增加了。 Olaf Ronneberger能夠在這里發表非常重要的論文，它主導著整個圖像分割領域。

CC BY 4.0 from the 深度學習講座中 Deep Learning Lecture.CC BY 4.0下的圖像。

You can see that there are many additional approaches. They can be implemented with the U-net. So they can use dilated convolutions and many more. There have been many of these very small changes that have been suggested and they may be useful for particular tasks but for general image segmentation, the U-net has been shown to still outperform such approaches. Still, there are things using like dilated convolutions, there are network stacks that can be very beneficial, and there’s also multi-scale networks that then even further go into this idea of using the image at different scales. You can also do things like deferring the context modeling to another network. Then, you can also incorporate recurrent neural networks. Also very nice is the idea to refine the resulting segmentation maps using a conditional random field.

您可以看到還有許多其他方法。它們可以通過U-net實施。因此，他們可以使用膨脹卷積等等。已經提出了許多這些非常小的更改，它們可能對特定任務很有用，但對于一般的圖像分割，已顯示出U-net仍然優于此類方法。仍然有一些東西使用諸如膨脹卷積之類的東西，有一些網絡堆棧可能會非常有益，而且還有多尺度網絡，它們甚至進一步陷入了以不同尺度使用圖像的想法。您還可以執行將上下文建模推遲到另一個網絡的操作。然后，您還可以合并遞歸神經網絡。使用條件隨機字段細化生成的分割圖的想法也非常好。

CC BY 4.0 from the 深度學習講座中 Deep Learning Lecture.CC BY 4.0下的圖像。

We have some of these additional approaches here, such that you can see what we are talking about. The dilated convolutions, here, is the idea that you want to use those atrous convolutions that we already talked about. So, the idea is that you use dilated convolutions to support exponentially expanding the receptive field without losing the resolution. Then, you introduce the dilation rate L that controls the upsampling factor. You then stack this on top such that you make the receptive field grow exponentially while the number of parameters for the filters grows linear. So, in specific applications where you have a broad range of magnifications happening, this can be very useful. So, it really depends on your application.

我們這里有一些其他方法，您可以了解我們在說什么。這里的膨脹卷積就是您想要使用我們已經討論過的那些無用卷積的想法。因此，我們的想法是使用膨脹卷積來支持以指數方式擴展接收場而不損失分辨率。然后，介紹控制上采樣因子的膨脹率L。然后，將其堆疊在頂部，以使接收場呈指數增長，而過濾器的參數數量呈線性增長。因此，在發生各種放大倍數的特定應用中，這可能非常有用。因此，這實際上取決于您的應用程序。

CC BY 4.0 from the 深度學習講座中 Deep Learning Lecture.CC BY 4.0下的圖像。

Examples for this are DeepLab, ENet, and the multi-scale context aggregation module in [28]. The main issue, of course, is there’s no efficient implementation available. So, the benefit is somewhat unclear.

例子包括DeepLab，ENet和[28]中的多尺度上下文聚合模塊。當然，主要問題是沒有有效的實現方法。因此，收益尚不清楚。

CC BY 4.0 from the 深度學習講座中 Deep Learning Lecture.CC BY 4.0下的圖像。

Another approach that I would like to show you here is these so-called stacked hourglass network. So, here, the idea is that you use something very similar to a U-net, but you would put in an additional trainable part in the skip connection. So, that’s essentially the main idea.

我想在這里向您展示的另一種方法是這些所謂的堆疊沙漏網絡。因此，這里的想法是您使用與U-net非常相似的東西，但是您將在跳過連接中加入一個額外的可訓練部分。因此，這基本上是主要思想。

CC BY 4.0 from the 深度學習講座中 Deep Learning Lecture.CC BY 4.0下的圖像。

Then, you can use this hourglass module and stack it behind each other. So, you have essentially multiple refinement steps after each other and you always return to the original resolution. You can plug in a second network essentially as a kind of artifact correction network. Now, what’s really nice about this kind of hourglass network approach is that you return to the original resolution.

然后，您可以使用此沙漏模塊并將其堆疊在一起。因此，您彼此之間實際上有多個優化步驟，并且您始終會返回原始分辨率。您可以實質上將第二種網絡插入為一種偽像校正網絡。現在，這種沙漏網絡方法的真正好處是您恢復了原始分辨率。

Convolutional pose machines also stack several modules on top of each other to enable pose tracking. This can also be combined with segmentation. Image created using gifify. Source: YouTube卷積式姿勢機還可以在彼此頂部堆疊幾個模塊，以實現姿勢跟蹤。這也可以與細分結合使用。使用gifify創建的圖像。資料來源： YouTube

Let’s say you’re predicting several classes at the same time. Then, you end up with several segmentation masks for the different classes. This idea can then be picked up in something that is called a convolutional pose machine. In the convolutional pose machine, you then use the area where your hourglasses connect, where you have one U-net essentially stacked on top of another U-net. At this layer, you can then also use the resulting segmentation maps per class in order to inform them of each other. So, you can use the context information of other things that have been detected in the image in order to steer this refinement. In convolutional pose machines, you do that for pose detection of joints of a body model. Of course, if you have the left knee joint and the right knee joint and other joints of the body the information about the other joints helps in decoding the correct position.

假設您要同時預測幾個課程。然后，您將獲得針對不同類的多個細分掩碼。然后可以在稱為卷積姿勢機的東西中找到這個想法。然后在卷積姿勢機中，使用沙漏連接的區域，在該區域中，一個U網絡實際上堆疊在另一個U網絡的頂部。然后，在此層，您還可以使用每個類生成的分割圖，以相互告知它們。因此，您可以使用已在圖像中檢測到的其他事物的上下文信息來引導此優化。在卷積姿勢機中，您可以執行此操作以檢測人體模型的關節。當然，如果您的身體有左膝關節和右膝關節以及其他關節，則有關其他關節的信息將有助于解碼正確的位置。

X-ray transform invariant landmark detection by Bastian Bier. Image under X射線變換不變地標檢測。 CC BY 4.0 from the 深度學習講座中 Deep Learning Lecture.CC BY 4.0下的圖像。

This idea has also been used by my colleague Bastian Bier for the detection of anatomic landmarks in the analysis of x-ray projections. I’m showing a small video here. You’ve already seen that in the introduction and now you finally have all the context that you need to understand the method. So, here you have an approach behind it that is very similar to convolutional pose machines that then start informing the landmarks about each other’s orientation and position in order to get improved detection results.

我的同事巴斯蒂安·比爾( Bastian Bier)也已將此想法用于在X射線投影分析中檢測解剖標志。我在這里顯示一個小視頻。您已經在引言中看到了這一點，現在終于有了理解該方法所需的所有上下文。因此，這里有一個與卷積姿勢機非常相似的方法，該方法然后會開始向界標通知彼此的方向和位置，以便獲得更好的檢測結果。

CC BY 4.0 from the 深度學習講座中 Deep Learning Lecture.CC BY 4.0下的圖像。

So what else? I already hinted at the conditional random fields. Here, the idea is that you refine the output using a conditional random field. So, the pixel is modeled as a node in a random field. Theses pair-based terms between the pixels are very interesting because they can capture long-range dependencies and fine, local information.

還有什么？我已經暗示條件隨機字段。這里的想法是，您可以使用條件隨機字段來優化輸出。因此，將像素建模為隨機字段中的節點。這些像素之間的基于對的術語非常有趣，因為它們可以捕獲遠程依賴關系和精細的本地信息。

CC BY 4.0 from the 深度學習講座中 Deep Learning Lecture.CC BY 4.0下的圖像。

So if you see the output here, this is from DeepLab. Here, you see how the iterative refinement of the conditional random field then can help to improve the segmentation. So, you can then also combine this with artous convolutions as in [4]. You could even model the conditional random field with recurrent neural networks as shown in reference [29]. This then also allows end-to-end training of the entire conditional random field.

因此，如果您在此處看到輸出，則來自DeepLab。在這里，您將看到條件隨機字段的迭代細化如何可以幫助改善分割。因此，您還可以將其與虛假卷積相結合，如[4]中所示。您甚至可以使用遞歸神經網絡為條件隨機場建模，如參考文獻[29]所示。然后，這還允許對整個條件隨機字段進行端到端訓練。

CC BY 4.0 from the 深度學習講座中 Deep Learning Lecture.CC BY 4.0下的圖像。

There are also a couple of advanced topics that I still want to hint at. Of course, you can also work with the losses. So far, we’ve only seen the segmentation loss itself but of course, you can also mix and match from previous ideas that we already saw in this class. For example, you can use a GAN in order to augment your loss. The idea here is then that you can essentially create a segmentor. You can then use the output of the segmentor as an input to a GAN type of discriminator. The discriminator now gets the task to say whether this is an automatic segmentation or a manual one. Then, this can be used as a kind of additional adversarial loss inspired by the ideas of the generative adversarial networks. You find that very often in literature as the so-called adversarial loss.

我還想暗示幾個高級主題。當然，您也可以彌補損失。到目前為止，我們僅看到細分損失本身，但是當然，您也可以將在本課程中已經看到的以前的想法中混搭起來。例如，您可以使用GAN來增加損失。然后，這里的想法是，您基本上可以創建一個細分器。然后，您可以將分段器的輸出用作GAN類型的鑒別器的輸入。現在，判別器將獲得任務來說這是自動分段還是手動分段。然后，這可以用作一種受生成對抗網絡思想啟發的附加對抗損失。您在文學中經常發現這種所謂的對抗損失。

CC BY 4.0 from the 深度學習講座中 Deep Learning Lecture.CC BY 4.0下的圖像。

So, how is this then implemented? Well, the idea is that if you have a data set given of N training images and the corresponding label maps, you can then build the following loss function: This is essentially the multi-class cross-entropy loss and then you put on top the adversarial loss that works on your segmentation masks. So here, you can then essentially train your segmentation both with the ground truth label and on fooling the discriminator. This is essentially nothing else than a multi-task learning approach with an adversarial task.

那么，如何實現呢？好吧，這個想法是，如果您有一個包含N個訓練圖像和相應標簽圖的數據集，則可以構建以下損失函數：這本質上是多類交叉熵損失，然后將對您的細分蒙版有效的對抗性損失。因此，在這里，您實際上可以使用基本事實標簽和欺騙鑒別器來訓練您的細分。從本質上講，這只是帶有對抗任務的多任務學習方法。

CC BY 4.0 from the 深度學習講座中 Deep Learning Lecture.CC BY 4.0下的圖像。

Okay. So, this already brings us to the end of our short video today. You see that we’ve seen now the key ideas on how to build good segmentation networks. In particular, U-net is one of the key ideas that you should know about. Now that we have discussed the segmentation networks, we can talk in the next lecture about object detection and how to actually implement that very quickly. So, this is the other side of image interpretation. We will also be able to figure out where different instances in the image actually are. So I hope, you liked this small video and I’m looking forward to seeing you in the next one. Thank you very much and bye-bye.

好的。因此，這已經使我們結束了今天的簡短視頻。您會看到，我們已經看到了有關如何構建良好的細分網絡的關鍵思想。特別是，U-net是您應該了解的關鍵思想之一。既然我們已經討論了分割網絡，那么我們可以在下一講中討論對象檢測以及如何非常快地實現它。因此，這是圖像解釋的另一面。我們還將能夠找出圖像中不同實例的實際位置。因此，我希望您喜歡這個小視頻，希望在下一個視頻中見到您。非常感謝，再見。

If you liked this post, you can find more essays here, more educational material on Machine Learning here, or have a look at our Deep LearningLecture. I would also appreciate a follow on YouTube, Twitter, Facebook, or LinkedIn in case you want to be informed about more essays, videos, and research in the future. This article is released under the Creative Commons 4.0 Attribution License and can be reprinted and modified if referenced. If you are interested in generating transcripts from video lectures try AutoBlog.

如果你喜歡這篇文章，你可以找到這里更多的文章，更多的教育材料，機器學習在這里，或看看我們的深入學習講座。如果您希望將來了解更多文章，視頻和研究信息，也歡迎關注YouTube ， Twitter ， Facebook或LinkedIn 。本文是根據知識共享4.0署名許可發布的，如果引用，可以重新打印和修改。如果您對從視頻講座中生成成績單感興趣，請嘗試使用AutoBlog 。

翻譯自: https://towardsdatascience.com/segmentation-and-object-detection-part-2-a334b91255f1

檢測和語義分割

總結

以上是生活随笔為你收集整理的检测和语义分割_分割和对象检测-第2部分的全部內容，希望文章能夠幫你解決所遇到的問題。

如果覺得生活随笔網站內容還不錯，歡迎將生活随笔推薦給好友。

上一篇：上班的班字最早指的是？蚂蚁庄园1.31日
下一篇： watson软件使用_使用Watson