opencv 分割边界_电影观众:场景边界分割
opencv 分割邊界
This is part of a series describing the development of Moviegoer, a multi-disciplinary data science project with the lofty goal of teaching machines how to “watch” movies and interpret emotion and antecedents (behavioral cause/effect).
這是描述 Moviegoer 開發的系列文章的一部分, Moviegoer 是一個跨學科的數據科學項目,其崇高目標是教學機器如何“觀看”電影以及解釋情感和前因(行為因果)。
Films are divided into individual scenes, a self-contained series of shots which may contain dialogue, visual action, and more. Being able to programmatically identify specific scenes is key to turning a film into structured data. We attempt to identify the start and end frames for individual scenes by using Keras’ VGG16 image model to group similar frames (images) into clusters known as shots. Then an original algorithm, rooted in film editing expertise, is applied to partition individual scenes.
電影分為各個場景,一系列獨立的鏡頭,其中可能包含對話,視覺動作等。 能夠以編程方式識別特定場景是將電影轉換為結構化數據的關鍵。 我們嘗試通過使用Keras的VGG16圖像模型將相似的幀(圖像)分組為稱為鏡頭的簇,來確定單個場景的開始和結束幀。 然后,將以電影編輯專業知識為基礎的原始算法應用于分割各個場景。
Overview
總覽
Our goal is to, given a set of input frames, identify the start frame and end frame for individual scenes. (This is completely unsupervised, but for the purposes of explanation, I’ll comment on our progress, as well as provide visualization.) In this example, 400 frames, one taken every second from The Hustle (2019) are being fed into the algorithm. Keras’ VGG16 image model is used to vectorize these images, and then unsupervised HAC clustering is applied to group similar frames into clusters. Frames with equal cluster values are similar, so a set of three consecutive frames with the same cluster value could represent a three-second shot of a character.
我們的目標是在給定一組輸入幀的情況下,確定各個場景的開始幀和結束幀。 (這是完全不受監督的,但出于解釋的目的,我將評論我們的進度并提供可視化效果。)在此示例中,將400幀(每秒從The Hustle(2019)中獲取)輸入到算法。 使用Keras的VGG16圖像模型對這些圖像進行矢量化,然后應用無監督的HAC聚類將相似的幀分組為聚類。 具有相同簇值的幀是相似的,因此一組具有相同簇值的三個連續幀可以表示一個角色的三秒鐘拍攝。
Here is the vectorization of our sample 400 frames from The Hustle.
這是The The Hustle的樣本400幀的向量化。
Clustering of 400 frames from “The Hustle”來自“ The Hustle”的400幀的聚類Target visualization
目標可視化
In this example, we have two partial scenes and two complete scenes. Our goal is to identify the scene boundaries of each scene; in this example, we’ll try and identify the boundaries of the blue scene. I’ve colored in this visualization manually, to illustrate our “target”.
在此示例中,我們有兩個局部場景和兩個完整場景。 我們的目標是確定每個場景的場景邊界; 在此示例中,我們將嘗試確定藍色場景的邊界。 我已在此可視化中手動上色,以說明我們的“目標”。
Manual annotation of the 400 frames, divided into scenes手動注釋400幀,分為場景五步算法 (Five-step algorithm)
Step 1: Finding the A/B/A/B Shot Pattern
步驟1:找出A / B / A / B射擊圖案
Among all 400 frames, we look for any pairs of shots that form an A/B/A/B pattern.
在所有400幀中,我們尋找形成A / B / A / B模式的任何鏡頭對。
A/B/A/B patternA / B / A / B模式Step 2: Checking for MCUs
步驟2:檢查MCU
Finding four A/B/A/B patterns, we run each shot through the MCU image classifier. Two of the patterns were rejected because they contain a shot that doesn’t pass the MCU check. In the below image, the top shot-pair represents our example scene.
找到四個A / B / A / B模式,我們通過MCU圖像分類器運行每個鏡頭。 其中兩個模式被拒絕,因為它們包含未通過MCU檢查的鏡頭。 在下圖中,最上面的鏡頭對代表我們的示例場景。
MCU CheckMCU檢查Step 3: Designating a Preliminary Scene Boundary: Anchor Start/End
步驟3:指定初步的場景邊界:錨點開始/結束
Once we’ve confirmed that we’re looking at Medium Close-Up shots, we can reasonably believe that we’re looking at a two-character dialogue scene. We look for the first and last appearances of either shot (regardless of A or B). These frames define the Anchor Start and Anchor End Frames, a preliminary scene boundary.
一旦我們確認要查看中景特寫鏡頭,就可以合理地認為我們正在查看兩個字符的對話場景。 我們尋找鏡頭的第一和最后出現(無論A或B)。 這些框架定義了初始場景邊界“錨定起點”和“錨定終點”。
Anchor Frames: Preliminary Scene Boundaries錨框架:初步場景邊界Step 4: Identify Cutaways
步驟4:確定切面
In between the Anchor Start and Anchor End are many other shots known as cutaways. These may represent any of the following:
在Anchor Start和Anchor End之間還有許多其他鏡頭,稱為切面圖。 這些可能代表以下任何一種:
- POV shots, showing what characters are looking at offscreen POV鏡頭,顯示在屏幕外正在看什么角色
- Inserts, different shots of Speaker A or B, such as a one-off close-up 插入揚聲器A或B的不同鏡頭,例如一次性特寫鏡頭
- Other characters, both silent and speaking 其他字符,包括靜音和說話
After we identify these cutaways, we may be able to expand the scene’s start frame backward, and the end frame forward. If we see these cutaways again, but before the Anchor start or after the Anchor end, they must still be part of the scene.
確定這些切點后,我們可以向后擴展場景的開始幀,向后擴展結束幀。 如果再次看到這些切角,但在“錨點”開始之前或“錨點”結束之后,它們仍必須是場景的一部分。
Cutaways Which Appear Between the Anchor Frames出現在錨框之間的切角Step 5a: Extending the Scene End
步驟5a:擴展場景結束
After the Anchor End are three frames with a familiar shot (cluster). Since we saw this cluster earlier, as a Cutaway, we incorporate these three frames into our scene. The following frames are unfamiliar, and are indeed not part of this scene.
錨端之后是三幀,并帶有熟悉的鏡頭(群集)。 由于我們較早地看到了該群集,因此它是Cutaway,因此將這三個幀合并到場景中。 以下幀是陌生的,并且確實不是該場景的一部分。
Extending the Scene End, Forward擴展場景結束,向前Step 5b: Extending the Scene Start
步驟5b:擴展場景開始
We apply this same technique to the scene’s beginning, in the opposite direction. We find many Cutaways, so we keep progressing earlier and earlier until no more Cutaways are found.
我們將相同的技術應用于相反方向的場景開始。 我們發現了很多剖面圖,因此我們會越來越快地進行開發,直到找不到更多剖面圖為止。
Extending the Scene Start, Backward擴展場景開始,向后Evaluation
評價
Below is a visualization of the total frames in the scene, with the blue highlighted frames included in our prediction, and the orange highlighted frames not included in our prediction. This algorithm managed to label most frames of the scene. Although some frames were missed at the scene’s beginning, these are non-speaking introductory frames. The scene takes some time to get started, and we’ve indeed captured all frames containing dialogue, the most important criteria.
以下是場景中所有幀的可視化,藍色的突出顯示的幀包含在我們的預測中,橙色的突出顯示的幀不包含在我們的預測中。 該算法設法標記了場景的大多數幀。 盡管在場景開始時就錯過了一些框架,但這些都是非言語性的入門框架。 場景需要一些時間才能開始,我們確實捕獲了所有包含對話(最重要的條件)的幀。
Blue Frames Were Captured by the Algorithm該算法捕獲了藍框Wanna see more?
想看更多嗎?
Repository: Moviegoer
資料庫: Moviegoer
Part 1: Can a Machine Watch a Movie?
第1部分: 機器可以看電影嗎?
Part 2: Cinematography Shot Modeling
第2部分: 攝影鏡頭建模
Part 3: Scene Boundary Partitioning
第3部分: 場景邊界分區
Part 4: Dialogue Attribution
第4部分: 對話歸因
Part 5: Four Categories of Comprehension
第五部分: 四類理解
Part 6: Vision Features
第6部分: 視覺功能
翻譯自: https://medium.com/swlh/moviegoer-scene-boundary-partitioning-95a0192baf1e
opencv 分割邊界
總結
以上是生活随笔為你收集整理的opencv 分割边界_电影观众:场景边界分割的全部內容,希望文章能夠幫你解決所遇到的問題。
- 上一篇: 提权篇之简单介绍和exp利用过程
- 下一篇: 监督学习无监督学习_无监督学习简介