论文翻译阅读——Facial Emotion RecognitionUsing Deep Learning:Review And Insights
文章目錄
- Abstract
- Introduction
- Facial Available Databases
- Facial Emotion Recognition Using Deep Learning
- Discussion and Comparison
- Conclusion and future work:
- 分析與總結
Abstract
- Automatic emotion recognition based on facial expression is an interesting research field, which has presented and applied in several areas such as safety, health and in human machine interfaces. Researchers in this field are interested in developing techniques to interpret, code facial expressions and extract these features in order to have a better prediction by computer. With the remarkable success of deep learning, the different types of architectures of this technique are exploited to achieve a better performance. The purpose of this paper is to make a study on recent works on automatic facial emotion recognition FER via deep learning. We underline on these contributions treated, the architecture and the databases used and we present the progress made by comparing the proposed methods and the results obtained. The interest of this paper is to serve and guide researchers by review recent works and providing insights to make improvements to this field.
- 基于面部表情得自動表情識別是一個有趣得研究領域,其已經出現和被使用在很多得領域,比如說安全、健康,除此之外,被應用于人機界面。這個領域的研究者興趣在于能夠開發出技術能夠將表情翻譯和編碼,并且還能夠提取出臉部特征,從而能夠讓電腦更好的預測。隨著深度學習的顯著成功,關于深度學習的不同類型的架構已經開發了很多,并且都有很好的效果。這篇文章的目的在于認知和了解一下最近基于深度學習自動表情識別的成果。我們的重點在于其所做的貢獻,架構和使用的數據庫,我們通過將提出來的方法相互比較來顯示進展,還要展示這些方法已經取得的成果。這篇文章的好處在于能夠通過回顧往期的工作來服務和引導研究者,并對其提供深刻的見解以對這個領域進行改良。
- 總結:這篇文章主要講使用深度學習的基于面部的表情識別,會比較幾個提出來的方法。
Introduction
-
Automatic emotion recognition is a large and important research area that addresses two different subjects, which are psychological human emotion recognition and artificial intelligence (AI). The emotional state of humans can obtain from verbal and non-verbal information captured by the various sensors, for example from facial changes [1], tone of voice [2] and physiological signals [3]. In 1967, Mehrabian [4] showed that 55% of emotional information were visual, 38% vocal and 7% verbal. Face changes during a communication are the first signs that transmit the emotional state, which is why most researchers are very interested by this modality.
-
自動表情識別是一個很大的并且很重要的研究領域,他們主要是解決和討論兩個不同的課題,一個是心理學上的人類表情識別,另外一個是的AI。人類的表情的可以通過不同感官獲取的言語信息或者非言語信息來獲得,比如說面部表情變化,說話音調的變化的和生理信號等等。在1967年,Mehrabian就得出結論:55%的表情信息是可視化的,38%是體現在聲音上的,7%體現在語言上的。在交流中面部變化是傳遞表情狀態的第一信號,這就是為什么大部分的研究者都十分感興趣于這個形式。
-
詞組:
- address two subjects 解決兩個課題
- visual 可視化 vocal 聲音的 verbal 語言的
-
總結:關注臉部表情識別的原因,臉部表情傳達了55%的心里表情。
-
Extracting features from one face to another is a difficult and sensitive task in order to have a better classification. In 1978 Ekman and Freisen [5] are among the first scientific interested in facial expression which are developed FACS (Facial Action Coding System) in which facial movements are described by Action Units AUs, they are broken down the human face into 46 AUs action units each AU is coded with one or more facial muscles.
-
從一張臉到另外一張臉提取特征是十分困難和棘手的,其目的是通過提取特征實現一個更好分類。在1978年,Ekman和 Freisen是第一批對面部表情感興趣的科學家之一,他們開發了FACS(面部動作編碼系統),在其中面部動作被描述為動作單元(AUs),他們將人的面部劃分為46個AUs動作單元,每一個Aus都可以使用一個或者多個面部肌肉進行編碼。
-
大意:AUs單元,將人臉劃分為46個Aus動作,使用肌肉對每一個AUs動作進行編碼。
-
The automatic FER is the most studied by researchers compared to other modalities to statistics which made by philipp et al.[6], but it is task that is not easy because each person presents his emotion by his way. Several obstacles and challenges are present in this area that one should not neglect like the variation of head pose, luminosity, age, gender and the background, as well as the problem of occlusion caused by Sunglasses, scarf, skin illness…etc.
-
比起別的philipp等人做的數據統計形式,研究自動表情識別的人最多。但是這本身就不是一個簡單的任務,因為每一個人都有自己表達感情的方式。在這個領域很多的障礙和挑戰,比如說一個人,是不能忽略頭部姿態,光度,年齡,性別和背景等因素的變化,除此之外還有由遮陽眼鏡、圍巾、皮膚裝飾等遮擋物的導致的問題。
-
詞組:
- modalities to statistics 數據統計的形式
- occlusion 遮擋物
-
大意:面部表情識別的挑戰:每一個人都有自己表情識別方式,而且還有很多不確定的因素。
-
Several traditional methods exist are used for the extraction facial features such as geometric and texture features for example local binary patterns LBP [7], facial action units FAC [5], local directional patterns LDA [8], Gabor wavelet [9]. In recent years, deep learning has been very successful and efficient approach thanks to the result obtained with its architectures which allow the automatic extraction of features and classification such as the convolutional neural network CNN and the recurrent neural network RNN; here what prompted researchers to start using this technique to recognize human emotions. Several efforts are made by researchers on the development of deep neural network architectures, which produce very satisfactory results in this area.
-
一些現存的傳統方法都被用于例如幾何學和質地等面部特征的提取,比如說局部二值模式(LBP),面部活動單元(FAC),局部方向模式(LDA),Gabor小波。在近幾年,深度學習已經有了很多成功和有效的方法,比如說卷積神經網絡RNN和遞歸神經網絡RNN,正是因為其自身結構獲得的成功,使得自動特征提取和分類成為現實。故而,很多研究者開始使用這些技術去識別人類表情。在深度神經網絡結構的發展上,研究者們,已經做了很多的努力,這在表情識別方面已經產生了很多的令人滿意的結果。
-
詞組:
- exist:是existing的縮寫,修飾methods,表示現存的方法,就是existing
-
大意:列舉了一些現存的一些表情特征提取的方法,深度學習促進了表情識別的發展。
-
In this paper, we provide a review of recent advances in sensing emotions by recognizing facial expressions using different deep learning architectures. We present recent results from 2016 to 2019 with an interpretation of the problems and contributions. It is organized as follows: in section two, we introduce some available public databases, section three; we present a recent state of the art on the FER using deep learning and we end in section four and five with a discussion and comparisons then a general conclusion with the future works.
-
在這篇文章中,我們提供了近些年在感知情緒方面的最新進展的回顧,主要基于不同的深度學習架構識別面部表情來感知情緒的進展。我們列出從2016年到2019年之間的實驗結果和相關問題和貢獻的解釋。他們時使用如下的方式進行組織的:在第二部分,我們介紹一些可用的公共數據庫,在第三部分,我們列一些最先進的深度學習的表情識別現狀,我們將以討論和對比的方式在第四部分和第五部分結束整篇文章,最后是一個對未來工作的整體概括。
-
詞組:
- provide a review of 提供對什么的回顧
- with an interpretation of the problems 對問題的解釋
-
大意:文章的整體結構:第二部分,是可用的公共數據庫,第三部分是最先進的表情識別技術現狀,四五部分是對文章的對比和討論,最后一部分是對未來工作的總結。
Facial Available Databases
- One of the success factors of deep learning is the training the neuron network with examples, several FER databases now available to researchers to accomplish this task, each one different from the others in term of the number and size of images and videos, variations of the illumination, population and face pose. Some presented in the Table.1 in which we will note its presence in the works cited in the following section.
- 深度學習的一個成功因素就是使用樣例去訓練神經元網絡,一些表情識別的數據庫可以讓研究者利用來完成這個任務,每一個在數量和圖片以及視頻的大小上都是彼此不同的,他們的遮擋度,人口密度和面部姿態也是不同的。下面的table1展示了一些,在后面的幾個部分中,在我們引用的時候可以注意到他們的存在。
Facial Emotion Recognition Using Deep Learning
-
Despite the notable success of traditional facial recognition methods through the extracted of handcrafted features, over the past decade researchers have directed to the deep learning approach due to its high automatic recognition capacity. In this context, we will present some recent studies in FER, which show proposed methods of deep learning in order to obtain better detection. Train and test on several static or sequential databases.
-
盡管傳統的面部表情識別方式通過手工特征的提取已經取得了很顯著的成就,但是在過去幾十年之間,研究者們將方向轉到了深度學習的研究方向,這主要是因為他的自動識別的性能很高。在這個背景下,我們將列舉一些最近的在表情識別方面的研究,為了能夠獲得更好的檢測結果,我們主要展示一些被推薦的方法。他們訓練和測試的過程,主要是基于一些靜態或者連續的數據庫。
-
生詞和詞組:
- direct to the deep learning approach 調整方向到深度學習
- in this context 在這個背景下
-
大意:深度學習的自動表情識別的性能很高,故稱為研究者的研究方向。
-
Mollahosseini et al. [23] propose deep CNN for FER across several available databases. After extracting the facial landmarks from the data, the images reduced to 48x 48 pixels. Then, they applied the augmentation data technique. The architecture used consist of two convolution-pooling layers, then add two inception styles modules, which contains convolutional layers size 1x1, 3x3 and 5x5. They present the ability to use technique the network-in-network, which allow increasing local performance due to the convolution layers applied locally, and this technique also make it possible to reduce the over-fitting problem.
-
Mollahosseini等人提出通過常規的數據庫,使用深度卷積網絡進行表情識別的想法。從數據庫中提取了明顯的標志之后,圖片就被減少到4848像素大小。然后,他們使用信息增強技術,對圖片進行增強。他們使用的網絡結構是由兩個卷積池化層,兩個起始模塊構成的,這兩個其實模塊都包含了的11,33,和55的卷積層。他們體現出了使用內網的能力,正是因為的局部應用卷積層,使得他們能夠提高局部的效果,除此之外,這個技術也減少了過擬合問題的出現。
-
生詞:
- across several available databases 使用多個常見可獲得數據庫
- augmentation 增強
-
大意:使用卷積網路實現表情識別,其網絡的構造是兩個卷積池化層和兩個初始模式模塊,每一個初始模式模塊都包含了三個卷積層,分別是11,33,5*5。其使用內網的技術,使得整個網絡不會出現過擬合的情況。
-
注意:希望重點了解一下Mollahossenii的卷積層實現表情識別的方法。
-
Lopes et al. [24] Studied the impact of data pre-processing before the training the network in order to have a better emotion classification. Data augmentation, rotation correction, cropping, down sampling with 32x32 pixels and intensity normalisation are the steps that were applied before CNN, which consist of two convolution-pooling layers ending with two fully connected with 256 and 7 neurons. The best weight gained at the training stage are used at the test stage. This experience was evaluated in three accessible databases: CK+, JAFFE, BU-3DFE. Researchers shows that combining all of these pre-processing steps is more effective than applying them separately.
-
為了有一個更好的情緒分類,Lopes等人特地研究了在訓練網絡之前對數據進行預處理的效果。在訓練卷積網絡之前,一定要經歷數據增強,旋轉矯正,剪裁矯正,基于32*32像素的下采樣和強度標準化等預處理步驟。卷積網絡是由兩個卷積池化層和兩個全連接的神經元層構成得,兩個層的神經元分別是256和7個。在訓練階段獲得的最好的權重被用于測試階段。研究者總共在三個常見的數據庫CK+,JAFFE,和BU-3DFE上進行了測試。他們的研究表明,比起分開單獨應用,整合了這些預處理之后,識別得效果更佳。
-
生詞和詞組:
- data augmentation 數據增強
- rotation correction 旋轉矯正
- cropping 剪裁矯正
- down sampling 下采樣
- intensity normalisation 強度標準化
-
大意:Lopes等人也是使用卷積網絡,不同的是他在訓練網絡之前,增加對圖片數據的預處理,效果顯著提高和增加了。
-
These pre-processing techniques also implemented by Mohammadpour et al. [25]. They propose a novel CNN for detecting AUs of the face. For the network, they use two convolution layers, each followed by a max pooling and ending with two fully connected layers that indicate the numbers of AUs activated.
-
這些數據預處理技術被Mohammadpour等人廣泛應用。他們提出一種新穎的CNN網絡,主要用于檢測臉上的AUs。對于網絡而言,他們使用兩個卷積層,每一個后面都跟隨一個最大的池化層,整個網絡是以兩個全連接層結束的,他們的作用是顯示正在活動的Aus的數目。
-
生詞和詞組:
- novel 作為形容詞是指新穎的
-
大意:Mohammadpour等人提出專門用來檢測AUs的卷積網絡,實時統計AUs的數目。
-
In 2018, for the disappearance or explosion gradient problem Cai et al. [26] propose a novel architecture CNN with Sparse Batch normalization SBP. The property of this network is to use two convolution layers successive at the beginning, followed by max pooling then SBP, and to reduce the over-fitting problem, the dropout applied in the middle of three fully connected. For the facial occlusion problem Li et al. [27] present a new method of CNN, firstly the data introduced into VGGNet network, then they apply the technique of CNN with attention mechanism ACNN. This architecture trained and tested in three large databases FED-RO, RAF-DB and AffectNet.
-
在2018年,針對梯度消失和爆炸問題,Cai等人使用稀疏性標準性處理,提出一種新型架構的CNN網絡。這個網絡的特性就是就是在一開始使用兩個連續的卷積層,然后最大池化層,緊接著是SBP,最后,為了減少過擬合問題的出現,在兩個全連接層之間使用了dropout層。對于面部遮擋的問題,Li等人提出了一種新的CNN網絡,首先數據被引入到VGGNet網絡中,然后他們使用了增加注意力機制的CNN的網絡,這個架構使用FED-RO, RAF-DB 和AffectNet三個數據集進行訓練和測試。
-
生詞和詞組:
- attention machanism 注意力機制
-
大意:
- Cai等人使用稀疏標準性處理,他的網絡架構是兩個連續的卷積層,然后最大池化層,緊接著是SBP,最后,為了減少過擬合問題的出現,在兩個全連接層之間使用了dropout層。
- Li等人使用新的CNN網絡,將數據引入到VGGNet網絡,是用來增加注意力機制的CNN網絡。
-
Detection of the essential parts of the face was proposed by Yolcu et al. [28].They used three CNN with same architecture each one detect a part of the face such as eyebrow, eye and mouth. Before introducing the images into CNN, they go through the crop stage and the detection of key-point facial. The iconic face obtained combined with the raw image was introduced into second type of CNN to detect facial expression. Researchers show that this method offers better accuracy than the use raw images or iconize face alone (See Fig.1.a).
-
對于臉部基本信息探索和發現是由的Yolcu等人提出的。他們使用三個具有相同結構的CNN層,每一個網絡都能夠檢測出臉的一部分,比如說眼睫毛、眼睛和嘴巴。在將圖片輸入到卷積網絡之前,他們會經過剪裁矯正和關鍵點面部檢測的過程。和原始圖片混合之后圖標化臉被輸入第二個CNN網絡,用來檢測面部表情。研究者發現,比起直接使用原圖片或者單單像素化臉部,這種方式能夠有更好的效果。
-
大意:Yolcu et al. [28]等人將提取特征的圖片和原圖片一塊輸入到卷積網絡中,比單純的只輸入原圖片或者提取的特征的效果要好多了。他的網絡架構:三個的一摸一樣的卷積層,每一個網絡都特地針對臉上某一個部位。他們通過剪裁矯正和關鍵面部提取獲取臉部的特征。
-
In 2019, Agrawal et Mittal [29] make a study of the influence variation of the CNN parameters on the recognition rate using FER2013 database. First, all the images are all defined at 64x64 pixels, and they make a variation in size and number of filters also the type of optimizer chosen (adam, SGD, adadelta) on a simple CNN, which contain two successive convolution layers, the second layer play the role the max pooling, then a softmax function for classification. According to these studies, researchers create two novel models of CNN achieve average 65.23% and 65.77% of accuracy, the particularity of these models is that they do not contain fully connected layers dropout, and the same filter size remains in the network.
-
在2019年,Agrawal et Mittal [29] 使用FER2013測試集,做了卷積網絡中的參數對表情識別率的影響的研究。首先,所有的圖片都被設置為64*64像素,對圖片的大小進行變化,對過濾器的數量進行變化,同時還包括對于每一個卷積層的優化器的選擇。他的網絡包含兩個連續的卷積層,第二個卷積層充當著最大池化層的作用,最后連接的就是使用softmax的分類器的。根據他的研究,研究者們創建了兩個新型的CNN,他們分別實現了平均精確率是65.23%和65.77%,這些模型的特殊性在于他們并不會包含全連接層退出,并且整個網絡中相同過濾器仍舊存在。
-
生詞和詞組:
- fully connected layers dropout 全連接層退出
-
大意:
- Agrawal et Mittal [29]主要研究了卷積網絡參數對于識別效果的影響。在不包含全連接層退出的情況下,他們實現了65.23的平均精確率。
-
Deepak jain et al. [30] propose a novel deep CNN witch contain two residual blocks, each one contain four convolution layers. These model trains on JAFFE and CK+ databases after a pre-processing step, which allows cropping and normalizing the intensity of the images.
-
Deepak jain et al. [30] 提出了一種新型的深度卷積網絡,其包含了的兩個殘余的模塊,每一個都包含了四個卷積層。這些模型是使用JAFFE和CK+進行訓練的,而且要對數據有預處理,主要是cropping和圖片的密度的標準化。
-
生詞和詞組:
- residual 殘余的
-
Deepak jain et al. [30] 提出一種新的網絡結構,使用兩個的后綴模塊。
-
Kim et al. [31] studies variation facial expression during emotional state, they propose a spatio-temporal architect with a combination between CNN and LSTM. At first time, CNN learn the spatial features of the facial expression in all the frames of the emotional state followed by an LSTM applied to preserve the whole sequence of these spatial features. Also Yu et al. [32] Present a novel architecture called Spatio-Temporal Convolutional with Nested LSTM (STC-NLSTM), this architecture based on three deep learning sub network such as: 3DCNN for extraction spatiotemporal features followed by temporal T-LSTM to preserve the temporal dynamic, then the convolutional C-LSTM for modelled the multi-level features.
-
Kim et al. [31]在表情狀態中面部表情的變化,他們提出了一種時空上的結構,其是CNN和LSTM的組合。首先,CNN學習表情狀態中的所有分支的空間特征,然后LSTM保存的所有空間特征的完整序列。
-
同時,Yu et al. [32] 也提出了一種新的結構,叫做帶有內嵌的LSTM的時空卷積層。她是基于三個深度學習子網絡。使用3個卷積網絡用來提取時空特征,然后使用T-LSTM保存時間上的動態,最后是帶有卷積特性的C-LSTM,這個主要是用來制作多層特征。
-
生詞和詞組
- spatial feature 空間上的特征
- spatio-temporal 時空的
-
大意:CNN和LSTM結合主要用來提取表情狀態的時空特征。
-
Deep convolutional BiLSTM architecture was proposed by Liang et al. [33], they create two DCNN, one of which is designated for spatial features and the other for extracting temporal features in facial expression sequences, these features fused at level on a vector with 256 dimensions, and for the classification into one of the six basic emotions, researchers used BiLTSM network. For the pre-processing stage, they used the Multitask cascade convolutional network for detecting the face, then applied the technique of data augmentation to broaden database (See Fig.1.b).
-
Liang et al. [33]提出深度卷積的BiLSTM網絡結構。他們創建了兩個DCNN,一個都被專門設計用來提取表情序列中空間特征,另外一個用于提取表情序列中時間特征。他們的特征在同層中被混合到一個256個維度的向量中,然后被分類為6種基本的表情。而這個網絡結構,在數據預處理階段,他們使用多任務級聯的卷積網絡,來檢測臉部,然后使用數據增強技術拓寬數據庫。
-
All of the researchers cited previously classifying the basic emotions: happiness, disgust, surprise, anger, fear, sadness and neutral, Fig 3. Present some different architecture proposed by the researchers who mentioned above.
-
上文所有被引用的研究者都是將表情分為六種基本的表情:高興,厭惡,驚喜,生氣,恐懼,悲傷和中性。圖片三,列出了上文提到的研究者們使用的不同的架構。
Discussion and Comparison
-
In this paper, we clearly noted the significant interest of researchers in FER via deep learning over recent years. The automatic FER task goes through different steps like: data processing, proposed model architecture and finally emotion recognition.
-
在這篇文章中, 我們清除地指出了近些年,在表情識別領域的研究者對于深度學習的巨大興趣。自動化的表情識別任務需要經歷不同的幾個步驟:數據預處理,提出模型結構和最終進行表情識別。
-
The preprocessing is an important step, which was present in all the papers cited in this review, that consist several techniques such as resized and cropped images to reduce the time of training, normalization spatial and intensity pixels and the data augmentation to increase the diversity of the images and eliminate the over-fitting problem. All these techniques are well presented by lopes et al. [24].
-
數據預處理是十分重要的一個步驟,在這篇回顧中所有被引用的文章,都用到了數據預處理。數據預處理一般是由幾個不同的技術構成的,比如說調整大小,建材圖片,以減少訓練的時間,標準化空間和密集像素,數據增強以增加圖片的多樣性和消除過擬合問題。這些技術在 lopes et al. [24]的文章中有被詳細地提及。
-
生詞和短語:
- the diversity of the images 圖片的多樣性
- eliminate the over-fitting problems 消除過擬合問題
-
大意:數據增強可以增加訓練集的多樣性和減少過擬合問題的出現
-
Several methods and contributions presented in this review was achieved high accuracy. Mollahosseini et al. [23] showed the important performance by adding inception layers in the networks, . Mohammadpour et al. [25] prefer to extract AU from the face than the classification directly the emotions, Li et al. [27] is interested in the study the problem of occlusion images, also for to get network deeper, Deepak et al. [30] propose adding the residual blocks. Yolcu et al. [28] shows the advantage of adding the iconized face in the input of the network, enhance compared with the training just with the raw images. For Agrawal et Mittal.[29] offers two new CNN architecture after an in-depth study the impact of CNN parameters on the recognition rate. Most of these methods presented competitive results over than 90%. (See Table.2)
-
這篇文章中提到的幾個方法和他們所做的貢獻,都已經達到了很高的精度。 Mollahosseini et al. [23]在網絡中添加起始層發揮出十分重要的作用。 比起直接對表情做分類,Mohammadpour et al. [25]更喜歡從面部表情中提取AUs。 Li et al. [27] 的興趣在于研究圖片遮擋問題,除此之外,要比誰的網絡更深,Deepak et al. [30]提出在網絡最后加上一個冗余層。比起直接使用原圖像進行訓練,Yolcu et al. [28]的研究發現在網絡的輸入層中直接輸入標識化的臉部訓練的優勢。在深入研究了卷積網絡參數對于識別率的影響之后,Agrawal et Mittal.[29]提出了兩種新的CNN架構。這些方法中的大部分都展現了比較好的結果,大部分都是在的90%以上。
-
生詞和詞組:
- occlusion image 遮擋的圖片
- residual blocks 附加模塊,剩余模塊
-
大意:這里列舉和對比了所有相關文章的特長。
-
For extraction the spatio-temporal features researchers proposed different structures of deep learning such as a combination of CNN-LSTM, 3DCNN, and a Deep CNN. According to the results obtained, the methods proposed by Yu et al. [32] and Liang et al. [33] achieve better precision compared to the method used by Kim et al. [31]. With a rate higher than 99%.
-
對于時空特征的提取,研究者提出了深度學習的不同架構,比如說CNN-LSTM, 3DCNN,和Deep CNN的組合。根據獲得結果,比起Kim et al. [31].使用的方法, Yu et al. [32] 和 Liang et al. [33]使用的方法達到了更高的精度,高于99%。
-
Researchers achieve high precision in FER by applying CNN networks with spatial data and for sequential data, researchers used the combination between CNN-RNN especially LSTM network, this indicate that CNN is the network basic of deep learning for FER. For the CNN parameters, the Softmax function and Adam optimization algorithm are the most used by researchers. We also note in order to test the effectiveness of the proposed neural network architecture, researchers trained and tested their model in several databases, and we clearly see that the recognition rate varies from one database to another with the same DL model (See Table.2).
-
將空間數據和連續的數據輸入給CNN網絡之后,研究者能夠獲得更高的精度。他們使用的網絡是CNN-RNN,尤其是LSTM網絡,這表明CNN是深度學習進行表情識別的網絡基礎。對于卷積網絡的參數,Softmax參數和Adam優化器算法是被大多數研究者采用的。為了能夠測試提出的神經網絡的效果,我們說明了研究者載訓練和測試過程中使用的數據庫,我們也清楚地看到了即使使用相同的深度學習框架,最終的識別率也是不盡相同的。
Conclusion and future work:
- This paper presented recent research on FER, allowed us to know the latest developments in this area. We have described different architectures of CNN and CNN-LSTM recently proposed by different researchers, and presented some different database containing spontaneous images collected from the real world and others formed in laboratories (SeeTable.1), in order to have and achieve an accurate detection of human emotions. We also present a discussion that shows the high rate obtained by researchers that is what highlight that machines today will be more capable of interpreting emotions, which implies that the interaction human machine becomes more and more natural.
- 這篇文章列出了關于表情識別近期的研究,使得我們能知道在這個領域的最新發展。我們也描述了近些年不同研究者提出來的不用的CNN和CNN-LSTM架構。為了能夠獲實現對人類表情的精確識別,我們也列出了不用的數據庫,其圖片都是從真實世界收集到的。除了數據庫中的圖片,別的都是實驗室生成的。我們也提出了討論,表明研究者獲得高識別率是十分重要的,這體現出今天的機器將更加準確地理解情緒,這意味著人家交互變得越來越自然。
- 生詞和詞組:
- spontaneous 自發的,自然的,天然產生的,無意識的
- spontaneous images 自然的圖片,天然的圖片
- highlight v.突出,強調,醒目 n. 最好的部分
- FER are one of the most important ways of providing information about the emotional state, but they are always limited by learning only the six-basic emotion plus neutral. It conflicts with what is present in everyday life, which has emotions that are more complex. This will push researchers in the future work to build larger databases and create powerful deep learning architectures to recognize all basic and secondary emotions. Moreover, today emotion recognition has passed from unimodal analysis to complex system multimodal. Pantic et Rothkrantz [36] show that multimodality is one of the condition for having an ideal detection of human emotion. Researchers are now pushing their research to create and offer powerful multimodal deep learning architectures and databases, for example the fusion of audio and visual studied by Zhang et al. [37] and Ringeval et al. [38] for audio-visual and physiological modalities.
- 表情識別是提供有關情緒狀態信息的一個十分重要的渠道。但是他們往往受學習表情數量的限制,表情數量往往只有六種還有一種中立表情。這就和人們每天生活所表現出來的相違背,因為情緒往往是更加復雜的。這將推動研究者在未來工作中創建更大的數據庫和更有效的學習框架,去識別所有基礎表情和衍生的表情。除此之外,今天的表情識別已經由單模態分析過渡到了復雜系統多模態。Pantic et Rothkrantz [36] 指出,多模態是人類情緒理想化識別的一種情況。研究者現在正在推進他們的研究去創建和提供一個更有效的多模態的深度表情識別模型和數據庫。比如說, Zhang et al. [37] 研究的音頻和視頻的融合,Ringeval et al. [38]研究的音視頻和生理學相融合的模式。
- 生詞和詞組:
- unimodal analyssi 單模態分析
- complex system multimodal 復雜系統多態
- the fusion of audio 音頻的融合
- physiological 生理學
- psychology 心理學
分析與總結
- 看完這篇綜述論文,我又覺得我行了,但是還是有問題,很多文章中引用的一些最先進的文章,并沒有進一步看。還需要再細致地看第二遍。
總結
以上是生活随笔為你收集整理的论文翻译阅读——Facial Emotion RecognitionUsing Deep Learning:Review And Insights的全部內容,希望文章能夠幫你解決所遇到的問題。
- 上一篇: Tensorflow 自然语言处理
- 下一篇: mysql 时间计算器