當前位置：首頁 > 编程资源 > 编程问答 >内容正文

编程问答

SR综述论文总结

發布時間：2023/12/16 编程问答 30 豆豆

生活随笔收集整理的這篇文章主要介紹了 SR综述论文总结小編覺得挺不錯的,現在分享給大家,幫大家做個參考.

文章目錄

- 論文：A Deep Journey into Super-resolution: A Survey
- - - 論文概要
    - BackGround
    - SISR 分類
    - 實驗評估
    - 未來方向

論文：A Deep Journey into Super-resolution: A Survey

作者：Saeed Anwar, Salman Khan, and Nick Barnes

論文概要

論文概要：
對比了近30個最新的超分辨率卷積網絡在6個數據集上(3個經典的，3個最近提出的)的表現，來給SISR定下基準。分成9類。我們還提供了在網絡復雜性、內存占用、模型輸入和輸出、學習細節、網絡損失類型和重要架構差異方面的比較。
SISR應用方面：
large computer displays
HD television sets
hand-held devices (mobile phones,tablets, cameras etc.).
object detection in scenes (particularly small objects )
face recognition in surveillance videos
medical imaging
improving interpretation of images in remote sensing
astronomical images
forensics
超分辨率是一個經典的問題，但由于種種原因，仍然是計算機視覺領域一個具有挑戰性和開放性的研究課題
原因：
SR is an ill-posed inverse problem
(There exist multiple solutions for the same low-resolution image. To constrain the solution-space, reliable prior information is typically required.)
the complexity of the problem increases as the up-scaling factor increases.(x2,x4,x8,問題就會變得越來越難)
assessment of the quality of output is not straightforward（模型的質量評估不容易，質量指標PSNR,SSIM只與人類的感知有松散的聯系）
DL 在其他 AI 領域的應用：
目標分類與探測
自然語言處理
圖像處理
音頻信號處理
本論文的貢獻：
全面回顧超分辨率的最新技術
基于各種超分辨率算法結構的不同提出一個的新的分類方法
基于參數數量、算法設置、訓練細節和重要的結構創新進行全面的分析
我們對算法進行系統的評估（在6個SISR數據集上）
討論了目前超分領域的挑戰和對未來研究的展望

BackGround

Degradation Process:

$\Phi( x ; \theta_\eta)\tag{1}$
$x$ ：HR 圖像
$y$ ：LR 圖像
$Φ\Phi$ ：degradation function
$θη\theta_\eta$ ：degradation parameters (scaling factor,noise)

現實中，只有 $y$ 是可獲取的，并且沒有降解過程和也沒有降解參數，超分辨率就是試圖消除降解效應去獲得和 $x$ (真實HR圖像) 近似的圖像 $x^\hat{x}$

$x^=Φ?1(y,θ?)(2)\hat{x}=\Phi^{-1}(y,\theta_\varsigma)\tag{2}$
$θ?\theta_\varsigma$ ： $Φ?1\Phi^{-1}$ 的參數
降解的過程是未知且非常復雜的，受到很多因素影響，例如：noise (sensor and speckle), compression, blur (defocus and motion), and other artifacts

因此，大多數研究工作相對(1)更喜歡下邊降解模型：
$\otimes k) \downarrow_s+ \ n\tag{3}$
$k$ ：blurring kernel
$\otimes k$ ：convolution operation
$↓s\downarrow_s$ ：downsampling operation with a scaling factor $s$
$n$ : the additive white Gaussian noise (AWGN) with a standard deviation of $σ\sigma$ (noise level).

圖像超分辨率的目標就是去最小化與模型 $\otimes k+ \ n$ 相關的數據保真項 (data fidelity term) 如下：
$J(x^,θ?,k)=∥x?k?y∥?datafidelityterm+αΨ(x,θ?)?regularizer(正則化)J(\hat{x},\theta_\varsigma,k)=\underbrace{\|x \otimes k -y\|}_{data\ fidelity\ term}+\underbrace{\alpha \Psi(x,\theta_\varsigma)}_{regularizer(正則化)}$
$α\alpha$ ：（the data fidelity term and image prior $Ψ(?)\Psi(\cdot)$ ）平衡系數

自然圖像先驗
在自然圖像處理領域里,有很多問題(比如圖像去噪、圖像去模糊、圖像修復、圖像重建等)都是反問題 ,即問題的解不是唯一的。為了縮小問題的解的空間或者說為了更好的逼近真實解,我們需要添加限制條件。這些限制條件來自自然圖像本身的特性,即自然圖像的先驗信息。如果能夠很好地利用自然圖像的先驗信息,就可以從低質量的圖像上恢復出高質量的圖像,因此研究自然圖像的先驗信息是非常有意義的。
目前常用的自然圖像的先驗信息有自然圖像的局部平滑性、非局部自相似性、非高斯性、統計特性、稀疏性等特征。
作者：showaichuan
鏈接：https://www.jianshu.com/p/ed8a5b05c3a4
來源：簡書

基于圖像先驗，超分辨率的方法大致可以分為如下幾個類別：
prediction methods
edgebased methods
statistical methods
patch-based methods
deep learning methods

SISR 分類

Linear networks
only a single path for signal flow without any skip connections or multiple-branches

note：some linear networks learn to reproduce the residual image （the difference between the LR and HR images）

根據 up-sampling operation 可以分兩類：
- early upsampling
  
  首先對LR輸入進行上采樣以匹配所需的HR輸出大小，然后學習層次特征表示以生成輸出
  常用的上采樣方法：雙立方插值算法
  - SRCNN（using only convolutional layers for super-resolution）
- 數據集：
  training data set：
  HR圖像：synthesized by extracting non-overlapping dense patches of size 32 $×\times$ 32 from the HR images
  LR圖像：The LR input patches are first downsampled and then upsampled using bicubic interpolation having the same size as the high-resolution output image
- Layers ：three convolutional and two ReLU layers
  convolutional layer is termed as patch extraction or feature extraction（從輸入圖像創建特征映射）
  convolutional layer is called non-linear mapping（非線性映射，將特征映射轉換為高維特征向量）
  convolutional layer aggregates the features maps to output the final high-resolution image
- Loss function：Mean Squared Error (MSE)
- Layers ：deep CNN architecture
  (the VGG-net and uses fixed-size convolutions 3 $×\times$ 3 in all network layers)
  To avoid slow convergence(緩慢收斂) in deep networks (specifically with 20 weight layers), they propose two effective strategies :
- learn a residual mapping(殘差映射) that generates the difference between the HR and LR image(使得目標更簡單，網絡只聚焦在高頻信息)
- gradients are clipped with(夾在) in the range $[?θ,+θ][\ -\theta,+\theta\ ]$ (使得到學習率可以加速訓練過程)
- 觀點：deeper networks can provide better contextualization and learn generalizable representations that can be used for multi-scale super-resolution
  
  VDSR 與 ResNet
- learns to predict a high-frequency residual directly instead of the latent super-resolved image
- Layers ：similar to SRCNN
- depends heavily on the accuracy of noise estimation without knowing the underlying structures and textures present in the image
- computationally expensive (batch normalization operations after every convolutional layer)
- 提出了一套基于CNN的去噪器，可以聯合用于圖像去噪、去模糊和超分辨率等幾個低層次的視覺任務
- Specifically，利用半二次分裂Half Quadratric Splitting (HQS)技術對觀測模型中的正則項和保真項進行解耦，然后，利用CNN具有較強的建模能力和測試時間效率，對去噪先驗進行判別學習
- Layers：CNN去噪器由7個(dilated convolution layers)擴張卷積層組成的堆棧組成，這些卷積層與批歸一化和ReLU非線性層交錯。擴展操作通過封閉更大的接受域有助于對較大的上下文進行建模。
- residual image learning is performed in a similar manner to previous architectures （VDSR, DRCN and DRRN）
- 使用小尺寸的訓練樣本和零填充來避免卷積運算造成的（boundary artifacts）邊界偽影
- late upsampling
  后上采樣網絡對低分辨率輸入進行學習，然后對網絡輸出附近的特征進行上采樣(低內存占用)
  - FSRCNN
- improves speed and quality over SRCNN
- Datasets: 91-image dataset ,
  Data augmentation such as rotation, flipping,and scaling is also employed to increase the number of images by 19 times
- Layers：consists of four convolution layers （feature extraction, shrinking, non-linear mapping, and expansion layers）and one deconvolution
- feature extraction step is similar to SRCNN(difference lies in the input size and the filter size, the input to FSRCNN is the original patch without upsampling it)
- shrinking layer : reduce the feature dimensions (number of parameters) by adopting a smaller filter size (i.e. f=1)
- non-linear mapping (critical step)：the size of filters in the non-linear mapping layer is set to three,while the number of channels is kept the same as the previous layer
- expansion layers :an inverse operation of the shrinking step to increase the number of dimensions
- upsampling and aggregating deconvolution layer : stride acts as an upscaling factor
- 使用(PReLU)代替了每個卷積層后的整流線性單元(ReLU)
- Loss Funcion : mean-square error
- ESPCN(Efficient sub-pixel convolutional neural network)
  a fast SR approach that can operate in real-time
  both for images and videos
- perform feature extraction in the LR space
- at the very end to aggregate LR feature maps and simultaneously perform projection to high dimensional space to reconstruct the HR image.
- sub-pixel convolution operation used in this work is essentially similar to convolution transpose or deconvolution operation(使用 fractional kernel stride 分數級步幅用于提高輸入特征圖的空間分辨率)
- Loss Function ： $l_1$ loss
  A separate upscaling kernel is used to map each feature map
Residual Networks（殘差網絡）
uses skip connections in the network design (avoid gradients vanishing, more feasible)

algorithms learn residue i.e. the high-frequencies between the input and ground-truth

根據 the number of stages used in such networks 可以分成兩類：
- Single-stage Residual Nets
  - EDSR(The Enhanced Deep Super-Resolution)
    modifies the ResNet architecture to work with the SR task
- Removing Batch Normalization layers (from each residual block) and ReLU activation (outside residual blocks) (實質性的改進)
- Similar to VDSR, they also extended their single scale approach to work on multiple scales.
- Propose Multi-scale Deep SR (MDSR) architecture(reduces the number of parameters through a majority of shared parameters)
- 特定于尺度的層僅并行地應用于輸入和輸出塊附近，以學習與尺度相關的表示。
- Data augmentation (rotations and flips) was used to create a ‘self-ensemble’ ( transformed inputs are passed through the network, reverse-transformed and averaged together to create a single output )
- Better performance compared to SR-CNN, VDSR，SR-GAN
- Loss Function ： $l_1$ loss
- 與其他模型的區別在于本地和全局級聯模塊的存在
- 中間層的特點是級聯的，且聚集到一個 $1×11\times1$ 的卷積層上
- 本地級聯連接與全局級聯連接相同，只是這些塊是簡單的剩余塊。
- DateSets：using $64×6464\times64$ patches from BSD , Yang et al. and DIV2K dataset with data augmentation
- Loss Function ： $l_1$ loss
- Adam is used for optimization with an initial learning rate of $10^{-4}$ which is halved after every $4×1054\times 10 ^ 5$ steps
- Multi-stage Residual Nets
  composed of multiple subnets that are generally trained in succession (第一個子網通常預測粗特征，而其他子網改進了初始預測)
  encoder-decoder designs(first downsample the input using an encoder and then perform upsampling via a decoder)(hence two distinct stages)
  - FormResNet
    composed of two networks, both of which are similar to DnCNN，the difference lies in the loss layers
- The first network(Formatting layer格式化層)
  Loss = Euclidean loss + perceptual loss
  The classical algorithms such as BM3D can also replace this formatting layer
- The second deep network (DiffResNet)
  第二層網絡的輸入取自第一層網絡
- Formatting layer removes high-frequency corruption in uniform areas
  DiffResNet learns the structured regions
- low-resolution stage：
  feature maps have a smaller size, the same as the input patch
  （通過反褶積和最近鄰上采樣對特征圖進行上采樣）
- high-resolution stages：
  The upsampled feature maps are then fed into the high-resolution stage
- a variant of residual block called projected convolution is employed（In both the low-resolution and the high-resolution stages）
  residual block consists of $\times 1$ convolutional layer as a feature map projection to decrease the input size of $\times 3$ convolutional features
  LR stage has six residual blocks，HR stage consists of four residual blocks
- DataSets： DIV2K dataset
  During training, the images are cropped to $108 \times 108$ sized patches and augmented using flipping and rotation operations
- The optimization was performed using Adam
- The residual block consists of 128 feature maps as input and 64 as output
- Loss Function ： $l_2$ loss
- the convolutional layers
  (在保留對象結構和去除退化的同時提取特征映射)
- the deconvolutional layers
  reconstruct the missing details of the images
- skip connections are added between the convolutional and the symmetric deconvolutional layer
  卷積層的特征映射與鏡像反卷積層的輸出相加，然后進行非線性校正
- (input)bicubic interpolated images
  (outcome)high-resolution image
  該網絡具有端到端可訓練性，通過最小化系統output與ground truth之間的 $l_2 -norm$ 來達到收斂性
- proposed three variants of the REDNet architecture(改變了卷積和反卷積層的數量)
  best performing architecture has 30 weight layers, each with 64 feature maps
- Datasets：the luminance channel from the Berkeley Segmentation Dataset(BSD) is used to generate the training image set
  Ground truth: The patches of size $50 \times 50$
  Input patches : 輸入的patch是通過對patch進行降采樣，再用雙三次插值的方法將其恢復到原來的大小
- Loss Function ：Mean square error (MSE)
- The input and output patch sizes are $\times 9$ and
  $\times 5$ , respectively.
  這些小塊通過其平均值和方差被歸一化，這些平均值和方差隨后被添加到相應的恢復后的最終高分辨率輸出中
  the kernel has a size of $\times 5$ with 128 feature channels
Recursive networks（遞歸網絡）
employ recursively connected convolutional layers or recursively linked units
這些設計背后的主要動機是將較難的SR問題逐步分解為一組較簡單的SR問題
- DRCN（Deep Recursive Convolutional Network）
  這種技術的一個優點是，對于更多的遞歸，參數的數量保持不變
  composed of three smaller networks：
embedding network：converts the input (either grayscale or color image) to feature maps
inference net：performs super-resolution
analyzes image regions by recursively applying a single layer （consisting of convolution and ReLU）
The size of the receptive field is increased after each recursion.
The output of the inference net is high-resolution feature maps
reconstruction net：high-resolution feature maps which are transformed to grayscale or color

DRRN（Deep Recursive Residual Network）
a deep CNN model but with conservative parametric complexity

deeper architecture with as many as 52 convolutional layers.
At the same time, they reduce the network complexity
這是通過將residual image learning與網絡中small blocks層之間的local identity connections相結合來實現的
這種并行信息流實現了對更深層架構的穩定訓練
DRRN utilizes recursive learning which replicates a basic skip-connection block several times to achieve a multi-path network block
由于在復制之間共享參數，內存成本和計算復雜度顯著降低
used the standard SGD optimizer
The loss layer is based on MSE loss

MemNet（memory network）
MemNet can be broken down into three parts similar to SRCNN

(the first part)feature extraction block ：
extracts features from the input image
(the second part)(crucial role):
consists of a series of memory blocks
memory block = a recursive unit + a gate unit
The recursive part is similar to ResNet
composed of two convolutional layers with a pre-activation mechanism and dense connections to the gate unit
Each gate unit is a convolutional layer with $\times 1$ convolutional kernel size
Loss Function: MSE
DataSets:using 200 images from BSD and 91 images from Yang et al
The network consists of six memory blocks with six recursions.The total number of layers in MemNet is 80
MemNet也被用于其他圖像恢復任務，如圖像去噪，JPEG去塊
Progressive reconstruction designs
To deal with large factors，predict the output in multiple steps $i . e .$ $×2\times 2$ followed by $×4\times 4$
(CNN算法可一步預測輸出；但是，對于大比例因子而言，這可能不可行)
- SCN(sparse coding-based network)基于稀疏編碼的網絡
  將稀疏編碼的優點與深度神經網絡的領域知識相結合，以獲得一個緊湊的模型并提高性能
  mimics a Learned Iterative Shrinkage and Thresholding Algorithm (LISTA) network to build a multi-layer neural network
the first convolutional layer extracts features from the low-resolution patches which are then fed into a LISTA network.
the LISTA network consists of a finite number of recurrent stages(to obtain the sparse code for each feature)
LISTA階段由兩個線性層和一個非線性層組成，其中激活函數具有一個閾值(threshold)，該閾值在訓練過程中被學習/更新。
為了簡化訓練，將非線性神經元分解為兩個(linear scaling layers)線性標度層和一個(unit-threshold neuron)單位閾值神經元
兩個尺度層是對角矩陣，它們互為倒數，例如，如果存在乘法尺度層，則在閾值單位之后進行除法
在LISTA網絡之后，將（sparse code）稀疏編碼與（high-resolution dictionary）高分辨率字典相乘，在連續的線性層中重構原始的高分辨率patch
最后一步，再次使用線性層，將高分辨率的patch放置在圖像的原始位置，獲得高分辨率的輸出。

LapSRN（Deep Laplacian pyramid super-resolution network）深度拉普拉斯金字塔超分辨率網絡
consists of three sub-networks that progressively predict the residual images up to a factor of $×8\times8$
將每個子網絡的殘差圖像加入到輸入LR圖像中，得到SR圖像

The output :
(first sub-network) a residue of $×2\times2$
(second sub-network) a residue of $×4\times4$
(last sub-network) a residue of $×8\times8$
將這些剩余圖像加入相應比例的上采樣圖像中，得到最終的超分辨圖像。
將residual prediction branch稱為feature extraction
將the addition of bicubic images with the residue稱為image reconstruction branch
the LapSRN network consists of three types of elements (the convolutional layers, leaky ReLU,and deconvolutional layers)
Loss Function：Charbonnier(一種可微分的 $l_1$ 損失函數變體，它可以處理異常值)
employed at every sub-network, resembling a multi-loss structure
the filter sizes for convolutional and deconvolutional layers are $3×33\times3$ and $4×44\times4$ , having 64 channels each
DataSets: images from Yang et al. and 200 images from BSD dataset
他們還提出了一個稱為Multi-scale (MS) LapSRN的單一模型，該模型聯合學習處理多個SR尺度。單個MSLapSRN模型的性能優于三個不同模型的結果。對這種效應的一種解釋是，單一模型利用了共同的內部尺度特征，這有助于獲得更準確的結果
Densely Connected Networks
DenseNet architecture
這種設計的主要動機是將沿著網絡深度可用的層次線索組合起來（combine hierarchical cues available along the network depth），以實現更高的靈活性和更豐富的特性表示。
- SR-DenseNet
  based on the DenseNet which uses dense connections between the layers（a layer directly operates on the output from all previous layers）
這種從低層到高層的信息流動避免了梯度消失的問題，使學習compact models成為可能，并加快了訓練過程。
Towards the rear part of the network, SR-DenseNet uses a couple of deconvolution layers to upscale the inputs.
three variants of SR-DenseNet ：
a sequential arrangement of dense blocks followed by deconvolution layers
這樣，只有高層次的特征被用于重建最終的SR圖像
Low-level features from initial layers are combined before final reconstruction.
跳躍連接用于組合低層次和高層次的特征
All features are combined by using multiple skip connections between low-level features and the dense blocks（to allow a direct flow of information for a better HR reconstruction）
Since complementary features are encoded at multiple stages in the network, the combination of all feature maps gives the best performance
Loss Function :MSE error ( $l_{2}$ loss )

RDN(Residual Dense Network)
combines residual skip connections (inspired by SR-ResNet) with dense connections (inspired by SR-DenseNet)
主要動機是充分利用(hierarchical feature representations)分層特性表示來學習(local patterns)局部模式

residual connections are introduced at two levels; local and global
(At the local level) 提出了一種新的殘差密集塊(residual density block, RDB)，將每個塊的輸入傳到RDB中的所有層，并將其添加到塊的輸出中，使每個塊更關注殘差模式（residual patterns）。
由于密集的連接會很快產生高維輸出，因此每個RDB使用了一種包含一個 $1×11\times1$ 卷積的局部特征融合方法來減少維數
(At the global level)將多個RDB的輸出融合在一起(通過拼接和一個 $1×11\times1$ 卷積操作)，并執行全局殘差學習來合并網絡中多個塊的特征
The residual connections help stabilize network training and results in an improvement over the SR-DenseNet
Loss Function: $l_1$ loss
在每批隨機選擇 $32×3232\times32$ 的patch來進行網絡訓練
通過翻轉和旋轉來增加數據是一種正則化措施
作者還對在LR圖像中不同降解degradation形式（噪音和人為干擾)）的環境進行了實驗。該方法具有良好的抗退化能力（resilience against such degradation），能夠恢復較好的SR圖像

D-DBPN（Dense deep back-projection network）致密深部反投影網絡
從傳統的SR方法中獲得靈感（迭代地執行反向投影，以了解LR和HR圖像之間的反饋錯誤信號）
其動機是，只有前饋方法不是建模從LR到HR圖像映射的最佳方法，而反饋機制可以極大地幫助實現更好的結果

comprises of a series of up and down sampling layers that are densely connected with each other
將網絡中多個深度的HR圖像進行組合，得到最終的輸出
該設計的一個重要特點是將 upsampling outputs for input feature map 與residual signal 相結合。
在upsampled feature map中添加residual signal 可提供錯誤反饋，并迫使網絡專注于精細細節
Loss Function : $l_{1}$ loss
計算復雜度較高（ $～\sim$ 10 million parameters for $×4\times 4$ SR）一個較低的復雜性版本的最終模型也被提出(性能略有下降)
Multi-branch designs
多分支網絡的目標是在多個上下文范圍(multiple context scales)內獲得一組不同的特性，然后將這些互補信息融合在一起，得到更好的HR重構。
這種設計還支持多路徑信號流，從而在訓練過程中更好地進行前向和后向的信息交換
- CNF（Context-wise Network Fusion）
  融合多個卷積神經網絡實現圖像超分辨率
  每個SRCNN都由不同數量的層構成，然后，每個SRCNN的輸出通過一個單獨的卷積層傳遞，最終使用sum-pooling將它們融合在一起
DataSets：20 million patches collected from Open Image Dataset
The size of each patch is $33 \times 33$ pixels of luminance channel only
（First）each SRCNN is trained individually （epochs = 50 ，learning rate =1e-4 ）
（then）the fused network is trained （epochs = 10 ，learning rate =1e-4 ）
這種漸進的學習策略類似于課程學習，從簡單的任務開始，然后轉向更復雜的任務，聯合優化多個子網，以實現改進的SR。
Loss Function：Mean square error

CMSC（Cascaded multi-scale cross-network）級聯多尺度交叉網絡
composed of a feature extraction layer, cascaded subnets, and a reconstruction network

(feature extraction layer) performs the same function as mentioned for the cases of SRCNN , FSRCNN
(cascaded subnets) Each subnet is composed of merge-and-run (MR) blocks
每個MR塊由兩個并行的分支組成，每個分支有兩個卷積層，每個分支的（residual connections）剩余連接累積在一起，然后分別添加到兩個分支的輸出中
CMSC的每個子網均由四個MR塊組成，這些MR塊具有 $3×33\times3$ 、 $5×55\times5$ 和 $7×77\times7$ 的不同接收字段，以多個尺度捕獲上下文信息
MR塊中的每個卷積層后面都是batch normalization和Leaky-ReLU
(last reconstruction layer) generates the final output
Loss Function: $l_{1}$ (使用平衡項將中間輸出與最終輸出組合在一起)
Input : 使用雙三次插值對網絡的輸入進行向上采樣，patch的大小為 $41×4141\times41$
該模型使用與VDSR相似的291幅圖像進行訓練，初始學習率為 $10^{-1}$ ,每10個epochs 學習后每十世總共50時代
與EDSR及其變體MDSR相比，CMSC的性能有所滯后

IDN（Information Distillation Network）
consists of three blocks: a feature extraction block, multiple stacked information distillation blocks and a reconstruction block

（feature extraction block）composed of two convolutional layers to extract features
（distillation block）made up of two other blocks, an enhancement unit, and a compression unit.

enhancement unit ：six convolutional layers followed by leaky ReLU
將第三個卷積層的輸出進行切片，將其中的一半與block的輸入進行拼接，將剩下的一半作為第四個convolutional layer的輸入
The output of the concatenated component (連接組件) is added with the output of the enhancement block. In total, four enhancement blocks are utilized.

compression unit ：the compression unit is realized using a $1×11\times1$ convolutional layer after each enhancement block.
（ reconstruction block） a deconvolution layer with a kernel size of $17×1717\times17$ .
Loss Function: 首先利用(absolute mean error loss)絕對平均誤差損失對網絡進行訓練，然后利用(mean square error loss)均方誤差損失對網絡進行微調
Input :The input patch size is $26×2626\times26$
The initial learning rate is set to be $1 e ? 4$ for a total of $10^5$ iterations
utilizing Adam as an optimizer
Attention-based Networks
在前面討論的網絡設計中，所有的空間位置和信道對于超分辨率都具有統一的重要性，在某些情況下，它有助于有選擇地關注給定層中的少數特性。
基于注意力的模型允許這種靈活性，并考慮到并非所有的特性都是超分辨率的必要條件，但它們的重要性各不相同。與深度網絡相結合，最近的基于注意力的模型顯示了SR的顯著改進。
- SelNet
  a novel selection unit for the image super-resolution network
The selection unit serves as a gate between convolutional layers, allowing only selected values from the feature maps.
選擇單元由一個恒等映射和一個ReLU級聯、一個 $1×11\times 1$ 卷積層和一個sigmoid層組成
SelNet共包含22個卷積層，每個卷積層之后都添加一個選擇單元。與VDSR類似，SelNet也使用了殘差學習和gradient switching (a version of gradient clipping)來提高學習速度。
DataSets：low-resolution patches of size $120×120120\times 120$ （cropped from DIV2K dataset）
epochs = 50 ，learning rate = $10^{-1}$
Loss Function : $l_{2}$

RCAN(Residual Channel Attention Network)

The main highlights of the architecture include:
(a) 一種遞歸殘差設計，其中(residual connections)殘差連接存在于(global residual network)全局殘差網絡的每個塊中
(b) 每個(local residual block)局部剩余塊都有一個(channel attention mechanism)通道注意機制：the filter activations are collapsed from $h×w×ch\times w \times c$ to a vector with $1×1×c1\times 1\times c$ dimensions (after passing through a bottleneck) that acts as a selective attention over channel maps
第一個新奇之處是允許信息從最初的層流向最終的層
第二個貢獻是允許網絡將重點放在對最終任務更重要的選擇性特征映射上，并有效地建模特征映射之間的關系
Loss Function ： $l_{1}$ loss
recursive residual style architecture使超深網絡具有更好的收斂性。與當代的方法如IRCNN，VDSR和RDN 相比，它具有更好的性能，這說明了通道注意機制對低水平視覺任務的有效性
high computational complexity compared to LapSRN, MemNet and VDSR.（ $～15\sim 15$ million parameters for $×4\times 4$ SR)

SRRAM(Residual Attention Module for SR)
SRRAM結構類似于RCAN，這兩種方法都受到了EDSR的啟發
The SRRAM can be divided into three parts :

(feature extraction) similar to SRCNN
(feature upscaling) composed of residual attention modules (RAM).
SRRAM的基本單元，由residual blocks、spatial attention和channel attention組成，用于學習inter-channel and intra-channel dependencies通道間和通道內的依賴關系
(feature reconstruction) similar to SRCNN
DataSets : using randomly cropped $48×4848\times 48$ patches from DIV2K dataset with data augmentation
The filters are of $3×33\times 3$ size with feature maps of 64
The optimizer used is Adam
Loss Function : $l_{1}$ loss
learning rate = $10^{-4}$
在最終的模型中總共使用了64個RAM塊
Multiple-degradation handling networks
in reality, multiple degradations can simultaneously occur
- ZSSR(Zero-Shot Super-Resolution)
  該方法在經典方法的基礎上，利用內部圖像統計信息，利用深度神經網絡對圖像進行超分辨
The ZSSR is trained using a downsampled version of the test image
這里的目的是根據測試圖像生成的LR圖像預測測試圖像
一旦網絡學習了LR測試圖像和測試圖像之間的關系，就會使用相同的網絡以測試圖像為輸入來預測SR圖像
因此，它不需要對特定的退化訓練圖像，并且可以在推理過程中動態地學習特定于圖像的網絡
eight convolutional layers followed by ReLU consisting of 64 channels
Loss Function ： $l_{1}$ loss

SRMD(Super-resolution network for multiple degradations)
takes a concatenated low-resolution image and its degradation maps.

The architecture of SRMD is similar to SRCNN.
(First) a cascade of convolutional layers of $\times 3$ filter size is applied to extracted features, followed by a sequence of Conv, ReLU and Batch normalization layers
(Furthermore)similar to ESPCN，利用卷積運算提取HR子圖像
(final) HR sub-images are transformed to the final single HR output
SRMD直接學習HR圖像，而不是圖像的殘差
a variant called SRMDNF，learns from noise-free degradations
the connections from the first noise-level maps in the convolutional layers are removed
the rest of the architecture is similar to SRMD
The authors trained individual models for each upsampling scale in contrast to the multi-scale training
Loss Function: $l_{1}$ loss
Input : training patches ( $40 \times 40$ )
Layers: The number of convolution layers is fixed to 12, while each layer has 128 feature maps
DataSets: 5,944 images from BSD, DIV2K and Waterloo datasets
initial learning( $10^{-3}$ )， later decreased( $10^{-5}$ )
學習速率降低的標準是基于the error change between successive epochs
SRMD和它的變體都不能打破早期SR網絡如EDSR，MDSR，和CMSC的PSNR記錄
然而，它聯合處理多種降解的能力提供了一種獨特的能力
GAN Models
采用博弈論方法，其中模型由兩個部分組成，即生成器和鑒別器。該生成器生成的SR圖像是鑒別器無法識別是否是真實HR圖像或人工超分辨輸出
這樣就產生了感知質量更好的HR圖像，相應的PSNR值通常會降低（PSNR值越小表示圖像失真越大）（這突出了SR文獻中流行的定量測量方法沒能很好的描述出生成的HR圖像的感知質量）
- SRGAN
  SRGAN提出使用一種對抗目標函數來促使超分辨輸出近乎接近自然圖像。
（highlight）a multi-task loss formulation that consists of three main parts ：
（1）a MSE loss that encodes pixel-wise similarity
（2）a perceptual similarity metric in terms of a distance metric （defined over high-level image representation (e.g., deep network features)）
（3）an adversarial loss
平衡了生成器和鑒別器之間的最小最大博弈(標準GAN目標)
favors outputs that are perceptually similar to the high-dimensional images
To quantify this capability（perceptually similar）, they introduce a new Mean Opinion Score (MOS) which is assigned manually by human raters indicating bad/excellent quality of each super-resolved
image.
SRGAN在感知質量指標上明顯優于競爭對手
competitors：optimize direct data dependent measures (such as pixel-errors)

EnhanceNet
這個網絡設計的重點是在高分辨率的超分辨率圖像中創建如實的紋理細節。

常規圖像質量測量如PSNR的一個關鍵問題是它們不符合圖像的感知質量。這導致過度平滑的圖像沒有銳利的紋理。為了克服這個問題，EnhanceNet 使用了（the regular pixel-level MSE loss）常規像素級MSE損耗之外的另外兩個loss terms ：
（the perceptual loss function）was defined on the intermediate feature representation of a pretrained network in the form of $l_{1}$ distance
（the texture matching loss）用于低分辨率和高分辨率圖像的紋理匹配 , is quantified as the $l_{1}$ loss between gram matrices computed from deep features
整個網絡架構都經過了對抗性訓練，SR網絡的目標是欺騙鑒別器網絡。
EnhanceNet使用的架構基于全卷積網絡和殘差學習原理
他們的結果表明，盡管在只使用(a pixel level loss)像素級損失的情況下可以獲得最佳的PSNR，但是額外的損失項和對抗性訓練機制會產生更實際和感知上更好的輸出
不利的一面是，當超分辨高紋理區域時，所提出的對抗性訓練可能會產生visible artifacts。

SRFeat
another GAN-based Super-Resolution algorithm with Feature Discrimination
這項工作的重點是輸入圖像的真實感，使用一個額外的鑒別器來幫助生成器生成高頻結構特征（是通過鑒別機器生成圖像和真實圖像的特征來實現的），而不是noisy artifacts

該網絡使用了個 $9×99\times9$ 個卷積層來提取特征
使用類似于ResNet 帶有long-range skip connections 的residual blocks，它有 $1×11\times1$ 個卷積
通過（pixel shuffler layers）像素變換層對特征圖進行向上采樣以獲得所需的輸出大小
used 16 residual blocks with two different settings of feature maps i.e. 64 and 128
Loss Function: perceptual (adversarial loss) and pixel-level loss ( $l_{2}$ ) functions
Adam optimizer
Input : The input resolution to the system is $74×7474\times74$ which only outputs $296×296296\times296$ image
120k images from the ImageNet for pre-training the generator
followed by fine-tuning on augmented DIV2K dataset using learning rates of $10^{-4}$ to $10^{-6}$ .

ESRGAN(Enhanced Super-Resolution Generative Adversarial Networks)
在SRGAN的基礎上構建，刪除batch normalization和incorporating dense blocks

Each dense block’s input is also connected to the output of the respective block making a residual connection over each dense block
ESRGAN also has a global residual connection to enforce residual learning
the authors also employ an enhanced discriminator called Relativistic GAN
DataSets:3,450 images from the DIV2K and Flicker2K datasets employing augmentation
Loss Function: 訓練模型 $l_{1}$ loss ，訓練好的模型 perceptual loss
Input ：patch size for training is set to $128×128128\times128$
having a network depth of 23 blocks，Each block contains five convolutional layers, each with 64 feature maps
與RCAN相比，視覺結果相對較好，但在定量測度方面，RCAN表現較好，ESRGAN 存在滯后

實驗評估

Dataset
Set5
Set14
BSD100
Urban100
DIV2K
Manga109
Quantitative Measures
PSNR（peak signal-to-noise ratio）
SSIM（structural similarity index）
Number of parameters
Choice of network loss
卷積神經網絡：
平均絕對誤差 $l_{1}$
均方誤差 MSE $l_{2}$

生成對抗網絡(GANs)：

感知損失（對抗損失）
像素級損失（MSE）
Network Depth
目前這批CNNs正在加入更多的卷積層來構建更深層次的網絡，以提高圖像質量和數量，自SRCNN誕生以來，這一趨勢一直是深度SR的主導趨勢
Skip Connections
這些連接可以分為四種主要類型:全局連接、局部連接、遞歸連接和密集連接

未來方向

Incorporation of Priors
Objective Functions and Metrics
Need for Unified Solutions
Unsupervised Image SR
Higher SR rates
Arbitrary SR rates
Real vs Artificial Degradation

總結

以上是生活随笔為你收集整理的SR综述论文总结的全部內容，希望文章能夠幫你解決所遇到的問題。

如果覺得生活随笔網站內容還不錯，歡迎將生活随笔推薦給好友。

论文
sr