當(dāng)前位置：首頁(yè) > 编程资源 > 编程问答 >内容正文

编程问答

生成高分辨率pdf_用于高分辨率图像合成的生成变分自编码器

發(fā)布時(shí)間：2023/12/15 编程问答 32 豆豆

生活随笔收集整理的這篇文章主要介紹了生成高分辨率pdf_用于高分辨率图像合成的生成变分自编码器小編覺(jué)得挺不錯(cuò)的,現(xiàn)在分享給大家,幫大家做個(gè)參考.

生成高分辨率pdf

This article presents our research on high resolution image generation using Generative Variational Autoencoder.

本文介紹了我們使用生成變分自動(dòng)編碼器進(jìn)行高分辨率圖像生成的研究。

重要事項(xiàng) (Important Points)

Our work addresses the mode collapse issue of GANs and blurred images generated using VAEs in a single model architecture.

我們的工作解決了單一模型架構(gòu)中GAN的模式崩潰問(wèn)題以及使用VAE生成的模糊圖像。

We use the encoder of VAE as it is while replacing the decoder with a discriminator.

我們將VAE編碼器原樣使用，同時(shí)用鑒別符替換解碼器。

The encoder is fed data from a normal distribution while the generator is fed that from a gaussian distribution.

編碼器從正態(tài)分布中饋入數(shù)據(jù)，而生成器從高斯分布中饋入數(shù)據(jù)。

The combination from both is then fed to a discriminator which tells whether the generated images are correct or not.

然后將兩者的組合饋送到鑒別器，該鑒別器告訴所生成的圖像是否正確。

We evaluate our network on 3 different datasets: MNIST, CelebA-HQ and LSUN dataset.

我們?cè)?個(gè)不同的數(shù)據(jù)集上評(píng)估我們的網(wǎng)絡(luò)：MNIST，CelebA-HQ和LSUN數(shù)據(jù)集。

We outperform previous state-of-the-art methods in terms of MMD, SSIM, log likelihood, reconstruction error, ELBO and KL divergence as the evaluation metrics.

在MMD，SSIM，對(duì)數(shù)似然，重構(gòu)誤差，ELBO和KL散度作為評(píng)估指標(biāo)方面，我們的表現(xiàn)優(yōu)于以前的最新方法。

介紹 (Introduction)

The training of deep neural networks requires hundreds or even thousands of images. Lack of labelled datasets especially for medical images often hinders the progress. Hence it becomes imperative to create additional training data. Another area which is actively researched is using generative adversarial networks for image generation. Using this technique, new images can be generated by training on the existing images present in the dataset. The new images are realistic but different from the original data. There are two main approaches of using data augmentation using GANs: image to image translation and sampling from random distribution. The main challenge with GANs is the mode collapse problem i.e. the generated images are quite similar to each other and there is not enough variety in the images generated.

深度神經(jīng)網(wǎng)絡(luò)的訓(xùn)練需要數(shù)百甚至數(shù)千張圖像。缺少特別是醫(yī)學(xué)圖像的標(biāo)記數(shù)據(jù)集通常會(huì)阻礙這一進(jìn)展。因此，必須創(chuàng)建其他訓(xùn)練數(shù)據(jù)。積極研究的另一個(gè)領(lǐng)域是使用生成對(duì)抗網(wǎng)絡(luò)進(jìn)行圖像生成。使用這種技術(shù)，可以通過(guò)對(duì)數(shù)據(jù)集中存在的現(xiàn)有圖像進(jìn)行訓(xùn)練來(lái)生成新圖像。新圖像逼真但與原始數(shù)據(jù)不同。使用GAN進(jìn)行數(shù)據(jù)增強(qiáng)的主要方法有兩種：圖像到圖像的轉(zhuǎn)換和隨機(jī)分布的采樣。 GAN的主要挑戰(zhàn)是模式崩潰問(wèn)題，即生成的圖像彼此非常相似，并且生成的圖像種類(lèi)不足。

Another approach for image generation uses Variational Autoencoders. This architecture contains an encoder which is also known as generative network which takes a latent encoding as input and outputs the parameters for a conditional distribution of the observation. The decoder is also known as an inference network which takes as input an observation and outputs a set of parameters for the conditional distribution of the latent representation. During training VAEs use a concept known as reparameterization trick, in which sampling is done from a gaussian distribution. The main challenge with VAEs is that they are not able to generate sharp images.

圖像生成的另一種方法是使用變分自動(dòng)編碼器。該體系結(jié)構(gòu)包含一個(gè)編碼器，也稱為生成網(wǎng)絡(luò)，它以潛在編碼為輸入并輸出用于條件分布觀測(cè)的參數(shù)。解碼器也稱為推理網(wǎng)絡(luò)，其將觀察值作為輸入并輸出用于潛在表示的條件分布的一組參數(shù)。在訓(xùn)練過(guò)程中，VAE使用一種稱為“重新參數(shù)化技巧”的概念，其中從高斯分布中進(jìn)行采樣。 VAE的主要挑戰(zhàn)是它們無(wú)法生成清晰的圖像。

數(shù)據(jù)集 (Dataset)

The following datasets are used for training and evaluation:

以下數(shù)據(jù)集用于訓(xùn)練和評(píng)估：

MNIST — This is a large dataset of handwritten digits which has been used successfully for training image classification and image processing algorithms. It contains 60,000 training images and 10,000 test images.

MNIST —這是一個(gè)龐大的手寫(xiě)數(shù)字?jǐn)?shù)據(jù)集，已成功地用于訓(xùn)練圖像分類(lèi)和圖像處理算法。它包含60,000個(gè)訓(xùn)練圖像和10,000個(gè)測(cè)試圖像。

LSUN dataset — This dataset contains millions of color images with 10 scene categories and 20 object categories. This is one of the most common datasets for training and testing GAN based neural networks.

LSUN數(shù)據(jù)集—該數(shù)據(jù)集包含數(shù)百萬(wàn)個(gè)具有10個(gè)場(chǎng)景類(lèi)別和20個(gè)對(duì)象類(lèi)別的彩色圖像。這是用于訓(xùn)練和測(cè)試基于GAN的神經(jīng)網(wǎng)絡(luò)的最常見(jiàn)數(shù)據(jù)集之一。

CelebA-HQ dataset -This is a large-scale face attributes dataset with more than 200K celebrity images, each with 40 attribute annotations. This is also one of the most common datasets for training and testing GAN based neural networks.

CelebA-HQ數(shù)據(jù)集-這是一個(gè)大規(guī)模的面部屬性數(shù)據(jù)集，其中包含200,000多張名人圖像，每張圖像都有40個(gè)屬性注釋。這也是用于訓(xùn)練和測(cè)試基于GAN的神經(jīng)網(wǎng)絡(luò)的最常見(jiàn)數(shù)據(jù)集之一。

VAE與我們的網(wǎng)絡(luò) (VAE vs Ours Network)

We show how instead of inference made in the way shown in original VAE architecture, we can add the error vector to the original data and multiply by standard distribution. The new term goes to the encoder and gets converted to the latent space. In the decoder, similarly the error vector gets added to the latent vector and multiplied by standard deviation. In this manner, we use the encoder of VAE in a manner similar to that in the original VAE. While we replace the decoder with a discriminator and hence change the loss function accordingly. The comparison between model architectures of VAE and our architecture is shown in Fig 1.

我們展示了如何代替原始VAE體系結(jié)構(gòu)中所示的方式進(jìn)行推理，而是可以將誤差矢量添加到原始數(shù)據(jù)并乘以標(biāo)準(zhǔn)分布。新術(shù)語(yǔ)進(jìn)入編碼器并轉(zhuǎn)換為潛在空間。在解碼器中，類(lèi)似地，將誤差矢量添加到潛矢量，并乘以標(biāo)準(zhǔn)偏差。以這種方式，我們以類(lèi)似于原始VAE的方式使用VAE的編碼器。雖然我們用鑒別器代替了解碼器，因此相應(yīng)地改變了損失函數(shù)。 VAE的模型架構(gòu)與我們的架構(gòu)之間的比較如圖1所示。

Figure 1: Comparison between standard VAE and our network where e1 and e2 denote samples from some noise distribution, x denotes image vector, z denotes latent space vector, f and g denotes encoder and decoder functions respectively and +, ? denotes addition and concat operators.圖1：標(biāo)準(zhǔn)VAE與我們的網(wǎng)絡(luò)之間的比較，其中e1和e2表示來(lái)自某些噪聲分布的樣本，x表示圖像矢量，z表示潛在空間矢量，f和g分別表示編碼器和解碼器函數(shù)，+，*表示加法和concat運(yùn)算符。

Our architecture can be seen both as an extension of VAE as well as that of GAN. Reasoning it as the former is easy as this requires a change in loss function for decoder, while the latter can be made by recalling the fact that GAN essentially works on the concept of zero sum game maintaining Nash Equilibrium between the generator and discriminator. In our case, both the encoder from VAE and discriminator from GAN are playing zero sum game and are competing with each other. As the training proceeds, the loss decreases in both the cases until it stabilizes.

我們的架構(gòu)既可以看作是VAE的擴(kuò)展，也可以看作是GAN的擴(kuò)展。將其推理為前者很容易，因?yàn)檫@需要更改解碼器的損失函數(shù)，而后者可以通過(guò)回顧GAN實(shí)質(zhì)上是在零和博弈的概念上起作用，以保持生成器與鑒別器之間的納什均衡這一事實(shí)來(lái)實(shí)現(xiàn)。在我們的案例中，VAE的編碼器和GAN的鑒別器都在玩零和游戲，并且彼此競(jìng)爭(zhēng)。隨著訓(xùn)練的進(jìn)行，兩種情況下的損失都會(huì)減少，直到穩(wěn)定為止。

網(wǎng)絡(luò)架構(gòu) (Network Architecture)

The network architecture used in this work is explained in the below points:

以下幾點(diǎn)解釋了此工作中使用的網(wǎng)絡(luò)體系結(jié)構(gòu)：

The discriminator and encoder networks have four convolution layers, each of which uses 3×3 filters.

鑒別器和編碼器網(wǎng)絡(luò)具有四個(gè)卷積層，每個(gè)卷積層都使用3×3濾波器。

We use Batch Normalization and Leaky Rectified Linear Unit (LeakyReLU) layers after each layer.

我們?cè)诿繉又笫褂门鷼w一化和泄漏校正線性單位(LeakyReLU)層。

In training, we found that our architecture suffers from instability during training. This was solved using WGAN loss function which measures Wasserstein distance between two distributions.

在訓(xùn)練中，我們發(fā)現(xiàn)我們的體系結(jié)構(gòu)在訓(xùn)練過(guò)程中遭受不穩(wěn)定的困擾。這是使用WGAN損失函數(shù)解決的，該函數(shù)測(cè)量?jī)蓚€(gè)分布之間的Wasserstein距離。

We used the gradient penalty term to stabilize the training.

我們使用梯度懲罰項(xiàng)來(lái)穩(wěn)定訓(xùn)練。

Our loss function has a total for 3 terms. While training, the encoder and the generator are considered as one network. Thus, we sum up the loss functions of the two networks in the order encoder-generator, discriminator as one and train the networks.

我們的損失函數(shù)總共有3個(gè)條件。訓(xùn)練時(shí)，編碼器和生成器被視為一個(gè)網(wǎng)絡(luò)。因此，我們將兩個(gè)網(wǎng)絡(luò)的損失函數(shù)以編碼器-生成器，鑒別器的階數(shù)作為一個(gè)總和進(jìn)行訓(xùn)練。

Two latent vectors are sampled one from normal distribution and the other from gaussian distribution. The one from normal distribution is fed to the encoder while the one from gaussian distribution is fed to the generator.

采樣兩個(gè)潛在向量，一個(gè)從正態(tài)分布中采樣，另一個(gè)從高斯分布中采樣。來(lái)自正態(tài)分布的一個(gè)饋給編碼器，而來(lái)自高斯分布的一個(gè)饋給發(fā)電機(jī)。

The outputs from both the vectors are in turn fed to the discriminator to tell whether the generated image is real or not.

來(lái)自兩個(gè)向量的輸出又被饋送到鑒別器以判斷所生成的圖像是否真實(shí)。

Our network architecture is shown in Fig 2.

我們的網(wǎng)絡(luò)架構(gòu)如圖2所示。

Figure 2: Our network architecture圖2：我們的網(wǎng)絡(luò)架構(gòu)

建筑細(xì)節(jié) (Architecture Details)

The generator and discriminator layerwise architecture details is shown in Table 1 and Table 2 respectively. We denoted ResNet block as consisting of the following layers — convolutional, max pooling layer, 30 percent dropouts in between the layers and batch normalization layer.

生成器和鑒別器分層體系結(jié)構(gòu)的詳細(xì)信息分別顯示在表1和表2中。我們將ResNet塊表示為由以下幾層組成-卷積，最大池化層，各層與批處理規(guī)范化層之間的30％的失落。

算法 (Algorithm)

The algorithm used in this work is trained using Stochastic Gradient Descent (SGD) as shown below:

這項(xiàng)工作中使用的算法是使用隨機(jī)梯度下降(SGD)進(jìn)行訓(xùn)練的，如下所示：

實(shí)驗(yàn) (Experiments)

All the generated samples are generator outputs from random latent vectors. We normalize all data into the range [-1, 1] and use two evaluation metrics to measure the performance of our network. First of them measures the distribution distance between the real and generated samples with maximum mean discrepancy (MMD) scores. The second metric evaluates the generation diversity with multi-scale structural similarity metric (MS-SSIM). Table 4. compares MMD and MS-SSIM scores with previous state of the art architectures.

所有生成的樣本都是隨機(jī)潛矢量的生成器輸出。我們將所有數(shù)據(jù)歸一化為[-1，1]范圍，并使用兩個(gè)評(píng)估指標(biāo)來(lái)衡量我們網(wǎng)絡(luò)的性能。首先，它們以最大平均差異(MMD)分?jǐn)?shù)測(cè)量實(shí)際樣本與生成的樣本之間的分布距離。第二個(gè)指標(biāo)使用多尺度結(jié)構(gòu)相似性指標(biāo)(MS-SSIM)評(píng)估世代多樣性。表4.將MMD和MS-SSIM得分與先前的最新體系結(jié)構(gòu)進(jìn)行了比較。

We noticed the model with a small latent vector size of 100 suffers from severe mode collapse. The best results can be obtained using a moderately large latent vector size. Table 5 compares the effect of different latent variable sizes on the MMD and MS-SSIM scores respectively.

我們注意到，較小的潛在矢量大小為100的模型會(huì)遭受?chē)?yán)重的模式崩潰。使用適度大的潛在向量大小可以獲得最佳結(jié)果。表5比較了不同潛在變量大小分別對(duì)MMD和MS-SSIM分?jǐn)?shù)的影響。

As can be seen, latent variable size with value 1000 produces the best results of those being compared. Both at low and high latent variable size mode collapse is seen which is one of the main challenges faced while training GANs.

可以看出，值1000的潛在變量大小產(chǎn)生了被比較的最佳結(jié)果。在低潛變量和高潛變量模式下都可以看到崩潰，這是訓(xùn)練GAN時(shí)面臨的主要挑戰(zhàn)之一。

Four common evaluation metrics have been used in the literature for testing the performance of generative models. These are log-likelihood, reconstruction error, ELBO and KL divergence.

文獻(xiàn)中已使用四種常見(jiàn)的評(píng)估指標(biāo)來(lái)測(cè)試生成模型的性能。這些是對(duì)數(shù)似然，重構(gòu)誤差，ELBO和KL差異。

The log-likelihood is calculated by finding the parameter that maximizes the log-likelihood of the observed sample. The reconstruction error is the distance between the original data point and its projection onto a lower-dimensional subspace. The optimization problem used in our model uses KL divergence error which is intractable hence we maximize ELBO instead of minimizing the KL divergence. KL divergence is a measure of how similar the generated probability distribution is to the true probability distribution. The comparison using these evaluation metrics for our model on MNIST dataset with the original VAE architecture is shown in Table 6.

通過(guò)找到使所觀察樣品的對(duì)數(shù)似然性最大的參數(shù)來(lái)計(jì)算對(duì)數(shù)似然性。重建誤差是原始數(shù)據(jù)點(diǎn)與其在低維子空間上的投影之間的距離。我們模型中使用的優(yōu)化問(wèn)題使用了KL散度誤差，這是很難解決的，因此我們將ELBO最大化而不是將KL散度最小化。 KL散度是衡量所生成的概率分布與真實(shí)概率分布的相似程度的度量。表6顯示了使用這些評(píng)估指標(biāo)對(duì)我們的模型在MNIST數(shù)據(jù)集與原始VAE體系結(jié)構(gòu)上進(jìn)行的比較。

We compare our log probability distribution value with those obtained by previous state of the art methods which is shown in Table 7. The log probability distribution is an important evaluation metric in the sense that it shows the diversity of the samples generated.

我們將對(duì)數(shù)概率分布值與通過(guò)表7所示的現(xiàn)有技術(shù)方法獲得的對(duì)數(shù)概率分布值進(jìn)行比較。就對(duì)數(shù)概率分布而言，它顯示了所生成樣本的多樣性，這是一項(xiàng)重要的評(píng)估指標(biāo)。

結(jié)果 (Results)

We present the generated images on all the 3 datasets used for testing. The images were trained for 1000 iterations. The images generated using the CELEBA-HQ dataset is shown in Fig 3.

我們?cè)谟糜跍y(cè)試的所有3個(gè)數(shù)據(jù)集上展示生成的圖像。對(duì)圖像進(jìn)行了1000次迭代訓(xùn)練。使用CELEBA-HQ數(shù)據(jù)集生成的圖像如圖3所示。

Figure 3: 1024 × 1024 images generated using the CELEBA-HQ dataset.圖3：使用CELEBA-HQ數(shù)據(jù)集生成的1024×1024圖像。

The images generated using the LSUN BEDROOM dataset is shown in Fig 4.

使用LSUN BEDROOM數(shù)據(jù)集生成的圖像如圖4所示。

Figure 4: 256 × 256 images generated using LSUN BEDROOM dataset圖4：使用LSUN BEDROOM數(shù)據(jù)集生成的256×256圖像

The images generated from different LSUN categories is shown in Fig 5.

從不同的LSUN類(lèi)別生成的圖像如圖5所示。

Figure 5: Sample 256 × 256 images generated from different LSUN categories圖5：從不同的LSUN類(lèi)別生成的示例256×256圖像

We compare our results with previous state of the art networks on MNIST dataset in Fig 6.

我們將結(jié)果與圖6中MNIST數(shù)據(jù)集上的現(xiàn)有技術(shù)網(wǎng)絡(luò)進(jìn)行了比較。

Figure 6: Generated MNIST images a) GAN b) WGAN c) VAE d) GVAE圖6：生成的MNIST圖像a)GAN b)WGAN c)VAE d)GVAE

結(jié)論 (Conclusions)

In this blog, we presented a new training procedure for Variational Autoencoders based on generative models. This allows us to make the inference model much more flexible, allowing it to represent almost any posterior distributions over the latent variables. Our network was trained and tested on 3 publicly available datasets. On evaluating using MMD, SSIM, log likelihood, reconstruction error, ELBO and KL divergence as the evaluation metrics, our network beats the previous state of the art algorithms. Using generative model approaches to generate additional training data especially in fields like medical imaging could be revolutionary as there is a shortage of medical data for training deep convolutional neural network architectures.

在此博客中，我們介紹了基于生成模型的變分自動(dòng)編碼器的新訓(xùn)練程序。這使我們可以使推理模型更加靈活，從而可以表示潛在變量上的幾乎任何后驗(yàn)分布。我們的網(wǎng)絡(luò)在3個(gè)公開(kāi)可用的數(shù)據(jù)集上進(jìn)行了培訓(xùn)和測(cè)試。在使用MMD，SSIM，對(duì)數(shù)似然，重構(gòu)誤差，ELBO和KL散度作為評(píng)估指標(biāo)進(jìn)行評(píng)估時(shí)，我們的網(wǎng)絡(luò)擊敗了現(xiàn)有算法。使用生成模型方法生成額外的訓(xùn)練數(shù)據(jù)，尤其是在醫(yī)學(xué)成像等領(lǐng)域，可能是革命性的，因?yàn)槿狈τ糜谟?xùn)練深度卷積神經(jīng)網(wǎng)絡(luò)架構(gòu)的醫(yī)學(xué)數(shù)據(jù)。

翻譯自: https://towardsdatascience.com/generative-variational-autoencoder-for-high-resolution-image-synthesis-48dd98d4dcc2

生成高分辨率pdf

總結(jié)

以上是生活随笔為你收集整理的生成高分辨率pdf_用于高分辨率图像合成的生成变分自编码器的全部?jī)?nèi)容，希望文章能夠幫你解決所遇到的問(wèn)題。

如果覺(jué)得生活随笔網(wǎng)站內(nèi)容還不錯(cuò)，歡迎將生活随笔推薦給好友。

上一篇：目标检测 dcn v2_使用Detect
下一篇：神经网络激活函数对数函数_神经网络中的激