當前位置：首頁 > 编程资源 > 编程问答 >内容正文

编程问答

pytorch贝叶斯网络_贝叶斯神经网络：2个在TensorFlow和Pytorch中完全连接

發布時間：2023/12/15 编程问答 26 豆豆

生活随笔收集整理的這篇文章主要介紹了 pytorch贝叶斯网络_贝叶斯神经网络：2个在TensorFlow和Pytorch中完全连接小編覺得挺不錯的,現在分享給大家,幫大家做個參考.

pytorch貝葉斯網絡

貝葉斯神經網絡 (Bayesian Neural Net)

This chapter continues the series on Bayesian deep learning. In the chapter we’ll explore alternative solutions to conventional dense neural networks. These alternatives will invoke probability distributions over each weight in the neural network resulting in a single model that effectively contains an infinite ensemble of neural networks trained on the same data. We’ll use this knowledge to solve an important problem of our age: how long to boil an egg.

本章繼續介紹貝葉斯深度學習系列。在本章中，我們將探討傳統密集神經網絡的替代解決方案。這些替代方法將調用神經網絡中每個權重的概率分布，從而產生一個有效包含對相同數據訓練的神經網絡的無限集合的單個模型。我們將利用這些知識來解決我們這個時代的一個重要問題：煮雞蛋要多長時間。

本章目標： (Chapter Objectives:)

Become familiar with variational inference with dense Bayesian models
熟悉密集貝葉斯模型的變分推理
Learn how to convert a normal fully connected (dense) neural network to a Bayesian neural network
了解如何將正常的完全連接(密集)神經網絡轉換為貝葉斯神經網絡
Appreciate the advantages and shortcomings of the current implementation
贊賞當前實施的優點和缺點

The data is from an experiment in egg boiling. The boil durations are provided along with the egg’s weight in grams and the finding on cutting it open. Findings are categorised into one of three classes: under cooked, soft-boiled and hard-boiled. We want the egg’s outcome from its weight and boiling time. The problem is insanely simple, so much so that the data is near being linearly separable1?. But not quite, as the egg’s pre-boil life (fridge temperature or cupboard storage at room temperature) aren’t provided and as you’ll see this swings cooking times. Without the missing data we can’t be certain what we’ll find when opening an egg up. Knowing how certain we are we can influence the outcome here as we can with most problems. In this case if relatively confident an egg’s undercooked we’ll cook it more before cracking it open.

數據來自煮雞蛋的實驗。提供煮沸的時間以及雞蛋的重量(克)和切開雞蛋的發現。調查結果分為以下三類之一：未煮熟，軟煮和硬煮。我們想要雞蛋的重量和煮沸時間來獲得結果。這個問題非常簡單，以至于數據幾乎可以線性分離。但事實并非如此，因為沒有提供雞蛋的預煮壽命(冰箱溫度或櫥柜在室溫下存儲)，并且您會看到這會改變烹飪時間。沒有缺少的數據，我們將無法確定打開雞蛋時會發現什么。知道自己的確定性，我們可以像對待大多數問題一樣在這里影響結果。在這種情況下，如果對雞蛋未完全煮熟有相對的把握，我們會在將其打開之前將其煮熟。

Let’s have a look at the data first to see what we’re dealing with. If you want to feel the difference for yourself you can get the data at github.com/DoctorLoop/BayesianDeepLearning/blob/master/egg_times.csv. You’ll need Pandas and Matplotlib for exploring the data. (pip install — upgrade pandas matplotlib) Download the dataset to the same directory you’re working from. From a Jupyter notebook type pwd on its own in a cell to find out where that directory is if unsure.

首先讓我們看一下數據，看看我們正在處理什么。如果您想自己感覺與眾不同，可以在github.com/DoctorLoop/BayesianDeepLearning/blob/master/egg_times.csv上獲取數據。您將需要Pandas和Matplotlib來瀏覽數據。 (pip安裝-升級pandas matplotlib)將數據集下載到您正在使用的目錄中。在Jupyter筆記本中，在一個單元格中單獨鍵入pwd，以查明該目錄在哪里(如果不確定)。

https://gist.github.com/DoctorLoop/5a8633691f912d403e04a663fe02e6aahttps://gist.github.com/DoctorLoop/5a8633691f912d403e04a663fe02e6aa https://gist.github.com/DoctorLoop/21e30bdf16d1f88830666793f0080c63https://gist.github.com/DoctorLoop/21e30bdf16d1f88830666793f0080c63 Figure 2.01 Scatter plot of egg outcomes圖2.01雞蛋結果散布圖

And let’s see it now as a histogram.

現在讓我們將其視為直方圖。

https://gist.github.com/DoctorLoop/2a5e95a68a29315f167e0e875e7fae16https://gist.github.com/DoctorLoop/2a5e95a68a29315f167e0e875e7fae16 Figure 2.02 Histogram of egg times by outcome圖2.02按結果劃分的雞蛋時間直方圖

It seems I wasn’t so good at getting my eggs soft-boiled as I like them so we see a fairly large class imbalance with twice as many underdone instances and three times as many hardboiled instances relative to the soft-boiled lovelies. This class imbalance can spell trouble for conventional neural networks causing them to underperform and an imbalanced class size is a common finding.

看來我不太喜歡自己煮雞蛋，因此我們發現班級失衡相當大，相對于煮熟的熟食來說，欠缺實例的數量是未煮熟實例的兩倍，硬煮實例的數量是三倍。這種類別的不平衡會給常規神經網絡帶來麻煩，導致它們表現不佳，并且類別規模的不平衡是常見的發現。

Note that we’re not setting density to True (False is the default so doesn’t need to be specified) as we’re interested in comparing actual numbers. While if we were comparing probabilities sampled from one of the three random variables, we’d want to set density=True to normalise the histogram summing the data to 1.0.

請注意，我們沒有將密度設置為True(默認值為False，因此無需指定)，因為我們希望比較實際數字。如果要比較從三個隨機變量之一中采樣的概率，則需要設置density = True以將直方圖歸一化以將數據求和為1.0。

Histograms can be troublesome for discretisation of data as we need explicitly specify (or use an algorithm to specify for us) the number of bins to gather values by. Bin size vastly influences how the data appears. As an alternative we can opt to use a kernel density estimate, but it’s often nice when comparing groups as we are here to use a violin plot instead. A violin plot is a hybrid with a box plot showing the smoothed distribution for easy comparison. In python it’s both easier and prettier to do violin plots with the Seaborn library (pip install — upgrade seaborn). But I’ll show it here in matplotlib to be consistent.

直方圖對于數據離散化可能很麻煩，因為我們需要明確指定(或使用算法為我們指定)收集值所依據的倉數。容器大小極大地影響了數據的顯示方式。作為替代方案，我們可以選擇使用核密度估計，但是在比較組時通常很好，因為這里我們改用小提琴圖。小提琴圖是帶盒圖的混合圖，它顯示了平滑的分布以便于比較。在python中，使用Seaborn庫(小提琴安裝-升級seaborn)進行小提琴繪圖既簡單又美觀。但是我將在matplotlib中展示它以保持一致。

https://gist.github.com/DoctorLoop/c5bac12d7b7ebd7758cc83f0fe931fc0https://gist.github.com/DoctorLoop/c5bac12d7b7ebd7758cc83f0fe931fc0 Figure 2.03 Violin plot of egg times by outcome. Central bar indicates mean.圖2.03小提琴將雞蛋乘以時間的結果圖。中心欄表示平均值。

Alright great. Now let’s highlight the architectural differences between neural networks we’re actually here to learn about. We’ll implement a classical dense architecture first then transform it into a powerful Bayesianesque equivalent. We implement the dense model with the base library (either TensorFlow or Pytorch) then we use the add on (TensorFlow-Probability or Pyro) to create the Bayesian version. Unfortunately the code for TensorFlow’s implementation of a dense neural network is very different to that of Pytorch so go to the section for the library you want to use.

很好。現在讓我們重點介紹我們實際上要學習的神經網絡之間的架構差異。我們將首先實現經典的密集架構，然后將其轉換為功能強大的貝葉斯風格。我們使用基礎庫(TensorFlow或Pytorch)實現密集模型，然后使用添加項(TensorFlow-Probability或Pyro)創建貝葉斯版本。不幸的是，用于TensorFlow的密集神經網絡實現的代碼與Pytorch的代碼非常不同，因此請轉到要使用的庫部分。

TensorFlow / TensorFlow概率 (TensorFlow/TensorFlow-Probability)

The source code is available at

源代碼位于

github.com/DoctorLoop/BayesianDeepLearning/

https://gist.github.com/DoctorLoop/3a67fd1197a355f68f45076be0074844https://gist.github.com/DoctorLoop/3a67fd1197a355f68f45076be0074844

We’re using Keras for the implementation as it’s what we need for our later TFP model. We start training by calling the compiled model as shown bellow.

我們正在使用Keras進行實施，因為這是我們以后的TFP模型所需要的。我們通過調用如下所示的編譯模型開始訓練。

https://gist.github.com/DoctorLoop/f6c74509046068ae7ac37efe00d08545https://gist.github.com/DoctorLoop/f6c74509046068ae7ac37efe00d08545

You’ll see TensorBoard is used to keep a record of our training — It’s good to take advantage of TensorBoard as its powerful means of model debugging and one of the key features of TensorFlow. If we want to output training progress beneath the cell during training we can use the Keras output with the verbose parameter of model.fit set to 1. The problem with the Keras output is it’ll print the loss for every epoch. Here we’re training for 800 epochs so we’ll get a long a messy output. It’s therefore nicer to create a custom logger giving control over the frequency of printing the loss. The source code for the custom logger is very simple and available in the notebook at Github. Both the TensorBoard callback (and any custom logger) is passed to the callbacks parameter but this can be ignored if you don’t want to just log every epoch in the notebook. The great thing about Keras is it’ll split the data for validation for us and saving time. Here 10% of the data is used for validation. Some people use 25% of the data for validation but as data is the most important consideration for a neural network and the datasets tend to be fairly large 10% works fine and gives the model a better chance of reaching our training goals. The Github notebook also shows how to use class weights to address the class imbalance we discussed earlier. Finally, ensure validation is infrequent if training time is important as validation is considerably slower than the training itself (especially with a custom logger where approx 10x slower) therefore making training longer.

您會看到TensorBoard用于記錄我們的培訓-很好地利用TensorBoard作為其強大的模型調試工具和TensorFlow的關鍵功能之一。如果我們想在訓練過程中在單元下方輸出訓練進度，我們可以將Keras輸出與model.fit的詳細參數一起使用，設置為1。Keras輸出的問題是它將打印每個時期的損失。在這里，我們正在訓練800個紀元，因此我們會得到很長的混亂輸出。因此，最好是創建一個自定義記錄器，以控制打印損失的頻率。自定義記錄器的源代碼非常簡單，可以在Github的筆記本中找到。 TensorBoard回調(以及任何自定義記錄器)都傳遞給callbacks參數，但是如果您不想只記錄筆記本中的每個紀元，則可以忽略該參數。 Keras的妙處在于它將拆分數據以供我們驗證并節省時間。這里有10％的數據用于驗證。有些人使用25％的數據進行驗證，但由于數據是神經網絡最重要的考慮因素，而數據集往往相當大，因此10％的數據可以很好地工作，并使模型有更好的機會達到我們的訓練目標。 Github筆記本還顯示了如何使用類權重來解決我們前面討論的類不平衡問題。最后，如果培訓時間很重要，請確保不經常進行驗證，因為驗證比培訓本身要慢得多(尤其是使用自定義記錄器，速度大約慢10倍)，因此會使培訓時間更長。

Final loss is around 0.15 with an accuracy of 0.85. If you train for more than 800 epochs after epoch 1200 you’ll see validation accuracy decreasing even though loss is still declining. That’s because we’ve started to overfit to the training data. When we’re overfitting we’re making our model depend on the specific training instances meaning it doesn’t generalise well to our unseen validation data or the intended real-world application of the model2?. Overfitting is the bane of conventional neural networks that often need large datasets and early stopping approaches to mitigate.

最終損失約為0.15，準確度為0.85。如果您在1200期之后進行了800多個培訓，那么即使損失仍在減少，驗證準確性也會下降。這是因為我們已經開始過度擬合訓練數據。當我們過度擬合時，我們將模型建立在特定的訓練實例上，這意味著它不能很好地推廣到我們看不見的驗證數據或模型的預期實際應用中2?。過度擬合是傳統神經網絡的禍根，傳統神經網絡通常需要大型數據集和早期停止方法來緩解。

But we’re here to learn about solutions not problems! Enter stage our Bayesian neural networks which are resistant to overfitting, aren’t particularly bothered about class imbalance and most importantly perform extraordinarily well with very little data.

但是，我們在這里了解解決方案而不是問題！進入我們的抵抗過度擬合的貝葉斯神經網絡，對于類的不平衡并不會特別打擾，最重要的是， 在數據很少的情況下，它表現出色 。

Let’s take a look at a Bayesian implementation of the model.

讓我們看一下該模型的貝葉斯實現。

https://gist.github.com/DoctorLoop/f00c18f591553685e06a79fdbe4b68e0https://gist.github.com/DoctorLoop/f00c18f5915536??85e06a79fdbe4b68e0

It’s very similar to our conventional model except we’re using a flipout layer from TensorFlow-Probability instead. We specify a kernel divergence function which is the Kullback-Leibler divergence mentioned earlier. That’s about it.

它與我們的常規模型非常相似，只不過我們使用的是TensorFlow-Probability中的翻轉層。我們指定一個內核散度函數，它是前面提到的Kullback-Leibler散度。就是這樣

火炬/火 (Pytorch/Pyro)

When comparing a conventional dense model to a Bayesian equivalent Pyro does things differently. With Pyro we always create a conventional model first then upgrade it by adding two new functions to make the conversion. The conventional model is needed to provide a way to automatically sample values from the weight distributions. The sampled values are plugged into the corresponding position on the conventional model to produce an estimate of training progress. You’ll remember from the last chapter how training conditions (trains) the weight distributions. In clear speech it means training shifts the normal distributions and alters their scale to represent each weight. However to make a prediction we still plug in solid single-weight values. While we can’t use the distributions wholesale to make predictions we can take many predictions each with different sampled weight values to approximate the distribution. That’s where the dense model we first build fits in nicely.

當將傳統的稠密模型與貝葉斯等效的Pyro進行比較時，事情會有所不同。使用Pyro，我們總是先創建一個常規模型，然后通過添加兩個新功能進行轉換來升級它。需要傳統模型來提供一種從重量分布中自動采樣值的方法。采樣值被插入到常規模型的相應位置，以生成訓練進度的估計值。您將從上一章中還記得訓練條件 (訓練)如何分配體重。用清晰的言語意味著訓練會改變正態分布并改變其比例以表示每個權重。但是，為了做出預測，我們仍然插入固定的單權重值。盡管我們不能使用分布批發來進行預測，但我們可以采用許多預測，每個預測具有不同的采樣權重值來近似分布。這就是我們最初構建的密集模型的完美體現。

We’ll start then with a class to enclose our dense model. The full source code is available in an online notebook at https://github.com/DoctorLoop/BayesianDeepLearning.

然后，我們從一個類開始以封裝我們的密集模型。完整的源代碼可在https://github.com/DoctorLoop/BayesianDeepLearning的在線筆記本中找到。

https://gist.github.com/DoctorLoop/33b192030388c46d0a2591459b2f6623https://gist.github.com/DoctorLoop/33b192030388c46d0a2591459b2f6623

We’ve setup three layers with only the first two layers utilising an activation function. Then we’ve specified our loss function and optimizer. Be careful to note that for this model we’re utilising the torch.optim.Adam (Pytorch optimizer) rather than the Pyro optimizer. If you try to use Pyro here it’ll throw errors as the parameters are presented differently. Our training loop isn’t anything special if you’ve used Pytorch before. 800 epochs are perfectly excessive, but so what.

我們使用激活功能設置了三層，只有前兩層。然后，我們指定了損失函數和優化器。請注意，對于此模型，我們使用的是torch.optim.Adam(Pytorch優化器)，而不是Pyro優化器。如果您在此處嘗試使用Pyro，由于參數顯示方式不同，將會引發錯誤。如果您以前使用過Pytorch，我們的培訓循環就沒什么特別的了。 800個紀元是完全多余的，但那又如何呢？

https://gist.github.com/DoctorLoop/75854f26bee106d9b596b2ee0544c5d5https://gist.github.com/DoctorLoop/75854f26bee106d9b596b2ee0544c5d5 [Out]:
…
Test accuracy: 0.88
Final loss:0.173 at epoch: 799 and learning rate: 0.001

Pretty good performance here and quick too. If you’re new to Pytorch and have experience elsewhere you might wonder at the lack of softmax. In Pytorch it’s important to note softmax is built into CrossEntropyLoss.

這里的表現相當不錯，而且速度也很快。如果您是Pytorch的新手，并且在其他地方有經驗，那么您可能會懷疑缺少softmax。在Pytorch中，需要注意的是softmax是內置在CrossEntropyLoss中的。

Let’s get on with upgrading it. Here are the two new functions I mentioned, the model and guide.

讓我們繼續進行升級。這是我提到的兩個新功能，即模型和指南。

https://gist.github.com/DoctorLoop/a4dcf6b0f7f1ece43087534ebdfcaa08https://gist.github.com/DoctorLoop/a4dcf6b0f7f1ece43087534ebdfcaa08

We will save intricate discussion about what these function do to another article in the series. Simply put, the model explicitly declares the distributions used for each layer to replace the point values. While the guide declares the variables used to condition (train) those distributions. You’ll notice the functions look similar but on close inspection the model lists the individual distributions for weights and for biases then the guide lists the distributions for the mean and sigma for every weight and bias distribution in the model. That sounds a bit meta — because it is. Save pondering it until later and just have a feel for the training now3.

我們將把關于這些功能的復雜討論保存到本系列的另一篇文章中。簡而言之，模型明確聲明了用于每一層的分布以替換點值。雖然指南聲明了用于條件(訓練)這些分布的變量。您會注意到功能看起來很相似，但仔細檢查后，模型會列出權重和偏差的單個分布，然后指南會列出模型中每個權重和偏差分布的均值和sigma 的分布 。這聽起來有點元-因為是。無需再考慮，直到以后再去體驗一下。3

https://gist.github.com/DoctorLoop/5a3837c384181b391df5927bdd5e2ab5https://gist.github.com/DoctorLoop/5a3837c384181b391df5927bdd5e2ab5

Training this beautiful monster is fast. We arrive at an accuracy slightly higher than the dense model 0.88–0.90 with a loss of 0.18. I’ve implemented a few extra functions here to make training easier (the trainloader and a predict function) and provide the full source in the GitHub notebook.

訓練這個美麗的怪物很快。我們得出的精度略高于密集模型0.88–0.90，損失為0.18。我在這里實現了一些額外的功能，以簡化培訓(trainloader和預測功能)，并在GitHub筆記本中提供完整的源代碼。

貝葉斯深度學習的損失函數 (Loss function for Bayesian deep learning)

We can’t backpropagate through random variables (because by definition they’re random). Therefore we cheat and reparametrize the very distributions and have training update distribution parameters. With this fundamental change we need a different way to calculate training loss.

我們無法通過隨機變量向后傳播(因為根據定義，它們是隨機的)。因此，我們欺騙并重新參數化了非常分布，并具有訓練更新分布參數。有了這一根本性的改變，我們需要一種不同的方法來計算訓練損失。

You may have already known of negative log likelihood that’s used in conventional models. It reflects the probability the data was generated by the model. Don’t worry about what this means now. Instead know that we approximate this loss by taking the average of a large number of samples. But along with our negative log likelihood we combine a new loss leveraging the distribution under calculation. The new loss is Kullback–Leibler divergence (KL divergence) and provides a measure of how different two distributions are from each other. We’ll refer to KL divergence frequently and find it useful in other areas including metrics of uncertainty?. Negative log likelihood is added to the KL divergence to get ELBO loss (expected lower bound on margin likelihood, also known as variational free energy). ELBO loss allows us to approximate distributions during training and benefit hugely in terms of trainability and training time.

您可能已經知道傳統模型中使用的對數可能性為負。它反映了模型生成數據的可能性。現在不用擔心這意味著什么。相反，要知道我們通過取大量樣本的平均值來近似估算此損失。但是，除了負對數可能性外，我們還利用計算中的分布組合了新的損失。新的損失是Kullback-Leibler散度(KL散度)，它提供了兩種分布之間的差異程度的度量。我們將經常提及KL分歧，并發現它在其他領域(包括不確定性指標?)很有用。將負對數似然率添加到KL散度中以獲得ELBO損失(預期的裕度似然性下限，也稱為變異自由能)。 ELBO損失使我們能夠在訓練過程中近似地分配分布，并在可訓練性和訓練時間方面受益匪淺。

Finally, to conclude, let’s glimpse what happens when we make predictions, after all it’s what we’re here for.

最后，總而言之，讓我們瞥見進行預測時會發生什么，畢竟這是我們的目標。

https://gist.github.com/DoctorLoop/41c31bf30339dc1b4b4c7e69a001b9bfhttps://gist.github.com/DoctorLoop/41c31bf303??39dc1b4b4c7e69a001b9bf

We make 5000 samples (this can take a few minutes depending on your computer). With a conventional neural network all these predictions would be exactly the same so it would be a pointless endeavour. But we aren’t using conventional neural networks anymore.

我們制作了5000個樣本(這可能需要幾分鐘的時間，具體取決于您的計算機)。使用傳統的神經網絡，所有這些預測將完全相同，因此這將是毫無意義的嘗試。但是我們不再使用傳統的神經網絡了。

https://gist.github.com/DoctorLoop/1ae0d213d3875dd569d582c810769fc7https://gist.github.com/DoctorLoop/1ae0d213d3875dd569d582c810769fc7 Figure 2.04: Histograms of 5000 samples for 3 test inputs圖2.04： 3個測試輸入的5000個樣本的直方圖

The predict function samples multiple different model versions from the master model we trained with TFP or Pyro. The three test instances we’ve input each result in well defined peaks. The first test instance (red histogram) had a weight of 61.2g and a boiling time of 4.8minutes. Most the time we can see our model predicted it would be a soft-boiled egg, 5% of the time however it predicted an underdone egg and 2% of the time it thought it would be a hard-boiled egg. The model is reasonably confident in this prediction but not as confident as the second test example (green histogram) which almost always predicts a hardboiled result for a 53g egg boiled for 6minutes. How do we achieve consistency from the model if predictions are different each time? We just take the average. If predictions are too variable, i.e. a third in each class we’d not want to make a prediction, except we can act on the information and tell the user that the model is uncertain about the outcome (or ask ourselves if we need work on the model/data some more!) In both cases we gain really powerful information about something a conventional model can’t offer. We can therefore think of a conventional model as always arrogant while our new model is appropriate. It’s cautious, but confident when confidence is correct. We don’t need be a computer-science psychologist to appreciate the preferable model-personality.

預測函數從我們使用TFP或Pyro訓練的主模型中采樣了多個不同的模型版本。我們輸入的三個測試實例的每個結果均定義良好的峰。第一個測試實例(紅色直方圖)的重量為61.2g，沸騰時間為4.8分鐘。在大多數時候，我們可以看到我們的模型預測它是一個煮熟的雞蛋，但有5％的時間預測它是一個煮熟的雞蛋，而有2％的時間則認為它是一個煮熟的雞蛋。該模型對該預測相當有信心，但不如第二個測試示例(綠色直方圖)那么有信心，第二個測試示例幾乎總是可以預測53分鐘煮沸6分鐘的雞蛋的煮熟結果。如果每次預測都不同，我們如何從模型中獲得一致性？我們只取平均值。如果預測變量太大，即每個類別中的三分之一，我們都不想做出預測，除非我們可以根據信息采取行動，并告訴用戶該模型對結果不確定(或者問自己是否需要開展工作)模型/數據更多！)在兩種情況下，我們都能獲得有關常規模型無法提供的強大信息。因此，我們可以認為傳統模型總是傲慢自大，而我們的新模型卻是合適的。這是謹慎的，但在信心正確的時候要有信心。我們不需要成為計算機科學心理學家就可以欣賞可取的模型個性。

摘要 (Summary)

In this chapter we’ve explored the changes in a basic Bayesian model and seen some major advantages accompanying it. We’ve seen how to do this in TensorFlow-Probability and in Pyro. While the model is fully functional, at this stage it isn’t perfect and neither is it truly Bayesian. Whether a ‘truer’ model matters depends on your circumstances. In subsequent articles we’ll include discussion about imperfections like the use of softmax and how we can address them. In the next article our main focus will be on image prediction with Bayesian convolutional deep learning. If my writing looked a little more formal I’d perhaps be more confident of seeing you there!

在本章中，我們探討了基本貝葉斯模型的變化，并看到了一些伴隨它的主要優點。我們已經在TensorFlow-Probability和Pyro中看到了如何執行此操作。盡管該模型具有完整的功能，但在現階段還不是很完美，也不是真正的貝葉斯模型。 “特魯爾”模型是否重要取決于您的情況。在隨后的文章中，我們將討論關于諸如softmax的使用之類的缺陷以及如何解決這些缺陷。在下一篇文章中，我們的主要焦點將是利用貝葉斯卷積深度學習進行圖像預測。如果我的作品看起來更正式一點，我也許會更有信心在那見到你！

1 Therefore it’s somewhat of an abuse to solve the problem with a neural network when simpler models would do fine.

1因此，當較簡單的模型可以解決問題時，使用神經網絡解決問題有點濫用。

2 Unsure what the real-world application is here, a breakfast cafe?

2不確定早餐咖啡館是這里的真實應用程序嗎？

3 If your thinking: Wow the TensorFlow-Probability code looked way simpler it’s because it does the operations for you for now. That said, we can make the Pyro code simpler as well by using a Pyro autoguide. As it suggests this plays the role of the guide for you. But we’re here to learn the wonder of Bayesian deep learning so we need to get exposure at some point!

3如果您的想法：哇，TensorFlow-Probability代碼看起來更簡單了，因為它現在為您執行操作。也就是說，我們可以使用Pyro自動向導來簡化Pyro代碼。正如它所暗示的那樣，它為您扮演了指南的角色。但是我們在這里要學習貝葉斯深度學習的奇跡，因此我們需要在某個時候獲得曝光！

4 Interestingly we can make a simple classifier with reasonable performance by using KL divergence directly, i.e. an image pattern classifier can be built with the local binary pattern algorithm and KL. In this partnership KL compares the distributions of corners and edges between the input image and an image with a known pattern.

4有趣的是，我們可以直接使用KL散度來制作一個具有合理性能的簡單分類器，即可以使用局部二進制模式算法和KL構建圖像模式分類器。在這種伙伴關系中，KL比較輸入圖像和具有已知圖案的圖像之間的角和邊緣分布。

翻譯自: https://towardsdatascience.com/bayesian-neural-networks-2-fully-connected-in-tensorflow-and-pytorch-7bf65fb4697

pytorch貝葉斯網絡

總結

以上是生活随笔為你收集整理的pytorch贝叶斯网络_贝叶斯神经网络：2个在TensorFlow和Pytorch中完全连接的全部內容，希望文章能夠幫你解決所遇到的問題。

如果覺得生活随笔網站內容還不錯，歡迎將生活随笔推薦給好友。

上一篇：我的世界破碎大电影什么时候出
下一篇： deepin中zz_如何解决R中的Fiz