當(dāng)前位置：首頁 > 编程资源 > 编程问答 >内容正文

编程问答

逻辑回归概率回归_概率规划的多逻辑回归

發(fā)布時(shí)間：2023/11/29 编程问答 35 豆豆

生活随笔收集整理的這篇文章主要介紹了逻辑回归概率回归_概率规划的多逻辑回归小編覺得挺不錯(cuò)的,現(xiàn)在分享給大家,幫大家做個(gè)參考.

邏輯回歸概率回歸

There is an interesting dichotomy in the world of data science between machine learning practitioners (increasingly synonymous with deep learning practitioners), and classical statisticians (both Frequentists and Bayesians). There is generally no overlap between the techniques used in these two camps. However, there are some interesting tools and libraries that are trying to bridge the gap between the two camps, especially using Bayesian inference techniques to estimate the uncertainty of deep learning models. See this post and this paper to know more about the historical and recent trends in this exciting new area. The biggest benefit to adopting Bayesian thinking is it forces us to explicitly layout all the assumptions that go into the model. It is hard to perform Bayesian inference without fully being aware of all the modeling choices throughout the way. The biggest downside to Bayesian inference is the time needed to run even moderately sized models.

在機(jī)器學(xué)習(xí)從業(yè)者(越來越多地與深度學(xué)習(xí)從業(yè)者同義)與古典統(tǒng)計(jì)學(xué)家(包括頻率論者和貝葉斯主義者)之間，數(shù)據(jù)科學(xué)領(lǐng)域存在著一種有趣的二分法。在這兩個(gè)陣營中使用的技術(shù)之間通常沒有重疊。但是，有一些有趣的工具和庫正試圖彌合兩個(gè)陣營之間的鴻溝，尤其是使用貝葉斯推理技術(shù)來估計(jì)深度學(xué)習(xí)模型的不確定性。請(qǐng)參閱這篇文章和本文，以了解有關(guān)這個(gè)令人興奮的新領(lǐng)域的歷史和最近趨勢(shì)的更多信息。采用貝葉斯思想的最大好處是，它迫使我們明確設(shè)計(jì)模型中的所有假設(shè)。在沒有完全了解整個(gè)過程中所有建模選擇的情況下，很難執(zhí)行貝葉斯推理。貝葉斯推斷最大的缺點(diǎn)是運(yùn)行中等大小的模型所需的時(shí)間。

There are several probabilistic programming languages/frameworks out there that are becoming more popular due to the recent advances in computing hardware. The most common and mature language is Stan which has APIs to work with other common programming languages like Python (PyStan) and R (RStan). There are also some newer players in the field like PyMC3 (Theano), Pyro (PyTorch), and Turing (Julia). Of these, Turing, written in Julia potentially seems to be an interesting option. It brings with it all the advantages of Julia, and combining it with Flux can theoretically make it “easy” to estimate the uncertainties of any deep learning model.

由于計(jì)算硬件的最新發(fā)展，有幾種概率性編程語言/框架正在變得越來越流行。最常見和最成熟的語言是Stan，它具有與其他常見編程語言(例如Python( PyStan )和R( RStan ))一起使用的API。該領(lǐng)域中還有一些較新的玩家，例如PyMC3 (Theano)， Pyro (PyTorch)和Turing (Julia)。其中，用Julia(Julia)編寫的圖靈似乎是一個(gè)有趣的選擇。它帶來了Julia的所有優(yōu)點(diǎn) ，并且將其與Flux結(jié)合使用在理論上可以很輕松地估計(jì)任何深度學(xué)習(xí)模型的不確定性。

There are some amazing books to get you up and running with Bayesian data analysis and the bible in the field is definitely the book by the great Andrew Gelman. He also writes short articles/opinions on his blog which is worth following. I personally think the book “Statistical Rethinking” by Richard McElreath is the best introduction to the field for any newcomer. He walks you from the garden of forking paths all the way to multi-level models. He even has his entertaining and engaging lectures up on Youtube! No reason not to get your daily dose of Bayesian 😄

有一些很棒的書可以幫助您開始使用貝葉斯數(shù)據(jù)分析，并且該領(lǐng)域的圣經(jīng)絕對(duì)是偉大的安德魯·蓋爾曼(Andrew Gelman) 所著的書。他還在自己的博客上寫了一些簡短的文章/觀點(diǎn)，值得關(guān)注。我個(gè)人認(rèn)為，Richard McElreath撰寫的“ Statistical Rethinking”一書對(duì)于任何新手來說都是該領(lǐng)域的最佳介紹。他會(huì)帶您從分叉路徑的花園一直到多層模型。他甚至在YouTube上進(jìn)行有趣而有趣的演講！沒有理由不每天服用貝葉斯😄

In this blog post, I just wanted to get my feet wet with Julia and Turing. I will use both PyStan and Turing to build multi-category logistic models to predict the species of penguins based on their features like bill-length, island, sex, etc. This is similar to the more popular Iris dataset that is used so commonly in data science tutorials. For more details on the Palmer penguin dataset see here.

在這篇博客中，我只是想和Julia和Turing在一起。我將同時(shí)使用PyStan和Turing來建立多類別的物流模型，根據(jù)帳單長度，島嶼，性別等特征來預(yù)測企鵝的種類。這類似于在Iso中常用的更流行的Iris數(shù)據(jù)集。數(shù)據(jù)科學(xué)教程。有關(guān)Palmer企鵝數(shù)據(jù)集的更多詳細(xì)信息，請(qǐng)參見此處。

y斯坦 (PyStan)

First, let's use PyStan to build a multi-logit model. Code for the Stan model looks like this:

首先，讓我們使用PyStan構(gòu)建多登錄模型。 Stan模型的代碼如下所示：

data {
int N; //the number of training observations
int N2; //the number of test observations
int D; //the number of features
int K; //the number of classes
int y[N]; //the response
matrix[N,D] x; //the model matrix
matrix[N2,D] x_new; //the matrix for the predicted values
}
parameters {
matrix[D,K] beta; //the regression parameters
}
model {
matrix[N, K] x_beta = x * beta;
to_vector(beta) ~ normal(0, 1);
for (n in 1:N)
y[n] ~ categorical_logit(x_beta[n]');
}

This is exactly similar to the example in Stan’s documentation. We are using a standard normal prior on all parameters. In the case of our penguin dataset, we have a total of 9 different features; four of them are continuous features namely bill-length, bill-depth, flipper-length, and body-mass, and 5 are one-hot encoded features for the island and sex categorical values. Therefore, the number of parameters to estimate is 9 per category. Since we have 3 categories, that would be a total of 27 parameters to estimate. For each category, the sum of the coefficients and the feature values are calculated:

這與Stan文檔中的示例完全相似。我們?cè)谒袇?shù)上都使用標(biāo)準(zhǔn)普通優(yōu)先級(jí)。就我們的企鵝數(shù)據(jù)集而言，我們共有9種不同的功能；其中四個(gè)是連續(xù)特征，即鈔票長度，鈔票深度，鰭狀肢長度和身體質(zhì)量，另外五個(gè)是島和性別分類值的一鍵編碼特征。因此，每個(gè)類別要估計(jì)的參數(shù)數(shù)量為9。由于我們有3個(gè)類別，因此總共需要估算27個(gè)參數(shù)。對(duì)于每個(gè)類別，計(jì)算系數(shù)和特征值的總和：

The final category for each data point is computed using softmax:

使用softmax計(jì)算每個(gè)數(shù)據(jù)點(diǎn)的最終類別：

We could have also let the parameters for one category to be all zeros, and only estimate the remaining 9*2 parameters. This is the same idea as the binary classification models, where we only have one coefficient present:

我們也可以讓一個(gè)類別的參數(shù)全為零，而僅估計(jì)剩余的9 * 2參數(shù)。這與二進(jìn)制分類模型的想法相同，在二進(jìn)制分類模型中，我們只有一個(gè)系數(shù)：

I will show how that looks like when we get to the Julia code using the Turing library

我將展示使用圖靈庫訪問Julia代碼時(shí)的情況

Now we have the model ready, let's go ahead and perform sampling to get the posteriors for all the parameters:

現(xiàn)在我們已經(jīng)準(zhǔn)備好模型，讓我們繼續(xù)進(jìn)行采樣以獲取所有參數(shù)的后驗(yàn)：

These are the parameters for Sampling:

這些是用于采樣的參數(shù)：

Algorithm: No-U-Turn Sampler (NUTS)

算法：禁止掉頭采樣器(NUTS)

Warmup: 500 iterations

預(yù)熱：500次迭代

Samples: 500 iterations

樣本：500次迭代

Chains: 4

鏈數(shù)：4

Max Tree Depth: 10

最大樹深：10

Time elapsed per chain: ~140 seconds

每條鏈經(jīng)過的時(shí)間：?140秒

The posterior distributions for some parameters and their corresponding trace plots for 500 iterations. The samples are too unstable to be reliable對(duì)于500次迭代，某些參數(shù)的后驗(yàn)分布及其對(duì)應(yīng)的跡線圖。樣品太不穩(wěn)定而不能可靠

The chains show poor mixing and stability, and the recommendation from Stan is to go higher with the max tree depth for the NUTS sampler to get better stability between and across chains

鏈條顯示出不良的混合和穩(wěn)定性，Stan的建議是增加NUTS采樣器的最大樹深度，以在鏈條之間和跨鏈獲得更好的穩(wěn)定性。

Summary of samples for some parameters. Rhat is definitely too high for the samples to be useful一些參數(shù)的樣本摘要。 Rhat絕對(duì)太高，無法用于樣本

The poor stability of the chains is also reflected in the number of effective samples (n_eff), which is quite low for some parameters. The Rhat is significantly above the recommended value of 1.05 for most parameters.

鏈的不良穩(wěn)定性還反映在有效樣本數(shù)(n_eff)中，對(duì)于某些參數(shù)而言，該數(shù)目非常低。對(duì)于大多數(shù)參數(shù)，Rhat明顯高于建議值1.05。

In general though, this is not generally an issue for most cases and the samples are usable as is shown below for predicting the train and test set classes

通常，在大多數(shù)情況下，這通常不是問題，并且可以使用樣本，如下所示，用于預(yù)測訓(xùn)練和測試集的類別

Training set predictions訓(xùn)練集預(yù)測 Test set predictions測試集預(yù)測

Now, lets increase the maximum tree depth for the NUTS sample from 10 to 12. This increases the time taken for each chain to converge

現(xiàn)在，讓NUTS樣本的最大樹深度從10增加到12。這增加了每個(gè)鏈?zhǔn)諗克璧臅r(shí)間。

Max Tree Depth: 12

最大樹深：12

Time elapsed per chain: ~570 seconds

每條鏈經(jīng)過的時(shí)間：?570秒

The posterior distributions for some parameters and their corresponding trace plots for 500 iterations500次迭代的某些參數(shù)的后驗(yàn)分布及其對(duì)應(yīng)的軌跡圖

The chains show much better mixing and stability, and we could still go higher with the max tree depth for the NUTS sampler to get better stability between and across chains

鏈條顯示出更好的混合和穩(wěn)定性，對(duì)于NUTS采樣器，我們?nèi)匀豢梢允褂米畲髽渖疃葋硖岣哝湕l之間和跨鏈條的穩(wěn)定性。

Summary of samples for some parameters. Rhat is on the higher end一些參數(shù)的樣本摘要。 Rhat在高端

As we can see, the number of effective samples (n_eff) has also increased considerably for some parameters, and the Rhat is approaching the recommended value of 1.05 for some parameters. These samples as expected provide good classification predictions

如我們所見，某些參數(shù)的有效樣本數(shù)(n_eff)也大大增加，Rhat接近某些參數(shù)的建議值1.05。預(yù)期這些樣本提供了良好的分類預(yù)測

Training set predicitons訓(xùn)練集謂詞 Test set predictions測試集預(yù)測

Increasing the max tree depth further to 15 significantly improves the chain stability (data not shown) but also increases the computational time ~25 fold.

將最大樹深度進(jìn)一步增加到15，可以顯著改善鏈的穩(wěn)定性(數(shù)據(jù)未顯示)，但還會(huì)增加約25倍的計(jì)算時(shí)間。

The code for running the above models is here. For the full project that includes setup for AWS, Sagemaker, and XGBoost models refer to my earlier blog post and Github repo.

運(yùn)行上述模型的代碼在這里。有關(guān)包含適用于AWS，Sagemaker和XGBoost模型的設(shè)置的完整項(xiàng)目，請(qǐng)參閱我先前的博客文章和Github repo 。

Julia： (Julia:)

Now, I will show you the equivalent model using Julia and Turing. The code can be found here in the main project repo. The model is defined like so:

現(xiàn)在，我將向您展示使用Julia和Turing的等效模型。該代碼可以發(fā)現(xiàn)這里的主要項(xiàng)目回購。該模型的定義如下：

@model logistic_regression(x, y, n, σ) = begin
intercept_Adelie ~ Normal(0, σ)
intercept_Gentoo ~ Normal(0, σ)
intercept_Chinstrap ~ Normal(0, σ) bill_length_mm_Adelie ~ Normal(0, σ)
bill_length_mm_Gentoo ~ Normal(0, σ)
bill_length_mm_Chinstrap ~ Normal(0, σ) bill_depth_mm_Adelie ~ Normal(0, σ)
bill_depth_mm_Gentoo ~ Normal(0, σ)
bill_depth_mm_Chinstrap ~ Normal(0, σ) flipper_length_mm_Adelie ~ Normal(0, σ)
flipper_length_mm_Gentoo ~ Normal(0, σ)
flipper_length_mm_Chinstrap ~ Normal(0, σ) body_mass_g_Adelie ~ Normal(0, σ)
body_mass_g_Gentoo ~ Normal(0, σ)
body_mass_g_Chinstrap ~ Normal(0, σ) island_Biscoe_Adelie ~ Normal(0, σ)
island_Biscoe_Gentoo ~ Normal(0, σ)
island_Biscoe_Chinstrap ~ Normal(0, σ)
island_Dream_Adelie ~ Normal(0, σ)
island_Dream_Gentoo ~ Normal(0, σ)
island_Dream_Chinstrap ~ Normal(0, σ)
island_Torgersen_Adelie ~ Normal(0, σ)
island_Torgersen_Gentoo ~ Normal(0, σ)
island_Torgersen_Chinstrap ~ Normal(0, σ) sex_female_Adelie ~ Normal(0, σ)
sex_female_Gentoo ~ Normal(0, σ)
sex_female_Chinstrap ~ Normal(0, σ)
sex_male_Adelie ~ Normal(0, σ)
sex_male_Gentoo ~ Normal(0, σ)
sex_male_Chinstrap ~ Normal(0, σ)for i = 1:n
v = softmax([intercept_Adelie +
bill_length_mm_Adelie*x[i, 1] +
bill_depth_mm_Adelie*x[i, 2] +
flipper_length_mm_Adelie*x[i, 3] +
body_mass_g_Adelie*x[i, 4] +
island_Biscoe_Adelie*x[i, 5] +
island_Dream_Adelie*x[i, 6] +
island_Torgersen_Adelie*x[i, 7] +
sex_female_Adelie*x[i,8] +
sex_male_Adelie*x[i,9],
intercept_Gentoo +
bill_length_mm_Gentoo*x[i, 1] +
bill_depth_mm_Gentoo*x[i, 2] +
flipper_length_mm_Gentoo*x[i, 3] +
body_mass_g_Gentoo*x[i, 4] +
island_Biscoe_Gentoo*x[i, 5] +
island_Dream_Gentoo*x[i, 6] +
island_Torgersen_Gentoo*x[i, 7] +
sex_female_Gentoo*x[i,8] +
sex_male_Gentoo*x[i,9],
intercept_Chinstrap + bill_length_mm_Chinstrap*x[i, 1] +
bill_depth_mm_Chinstrap*x[i, 2] +
flipper_length_mm_Chinstrap*x[i, 3] +
body_mass_g_Chinstrap*x[i, 4] +
island_Biscoe_Chinstrap*x[i, 5] +
island_Dream_Chinstrap*x[i, 6] +
island_Torgersen_Chinstrap*x[i, 7] +
sex_female_Chinstrap*x[i,8] +
sex_male_Chinstrap*x[i,9]])
y[i, :] ~ Multinomial(1, v)
end
end;

I used the default HMC sampler as recommended in the Turing tutorial. One thing that I noticed is the much better stability of the chains when using the HMC sampler from Turing:

我使用了圖靈教程中推薦的默認(rèn)HMC采樣器。我注意到的一件事是，使用圖靈的HMC采樣器時(shí)，鏈條的穩(wěn)定性更好：

The posterior distributions for some parameters and their corresponding trace plots for 1000 iterations一些參數(shù)的后驗(yàn)分布及其對(duì)應(yīng)的1000次迭代軌跡圖

And the summary of the samples:

以及樣本摘要：

The r_hat values look betterr_hat值看起來更好

Overall, the HMC samples from Turing seem to do a lot better compared to the NUTS samples from PyStan. Of course, this is not an apples-to-apples comparison, but these are interesting results. In addition, the HMC sampler also was much faster compared to the max_tree_depth=12 run from PyStan shown above. This is something to dig into more.

總體而言，與來自PyStan的NUTS樣本相比，來自Turing的HMC樣本似乎要好得多。當(dāng)然，這不是一個(gè)蘋果對(duì)蘋果的比較，但這是有趣的結(jié)果。此外，與上面顯示的PyStan運(yùn)行的max_tree_depth = 12相比，HMC采樣器還快得多。這是需要進(jìn)一步研究的東西。

The predictions from Turing are perfect on both the Training and Test sets as expected since this is an easy prediction problem.

Turing的預(yù)測在訓(xùn)練和測試集上都是理想的，因?yàn)檫@是一個(gè)容易預(yù)測的問題。

In conclusion, I like Julia and Turing so far! Another great (and fast) tool for Probabilistic Programming!

總之，到目前為止，我喜歡Julia和圖靈！概率編程的另一個(gè)出色(快速)工具！

Some good things:

一些好東西：

Turing is fast! (at least in this example with default samplers)

圖靈快！ (至少在此示例中使用默認(rèn)采樣器)

1-based indexing for Julia and Turing vs Python’s 0-based indexing which makes it harder to co-ordinate with Stan’s 1-based indexing

Julia和Turing的基于1的索引與Python的基于0的索引相比，這使得與Stan的基于1的索引更難以協(xié)調(diào)

Symbolic math ability with Turing and Julia

圖靈和Julia的符號(hào)數(shù)學(xué)能力

Some disadvantages compared to PyStan:

與PyStan相比，有些缺點(diǎn)：

Not enough libraries to make pre-processing easy

沒有足夠的庫來簡化預(yù)處理

Stan has parsimonous model declaration syntax compared to Turing (probably just my ignorance with Turing)

與Turing相比，Stan具有簡約的模型聲明語法(可能只是我對(duì)Turing的無知)

Not a straightforward way to combine with Python (PyJulia is an option worth exploring)

這不是與Python結(jié)合的直接方法( PyJuli a是值得探索的選擇)

*****************************************************************

****************************************************** ***************

https://www.azquotes.com/quote/655174https://www.azquotes.com/quote/655174

翻譯自: https://medium.com/swlh/multi-logistic-regression-with-probabilistic-programming-db9a24467c0d

邏輯回歸概率回歸

總結(jié)

以上是生活随笔為你收集整理的逻辑回归概率回归_概率规划的多逻辑回归的全部內(nèi)容，希望文章能夠幫你解決所遇到的問題。

如果覺得生活随笔網(wǎng)站內(nèi)容還不錯(cuò)，歡迎將生活随笔推薦給好友。

上一篇：为什么会莫名其妙梦到一个人
下一篇： ajax不利于seo_利于探索移动选项的