邏輯回歸 概率回歸
There is an interesting dichotomy in the world of data science between machine learning practitioners (increasingly synonymous with deep learning practitioners), and classical statisticians (both Frequentists and Bayesians). There is generally no overlap between the techniques used in these two camps. However, there are some interesting tools and libraries that are trying to bridge the gap between the two camps, especially using Bayesian inference techniques to estimate the uncertainty of deep learning models. See this post and this paper to know more about the historical and recent trends in this exciting new area. The biggest benefit to adopting Bayesian thinking is it forces us to explicitly layout all the assumptions that go into the model. It is hard to perform Bayesian inference without fully being aware of all the modeling choices throughout the way. The biggest downside to Bayesian inference is the time needed to run even moderately sized models.
在機器學習從業者(越來越多地與深度學習從業者同義)與古典統計學家(包括頻率論者和貝葉斯主義者)之間,數據科學領域存在著一種有趣的二分法。 在這兩個陣營中使用的技術之間通常沒有重疊。 但是,有一些有趣的工具和庫正試圖彌合兩個陣營之間的鴻溝,尤其是使用貝葉斯推理技術來估計深度學習模型的不確定性。 請參閱這篇文章和本文,以了解有關這個令人興奮的新領域的歷史和最近趨勢的更多信息。 采用貝葉斯思想的最大好處是,它迫使我們明確設計模型中的所有假設。 在沒有完全了解整個過程中所有建模選擇的情況下,很難執行貝葉斯推理。 貝葉斯推斷最大的缺點是運行中等大小的模型所需的時間。
There are several probabilistic programming languages/frameworks out there that are becoming more popular due to the recent advances in computing hardware. The most common and mature language is Stan which has APIs to work with other common programming languages like Python (PyStan) and R (RStan). There are also some newer players in the field like PyMC3 (Theano), Pyro (PyTorch), and Turing (Julia). Of these, Turing, written in Julia potentially seems to be an interesting option. It brings with it all the advantages of Julia, and combining it with Flux can theoretically make it “easy” to estimate the uncertainties of any deep learning model.
由于計算硬件的最新發展,有幾種概率性編程語言/框架正在變得越來越流行。 最常見和最成熟的語言是Stan,它具有與其他常見編程語言(例如Python( PyStan )和R( RStan ))一起使用的API。 該領域中還有一些較新的玩家,例如PyMC3 (Theano), Pyro (PyTorch)和Turing (Julia)。 其中,用Julia(Julia)編寫的圖靈似乎是一個有趣的選擇。 它帶來了Julia的所有優點 ,并且將其與Flux結合使用在理論上可以很輕松地估計任何深度學習模型的不確定性。
There are some amazing books to get you up and running with Bayesian data analysis and the bible in the field is definitely the book by the great Andrew Gelman. He also writes short articles/opinions on his blog which is worth following. I personally think the book “Statistical Rethinking” by Richard McElreath is the best introduction to the field for any newcomer. He walks you from the garden of forking paths all the way to multi-level models. He even has his entertaining and engaging lectures up on Youtube! No reason not to get your daily dose of Bayesian 😄
有一些很棒的書可以幫助您開始使用貝葉斯數據分析,并且該領域的圣經絕對是偉大的安德魯·蓋爾曼(Andrew Gelman) 所著的書 。 他還在自己的博客上寫了一些簡短的文章/觀點,值得關注。 我個人認為,Richard McElreath撰寫的“ Statistical Rethinking”一書對于任何新手來說都是該領域的最佳介紹。 他會帶您從分叉路徑的花園一直到多層模型。 他甚至在YouTube上進行有趣而有趣的演講 ! 沒有理由不每天服用貝葉斯😄
In this blog post, I just wanted to get my feet wet with Julia and Turing. I will use both PyStan and Turing to build multi-category logistic models to predict the species of penguins based on their features like bill-length, island, sex, etc. This is similar to the more popular Iris dataset that is used so commonly in data science tutorials. For more details on the Palmer penguin dataset see here.
在這篇博客中,我只是想和Julia和Turing在一起。 我將同時使用PyStan和Turing來建立多類別的物流模型,根據帳單長度,島嶼,性別等特征來預測企鵝的種類。這類似于在Iso中常用的更流行的Iris數據集。數據科學教程。 有關Palmer企鵝數據集的更多詳細信息,請參見此處 。
y斯坦 (PyStan)
First, let's use PyStan to build a multi-logit model. Code for the Stan model looks like this:
首先,讓我們使用PyStan構建多登錄模型。 Stan模型的代碼如下所示:
data {
int N; //the number of training observations
int N2; //the number of test observations
int D; //the number of features
int K; //the number of classes
int y[N]; //the response
matrix[N,D] x; //the model matrix
matrix[N2,D] x_new; //the matrix for the predicted values
}
parameters {
matrix[D,K] beta; //the regression parameters
}
model {
matrix[N, K] x_beta = x * beta;
to_vector(beta) ~ normal(0, 1);
for (n in 1:N)
y[n] ~ categorical_logit(x_beta[n]');
}
This is exactly similar to the example in Stan’s documentation. We are using a standard normal prior on all parameters. In the case of our penguin dataset, we have a total of 9 different features; four of them are continuous features namely bill-length, bill-depth, flipper-length, and body-mass, and 5 are one-hot encoded features for the island and sex categorical values. Therefore, the number of parameters to estimate is 9 per category. Since we have 3 categories, that would be a total of 27 parameters to estimate. For each category, the sum of the coefficients and the feature values are calculated:
這與Stan文檔中的示例完全相似。 我們在所有參數上都使用標準普通優先級。 就我們的企鵝數據集而言,我們共有9種不同的功能; 其中四個是連續特征,即鈔票長度,鈔票深度,鰭狀肢長度和身體質量,另外五個是島和性別分類值的一鍵編碼特征。 因此,每個類別要估計的參數數量為9。 由于我們有3個類別,因此總共需要估算27個參數。 對于每個類別,計算系數和特征值的總和:
The final category for each data point is computed using softmax:
使用softmax計算每個數據點的最終類別:

We could have also let the parameters for one category to be all zeros, and only estimate the remaining 9*2 parameters. This is the same idea as the binary classification models, where we only have one coefficient present:
我們也可以讓一個類別的參數全為零,而僅估計剩余的9 * 2參數。 這與二進制分類模型的想法相同,在二進制分類模型中,我們只有一個系數:

I will show how that looks like when we get to the Julia code using the Turing library
我將展示使用圖靈庫訪問Julia代碼時的情況
Now we have the model ready, let's go ahead and perform sampling to get the posteriors for all the parameters:
現在我們已經準備好模型,讓我們繼續進行采樣以獲取所有參數的后驗:
These are the parameters for Sampling:
這些是用于采樣的參數:
Algorithm: No-U-Turn Sampler (NUTS)
算法:禁止掉頭采樣器(NUTS)
Warmup: 500 iterations
預熱:500次迭代
Samples: 500 iterations
樣本:500次迭代
Chains: 4
鏈數:4
Max Tree Depth: 10
最大樹深:10
Time elapsed per chain: ~140 seconds
每條鏈經過的時間:?140秒

The chains show poor mixing and stability, and the recommendation from Stan is to go higher with the max tree depth for the NUTS sampler to get better stability between and across chains
鏈條顯示出不良的混合和穩定性,Stan的建議是增加NUTS采樣器的最大樹深度,以在鏈條之間和跨鏈獲得更好的穩定性。

The poor stability of the chains is also reflected in the number of effective samples (n_eff), which is quite low for some parameters. The Rhat is significantly above the recommended value of 1.05 for most parameters.
鏈的不良穩定性還反映在有效樣本數(n_eff)中,對于某些參數而言,該數目非常低。 對于大多數參數,Rhat明顯高于建議值1.05。
In general though, this is not generally an issue for most cases and the samples are usable as is shown below for predicting the train and test set classes
通常,在大多數情況下,這通常不是問題 ,并且可以使用樣本,如下所示,用于預測訓練和測試集的類別


Now, lets increase the maximum tree depth for the NUTS sample from 10 to 12. This increases the time taken for each chain to converge
現在,讓NUTS樣本的最大樹深度從10增加到12。這增加了每個鏈收斂所需的時間。
Max Tree Depth: 12
最大樹深:12
Time elapsed per chain: ~570 seconds
每條鏈經過的時間:?570秒

The chains show much better mixing and stability, and we could still go higher with the max tree depth for the NUTS sampler to get better stability between and across chains
鏈條顯示出更好的混合和穩定性,對于NUTS采樣器,我們仍然可以使用最大樹深度來提高鏈條之間和跨鏈條的穩定性。

As we can see, the number of effective samples (n_eff) has also increased considerably for some parameters, and the Rhat is approaching the recommended value of 1.05 for some parameters. These samples as expected provide good classification predictions
如我們所見,某些參數的有效樣本數(n_eff)也大大增加,Rhat接近某些參數的建議值1.05。 預期這些樣本提供了良好的分類預測


Increasing the max tree depth further to 15 significantly improves the chain stability (data not shown) but also increases the computational time ~25 fold.
將最大樹深度進一步增加到15,可以顯著改善鏈的穩定性(數據未顯示),但還會增加約25倍的計算時間。
The code for running the above models is here. For the full project that includes setup for AWS, Sagemaker, and XGBoost models refer to my earlier blog post and Github repo.
運行上述模型的代碼在這里 。 有關包含適用于AWS,Sagemaker和XGBoost模型的設置的完整項目,請參閱我先前的博客文章和Github repo 。
Julia: (Julia:)
Now, I will show you the equivalent model using Julia and Turing. The code can be found here in the main project repo. The model is defined like so:
現在,我將向您展示使用Julia和Turing的等效模型。 該代碼可以發現這里的主要項目回購。 該模型的定義如下:
@model logistic_regression(x, y, n, σ) = begin
intercept_Adelie ~ Normal(0, σ)
intercept_Gentoo ~ Normal(0, σ)
intercept_Chinstrap ~ Normal(0, σ) bill_length_mm_Adelie ~ Normal(0, σ)
bill_length_mm_Gentoo ~ Normal(0, σ)
bill_length_mm_Chinstrap ~ Normal(0, σ) bill_depth_mm_Adelie ~ Normal(0, σ)
bill_depth_mm_Gentoo ~ Normal(0, σ)
bill_depth_mm_Chinstrap ~ Normal(0, σ) flipper_length_mm_Adelie ~ Normal(0, σ)
flipper_length_mm_Gentoo ~ Normal(0, σ)
flipper_length_mm_Chinstrap ~ Normal(0, σ) body_mass_g_Adelie ~ Normal(0, σ)
body_mass_g_Gentoo ~ Normal(0, σ)
body_mass_g_Chinstrap ~ Normal(0, σ) island_Biscoe_Adelie ~ Normal(0, σ)
island_Biscoe_Gentoo ~ Normal(0, σ)
island_Biscoe_Chinstrap ~ Normal(0, σ)
island_Dream_Adelie ~ Normal(0, σ)
island_Dream_Gentoo ~ Normal(0, σ)
island_Dream_Chinstrap ~ Normal(0, σ)
island_Torgersen_Adelie ~ Normal(0, σ)
island_Torgersen_Gentoo ~ Normal(0, σ)
island_Torgersen_Chinstrap ~ Normal(0, σ) sex_female_Adelie ~ Normal(0, σ)
sex_female_Gentoo ~ Normal(0, σ)
sex_female_Chinstrap ~ Normal(0, σ)
sex_male_Adelie ~ Normal(0, σ)
sex_male_Gentoo ~ Normal(0, σ)
sex_male_Chinstrap ~ Normal(0, σ)for i = 1:n
v = softmax([intercept_Adelie +
bill_length_mm_Adelie*x[i, 1] +
bill_depth_mm_Adelie*x[i, 2] +
flipper_length_mm_Adelie*x[i, 3] +
body_mass_g_Adelie*x[i, 4] +
island_Biscoe_Adelie*x[i, 5] +
island_Dream_Adelie*x[i, 6] +
island_Torgersen_Adelie*x[i, 7] +
sex_female_Adelie*x[i,8] +
sex_male_Adelie*x[i,9],
intercept_Gentoo +
bill_length_mm_Gentoo*x[i, 1] +
bill_depth_mm_Gentoo*x[i, 2] +
flipper_length_mm_Gentoo*x[i, 3] +
body_mass_g_Gentoo*x[i, 4] +
island_Biscoe_Gentoo*x[i, 5] +
island_Dream_Gentoo*x[i, 6] +
island_Torgersen_Gentoo*x[i, 7] +
sex_female_Gentoo*x[i,8] +
sex_male_Gentoo*x[i,9],
intercept_Chinstrap + bill_length_mm_Chinstrap*x[i, 1] +
bill_depth_mm_Chinstrap*x[i, 2] +
flipper_length_mm_Chinstrap*x[i, 3] +
body_mass_g_Chinstrap*x[i, 4] +
island_Biscoe_Chinstrap*x[i, 5] +
island_Dream_Chinstrap*x[i, 6] +
island_Torgersen_Chinstrap*x[i, 7] +
sex_female_Chinstrap*x[i,8] +
sex_male_Chinstrap*x[i,9]])
y[i, :] ~ Multinomial(1, v)
end
end;
I used the default HMC sampler as recommended in the Turing tutorial. One thing that I noticed is the much better stability of the chains when using the HMC sampler from Turing:
我使用了圖靈教程中推薦的默認HMC采樣器。 我注意到的一件事是,使用圖靈的HMC采樣器時,鏈條的穩定性更好:

And the summary of the samples:
以及樣本摘要:

Overall, the HMC samples from Turing seem to do a lot better compared to the NUTS samples from PyStan. Of course, this is not an apples-to-apples comparison, but these are interesting results. In addition, the HMC sampler also was much faster compared to the max_tree_depth=12 run from PyStan shown above. This is something to dig into more.
總體而言,與來自PyStan的NUTS樣本相比,來自Turing的HMC樣本似乎要好得多。 當然,這不是一個蘋果對蘋果的比較,但這是有趣的結果。 此外,與上面顯示的PyStan運行的max_tree_depth = 12相比,HMC采樣器還快得多。 這是需要進一步研究的東西。
The predictions from Turing are perfect on both the Training and Test sets as expected since this is an easy prediction problem.
Turing的預測在訓練和測試集上都是理想的,因為這是一個容易預測的問題。
In conclusion, I like Julia and Turing so far! Another great (and fast) tool for Probabilistic Programming!
總之,到目前為止,我喜歡Julia和圖靈! 概率編程的另一個出色(快速)工具!
Some good things:
一些好東西:
- Turing is fast! (at least in this example with default samplers) 圖靈快! (至少在此示例中使用默認采樣器)
- 1-based indexing for Julia and Turing vs Python’s 0-based indexing which makes it harder to co-ordinate with Stan’s 1-based indexing Julia和Turing的基于1的索引與Python的基于0的索引相比,這使得與Stan的基于1的索引更難以協調
- Symbolic math ability with Turing and Julia 圖靈和Julia的符號數學能力
Some disadvantages compared to PyStan:
與PyStan相比,有些缺點:
- Not enough libraries to make pre-processing easy 沒有足夠的庫來簡化預處理
- Stan has parsimonous model declaration syntax compared to Turing (probably just my ignorance with Turing) 與Turing相比,Stan具有簡約的模型聲明語法(可能只是我對Turing的無知)
Not a straightforward way to combine with Python (PyJulia is an option worth exploring)
這不是與Python結合的直接方法( PyJuli a是值得探索的選擇)
*****************************************************************
****************************************************** ***************

翻譯自: https://medium.com/swlh/multi-logistic-regression-with-probabilistic-programming-db9a24467c0d
邏輯回歸 概率回歸
本文來自互聯網用戶投稿,該文觀點僅代表作者本人,不代表本站立場。本站僅提供信息存儲空間服務,不擁有所有權,不承擔相關法律責任。 如若轉載,請注明出處:http://www.pswp.cn/news/389714.shtml 繁體地址,請注明出處:http://hk.pswp.cn/news/389714.shtml 英文地址,請注明出處:http://en.pswp.cn/news/389714.shtml
如若內容造成侵權/違法違規/事實不符,請聯系多彩編程網進行投訴反饋email:809451989@qq.com,一經查實,立即刪除!