貝葉斯統計 傳統統計_統計貝葉斯如何補充常客

貝葉斯統計 傳統統計

For many years, academics have been using so-called frequentist statistics to evaluate whether experimental manipulations have significant effects.

多年以來,學者們一直在使用所謂的常客統計學來評估實驗操作是否具有significant效果。

Frequentist statistic is based on the concept of hypothesis testing, which is a mathematical based estimation of whether your results can be obtained by chance. The lower the value, the more significant it would be (in frequentist terms). By the same token, you can obtain non-significant results using the same approach. Most of these "negative" results are disregarded in research, although there is tremendous added value in also knowing what manipulations do not have an effect. But that’s for another post ;)

頻率統計基于假設檢驗的概念,假設檢驗是基于數學的估計,您是否可以偶然獲得結果。 值越低,它的意義就越大(以常用術語而言)。 同樣,您可以使用相同的方法獲得不重要的結果。 盡管大多數“負面”結果在了解什么操作沒有效果的過程中具有巨大的附加價值 ,但它們在研究中被忽略。 但這是另一篇文章;)

Thing is, in such cases where no effect can be found, frequentist statistics are limited in their explanatory power, as I will argue in this post.

事實是,在找不到效果的情況下,常客統計資料的解釋力受到限制,正如我將在本文中指出的那樣。

Below, I will be exploring one limitation of frequentist statistics, and proposing an alternative method to frequentist hypothesis testing: Bayesian statistics. I will not go into a direct comparison between the two approaches. There is quite some reading out there, if you are interested. I will rather explore how why the frequentist approach presents some shortcomings, and how the two approaches can be complementary in some situations (rather than seeing them as mutually exclusive, as sometimes argued).

下面,我將探討頻率論者統計的局限性,并提出一種用于頻率論者假設檢驗的替代方法: Bayesian統計。 我不會直接比較這兩種方法。 如果您有興趣的話,可以在這里讀很多書。 我寧愿探索為什么頻頻主義者的方法會帶來一些缺點,以及兩種方法在某些情況下如何互補(而不是像有時所說的那樣將它們視為互斥的)。

This is the first of two posts, where I will be focusing on the inability of frequentist statistics to disentangle between the absence of evidence and the evidence of absence.

這是兩篇文章中的第一篇,我將重點關注常客統計數據無法區分缺乏證據缺乏證據之間的情況。

缺乏證據與缺乏證據 (Absence of evidence vs evidence of absence)

背景 (Background)

In the frequentist world, statistics typically output some statistical measures (t, F, Z values… depending on your test), and the almighty p-value. I discuss the limitations of only using p-values in another post, which you can read to get familiar with some concepts behind its computation. Briefly, the p-value, if significant (i.e., below an arbitrarily decided threshold, called alpha level, typically set at 0.05), determines that your manipulation most likely has an effect.

在常人世界中,統計數據通常會輸出一些統計量度(t,F,Z值……取決于您的測試)以及全能的p值。 我將在另一篇文章中討論僅使用p值的局限性,您可以閱讀以熟悉其計算背后的一些概念。 簡而言之,如果p值顯著(即低于任意確定的閾值,稱為alpha水平,通常設置為0.05),則表明您的操作最有可能產生效果。

However, what if (and that happens a lot), your p-value is > 0.05? In the frequentist world, such p-values do not allow you to disentangle between an absence of evidence and an evidence of absence of effect.

但是,如果(而且經常發生)您的p值> 0.05怎么辦? 在常識世界中,此類p值不允許您在缺乏證據缺乏效果的證據之間做出區分。

Let that sink in for a little bit, because it is the crucial point here. In other words, frequentist statistics are pretty effective at quantifying the presence of an effect, but are quite poor at quantifying evidence for the absence of an effect. See here for literature.

讓它陷入一點,因為這是關鍵。 換句話說,頻繁出現的統計數據在量化效果存在方面非常有效,但在量化效果不存在的證據方面卻很差。 有關文學,請參見此處 。

The demonstration below is taken from some work that was performed at the Netherlands Institute for Neuroscience, back when I was working in neuroscience research. A very nice paper was recently published on this topic, that I encourage you to read. The code below is inspired by the paper repository, written in R.

下面的演示摘自我在神經科學研究領域工作時在荷蘭神經科學研究所所做的一些工作。 最近發表了一篇關于該主題的非常好的論文 ,我鼓勵您閱讀。 以下代碼受R編寫的紙質存儲庫的啟發。

模擬數據 (Simulated Data)

Say we generate a random distribution with mean=0.5 and standard deviation=1.

假設我們生成一個均值= 0.5和標準差= 1的隨機分布。

np.random.seed(42)
mean = 0.5; sd=1; sample_size=1000
exp_distibution = np.random.normal(loc=mean, scale=sd, size=sample_size)
plt.hist(exp_distibution)
Image for post
Figure 1 | Histogram depicting random draw from a normal distribution centered at 0.5
圖1 直方圖,描繪了以0.5為中心從正態分布隨機抽取

That would be our experimental distribution, and we want to know whether that distribution is significantly different from 0. We could run a one sample t-test (which would be okay since the distribution seems very Gaussian, but you should theoretically prove that parametric testing assumptions are fulfilled; let’s assume they are)

那將是我們的實驗分布,我們想知道該分布是否與0顯著不同。我們可以運行一個樣本t檢驗(因為分布看起來非常高斯,所以可以,但是理論上您應該證明參數測試滿足假設;讓我們假設它們是)

t, p = stats.ttest_1samp(a=exp_distibution, popmean=0)
print(‘t-value = ‘ + str(t))
print(‘p-value = ‘ + str(p))
Image for post

Quite a nice p-value that would make every PhD student’s spine shiver with happiness ;) Note that with that kind of sample size, almost anything gets significant, but let’s move on with the demonstration.

相當不錯的p值會使每個博士生都對幸福感顫抖;)請注意,使用這種樣本量,幾乎所有東西都變得很重要,但讓我們繼續進行演示。

Now let’s try a distribution centered at 0, which should not be significantly different from 0

現在,讓我們嘗試以0為中心的分布,該分布與0的差別應該不大

mean = 0; sd=1; sample_size=1000
exp_distibution = np.random.normal(loc=mean, scale=sd, size=sample_size)
plt.hist(exp_distibutiont, p = stats.ttest_1samp(a=exp_distibution, popmean=0)
print(‘t-value = ‘ + str(t))
print(‘p-value = ‘ + str(p))
Image for post

Here, we have as expected a distribution that does not significantly differ from 0. And here is where things get a bit tricky: in some situations, frequentist statistics cannot really tell whether a p-value > 0.05 is an absence of evidence, and an evidence for absence, although that is a crucial point that would allow you to completely rule out an experimental manipulation from having an effect.

在這里,我們期望的分布與0的差異不大。在這里,情況變得有些棘手:在某些情況下,常客統計學不能真正判斷p值> 0.05是否缺少證據,而缺席的證據,盡管這是至關重要的一點,可以讓您完全排除實驗性操作的影響。

Let’s take an hypothetical situation:

讓我們假設一個情況:

You want to know whether a manipulation has an effect. It might be a novel marketing approach in your communication, a interference with biological activity or a “picture vs no picture” test in a mail you are sending. You of course have a control group to compare your experimental group to.

您想知道操作是否有效。 這可能是您交流中的一種新穎的營銷方式,是對生物活動的干擾,也可能是您發送的郵件中的“圖片無圖片”測試。 您當然有一個對照組來比較您的實驗組。

When collecting your data, you could see different patterns:

收集數據時,您會看到不同的模式:

  • (i) the two groups differ.

    (i)兩組不同。
  • (ii) the two groups behave similarly.

    (ii)兩組的行為相似。
  • (iii) you do not have enough observations to conclude (sample size too small)

    (iii)您沒有足夠的觀察結論(樣本量太小)

While option (i) is an evidence against the null hypothesis H0 (i.e., you have evidence that your manipulation had an effect), situations (ii) (=evidence for H0, i.e, evidence of absence) and (iii) (=no evidence, i.e, absence of evidence) cannot be disentangled using frequentist statistics. But maybe the bayesian approach can add something to this story...

盡管選項(i)是針對null hypothesis H0的證據(即,您有證據證明您的操縱有效果),但情況(ii)(= H0的證據,即不存在的證據)和(iii)(=否)證據,即沒有證據)不能使用常客統計來弄清。 但是也許貝葉斯方法可以為這個故事增添些...

p值如何受效應和樣本量影響 (How p-values are affected by effect and sample sizes)

The first thing is to illustrate the situations where frequentist statistics have shortcomings.

首先是要說明常客統計數據存在缺陷的情況。

方法背景 (Approach background)

What I will be doing is plotting how frequentist p-values behave when changing both effect size (i.e., the difference between your control, here with a mean=0, and your experimental distributions) and sample size (number of observations or data points).

我要做的是繪制同時更改效果大小 (即,控件的均值= 0和實驗分布之間的差異)和樣本大小 (觀察值或數據點的數量)時,頻繁P值的行為。

Let’s first write a function that would compute these p-values:

讓我們首先編寫一個可以計算這些p值的函數:

def run_t_test(m,n,iterations):
"""
Runs a t-test for different effect and sample sizes and stores the p-value
"""
my_p = np.zeros(shape=[1,iterations])
for i in range(0,iterations):
x = np.random.normal(loc=m, scale=1, size=n)
# Traditional one tailed t test
t, p = stats.ttest_1samp(a=x, popmean=0)
my_p[0,i] = p
return my_p

We can then define the parameters of the space we want to test, with different sample and effect sizes:

然后,我們可以使用不同的樣本和效果大小來定義要測試的空間的參數:

# Defines parameters to be tested
sample_sizes = [5,8,10,15,20,40,80,100,200]
effect_sizes = [0, 0.5, 1, 2]
nSimulations = 1000

We can finally run the function and visualize:

我們最終可以運行該函數并進行可視化:

# Run the function to store all p-values in the array "my_pvalues"
my_pvalues = np.zeros((len(effect_sizes), len(sample_sizes),nSimulations))for mi in range(0,len(effect_sizes)):
for i in range(0, len(sample_sizes)):
my_pvalues[mi,i,] = run_t_test(m=effect_sizes[mi],
n=sample_sizes[i],
iterations=nSimulations
)

I will quickly visualize the data to make sure that the p-values seem correct. The output would be:

我將快速可視化數據以確保p值看起來正確。 輸出為:

p-values for sample size = 5
Effect sizes:
0 0.5 1.0 2
0 0.243322 0.062245 0.343170 0.344045
1 0.155613 0.482785 0.875222 0.152519
p-values for sample size = 15
Effect sizes:
0 0.5 1.0 2
0 0.004052 0.010241 0.000067 1.003960e-08
1 0.001690 0.000086 0.000064 2.712946e-07

I would make two main observations here:

我將在這里做兩個主要觀察:

  1. When you have high enough sample size (lower section), the p-values behave as expected and decrease with increasing effect sizes (since you have more robust statistical power to detect the effect).

    當樣本量足夠大時(下半部分),p值將按預期表現,并隨著效果大小的增加而減小(因為您有更強大的統計能力來檢測效果)。
  2. However, we also see that the p-values are not significant for a small sample sizes, even if the effect sizes are quite large (upper section). That is quite striking, since the effect sizes are the same, only the number of data points is different.

    但是,我們也看到即使樣本量很大(上半部分),p值對于小樣本量也并不重要。 這是非常驚人的,因為效果大小相同,所以只有數據點的數量不同。

Let’s visualize that.

讓我們想象一下。

可視化 (Visualization)

For each sample size (5, 8, 10, 15, 20, 40, 80, 100, 200), we will count the number of p-values falling in significance level bins.

對于每個樣本大小(5、8、10、15、20、40、80、100、200),我們將計算落入顯著性等級箱中的p值的數量。

Let’s first compare two distributions of equal mean, that is, we have an effect size = 0.

讓我們首先比較兩個均值相等的分布,即我們的效果大小= 0。

Image for post
Figure 2 | Number of p values located in each “significance” bins for effect size = 0
圖2 | 效果大小= 0時,每個“重要性”塊中位于p值的數量

As we can see from the plot above, most of the p-values computed by the t-test are not significant for an experimental distribution of mean 0. That makes sense, since these two distributions are not different in their means.

從上圖可以看出,通過t檢驗計算出的大多數p值對于平均值為0的實驗分布而言并不重要。這是有道理的,因為這兩種分布的均值沒有差異。

We can, however, see that in some cases, we do obtain significant p values, which can happen when using very particular data points drawn from the overall population. These are typically false positive, and the reason why it is important to repeat experiments and replicate results ;)

但是,我們可以看到,在某些情況下,我們確實獲得了顯著的p值,當使用從總體總體中得出的非常特殊的數據點時,可能會發生這種情況。 這些通常都是假陽性,是重復實驗和復制結果很重要的原因;)

Let’s see what happens if we use a distribution whose mean differs by 0.5 compared to the control:

讓我們看看如果我們使用與控件相比均值相差0.5的分布會發生什么:

Image for post
Figure 3 | Number of p values per “significance” bins for effect size = 0.5
圖3 | 每個“顯著性”區域的p值數量,效果大小= 0.5

Now, we clearly see that increasing sample size dramatically increases the ability to detect the effect, with still many non significant p-values for low sample sizes.

現在,我們清楚地看到,增加樣本量會極大地提高檢測效果的能力,但對于低樣本量,仍有許多不重要的p值。

Below, as expected, you see that for highly different distributions (effect size = 2), the number of significant p-values increase:

如下所示,可以看到,對于高度不同的分布(效果大小= 2),有效p值的數量增加:

Image for post
Figure 3 | Number of p values per “significance” bins for effect size = 2
圖3 | 每個“顯著性”倉的p值數量(效果大小= 2)

OK, so that was it for an illustrative example of how p-values are affected by sample and effect sizes.

好的,那是一個示例性示例,說明p值如何受樣本和效果大小影響。

Now, the problem is that when you have a non significant p value, you are not always sure whether you might have missed the effect (say because you had a low sample size, due to limited observations or budget) or whether your data really suggest the absence of an effect. As matter of fact, most scientific research have a problem of statistical power, because they have limited observations (due to experimental constraints, budget, time, publishing time pressure, etc…).

現在的問題是,當您的p值不顯著時,您將無法始終確定是否可能錯過了效果(例如,由于觀察或預算有限,樣本量較小)還是您的數據確實暗示了沒有效果。 實際上,大多數科學研究都有統計能力的問題,因為它們的觀察力有限(由于實驗限制,預算,時間,出版時間壓力等)。

Since the reality of data in research is a rather low sample size, you still might want to draw meaningful conclusions from non significant results based on low sample sizes.

由于研究中數據的真實性相當低,因此您可能仍想根據低樣本量從不重要的結果中得出有意義的結論。

Here, Bayesian statistics could help you make one more step with your data ;)

在這里,貝葉斯統計信息可以幫助您在數據處理方面邁出新一步;)

Stay tuned for the following post where I explore the Titanic and Boston data sets to demonstrate how Bayesian statistics can be useful in such cases!

請繼續關注以下文章,在該文章中我將探索泰坦尼克號和波士頓的數據集,以證明貝葉斯統計量在這種情況下如何有用!

You can find this notebook in the following repo: https://github.com/juls-dotcom/bayes

您可以在以下回購中找到此筆記本: https : //github.com/juls-dotcom/bayes

翻譯自: https://medium.com/@julien.her/statistics-how-bayesian-can-complement-frequentist-9ff171bb6396

貝葉斯統計 傳統統計

本文來自互聯網用戶投稿,該文觀點僅代表作者本人,不代表本站立場。本站僅提供信息存儲空間服務,不擁有所有權,不承擔相關法律責任。
如若轉載,請注明出處:http://www.pswp.cn/news/389773.shtml
繁體地址,請注明出處:http://hk.pswp.cn/news/389773.shtml
英文地址,請注明出處:http://en.pswp.cn/news/389773.shtml

如若內容造成侵權/違法違規/事實不符,請聯系多彩編程網進行投訴反饋email:809451989@qq.com,一經查實,立即刪除!

相關文章

吳恩達機器學習+林軒田機器學習+高等數學和線性代數等視頻領取

機器學習一直是一個熱門的領域。這次小編應大家需求,整理了許多相關學習視頻和書籍。本次分享包含:臺灣大學林軒田老師的【機器學習基石】和【機器學習技法】視頻教學、吳恩達老師的機器學習分享、徐小湛的高等數學和線性代數視頻,還有相關機…

saltstack二

配置管理 haproxy的安裝部署 haproxy各版本安裝包下載路徑https://www.haproxy.org/download/1.6/src/,跳轉地址為http,改為https即可 創建相關目錄 # 創建配置目錄 [rootlinux-node1 ~]# mkdir /srv/salt/prod/pkg/ [rootlinux-node1 ~]# mkdir /srv/sa…

319. 燈泡開關

319. 燈泡開關 初始時有 n 個燈泡處于關閉狀態。第一輪,你將會打開所有燈泡。接下來的第二輪,你將會每兩個燈泡關閉一個。 第三輪,你每三個燈泡就切換一個燈泡的開關(即,打開變關閉,關閉變打開&#xff0…

如何生成隨機不重復的11位數字

要求 不重復隨機11位數字不占存儲我們都知道11位數字(random)對應有最大值max和最小值min99999999999和10000000000.很簡單的從最小值開始按順序分發到最大值,就滿足了不重復,不占存儲,11位數字的特性。那么接下來就要考慮如何生成隨機數字這…

因為你的電腦安裝了即點即用_即你所愛

因為你的電腦安裝了即點即用Data visualization is a great way to celebrate our favorite pieces of art as well as reveal connections and ideas that were previously invisible. More importantly, it’s a fun way to connect things we love — visualizing data and …

關于前端緩存問題

Cookie、localStorage、sessionStorage的異同 之前沒怎接觸過前端緩存,請教了前端同事之后他給我粘了幾行代碼,用localStorage存取信息,后來老大review代碼的時候發現,被批了一頓,現在好好看看這幾個前端緩存的區別&am…

2074. 反轉偶數長度組的節點

2074. 反轉偶數長度組的節點 給你一個鏈表的頭節點 head 。 鏈表中的節點 按順序 劃分成若干 非空 組,這些非空組的長度構成一個自然數序列(1, 2, 3, 4, …)。一個組的 長度 就是組中分配到的節點數目。換句話說: 節點 1 分配給…

阿里云云服務器硬盤分區及掛載

云服務器環境:CentOS 6.2 64位 客戶端環境:Mac OSX 遠程連接方式:運行 Terminal,輸入命令 ssh usernameip 硬盤分區及掛載操作步驟: 查看未掛載的硬盤(名稱為/dev/xvdb)fdisk -l Disk /dev/xvdb…

團隊管理新思考_需要一個新的空間來思考討論和行動

團隊管理新思考andrew wong安德魯黃 Follow跟隨 Sep 4 九月4 There is a need for a new space to think, discuss, and act. This need are being felt by the majority of AI / ML / Data Product Managers out there. They are exhausted by the ever increasing data volum…

Uva201

原題地址:https://uva.onlinejudge.org/index.php?optioncom_onlinejudge&Itemid9 題意: 就是要你輸入一系列橫邊的起始點,和豎邊的起始點,然后你去找出這些邊里面構成的所有正方形。 心得體會 一道難度適中的模擬題&#xf…

2075. 解碼斜向換位密碼

2075. 解碼斜向換位密碼 字符串 originalText 使用 斜向換位密碼 ,經由 行數固定 為 rows 的矩陣輔助,加密得到一個字符串 encodedText 。 originalText 先按從左上到右下的方式放置到矩陣中。 先填充藍色單元格,接著是紅色單元格&#xff…

微服務實戰(六):落地微服務架構到直銷系統(事件存儲)

在CQRS架構中,一個比較重要的內容就是當命令處理器從命令隊列中接收到相關的命令數據后,通過調用領域對象邏輯,然后將當前事件的對象數據持久化到事件存儲中。主要的用途是能夠快速持久化對象此次的狀態,另外也可以通過未來最終一…

時間序列數據的多元回歸_清理和理解多元時間序列數據

時間序列數據的多元回歸No matter what kind of data science project one is assigned to, making sense of the dataset and cleaning it always critical for success. The first step is to understand the data using exploratory data analysis (EDA)as it helps us crea…

vue-cli搭建項目的目錄結構及說明

vue-cli基于webpack搭建項目的目錄結構 build文件夾 ├── build // 項目構建的(webpack)相關代碼 │ ├── build.js // 生產環境構建代碼(在npm run build的時候會用到這個文件夾)│ ├── check-versions.js // 檢查node&am…

391. 完美矩形

391. 完美矩形 給你一個數組 rectangles ,其中 rectangles[i] [xi, yi, ai, bi] 表示一個坐標軸平行的矩形。這個矩形的左下頂點是 (xi, yi) ,右上頂點是 (ai, bi) 。 如果所有矩形一起精確覆蓋了某個矩形區域,則返回 true ;否…

bigquery 教程_bigquery挑戰實驗室教程從數據中獲取見解

bigquery 教程This medium article focusses on the detailed walkthrough of the steps I took to solve the challenge lab of the Insights from Data with BigQuery Skill Badge on the Google Cloud Platform (Qwiklabs). I got access to this lab in the Google Cloud R…

學習linux系統到底有沒捷徑?

2019獨角獸企業重金招聘Python工程師標準>>> 說起linux操作系,可能對于很多不了解的人來說,第一個想到的就是類似于黑客帝國中的黑框框以及一串串不知所云的代碼,總之這些感覺都可以總結成為一個字,那就是——酷&#…

機器學習之路:python k近鄰回歸 預測波士頓房價

python3 學習機器學習api 使用兩種k近鄰回歸模型 分別是 平均k近鄰回歸 和 距離加權k近鄰回歸 進行預測 git: https://github.com/linyi0604/MachineLearning 代碼: 1 from sklearn.datasets import load_boston2 from sklearn.cross_validation import train_test_…

大話數據結構 (程杰 著)

1轉載于:https://www.cnblogs.com/revoid/p/9605734.html

wxpython實現界面跳轉

wxPython實現Frame之間的跳轉/更新的一種方法 wxPython是Python中重要的GUI框架,下面通過自己的方法實現模擬類似PC版微信登錄,并跳轉到主界面(朋友圈)的流程。 (一)項目目錄 【說明】 icon : 保存項目使用…