js值的拷貝和值的引用

介紹 (Introduction)

Welcome to this lesson on calculating p-values.

歡迎參加有關計算p值的課程。

Before we jump into how to calculate a p-value, it’s important to think about what the p-value is really for.

在我們開始計算p值之前，考慮一下p值的真正意義很重要。

假設檢驗復習 (Hypothesis Testing Refresher)

Without going into too much detail for this post, when establishing a hypothesis test, you will determine a null hypothesis. Your null hypothesis represents the world in which the two variables your assessing don’t have any given relationship. Conversely the alternative hypothesis represents the world where there is a statistically significant relationship such that you’re able to reject the null hypothesis in favor of the alternative hypothesis.

在不進行過多介紹的情況下，建立假設檢驗時，您將確定原假設。您的零假設代表了您評估的兩個變量沒有任何給定關系的世界。相反，替代假設表示存在統計學上顯著關系的世界，這樣您就可以拒絕原假設，而支持替代假設。

深潛 (Diving Deeper)

Before we move on from the idea of hypothesis testing… think about what we just said. You effectively need to prove that with little room for error, what we’re seeing in the real world could not be taking place in a world where these variables are not related or in a world where the relationship is independent.

在繼續進行假設檢驗的想法之前，請思考一下我們剛才所說的內容。您實際上需要證明，在幾乎沒有錯誤余地的情況下，在這些變量不相關的世界或在關系獨立的世界中，我們在現實世界中看到的東西不可能發生。

Sometimes when learning concepts in statistics, you hear the definition, but take little time to conceptualize. There is often a lot of memorization of rule sets… I find that understanding the intuitive foundation of these principles will serve you far better when finding their practical applications.

有時，當學習統計學中的概念時，您會聽到定義，但是花很少的時間來概念化。規則集通常記憶很多。我發現了解這些原理的直觀基礎將在您發現其實際應用時為您提供更好的服務。

Continuing on this vein of thought. If you want to compare your real world stat with the fake world, that’s exactly what you should do.

繼續這種思想脈絡。如果您想將真實世界的統計數據與假世界進行比較，那正是您應該做的。

As you’d guess we can calculate our observed statistic by creating a linear regression model where we explain our response variable as a function of our explanatory variable. Once we’ve done this we can quantify the relationship between these two variables using the slope or coefficient identified through our ols regression.

如您所料，我們可以通過創建線性回歸模型來計算觀察到的統計數據，在該模型中，我們將響應變量解釋為解釋變量的函數。完成此操作后，我們可以使用通過ols回歸確定的斜率或系數來量化這兩個變量之間的關系。

But now we need to come up with a this idea of the null world… or the world where these variables are independent. This is something we don’t have, so we’ll need to simulate it. For our convenience, we’re going to leverage the infer package.

但是，現在我們需要提出一個關于零世界 ……或這些變量是獨立的世界的想法。這是我們所沒有的，因此我們需要對其進行仿真。為了方便起見，我們將利用推斷包。

讓我們計算觀察到的統計數據 (Let’s Calculate our Observed Statistic)

First things first, let’s get our observed statistic!

首先，讓我們獲取觀察到的統計信息！

The dataset we’re working with is a Seattle home prices dataset. I’ve used this dataset many times before and find it particularly flexible for demonstration. The record level of the dataset is by home and details price, square footage, # of beds, # of baths, and so forth.

我們正在使用的數據集是西雅圖房屋價格數據集。我以前曾多次使用過該數據集，并發現它對于演示特別靈活。數據集的記錄級別是按房屋和詳細信息，價格，平方英尺，床位數，浴室數量等等。

Through the course of this post, we’ll be trying to explain price through a function of square footage.

在本文的整個過程中，我們將嘗試通過平方英尺的功能來解釋價格。

Let’s create our regression model

讓我們創建回歸模型

fit <- lm(price_log ~ sqft_living_log,
          data = housing)
summary(fit)

As you can see in the output above, the statistic we’re after is the Estimate for our explanatory variable, sqft_living_log.

如您在上面的輸出中看到的，我們需要的統計信息是我們的解釋變量sqft_living_log的Estimate 。

A very clean way to do this is to tidy our results such that rather than a linear model, we get a tibble. Tibbles, tables, or data frames are going to make it a lot easier for us to systematically interact with.

一種非常干凈的方法是整理我們的結果，使我們得到的不是線性模型，而是小標題。標語，表格或數據框將使我們更輕松地進行系統地交互。

We’ll then want to filter down to the sqft_living_log term and we'll wrap it up by using the pull function to return the estimate itself. This will return the slope as a number, which will make things easier to compare with our null distribution later on.

然后，我們希望過濾到sqft_living_log項，并使用pull函數返回估計值本身來對其進行包裝。這將以數字形式返回斜率，這將使以后更容易與空分布進行比較。

Take a look!

看一看！

lm(price_log ~ sqft_living_log,
          data = housing)%>%
  tidy()%>%
  filter(term == 'sqft_living_log')%>%
  pull(estimate)

是時候模擬了！ (Time to Simulate!)

To kick things off, you should know there are various types of simulation. The one we’ll be using here is what’s called permutation.

首先，您應該知道有各種類型的模擬。我們將在這里使用的就是所謂的permutation 。

Permutation is particularly helpful when it comes to showing a world where variables are independent of one another.

當顯示一個變量相互獨立的世界時，排列特別有用。

While we won’t be going into the specifics of how a permutation sample is created under the hood; it’s worth noting that the sample will be normal and center around 0 for the observed statistic.

雖然我們不會詳細介紹如何在后臺創建排列樣本；值得注意的是，樣本將是正常的，并且在觀察到的統計數據的中心大約為0。

In this case, the slope would center around 0 as we’re operating under the premise that there is no relationship between our explanatory and response variables.

在這種情況下，當我們在解釋變量和響應變量之間沒有關系的前提下進行操作時，斜率將以0為中心。

推斷基本原理 (Infer Fundamentals)

A few things for you to know:

您需要了解的幾件事：

specify is how we determine the relationship we’re modeling: price_log~sqft_living_log
指定如何確定我們正在建模的關系： price_log~sqft_living_log
hypothesize is where we designate independence
假設是我們指定independence
generate is how we determine the number of replications of our dataset we want to make. Note that if you did, one replicate and did not calculate it would return a sample dataset of the same size as the original dataset.
generate是我們確定要復制的數據集的數量的方式。請注意，如果您這樣做了，則一次重復但不進行calculate將返回與原始數據集大小相同的樣本數據集。
calculate allows you to determine the calculation in question (slope, mean, median, diff in means, etc.)
計算可讓您確定相關的計算(斜率，均值，中位數，均值差異等)

library(infer)
set.seed(1) perm <- housing %>%
  specify(price_log ~ sqft_living_log) %>%
  hypothesize(null = 'independence') %>%
  generate(reps = 100, type = 'permute') %>%
  calculate('slope')perm
hist(perm$stat)

Same distribution with 1000 reps

分配相同，重復1000次

空采樣分布 (Null Sampling Distribution)

Ok we’ve done it! We’ve created what is known as the null sampling distribution. What we’re seeing above is a distribution of 1000 slopes each modeled after 1000 simulations of independent data.

好的，我們完成了！我們創建了所謂的空采樣分布。上面我們看到的是1000個坡度的分布，每個坡度都是在獨立數據進行1000次模擬之后建模的。

This gives us just what we needed. A simulated world against which we can compare reality.

這給了我們我們所需要的。一個可以與現實進行比較的模擬世界。

Taking the visual we just made, let’s use a density plot and add a vertical line for our observed slope, marked in red.

以我們剛剛制作的視覺效果，讓我們使用密度圖，并為觀察到的斜率添加一條垂直線，用紅色標記。

ggplot(perm, aes(stat)) + 
  geom_density()+
  geom_vline(xintercept = obs_slope, color = 'red')

Visually, you can see that this is happening far beyond the occurrences of random chance.

從視覺上，您可以看到這種情況遠遠超出了隨機機會的發生。

As you can guess from visually looking at this the p-value here is going to be 0. As to say, in 0% of the null sampling distribution is greater than or equal to our observed statistic.

從視覺上可以看出，這里的p值將為0。也就是說，在0％的原始抽樣分布中，大于或等于我們觀察到的統計量。

If in fact we were seeing cases where our permuted data was greater than or equal to our observed statistic, we would know that it was just random.

如果實際上我們看到的是排列的數據大于或等于觀察到的統計數據的情況，那么我們將知道它只是隨機的。

The reiterate the message here, the purpose of p-value is to give you an idea of how feasible it is that we saw such a slope randomly versus a statistically significant relationship.

在此重申此信息，p值的目的是讓您了解我們隨機看到這樣的斜率與統計上顯著的關系是多么可行。

計算P值 (Calculating P-value)

While we know what our p-value will be here, let’s get you set up with the calculation for p-value.

雖然我們知道這里的p值將是多少，但讓我們開始設置p值的計算。

To re-prime this idea; p-value is the portion of replicates that were (randomly) greater than or equal to our observed slope.

重新提出這個想法； p值是重復(隨機)大于或等于我們觀察到的斜率的部分。

You’ll see in our summarise function that we're checking to see whether our stat or slope is greater than or equal to the observed slope. Each record will be assigned TRUE or FALSE accordingly.. When you wrap that in a mean function, TRUE will represent 1 and FALSE 0, resulting in a proportion of the cases stat was greater than or equal to our observed slope.

您將在summarise功能中看到，我們正在檢查統計數據或斜率是否大于或等于觀察到的斜率。每條記錄將被相應地分配為TRUE或FALSE。當您將其包裝在平均值函數中時，TRUE將代表1，而FALSE為0，從而導致部分情況stat大于或等于我們觀察到的斜率。

perm %>%
  summarise(p_val = 2 * round(mean(stat >= obs_slope),2))

For the sake of identifying the case of a weaker relationship in which we would not have sufficient evidence to reject the null hypothesis, let’s look at price explained as a function of the year it was built.

為了確定關系較弱的情況，在這種情況下我們將沒有足夠的證據來拒絕原假設，讓我們看一下價格作為其建立年份的函數。

Using the same calculation as above, this results in a p-value of 12%; which according to a standard confidence level of 95%, is not sufficient evidence to reject the null hypothesis.

使用與上述相同的計算，得出的p值為12％；根據95％的標準置信度，這不足以拒絕原假設。

關于P值解釋的最終說明 (Final Notes on P-value Interpretation)

One final thing I want to highlight just one more time….

最后一件事，我想再強調一次。

The meaning of 12%. We saw that when we randomly generated an independent sample… a whole 12% of the time, our randomly generated slope was as or more extreme…

意思是12％。我們看到，當我們隨機生成一個獨立樣本時……整整12％的時間里，我們隨機生成的斜率等于或大于極限。

You might see such a result as much as 12% just due to random chance

由于隨機機會，您可能會看到多達12％的結果

結論 (Conclusion)

That’s it! You’re a master of the calculating & understanding p-value.

而已！您是計算和理解p值的大師。

In a few short minutes we have learned a lot:

在短短的幾分鐘內，我們學到了很多：

hypothesis testing
假設檢驗
linear regression refresher
線性回歸更新
sampling explanation
抽樣說明
learning about infer package
了解推斷包
building a sampling distribution
建立抽樣分布
visualizing p-value
可視化p值
calculating p-value
計算p值

It’s easy to get lost when dissecting statistics concepts like p-value. My hope is that having a strong foundational understanding of the need and corresponding execution allows you to understand and correctly apply this to any variety of problems.

剖析p值之類的統計概念時，很容易迷失方向。我希望對需求和相應的執行有深刻的基礎理解，使您能夠理解并正確地將其應用于各種問題。

If this was helpful, feel free to check out my other posts at https://medium.com/@datasciencelessons. Happy Data Science-ing!

如果這有幫助，請隨時通過https://medium.com/@datasciencelessons查看我的其他帖子。快樂數據科學！

翻譯自: https://towardsdatascience.com/getting-to-the-bottom-of-p-value-the-intuitive-explanation-calculation-fec46bb15a92

js值的拷貝和值的引用

本文來自互聯網用戶投稿，該文觀點僅代表作者本人，不代表本站立場。本站僅提供信息存儲空間服務，不擁有所有權，不承擔相關法律責任。
如若轉載，請注明出處：http://www.pswp.cn/news/391600.shtml
繁體地址，請注明出處：http://hk.pswp.cn/news/391600.shtml
英文地址，請注明出處：http://en.pswp.cn/news/391600.shtml

如若內容造成侵權/違法違規/事實不符，請聯系多彩編程網進行投訴反饋email:809451989@qq.com，一經查實，立即刪除！

leetcode 115. 不同的子序列（dp）

給定一個字符串 s 和一個字符串 t ，計算在 s 的子序列中 t 出現的個數。字符串的一個子序列是指，通過刪除一些（也可以不刪除）字符且不干擾剩余字符相對位置所組成的新字符串。（例如，“ACE” 是 “ABCDE…

監督學習-KNN最鄰近分類算法

分類（Classification）指的是從數據中選出已經分好類的訓練集，在該訓練集上運用數據挖掘分類的技術建立分類模型，從而對沒有分類的數據進行分類的分析方法。分類問題的應用場景：用于將事物打上一個標簽，通常…

istio 和 kong_如何啟動和運行Istio

istio 和 kongby Chris Cooney克里斯庫尼(Chris Cooney) 如何啟動和運行Istio (How to get Istio up and running) 而一旦完成，您就可以做的瘋狂的事情。 (And the crazy stuff you can do once it is.) The moment you get Istio working on your cluster, it fee…

js練習--貪吃蛇（轉）

最近一直在看javascript，但是發現不了動力。就開始想找動力，于是在網上找到了一個用js寫的貪吃蛇游戲。奈何還不會用git，就只能先這樣保存著。哈哈哈，這也算第一篇博客了，以后會堅持用自己的代碼寫博客的，下…

bzoj千題計劃169：bzoj2463: [中山市選2009]誰能贏呢？

http://www.lydsy.com/JudgeOnline/problem.php?id2463 n為偶數時，一定可以被若干個1*2 矩形覆蓋先手每次從矩形的一端走向另一端，后手每次走向一個新的矩形所以先手必勝 n為奇數時，先手走完一步后，剩下同n為偶數所以先手必敗…

無監督學習-主成分分析和聚類分析

聚類分析（cluster analysis）是將一組研究對象分為相對同質的群組（clusters）的統計分析技術，即將觀測對象的群體按照相似性和相異性進行不同群組的劃分，劃分后每個群組內部各對象相似度很高，而不…

struts實現分頁_在TensorFlow中實現點Struts

struts實現分頁If you want to get started on 3D Object Detection and more specifically on Point Pillars, I have a series of posts written on it just for that purpose. Here’s the link. Also, going through the Point Pillars paper directly will be really help…

封裝jQuery下載文件組件

使用jQuery導出文檔文件 jQuery添加download組件 jQuery.download function(url, data, method){if( url && data ){data typeof data string ? data : paramEdit(data);　　　　　function paramEdit(obj){　　　　　　　　var temStr "",tempStr"…

7.13. parallel - build and execute shell command lines from standard input in parallel

并行執行shell命令 $ sudo apt-get install parallel 例 7.5. parallel - build and execute shell command lines from standard input in parallel $ cat *.csv | parallel --pipe grep 13113 設置塊大小 $ cat *.csv | parallel --block 10M --pipe grep 131136688 原…

MySQL-InnoDB索引實現

聯合索引提高查詢效率的原理 MySQL會為InnoDB的每個表建立聚簇索引，如果表有索引會建立二級索引。聚簇索引以主鍵建立索引，如果沒有主鍵以表中的唯一鍵建立，唯一鍵也沒會以隱式的創建一個自增的列來建立。聚簇索引和二級索引都是一個b樹&…

Go語言-基本的http請求操作

Go發起GET請求基本的GET請求 //基本的GET請求 package mainimport ("fmt""io/ioutil""net/http" )func main() {resp, err : http.Get("http://www.hao123.com")if err ! nil {fmt.Println(err)return}defer resp.Body.Close()body, …

釘釘設置jira機器人_這是當您機器學習JIRA票證時發生的事情

釘釘設置jira機器人For software developers, one of the most-debated and maybe even most-hated questions is “…and how long will it take?”. I’ve experienced those discussions myself, which oftentimes lacked precise information on the requirements. What I…

python的賦值與參數傳遞(python和linux切換)

1，python模式切回成linux模式------exit（） linux模式切換成python模式------python 2,在linux里運行python的復合語句（得在linux創建.py文件） touch le.py vim le.py----在le文件里輸入python語句 #!/usr/bin/python …

vscode 標準庫位置_如何在VSCode中使用標準

vscode 標準庫位置I use Visual Studio Code as my text editor. When I write JavaScript, I follow JavaScript Standard Style.Theres an easy way to integrate Standard in VS Code—with the vscode-standardjs plugin. I made a video for this some time ago if youre …