大數據定律與中心極限定理
數據科學 (Data Science)
The Central Limit Theorem is at the center of statistical inference what each data scientist/data analyst does every day.
中心極限定理是每個數據科學家/數據分析師每天所做的統計推斷的中心。
Central Limit Theorem performs a significant part in statistical inference. It depicts precisely how much an increase in sample size diminishes sampling error, which tells us about the precision or margin of error for estimates of statistics, for example, percentages, from samples.
中心極限定理在統計推斷中起著重要作用。 它精確地描述了樣本數量的增加在多大程度上減少了抽樣誤差,從而告訴我們關于統計估計值(例如,樣本中的百分比)的精度或誤差范圍。
Statistical inference depends on the possibility that it is conceivable to take a broad view results from a sample to the population. How might we guarantee that relations seen in an example are not just because of the possibility?
統計推斷取決于是否有可能對樣本進行總體評估。 我們如何保證在示例中看到的關系不僅僅是因為可能性?
Significance tests are intended to offer a target measure to inform decisions about the validity of the broad view. For instance, one can locate a negative relationship in a sample between education and income. However, added information is essential to show that the outcome isn’t just because of possibility, yet that it is statistically significant.
重要性測試旨在提供一種目標度量,以告知有關廣泛視野有效性的決策。 例如,可以在樣本中發現教育與收入之間的負相關關系。 但是,添加信息對于顯示結果不僅是因為可能,而且在統計上也很重要至關重要。
The Central Limit Theorem (CLT) is a mainstay of statistics and probability. The theorem expresses that as the size of the sample expands, the distribution of the mean among multiple samples will be like a Gaussian distribution.
中心極限定理 (CLT)是統計和概率的中流tay柱。 該定理表示,隨著樣本大小的擴展,多個樣本之間的均值分布將類似于高斯分布 。
We can think of doing a trial and getting an outcome or an observation. We can rehash the test again and get another independent observation. Accumulated, numerous observations represent a sample of observations.
我們可以考慮進行試驗并獲得結果或觀察結果。 我們可以再次重新測試,并獲得另一個獨立的觀察結果。 累積的大量觀察值代表觀察值樣本。
On the off chance that we calculate the mean of a sample, it will approximate the mean of the population distribution. In any case, like any estimate, it will not be right and will contain some mistakes. On the off chance that we draw numerous independent samples, and compute their means, the distribution of those means will shape a Gaussian distribution.
在計算樣本均值的偶然機會上,它將近似于總體分布的均值。 無論如何,像任何估計一樣,這都是不正確的,并且會包含一些錯誤。 在偶然的機會下,我們將抽取大量獨立樣本并計算其均值,這些均值的分布將形成高斯分布。
It is significant that every trial that outcomes in an observation be autonomous and acted similarly. This is to guarantee that the sample is drawing from the equivalent fundamental population distribution. More officially, this desire is alluded to as autonomous and indistinguishably distributed or set of comparative statements.
重要的是,觀察結果中的每項試驗都應具有自主性并采取類似的行動。 這是為了確保樣本來自等效的基本人口分布。 更正式地說,這種愿望被指為自主的,無差別的分布或一組比較表述。
As far as possible, the central limit theorem is regularly mistaken for the law of large numbers (LLN) by beginners. They are non -identical, and the key differentiation between them is that the LLN relies upon the size of a single sample, though the CLT relies upon the number of samples.
初學者經常將中心極限定理經常誤認為是大數定律 (LLN)。 它們是不同的,它們之間的主要區別在于LLN依賴于單個樣本的大小,而CLT則依賴于樣本的數量。
LLN expresses that the sample means of independent and indistinguishably distributed observations perceptions joins to a certain value as far as possible CLT portrays the distribution of the distinction between the sample means and the value.
LLN表示,獨立且無差別分布的觀測知覺的樣本均值將加入一個特定值,而CLT則描繪了樣本均值與值之間的區別的分布。
Since as far as possible, the central limit theorem gives us a certain distribution over our estimations. We can utilize this to pose an inquiry about the probability of an estimate that we make. For example, assume we are attempting to think about how an election will turn out.
由于盡可能地,中心極限定理給了我們估計值的一定分布。 我們可以利用它來提出關于我們做出估計的概率的詢問。 例如,假設我們試圖考慮選舉的結果。
We take a survey and discover that in our sample, 30% of individual would decide in favor of candidate A over candidate B. Obviously, we have just seen a small sample of the total population, so we had preferred to know whether our outcome can be said to hold for the whole population, and if it can’t, we’d like to understand how substantial the error may be.
我們進行了一項調查,發現在我們的樣本中,有30%的人會選擇候選人A勝過候選人B。顯然,我們只看到了總人口中的一小部分,因此我們更想知道我們的結果是否可以據說可以容納整個人口,如果不能,我們想了解這個錯誤可能有多大。
As far as possible, the central limit theorem discloses to us that on the off chance that we ran the survey over and again, the subsequent theories would be normally distributed across the real population value.
中心極限定理盡可能地向我們揭示,如果我們不需一次又一次地進行調查,那么隨后的理論將在實際人口價值上呈正態分布。
The CLT works from the center out. That implies on the off chance that you are presuming close to the center, for example, that around two-thirds of future totals will fall inside one standard deviation of the mean, you can be secure even with little samples.
CLT從中央開始工作。 這意味著您很有可能會假設自己靠近中心,例如,大約三分之二的未來總量將落在均值的一個標準差之內,即使樣本量很少,您也可以放心。
However, if you talk about the tails, for example, presuming that whole in excess of five standard deviations from the mean is almost unthinkable, you can be mortified, even with sizable samples.
但是,如果您談論的是尾巴,例如,假設與平均值相比超出5個標準差的整數幾乎是不可想象的,那么即使有相當大的樣本,您也可能會被貶低。
The CLT disappoints when a distribution has a non-limited variance. These cases are rare yet might be significant in certain fields.
當分布具有無限制的方差時,CLT會令人失望。 這些情況很少見,但在某些領域可能很重要。
CLT asserts the prominence of the Gaussian distribution as a natural restricting distribution. It legitimizes numerous theories associated to statistics, for example, the normality of the error terms in linear regression is the independent totality of numerous random variables with limited variance or undetectable errors, we can normally expect it is normally distributed.
CLT斷言, 高斯分布的突出之處是自然的限制性分布。 它使與統計有關的眾多理論合法化,例如,線性回歸中誤差項的正態性是方差有限或無法檢測到的眾多隨機變量的獨立總數,我們通常可以期望其呈正態分布。
Solidly, when you don’t have a clue about the distribution of certain data, at that point, you can utilize the CLT to presume about their normality.
當然,當您對某些數據的分布一無所知時,可以使用CLT推測其正常性。
In any case, the drawback of the CLT is that it is frequently utilized without checking the suspicions, which has been the situation in finance domain for quite a while, assuming returns were normal, though they have a fat-tailed distribution, which characteristically carries a greater number of dangers than the normal distribution.
無論如何,CLT的缺點是經常使用它而沒有檢查懷疑,這在金融領域已經存在了相當長的一段時間,假設收益是正常的,盡管它們具有肥大的分布 ,通常具有危險性比正常分布更大。
CLT doesn’t have any significant bearing when you are managing with sums of dependent random variables or sums of non- indistinguishably distributed random variables or sums of random variables that breach both the autonomy condition and the indistinguishably distributed condition.
當您處理因變量隨機和的總和,不可區分分布的隨機變量的總和或違反自治條件和不可區分分布的條件的隨機變量的總和時,CLT沒有任何重要意義。
There are additional central limit theorems that loosen up the autonomy or indistinguishably distributed conditions. For example, there is the Lindberg-Feller theorem, which despite everything, necessitates that the random variables be independent, yet it loosens up the indistinguishably distributed condition.
還有其他的中心極限定理,可以放寬自治性或難以區分的分布條件。 例如,有一個Lindberg-Feller定理,盡管有所有這些定理,但它要求隨機變量是獨立的,但它卻松開了難以區分的分布條件。
In conclusion, the advantage of the CLT is that it is powerful, meaning implying that regardless of whether the data originates from an assortment of distributions if their mean and variance are the equivalent, the theorem can even now be utilized.
總之,CLT的優勢在于功能強大,這意味著無論數據的均值和方差是否相等,無論數據是否源自各種分布,該定理現在都可以使用。
翻譯自: https://medium.com/towards-artificial-intelligence/why-is-central-limit-theorem-important-to-data-scientist-49a40f4f0b4f
大數據定律與中心極限定理
本文來自互聯網用戶投稿,該文觀點僅代表作者本人,不代表本站立場。本站僅提供信息存儲空間服務,不擁有所有權,不承擔相關法律責任。 如若轉載,請注明出處:http://www.pswp.cn/news/390548.shtml 繁體地址,請注明出處:http://hk.pswp.cn/news/390548.shtml 英文地址,請注明出處:http://en.pswp.cn/news/390548.shtml
如若內容造成侵權/違法違規/事實不符,請聯系多彩編程網進行投訴反饋email:809451989@qq.com,一經查實,立即刪除!