數據科學和統計學_數據科學中的統計

數據科學和統計學

統計 (Statistics)

Statistics are utilized to process complex issues in reality with the goal that Data Scientists and Analysts can search for important patterns and changes in Data. In straightforward words, Statistics can be utilized to get significant experiences from information by performing scientific calculations on it. A few Statistical capacities, standards and calculations are executed to break down crude information, fabricate a Statistical Model and construe or foresee the outcome. The motivation behind this is to give an extensive review of the fundamentals of statistics that you’ll need to start your data science journey.

統計數據用于處理現實中的復雜問題,其目標是數據科學家和分析師可以搜索數據的重要模式和變化。 簡而言之,可以通過對統計信息進行科學計算,利用統計信息來獲得重要的經驗。 執行一些統計能力,標準和計算以分解原始信息,構建統計模型并解釋或預見結果。 其背后的動機是對開始進行數據科學之旅所需的統計基礎知識進行廣泛的回顧。

資料類型 (Data Types)

  1. Numerical:

    數值

    Data communicated with digits; is quantifiable. It can either be discrete (limited number of qualities) or consistent (interminable number of qualities).

    用數字傳達的數據; 是可量化的。 它可以是離散的(有限數量的質量)或一致的(無限數量的質量)。

  2. Downright:

    完全

    Qualitative data grouped into classes. It tends to be ostensible (no structure) or ordinal (requested data).

    定性數據分為幾類。 它傾向于表面上的(無結構)或順序的(請求的數據)。

集中趨勢測度 (Measures of Central Tendency)

  • Mean: The normal of a dataset.

    平均值 :數據集的法線。

  • Medium: The center of an arranged dataset; less defenseless to anomalies.

    :排列的數據集的中心; 對異常情況缺乏防御力。

  • Mode: The most widely recognized incentive in a dataset; just significant for discrete information.

    模式 :數據集中最廣泛認可的激勵; 對于離散信息而言意義重大。

statistics (1)

變異量度 (Measures of Variability)

  • Range: The distinction between the most elevated and least incentive in a dataset.

    范圍 :數據集中最高激勵和最低激勵之間的區別。

  • Variance (σ2): Apportions on how to spread a lot of data is comparative with the mean.

    方差(σ2) :關于如何分散大量數據的方式與均值比較。

  • Standard Deviation (σ): Another estimation of how to spread out numbers are in data collection; it is the square foundation of variance

    標準偏差(σ) :關于如何分散數字的另一種估計是在數據收集中。 它是方差的平方根

  • Z-score: Decides the number of the standard deviations data point is from the mean.

    Z分數 :確定標準差數據點與平均值的數量。

  • R-Squared: A factual proportion of fit that demonstrates how much variety of a reliant variable is clarified by the free variable(s); just helpful for straightforward direct relapse.

    R平方 :擬合的實際比例,它表明自由變量闡明了多少依賴變量; 有助于直接復發。

  • Balanced R-squared: A changed variant of r-squared that has been balanced for the number of indicators in the model; it increments if the new term improves the model more than would be normal by some coincidence and the other way around.

    平衡的R平方R平方的已更改變體,已經針對模型中的指標數量進行了平衡; 如果新術語對模型的改進程度比正常情況好一些(反之亦然),則它會增加。

變量之間關系的度量 (Measurement of Relationships between Variables)

  • Covariance: Measures the fluctuation between (at least two) factors. On the off chance that it's sure, at that point they will move in a similar way, in the event that it's negative, at that point they will in general move in inverse bearings, and on the off chance that they're zero, they have no connection to one another.

    協方差 :衡量(至少兩個)因素之間的波動。 可以肯定的是,到那時它們將以類似的方式運動,如果它為負,則通常它們將反向移動,而當它們為零時,它們將以相反的方向運動。沒有任何聯系。

  • Correlation: Measures the quality of a connection between two factors and ranges from - 1 to 1; the standardized adaptation of covariance. By and large, a connection of +/ - 0.7 speaks to a solid connection between two factors. On the other side, connections between - 0.3 and 0.3 show that there is almost no connection between factors.

    相關 :測量兩個因素之間的連接質量,范圍為-1到1; 協方差的標準化適應。 總的來說,+ /-0.7的連接表示兩個因素之間的牢固連接。 另一方面,-0.3和0.3之間的聯系表明因素之間幾乎沒有聯系。

概率分布函數 (Probability Distribution Functions)

  • Probability Density Function (PDF): A capacity for ceaseless data where the incentive anytime can be deciphered as giving a relative probability that the estimation of the irregular variable would rise to that example.

    概率密度函數(PDF) :一種不間斷數據的能力,在這種能力下,可以隨時將激勵解釋為給出不規則變量的估計將上升到該示例的相對概率。

  • Probability Mass Function (PMF): A capacity for discrete information that gives the likelihood of a given worth happening.

    概率質量函數(PMF) :離散信息的能力,給出給定價值發生的可能性。

  • Cumulative Density Function (CDF): A capacity that reveals to us the probability that an irregular variable is not exactly a specific worth; the basis of the PDF.

    累積密度函數(CDF) :一種能力,向我們揭示不規則變量不完全是特定價值的可能性; PDF的基礎。

連續數據分配 (Continuous Data Distributions)

  • Uniform Distribution: Probability dissemination where all results are similarly likely.

    均勻分布 :概率分布 ,所有結果都有可能相似。

  • Normal/Gaussian Distribution: Regularly alluded to as the bell curve and is identified with central limit theorem; has a mean of 0 and a standard deviation of 1.

    正態/高斯分布 :通常被稱為鐘形曲線,并通過中心極限定理進行標識; 平均值為0,標準偏差為1。

statistics (2)

T-Distribution: Probability dissemination used to evaluate populace parameters when the example size is little and/r when the populace change is obscure.

T分布 :當樣本量較小時和/或在人口變化不明顯時,用于評估人口參數的概率分布

Chi-Square Distribution: Dissemination of the chi-square measurement.

卡方分布 :傳播卡方測量。

離散數據分布 (Discrete Data Distributions)

  • Poisson Distribution: Probability dissemination that communicates the likelihood of a given number of occasions happening inside a fixed timeframe.

    泊松分布 :概率分布 ,用于傳達在固定時間范圍內發生給定次數的情況的可能性。

  • Binomial Distribution: Probability dissemination of the number of achievements in a succession of n autonomous encounters each with its Boolean-esteemed result (p, 1-p).

    二項式分布 :概率分布 n次連續的自動遭遇中每個成就的數量,每個自主遭遇都有布爾值估計的結果(p,1-p)。

片刻 (Moments)

Moments portray various parts of nature and state of circulation. The principal moment is the mean, the subsequent moment is the fluctuation, the third moment is the skewness, and the fourth moment is the kurtosis.

時刻刻畫了自然的各個部分和循環狀態。 主力矩是均值,隨后力矩是波動,第三力矩是偏度,第四力矩是峰度。

可能性 (Probability)

Conditional Probability [P(A|B)] is the probability of an occasion happening, in light of the event of a past occasion.

條件概率[P(A | B)]是根據過去的事件發生的情況的概率。

Independent Event whose result doesn't impact the likelihood of the result of another occasion; P(A|B) = P(A).

獨立事件,其結果不會影響其他情況下結果的可能性; P(A | B)= P(A)。

Mutually Exclusive events are events that can't happen at the same time; P(A|B) = 0.

互斥事件是不能同時發生的事件。 P(A | B)= 0。

Bayes' Theorem: A scientific recipe for deciding restrictive likelihood. "The probability of A given B is equal to the probability of B given A times the probability of A over the probability of B".

貝葉斯定理 :決定限制性可能性的科學方法。 “ A給定B的概率等于B給定A的概率乘以A的概率對B的概率”。

statistics (3)

準確性 (Accuracy)

  • True positive: Identifies the condition when the condition is available.

    真實肯定 :在條件可用時標識條件。

  • True negative: doesn't distinguish the condition when the condition is absent.

    真否定 :不存在條件時不區分條件。

  • False-positive: distinguishes the condition when the condition is missing.

    假陽性 :缺少條件時區分條件。

  • False-negative: doesn't distinguish the condition when the condition is available.

    假陰性 :在條件可用時不區分條件。

  • Sensitivity: otherwise called recall; quantifies the capacity of a test to distinguish the condition when the condition is available; sensitivity = TP/(TP+FN)

    敏感性 :否則稱為召回; 在條件可用時量化測試區分條件的能力; 靈敏度= TP /(TP + FN)

  • Specificity: quantifies the capacity of a test to accurately reject the condition when the condition is missing; Specificity = TN/(TN+FP)

    特異性 :量化測試在條件缺失時準確拒絕條件的能力; 特異性= TN /(TN + FP)

  • Predictive value positive: otherwise called precision; the extent of positives that compare to the nearness of the condition; PVP = TP/(TP+FP)

    正預測值 :否則稱為精度; 與條件的接近程度相比,陽性的程度; PVP = TP /(TP + FP)

  • Predictive value negative: the extent of negatives that compare to the nonattendance of the condition; PVN = TN/(TN+FN)

    預測值負數 :與條件的無人值守相比較的負數范圍; PVN = TN /(TN + FN)

statistics (4)

假設檢驗及其統計意義 (Hypothesis Testing and Statistical Significance)

  • Null Hypothesis: The speculation that example perceptions result absolutely from possibility.

    零假設(Null Hypothesis)假設感知完全是由可能性引起的。

  • Alternative Hypothesis: The theory that example perceptions are affected by some non-irregular reason.

    替代假設 :理論感知受一些非常規原因影響的理論。

  • P-value: the likelihood of acquiring the watched aftereffects of a test, accepting that the invalid speculation is right; a littler p-value implies that there is more grounded proof for the elective theory.

    P值 :接受無效推測是正確的,獲得測試的觀察到的后效應的可能性; 較小的p值表示選修理論有更多扎實的證據。

  • Alpha: The essentialness level; the probability of dismissing the invalid theory when it is valid — otherwise called Type 1 error.

    Alpha :必要性級別; 無效理論成立時被駁回的可能性-否則稱為1類錯誤。

  • Beta: type 2 mistake; neglecting to dismiss the false null hypothesis.

    Beta :類型2錯誤; 忽略了錯誤的虛假假設。

假設檢驗的步驟 (Steps to Hypothesis Testing)

  1. Express the invalid and elective theory

    表達無效選修理論

  2. Decide the test size; is it a couple or two-tailed test?

    確定測試大小; 是幾尾還是兩尾測試?

  3. Register the test measurement and the likelihood value

    注冊測試測量值和似然值

  4. Dissect the outcomes and either dismiss or don't dismiss the invalid speculation

    剖析結果,或者駁斥或不駁斥無效的推測

翻譯自: https://www.includehelp.com/data-science/statistics.aspx

數據科學和統計學

本文來自互聯網用戶投稿,該文觀點僅代表作者本人,不代表本站立場。本站僅提供信息存儲空間服務,不擁有所有權,不承擔相關法律責任。
如若轉載,請注明出處:http://www.pswp.cn/news/377179.shtml
繁體地址,請注明出處:http://hk.pswp.cn/news/377179.shtml
英文地址,請注明出處:http://en.pswp.cn/news/377179.shtml

如若內容造成侵權/違法違規/事實不符,請聯系多彩編程網進行投訴反饋email:809451989@qq.com,一經查實,立即刪除!

相關文章

java隨機數生成(固定位數)

隨機生成 a 到 b (不包含b)的整數:(int)(Math.random()*(b-a))a; 隨機生成 a 到 b (包含b)的整數:(int)(Math.random()*(b-a1))a;轉載于:https://www.cnblogs.com/zhwl/p/3624726.html

POJ 3670 Eating Together

POJ_3670 由于遞增和遞減是類似的,下面不妨只討論變成遞增序列的情況。 由于Di只有三個數,所以可以考慮將序列分割成三部分,第一部分全部變成1,第二部分全部變成2,第三部分全部變成3。然后我們枚舉3開始的位置&#xf…

《MySQL——如何解決一主多從的讀寫分離的過期讀問題》

目錄兩種架構兩種架構特點強制走主庫方案Sleep方案判斷主備無延遲方案配合semi-sync等主庫位點方案GTID方案兩種架構 基于一主多從的讀寫分離,如何處理主備延遲導致的讀寫分離問題。 讀寫分離的主要目標:分攤主庫壓力。 有兩種架構: 1、客…

json/ 發送形式_24/7的完整形式是什么?

json/ 發送形式24/7:二十四 (24/7: Twenty-Four Seven) 24/7 or 24-7 service, which generally marked "twenty-four seven" is service that is existing at any time and typically, every day in trade business and industry. Substitute orthograph…

《MySQL tips:并發查詢與并發連接區別》

并發連接與并發查詢,并不是一個概念。 在執行show processlist的結果里,看到了幾千個連接,指的是并發連接。 而"當前正在執行"的語句,才是并發查詢。 并發連接數多影響的是內存。 并發查詢太高對CPU不利。一個機器的…

對上拉下拉電阻的作用作個總結(想了解的過來看看)(轉載)

轉自:http://www.amobbs.com/thread-5475279-1-3.html 一、定義:上拉就是將不確定的信號通過一個電阻嵌位在高電平!電阻同時起限流作用!下拉同理!上拉是對器件注入電流,下拉是輸出電流;弱強只是…

給用戶傳入的變量進行轉義操作

先看代碼實現: /* 對用戶傳入的變量進行轉義操作。*/ if (!get_magic_quotes_gpc()) {if (!empty($_GET)){$_GET addslashes_deep($_GET);}if (!empty($_POST)){$_POST addslashes_deep($_POST);}$_COOKIE addslashes_deep($_COOKIE);$_REQUEST addslashes_…

《MySQL——外部檢測與內部統計 判斷 主庫是否出現問題》

目錄select1判斷查表判斷更新判斷外部檢測弊端內部統計一主一備的雙M架構里,主備切換只需要把客戶端流量切換到備庫。 在一主多從的架構里,主備切換要把客戶端流量切換到備庫,也需要把從庫接到新主庫上。 切換有兩種場景:1、主動…

NIM的完整形式是什么?

NIM:無內部消息 (NIM: No Internal Message) NIM is an abbreviation of "No Internal Message". NIM是“無內部消息”的縮寫。 It is an expression, which is commonly used in the Gmail platform. It is written in the subject of the mail, if the…

[Json] C#ConvertJson|List轉成Json|對象|集合|DataSet|DataTable|DataReader轉成Json (轉載)...

點擊下載 ConvertJson.rar 本類實現了 C#ConvertJson|List轉成Json|對象|集合|DataSet|DataTable|DataReader轉成Json|等功能大家先預覽一下 請看代碼 /// <summary> /// 類說明&#xff1a;Assistant /// 編 碼 人&#xff1a;蘇飛 /// 聯系方式&#xff1a;361983679 …

let 只能在嚴格模式下嗎_LET的完整形式是什么?

let 只能在嚴格模式下嗎LET&#xff1a;今天早早離開 (LET: Leaving Early Today) LET is an abbreviation of "Leaving Early Today". LET是“ Leaveing Today Today”的縮寫 。 It is an expression, which is commonly used in the Gmail platform. It is writt…

js 遮罩層 loading 效果

//調用方法 //關閉事件<button οnclickLayerHide()>關閉</button>&#xff0c;在loadDiv(text)中&#xff0c;剔除出來 //調用LayerShow(text)&#xff0c;text為參數&#xff0c;可以寫入想要寫入的提示語 //本方法在調用時會自動生成一個添加到body的div&#x…

centos6.5安裝配置LDAP服務[轉]

centos6.5安裝配置LDAP服務[轉] 安裝之前查一下 1find / -name openldap*centos6.4默認安裝了LDAP&#xff0c;但沒有裝ldap-server和ldap-client 于是yum安裝 1su root2yum install -y openldap openldap-servers openldap-clients不建議編譯源碼包&#xff0c;有依賴比較麻煩…

《MySQL——恢復數據-誤刪行、表、庫》

目錄誤刪行事前預防誤刪行數據方法誤刪表/庫延遲復制備庫事前預防誤刪庫/表方法傳統的架構不能預防誤刪數據&#xff0c;因為主庫的一個drop table命令&#xff0c;會通過binlog傳給所有從庫和級聯從庫&#xff0c;進而導致整個集群的實例都會執行這個命令。 MySQL相關的誤刪除…

python圖例位置_Python | 圖例位置

python圖例位置Legends are one of the key components of data visualization and plotting. Matplotlib can automatically define a position for a legend in addition to this, it allows us to locate it in our required positions. Following is the list of locations…

Freemarker中遍歷List實例

Freemarker中如何遍歷List摘要&#xff1a;在Freemarker應用中經常會遍歷List獲取需要的數據&#xff0c;并對需要的數據進行排序加工后呈現給用戶。那么在Freemarker中如何遍歷List&#xff0c;并對List中數據進行適當的排序呢&#xff1f;通過下文的介紹&#xff0c;相信您一…

工作總結:文件對話框的分類(C++)

原文地址&#xff1a;http://www.jizhuomi.com/software/173.html 文件對話框分為打開文件對話框和保存文件對話框&#xff0c;相信大家在Windows系統中經常見到這兩種文件對話框。例如&#xff0c;很多編輯軟件像記事本等都有“打開”選項&#xff0c;選擇“打開”后會彈出一個…

《MySQL——Innodb改進LRU算法》

Innodb改進LRU.算法&#xff0c;實質上將內存鏈表分成兩段。 靠近頭部的young和靠近末尾的old&#xff0c;取5/12段為分界。 新數據在一定時間內只能在old段的頭部&#xff0c;當在old段保持了一定的時間后被再次訪問才能升級到young。 實質上是分了兩段lru&#xff0c;這樣做的…

nfc/nfc模式_NFC的完整形式是什么?

nfc/nfc模式NFC&#xff1a;沒有進一步評論 (NFC: No Further Comment) NFC is an abbreviation of "No Further Comment". NFC是“沒有進一步評論”的縮寫 。 It is an expression, which is commonly used in messaging or chatting on social media networking s…

dx小記(2)

1.構造一個平截臺體&#xff08;Frustum&#xff09; 最近距離-projMatirx.43/projMatrix.33 projMatrix。33 深度/&#xff08;深度-最近距離&#xff09; projMatrix。44-最近距離*&#xff08;深度/&#xff08;深度-最近距離&#xff09;&#xff09; FrustumMatrix proje…