因果關系和相關關系 大數據
Let’s jump into it right away.
讓我們馬上進入。
相關性 (Correlation)
Correlation means relationship and association to another variable. For example, a movement in one variable associates with the movement in another variable. For example, ice-cream sales go up as the weather turns hot.
關聯是指與另一個變量的關系和關聯。 例如,一個變量的運動與另一變量的運動相關。 例如,隨著天氣變熱,冰淇淋銷售量上升。
A positive correlation means, the movement is in the same direction (left plot); negative correlation means that variables move in opposite direction (middle plot). The farther right plot is when there no correlation between the variables.
正相關表示運動方向相同(左圖); 負相關表示變量沿相反方向移動(中間圖)。 最右邊的圖是變量之間沒有相關性時。
因果關系 (Causation)
Causation means that one variable causes another to change, which means one variable is dependent on the other. It is also called cause and effect. One example would be as weather gets hot, people experience more sunburns. In this case, the weather caused an effect which is sunburn.
因果關系意味著一個變量導致另一個變量改變,這意味著一個變量依賴于另一個變量。 也稱為因果關系。 一個例子是隨著天氣變熱,人們遭受更多的曬傷。 在這種情況下,天氣會導致曬傷。

相關與因果差異 (Correlation vs Causation Difference)
Let’s try another example with this visualization. Your computer running out of battery causes it to shut down. It also causes video player to shut down. Now, computer and video player shutting down events are correlated; the actual cause is running out of battery.
讓我們嘗試另一個可視化示例。 您的計算機電池電量耗盡會導致其關閉。 它還會導致視頻播放器關閉。 現在,計算機和視頻播放器的關閉事件是相關的。 實際原因是電池電量耗盡。

為什么這在數據科學中很重要? (Why is this important in data science?)
How many times have you seen studies that imply A causes B. For example, going to the gym results in higher productivity and focus. Is this really causation?
您看過多少次暗示A導致B的研究。例如,去健身房可以提高工作效率和專注力。 這真的是因果關系嗎?
As a data scientist, you should not let the correlation force your into bias because it can lead to faulty feature engineering and incorrect conclusions.
作為數據科學家,您不應讓相關性強加偏見,因為它可能導致錯誤的特征工程和錯誤的結論。
Correlation does not imply causation.
相關并不表示因果關系。
If you were to write a machine learning model for gym and productivity relationship, instead of focusing on features that are correlated (going to gym), you should focus on actual causes of high performance (hard work, perseverance, routine, etc) to validate cause-and-effect.
如果您要為健身房和生產力之間的關系編寫機器學習模型,而不是專注于相關的功能(去健身房),則應關注造成高性能的實際原因(努力,毅力,例行等)以進行驗證因果關系。
R中的相關性 (Correlation in R)
Let’s say you have a dataset and you want to evaluate if certain features in the dataset are correlated. I am using mtcars dataset, one of the built-in datasets in R.
假設您有一個數據集,并且想要評估數據集中的某些特征是否相關。 我正在使用mtcars數據集,這是R中的內置數據集之一。
library(ggcorrplot)#read mtcars, one of the built in dataset in R
data(mtcars)#use cor function get correlation
corr <- cor(mtcars)#build correlation plot
ggcorrplot(corr, hc.order = TRUE, type = "lower", lab = TRUE)
Try it yourself. Copy & paste the above code in R.
自己嘗試。 將以上代碼復制并粘貼到R中。

When you run the code, you should get an output with a correlation plot and values. A value closer to +1 means positive correlation and negative correlation if closer to -1. In the above example, you can observe that disp and wt have a positive correlation of +0.89; whereas, mpg and cyl have a negative correlation of -0.85.
運行代碼時,應該獲得帶有相關圖和值的輸出。 接近+1的值表示正相關,如果接近-1則意味著負相關。 在上面的示例中,您可以觀察到disp和wt呈正相關,為+0.89 ; mpg和cyl呈負相關-0.85 。
因果影響方法 (Causal Impact Methods)
Causation is harder to conclude than correlation but possible. One of the most common methods of determining causal impact is through experimentation and incremental studies.
因果關系比關聯性更難斷定,但可能。 確定因果影響的最常見方法之一是通過實驗和增量研究。

Continue learning causal impact methods with this video. It covers causal impact methodologies, specifically digital experimentation (A/B testing) and randomization techniques with real-world examples.
繼續通過本視頻學習因果影響方法。 它涵蓋了因果影響方法論,尤其是數字實驗(A / B測試)和帶有實際示例的隨機化技術。
👩🏻?💻 Learn more about me at sundaskhalid.com📝 Connect with me on LinkedIn, Twitter, Instagram, YouTube
👩🏻💻了解更多關于我在sundaskhalid.com 📝與我連接上LinkedIn , Twitter的 , Instagram , YouTube的
翻譯自: https://medium.com/@sundaskhalid/correlation-vs-causation-in-data-science-66b6cfa702f0
因果關系和相關關系 大數據
本文來自互聯網用戶投稿,該文觀點僅代表作者本人,不代表本站立場。本站僅提供信息存儲空間服務,不擁有所有權,不承擔相關法律責任。 如若轉載,請注明出處:http://www.pswp.cn/news/389343.shtml 繁體地址,請注明出處:http://hk.pswp.cn/news/389343.shtml 英文地址,請注明出處:http://en.pswp.cn/news/389343.shtml
如若內容造成侵權/違法違規/事實不符,請聯系多彩編程網進行投訴反饋email:809451989@qq.com,一經查實,立即刪除!