因果關系和相關關系 大數據_數據科學中的相關性與因果關系

因果關系和相關關系 大數據

Let’s jump into it right away.

讓我們馬上進入。

相關性 (Correlation)

Correlation means relationship and association to another variable. For example, a movement in one variable associates with the movement in another variable. For example, ice-cream sales go up as the weather turns hot.

關聯是指與另一個變量的關系和關聯。 例如,一個變量的運動與另一變量的運動相關。 例如,隨著天氣變熱,冰淇淋銷售量上升。

A positive correlation means, the movement is in the same direction (left plot); negative correlation means that variables move in opposite direction (middle plot). The farther right plot is when there no correlation between the variables.

正相關表示運動方向相同(左圖); 負相關表示變量沿相反方向移動(中間圖)。 最右邊的圖是變量之間沒有相關性時。

因果關系 (Causation)

Causation means that one variable causes another to change, which means one variable is dependent on the other. It is also called cause and effect. One example would be as weather gets hot, people experience more sunburns. In this case, the weather caused an effect which is sunburn.

因果關系意味著一個變量導致另一個變量改變,這意味著一個變量依賴于另一個變量。 也稱為因果關系。 一個例子是隨著天氣變熱,人們遭受更多的曬傷。 在這種情況下,天氣會導致曬傷。

Image for post
Anthony Figueroa Anthony Figueroa攝correlation is not causation關聯不是因果關系

相關與因果差異 (Correlation vs Causation Difference)

Let’s try another example with this visualization. Your computer running out of battery causes it to shut down. It also causes video player to shut down. Now, computer and video player shutting down events are correlated; the actual cause is running out of battery.

讓我們嘗試另一個可視化示例。 您的計算機電池電量耗盡會導致其關閉。 它還會導致視頻播放器關閉。 現在,計算機和視頻播放器的關閉事件是相關的。 實際原因是電池電量耗盡。

Image for post
correlation vs causation相關性與因果關系

為什么這在數據科學中很重要? (Why is this important in data science?)

How many times have you seen studies that imply A causes B. For example, going to the gym results in higher productivity and focus. Is this really causation?

您看過多少次暗示A導致B的研究。例如,去健身房可以提高工作效率和專注力。 這真的是因果關系嗎?

As a data scientist, you should not let the correlation force your into bias because it can lead to faulty feature engineering and incorrect conclusions.

作為數據科學家,您不應讓相關性強加偏見,因為它可能導致錯誤的特征工程和錯誤的結論。

Correlation does not imply causation.

相關并不表示因果關系。

If you were to write a machine learning model for gym and productivity relationship, instead of focusing on features that are correlated (going to gym), you should focus on actual causes of high performance (hard work, perseverance, routine, etc) to validate cause-and-effect.

如果您要為健身房和生產力之間的關系編寫機器學習模型,而不是專注于相關的功能(去健身房),則應關注造成高性能的實際原因(努力,毅力,例行等)以進行驗證因果關系。

R中的相關性 (Correlation in R)

Let’s say you have a dataset and you want to evaluate if certain features in the dataset are correlated. I am using mtcars dataset, one of the built-in datasets in R.

假設您有一個數據集,并且想要評估數據集中的某些特征是否相關。 我正在使用mtcars數據集,這是R中的內置數據集之一。

library(ggcorrplot)#read mtcars, one of the built in dataset in R
data(mtcars)#use cor function get correlation
corr <- cor(mtcars)#build correlation plot
ggcorrplot(corr, hc.order = TRUE, type = "lower", lab = TRUE)

Try it yourself. Copy & paste the above code in R.

自己嘗試。 將以上代碼復制并粘貼到R中。

Image for post
output from above code snippet
以上代碼段的輸出

When you run the code, you should get an output with a correlation plot and values. A value closer to +1 means positive correlation and negative correlation if closer to -1. In the above example, you can observe that disp and wt have a positive correlation of +0.89; whereas, mpg and cyl have a negative correlation of -0.85.

運行代碼時,應該獲得帶有相關圖和值的輸出。 接近+1的值表示正相關,如果接近-1則意味著負相關。 在上面的示例中,您可以觀察到dispwt呈正相關,為+0.89mpgcyl呈負相關-0.85

因果影響方法 (Causal Impact Methods)

Causation is harder to conclude than correlation but possible. One of the most common methods of determining causal impact is through experimentation and incremental studies.

因果關系比關聯性更難斷定,但可能。 確定因果影響的最常見方法之一是通過實驗增量研究。

Image for post
Photo by Analytics Vidya What’s the difference between Causality and Correlation?
因果攝影和相關性之間有什么區別?

Continue learning causal impact methods with this video. It covers causal impact methodologies, specifically digital experimentation (A/B testing) and randomization techniques with real-world examples.

繼續通過本視頻學習因果影響方法。 它涵蓋了因果影響方法論,尤其是數字實驗(A / B測試)和帶有實際示例的隨機化技術。

Sundas YouTube ChannelSundas YouTube頻道

👩🏻?💻 Learn more about me at sundaskhalid.com📝 Connect with me on LinkedIn, Twitter, Instagram, YouTube

👩🏻💻了解更多關于我在sundaskhalid.com 📝與我連接上LinkedIn , Twitter的 , Instagram , YouTube的

翻譯自: https://medium.com/@sundaskhalid/correlation-vs-causation-in-data-science-66b6cfa702f0

因果關系和相關關系 大數據

本文來自互聯網用戶投稿,該文觀點僅代表作者本人,不代表本站立場。本站僅提供信息存儲空間服務,不擁有所有權,不承擔相關法律責任。
如若轉載,請注明出處:http://www.pswp.cn/news/389343.shtml
繁體地址,請注明出處:http://hk.pswp.cn/news/389343.shtml
英文地址,請注明出處:http://en.pswp.cn/news/389343.shtml

如若內容造成侵權/違法違規/事實不符,請聯系多彩編程網進行投訴反饋email:809451989@qq.com,一經查實,立即刪除!

相關文章

Pytorch構建模型的3種方法

這個地方一直是我思考的地方&#xff01;因為學的代碼太多了&#xff0c;構建的模型各有不同&#xff0c;這里記錄一下&#xff01; 可以使用以下3種方式構建模型&#xff1a; 1&#xff0c;繼承nn.Module基類構建自定義模型。 2&#xff0c;使用nn.Sequential按層順序構建模…

vue取數據第一個數據_我作為數據科學家的第一個月

vue取數據第一個數據A lot.很多。 I landed my first job as a Data Scientist at the beginning of August, and like any new job, there’s a lot of information to take in at once.我于8月初找到了數據科學家的第一份工作&#xff0c;并且像任何新工作一樣&#xff0c;一…

Flask-SocketIO 簡單使用指南

Flask-SocketIO 使 Flask 應用程序能夠訪問客戶端和服務器之間的低延遲雙向通信。客戶端應用程序可以使用 Javascript&#xff0c;C &#xff0c;Java 和 Swift 中的任何 SocketIO 官方客戶端庫或任何兼容的客戶端來建立與服務器的永久連接。 安裝 直接使用 pip 來安裝&#xf…

STL-開篇

基本概念 STL&#xff1a; Standard Template Library&#xff0c;標準模板庫 定義&#xff1a; c引入的一個標準類庫 特點&#xff1a;1&#xff09;數據結構和算法的 c實現&#xff08; 采用模板類和模板函數&#xff09;2&#xff09;數據的存儲和算法的分離3&#xff09;高…

Symbol Mc1000 聲音的設置以及播放

首先引用Symbol.Audio 加一命名空間using Symbol.Audio; /聲音設備的設置 //Select Device from device list Symbol.Audio.Device MyDevice (Symbol.Audio.Device)Symbol.StandardForms.SelectDevice.Select( Symbol.Audio.Controller.Title, Symbol.Audio.Devic…

/bin/bash^M: 壞的解釋器: 沒有那個文件或目錄

在win下編輯的時候&#xff0c;換行結尾是\n\r &#xff0c; 而在linux下 是\n&#xff0c;所以會多出來一個\r&#xff0c;這樣會出現錯誤 此時執行 sed -i s/\r$// file.sh 將file.sh中的\r都替換為空白&#xff0c;問題解決轉載于:https://www.cnblogs.com/zzdbullet/p/9890…

rcp rapido_為什么氣流非常適合Rapido

rcp rapidoBack in 2019, when we were building our data platform, we started building the data platform with Hadoop 2.8 and Apache Hive, managing our own HDFS. The need for managing workflows whether it’s data pipelines, i.e. ETL’s, machine learning predi…

pandas處理丟失數據與數據導入導出

3.4pandas處理丟失數據 頭文件&#xff1a; import numpy as np import pandas as pd丟棄數據部分&#xff1a; dates pd.date_range(20130101,periods6) df pd.DataFrame(np.random.randn(6,4),indexdates,columns[A,B,C,D]) df.iloc[0,1] np.nan df.iloc[1,2] np.nanp…

Mysql5.7開啟遠程

2019獨角獸企業重金招聘Python工程師標準>>> 1.注掉bind-address #bind-address 127.0.0.1 2.開啟遠程訪問權限 grant all privileges on *.* to root"xxx.xxx.xxx.xxx" identified by "密碼"; 或 grant all privileges on *.* to root"%…

分類結果可視化python_可視化分類結果的另一種方法

分類結果可視化pythonI love good data visualizations. Back in the days when I did my PhD in particle physics, I was stunned by the histograms my colleagues built and how much information was accumulated in one single plot.我喜歡出色的數據可視化。 早在我獲得…

算法組合 優化算法_算法交易簡化了風險價值和投資組合優化

算法組合 優化算法Photo by Markus Spiske (left) and Jamie Street (right) on UnsplashMarkus Spiske (左)和Jamie Street(右)在Unsplash上的照片 In the last post, we saw how actual algorithms are developed and tested. In this post, we will figure out the level of…

Symbol Mc1000 快捷鍵 的 設置 事件 開發

switch (e.KeyCode) { ///數據 case Keys.F1://清除數據 if(File.Exists("Storage Card/CG.sdf")) { Mc.gConn.Close(); Mc.gConn.Dispose(); File.Delete("Storage Card/CG.sdf"); } MessageBox.S…

pandas合并concatmerge和plot畫圖

3.6&#xff0c;3.7pandas合并concat&merge 頭文件&#xff1a; import pandas as pd import numpy as npconcat基礎合并用法 df1 pd.DataFrame(np.ones((3,4))*0,columns [a,b,c,d]) df2 pd.DataFrame(np.ones((3,4))*1,columns [a,b,c,d]) df3 pd.DataFrame(np.ones…

Android跳轉WIFI界面的四種方式

第一種 Intent intent new Intent(); intent.setAction("android.net.wifi.PICK_WIFI_NETWORK"); startActivity(intent); 第二種 startActivity(new Intent(android.provider.Settings.ACTION_WIFI_SETTINGS)); 第三種 Intent i new Intent(); if(android.os.Buil…

PS摳發絲技巧 「選擇并遮住…」

PS摳發絲技巧 「選擇并遮住…」 現在的海報設計&#xff0c;大多數都有模特MM&#xff0c;然而MM的頭發實用太多了&#xff0c;有的還飄起來…… 對于設計師(特別是淘寶美工)沒有一個強大、快速、實用的摳發絲技巧真的混不去哦。而PS CC 2017版本開始&#xff0c;就有了一個強大…

covid 19如何重塑美國科技公司的工作文化

未來 &#xff0c; 技術 &#xff0c; 觀點 (Future, Technology, Opinion) Who would have thought that a single virus would take down the whole world and make us stay inside our homes? A pandemic wave that has altered our lives in such a way that no human (bi…

Symbol Mc1000 Text文本閱讀器整體代碼

using System; using System.ComponentModel;using System.Data;using System.Drawing;using System.Text;using System.Windows.Forms;using System.Collections;using System.IO;namespace text{ /// <summary> /// Form1 的摘要說明。 /// </summary> public c…

python生日悖論分析_生日悖論

python生日悖論分析If you have a group of people in a room, how many do you need to for it to be more likely than not, that two or more will have the same birthday?如果您在一個房間里有一群人&#xff0c;那么您需要多少個才能使兩個或兩個以上的人有相同的生日&a…

統計0-n數字中出現k的次數

/*** 統計0-n數字中出現k的次數&#xff0c;其中k范圍為0-9 */ public static int countOne(int k, int n) {if (k > n) {return 0;}int sum 0;int right 0;for (int i 0; n > 0; i) {int last n % 10;sum last * i * (int) Math.pow(10, i - 1);if (k 0) {sum - (…

房價預測 search Search 中對數據預處理的學習

對于缺失的數據&#xff1a; 我們對連續數值的特征做標準化&#xff08;standardization&#xff09;&#xff1a;設該特征在整個數據集上的均值為 μ &#xff0c;標準差為 σ 。那么&#xff0c;我們可以將該特征的每個值先減去 μ 再除以 σ 得到標準化后的每個特征值。對于…