netflix_Netflix的計算因果推論

netflix

Jeffrey Wong, Colin McFarland

杰弗里·黃 科林·麥克法蘭

Every Netflix data scientist, whether their background is from biology, psychology, physics, economics, math, statistics, or biostatistics, has made meaningful contributions to the way Netflix analyzes causal effects. Scientists from these fields have made many advancements in causal effects research in the past few decades, spanning instrumental variables, forest methods, heterogeneous effects, time-dynamic effects, quantile effects, and much more. These methods can provide rich information for decision making, such as in experimentation platforms (“XP”) or in algorithmic policy engines.

每個Netflix數據科學家,無論其背景是生物學,心理學,物理學,經濟學,數學,統計學還是生物統計學,都對Netflix分析因果關系的方式做出了有意義的貢獻。 在過去的幾十年中,這些領域的科學家在因果效應研究方面取得了許多進步,涵蓋了工具變量,森林方法,非均質效應,時間動態效應,分位數效應等等。 這些方法可以為決策提供豐富的信息,例如在實驗平臺(“ XP”)或算法策略引擎中。

We want to amplify the effectiveness of our researchers by providing them software that can estimate causal effects models efficiently, and can integrate causal effects into large engineering systems. This can be challenging when algorithms for causal effects need to fit a model, condition on context and possible actions to take, score the response variable, and compute differences between counterfactuals. Computation can explode and become overwhelming when this is done with large datasets, with high dimensional features, with many possible actions to choose from, and with many responses. In order to gain broad software integration of causal effects models, a significant investment in software engineering, especially in computation, is needed. To address the challenges, Netflix has been building an interdisciplinary field across causal inference, algorithm design, and numerical computing, which we now want to share with the rest of the industry as computational causal inference (CompCI). A whitepaper detailing the field can be found here.

我們希望通過提供能夠有效估計因果關系模型并將因果關系整合到大型工程系統中的軟件來擴大研究人員的效率。 當因果效應算法需要適合模型,根據情況和采取的可能措施,對響應變量進行評分以及計算反事實之間的差異時,這可能會具有挑戰性。 當使用大型數據集,具有高維特征,有很多可能的動作可供選擇以及有很多響應時,計算可能會爆炸并變得不堪重負。 為了獲得因果模型的廣泛軟件集成,需要在軟件工程上,特別是在計算上進行大量投資。 為了應對這些挑戰,Netflix一直在跨因果推理,算法設計和數值計算領域建立跨學科領域,我們現在希望將其作為計算因果推理 (CompCI)與業界其他人士共享。 可以在此處找到詳細說明該領域的白皮書。

Computational causal inference brings a software implementation focus to causal inference, especially in regards to high performance numerical computing. We are implementing several algorithms to be highly performant, with a low memory footprint. As an example, our XP is pivoting away from two sample t-tests to models that estimate average effects, heterogeneous effects, and time-dynamic treatment effects. These effects help the business understand the user base, different segments in the user base, and whether there are trends in segments over time. We also take advantage of user covariates throughout these models in order to increase statistical power. While this rich analysis helps to inform business strategy and increase member joy, the volume of the data demands large amounts of memory, and the estimation of the causal effects on such volume of data is computationally heavy.

計算因果推理將軟件實現重點放在因果推理上,尤其是在高性能數值計算方面。 我們正在實現幾種算法,以實現高性能,低內存占用。 例如,我們的XP正在從兩個樣本t檢驗轉向使用估計平均效果,異構效果和時間動態處理效果的模型。 這些效果有助于企業了解用戶群,用戶群中的不同細分以及細分隨時間的變化趨勢。 我們還利用這些模型中的用戶協變量來提高統計能力。 盡管這種豐富的分析有助于告知業務策略并增加成員的滿意度,但數據量需要大量的內存,并且對這種數據量的因果效應的估計在計算上很繁瑣。

In the past, the computations for covariate adjusted heterogeneous effects and time-dynamic effects were slow, memory heavy, hard to debug, a large source of engineering risk, and ultimately could not scale to many large experiments. Using optimizations from CompCI, we can estimate hundreds of conditional average effects and their variances on a dataset with 10 million observations in 10 seconds, on a single machine. In the extreme, we can also analyze conditional time dynamic treatment effects for hundreds of millions of observations on a single machine in less than one hour. To achieve this, we leverage a software stack that is completely optimized for sparse linear algebra, a lossless data compression strategy that can reduce data volume, and mathematical formulas that are optimized specifically for estimating causal effects. We also optimize for memory and data alignment.

過去,協變量調整后的異構效應和時動態效應的計算速度慢,內存繁重,難以調試,工程風險很大,最終無法擴展到許多大型實驗。 使用CompCI的優化,我們可以在一臺機器上用10秒鐘內進行1000萬次觀測的數據集上估計數百個條件平均效果及其方差。 在極端情況下,我們還可以在不到一小時的時間內對一臺機器上的億萬個觀測值進行條件時間動態處理效果分析。 為了實現這一目標,我們利用了針對稀疏線性代數進行了完全優化的軟件堆棧,可以減少數據量的無損數據壓縮策略以及專門用于估計因果關系的數學公式。 我們還針對內存和數據對齊進行了優化。

This level of computing affords us a lot of luxury. First, the ability to scale complex models means we can deliver rich insights for the business. Second, being able to analyze large datasets for causal effects in seconds increases research agility. Third, analyzing data on a single machine makes debugging easy. Finally, the scalability makes computation for large engineering systems tractable, reducing engineering risk.

這種級別的計算為我們提供了很多奢侈。 首先,擴展復雜模型的能力意味著我們可以為企業提供豐富的見解。 其次,能夠在幾秒鐘內分析大型數據集的因果關系,從而提高了研究敏捷性。 第三,在一臺機器上分析數據使調試變得容易。 最后,可伸縮性使大型工程系統的計算變得容易處理,從而降低了工程風險。

Computational causal inference is a new, interdisciplinary field we are announcing because we want to build it collectively with the broader community of experimenters, researchers, and software engineers. The integration of causal inference into engineering systems can lead to large amounts of new innovation. Being an interdisciplinary field, it truly requires the community of local, domain experts to unite. We have released a whitepaper to begin the discussion. There, we describe the rising demand for scalable causal inference in research and in software engineering systems. Then, we describe the state of common causal effects models. Afterwards, we describe what we believe can be a good software framework for estimating and optimizing for causal effects.

計算因果推理是我們宣布的一個新的跨學科領域,因為我們希望與更廣泛的實驗人員,研究人員和軟件工程師共同構建該因果推理。 將因果推理集成到工程系統中可以導致大量新的創新。 作為一個跨學科領域,它確實需要本地領域專家的社區團結。 我們發布了一份白皮書來開始討論。 在這里,我們描述了在研究和軟件工程系統中對可伸縮因果推理的不斷增長的需求。 然后,我們描述了常見因果模型的狀態。 然后,我們描述我們認為可以成為評估和優化因果關系的良好軟件框架。

Finally, we close the CompCI whitepaper with a series of open challenges that we believe require an interdisciplinary collaboration, and can unite the community around. For example:

最后,我們以一系列公開挑戰結束了CompCI白皮書,我們認為這需要跨學科合作,并且可以團結社區。 例如:

  1. Time dynamic treatment effects are notoriously hard to scale. They require a panel of repeated observations, which generate large datasets. They also contain autocorrelation, creating complications for estimating the variance of the causal effect. How can we make the computation for the time-dynamic treatment effect, and its distribution, more scalable?

    眾所周知,時間動態治療效果很難擴展。 他們需要一組重復的觀察結果,從而生成大型數據集。 它們還包含自相關,從而產生了復雜的估計因果效應的方差。 我們如何使時間動態治療效果及其分布的計算更具可擴展性?
  2. In machine learning, specifying a loss function and optimizing it using numerical methods allows a developer to interact with a single, umbrella framework that can span several models. Can such an umbrella framework exist to specify different causal effects models in a unified way? For example, could it be done through the generalized method of moments? Can it be computationally tractable?

    在機器學習中,指定損失函數并使用數值方法對其進行優化,使開發人員可以與可以跨多個模型的單個傘形框架進行交互。 是否可以使用這樣的傘形框架以統一的方式指定不同的因果模型? 例如,可以通過廣義矩方法來完成嗎? 它在計算上可以處理嗎?
  3. How should we develop software that understands if a causal parameter is identified? A solution to this helps to create software that is safe to use, and can provide safe, programmatic access to the analysis of causal effects. We believe there are many edge cases in identification that require an interdisciplinary group to solve.

    我們應該如何開發能夠識別因果參數的軟件? 解決此問題的方法有助于創建安全使用的軟件,并可以安全,編程地訪問因果關系分析。 我們認為,鑒定中存在許多需要跨學科小組解決的邊緣案例。

We hope this begins the discussion, and over the coming months we will be sharing more on the research we have done to make estimation of causal effects performant. There are still many more challenges in the field that are not listed here. We want to form a community spanning experimenters, researchers, and software engineers to learn about problems and solutions together. If you are interested in being part of this community, please reach us at compci-public@netflix.com.

我們希望這能開始討論,在接下來的幾個月中,我們將分享更多有關所做的研究以評估績效因果關系。 該領域中還有許多其他挑戰未在此處列出。 我們希望形成一個由實驗人員,研究人員和軟件工程師組成的社區,以共同了解問題和解決方案。 如果您有興趣加入這個社區,請通過compci-public@netflix.com與我們聯系。

翻譯自: https://netflixtechblog.com/computational-causal-inference-at-netflix-293591691c62

netflix

本文來自互聯網用戶投稿,該文觀點僅代表作者本人,不代表本站立場。本站僅提供信息存儲空間服務,不擁有所有權,不承擔相關法律責任。
如若轉載,請注明出處:http://www.pswp.cn/news/389133.shtml
繁體地址,請注明出處:http://hk.pswp.cn/news/389133.shtml
英文地址,請注明出處:http://en.pswp.cn/news/389133.shtml

如若內容造成侵權/違法違規/事實不符,請聯系多彩編程網進行投訴反饋email:809451989@qq.com,一經查實,立即刪除!

相關文章

算法題庫網站

Google Code Jam(GCJ)Peking University Online Judge(POJ)CodeForces(CF)LeetCode(LC)Aizu Online Judge(AOJ)

org.dom4j.DocumentException: null Nested exception: null解決方法

由于最近在學習使用Spring架構,經常會遇到與xml文檔打交道,今天遇到了此問題,特來分享一下解決方案。 出錯原因: 很明顯是因為找不到文件路徑。這個原因是因為我使用了*.clas.getResourceAsStream(xmlFilePath&#xf…

MySQL命令學習

上面兩篇博客講了MySQL的安裝、登錄,密碼重置,為接下來的MySQL命令學習做好了準備,現在開啟MySQL命令學習之旅吧。 首先打開CMD,輸入命令:mysql -u root -p 登錄MySQL。 注意:MySQL命令終止符為分號 (;) …

實驗心得_大腸桿菌原核表達實驗心得(上篇)

大腸桿菌原核表達實驗心得(上篇)對于大腸桿菌蛋白表達,大部分小伙伴都覺得 so easy! 做大腸桿菌蛋白表達十幾年經歷的老司機還經常陰溝翻船,被大腸桿菌表達蛋白虐千百遍的慘痛經歷,很多小伙伴都有切膚之痛。福因德接下…

scrapy從安裝到爬取煎蛋網圖片

下載地址:https://www.lfd.uci.edu/~gohlke/pythonlibs/pip install wheelpip install lxmlpip install pyopensslpip install Twistedpip install pywin32pip install scrapy scrapy startproject jandan 創建項目cd jandancd jandan items.py 存放數據pipelines.p…

高斯金字塔 拉普拉斯金字塔_金字塔學入門指南

高斯金字塔 拉普拉斯金字塔The topic for today is on data validation and settings management using Python type hinting. We are going to use a Python package called pydantic which enforces type hints at runtime. It provides user-friendly errors, allowing you …

基本排序算法

插入排序 基本思想&#xff1a;把待排序列表分為已排和未排序兩部分&#xff0c;從未排序左邊取值&#xff0c;按順序從已排序的右端開始對比插入到相應的位置。 java代碼實現 private void insertSort(int[] arr){int i, j;int temp;for(i 0; i < arr.length; i){temp …

自定義版本更新彈窗

目錄介紹 1.Animation和Animator區別 2.Animation運行原理和源碼分析 2.1 基本屬性介紹2.2 如何計算動畫數據2.3 什么是動畫更新函數2.4 動畫數據如何存儲2.5 Animation的調用 3.Animator運行原理和源碼分析 3.1 屬性動畫的基本屬性3.2 屬性動畫新的概念3.3 PropertyValuesHold…

《SQL Server 2008從入門到精通》--20180716

1.鎖 當多個用戶同時對同一個數據進行修改時會產生并發問題&#xff0c;使用事務就可以解決這個問題。但是為了防止其他用戶修改另一個還沒完成的事務中的數據&#xff0c;就需要在事務中用到鎖。 SQL Server 2008提供了多種鎖模式&#xff1a;排他鎖&#xff0c;共享鎖&#x…

googleearthpro打開沒有地球_嫦娥五號成功著陸地球!為何嫦娥五號返回時會燃燒,升空卻不會?...

目前&#xff0c;嫦娥五號已經帶著月壤成功降落到地球上&#xff0c;創造了中國航天的又一里程碑。嫦娥五號這一路走來&#xff0c;困難重重&#xff0c;但都被我國航天科技人員逐一克服&#xff0c;最終圓滿地完成了嫦娥五號的月球采樣返回地球任務。嫦娥五號最后這一步走得可…

語言認知偏差_我們的認知偏差正在破壞患者的結果數據

語言認知偏差How do we know if we are providing high-quality care? The answer to this question is sought by a multitude of parties: patients, clinicians, educators, legislators, and insurance companies. Unfortunately, it’s not easy to determine. There is …

android 打包相關問題記錄

Android 中的打包配置在build.gradle文件中&#xff0c;下面對該文件的內容做一下記錄。 buildscript {repositories {jcenter()}dependencies {classpath com.android.tools.build:gradle:2.2.0} } 這里生命了倉庫的位置&#xff0c;依賴gradle的版本。 android{} android {…

本文將引導你使用XNA Game Studio Express一步一步地創建一個簡單的游戲

本文將引導你使用XNA Game Studio Express一步一步地創建一個簡單的游戲 第1步: 安裝軟件 第2步: 創建新項目 第3步: 查看代碼 第4步: 加入一個精靈 第5步: 使精靈可以移動和彈跳 第6步: 繼續嘗試! 完整的實例 第1步: 安裝軟件在動手之前,先確定你已經安裝了所需的軟件,其中包…

C#中實現對象的深拷貝

深度拷貝指的是將一個引用類型&#xff08;包含該類型里的引用類型&#xff09;拷貝一份(在內存中完完全全是兩個對象&#xff0c;沒有任何引用關系)..........  直接上代碼&#xff1a; 1 /// <summary>2 /// 對象的深度拷貝&#xff08;序列化的方式&#xf…

Okhttp 源碼解析

HTTP及okhttp的優勢 http結構 請求頭 列表內容表明本次請求的客戶端本次請求的cookie本次請求希望返回的數據類型本次請求是否采用數據壓縮等等一系列設置 請求體 指定本次請求所使用的方法請求所使用的方法 響應頭 - 服務器標識 - 狀態碼 - 內容編碼 - cookie 返回給客…

python中定義數據結構_Python中的數據結構。

python中定義數據結構I remembered the day when I made up my mind to learn python then the very first things I learned about data types and data structures. So in this article, I would like to discuss different data structures in python.我記得當初下定決心學習…

python實訓英文_GitHub - MiracleYoung/You-are-Pythonista: 匯聚【Python應用】【Python實訓】【Python技術分享】等等...

You-are-Pythonista匯聚【從零單排】【實戰項目】【數據科學】【自然語言處理】【計算機視覺】【面試題系列】【大航海】【Python應用】【錯題集】【技術沙龍】【內推渠道】等等【人人都是Pythonista】由公眾號【Python專欄】推出&#xff0c;請認準唯一標識&#xff1a;請仔細…

java電子商務系統源碼 Spring MVC+mybatis+spring cloud+spring boot+spring security

鴻鵠云商大型企業分布式互聯網電子商務平臺&#xff0c;推出PC微信APP云服務的云商平臺系統&#xff0c;其中包括B2B、B2C、C2C、O2O、新零售、直播電商等子平臺。 分布式、微服務、云架構電子商務平臺 java b2b2c o2o 技術解決方案 開發語言&#xff1a; java、j2ee 數據庫&am…

Go語言實現FastDFS分布式存儲系統WebAPI網關

前言 工作需要&#xff0c;第一次使用 Go 來實戰項目。 需求&#xff1a;采用 golang 實現一個 webapi 的中轉網關&#xff0c;將一些資源文件通過 http 協議上傳至 FastDFS 分布式文件存儲系統。 一、FastDFS 與 golang 對接的代碼 github&#xff1a;https://github.com/weil…

builder 模式

首先提出幾個問題&#xff1a; 什么是Builder模式&#xff1f;為什么要使用Builder模式&#xff1f;它的優點是什么&#xff0c;那缺點呢&#xff1f;什么情況下使用Builder模式&#xff1f; 關于Builder模式在代碼中用的很多&#xff0c;比如AlertDialog, OkHttpClient等。一…