netflix
Jeffrey Wong, Colin McFarland
杰弗里·黃 , 科林·麥克法蘭
Every Netflix data scientist, whether their background is from biology, psychology, physics, economics, math, statistics, or biostatistics, has made meaningful contributions to the way Netflix analyzes causal effects. Scientists from these fields have made many advancements in causal effects research in the past few decades, spanning instrumental variables, forest methods, heterogeneous effects, time-dynamic effects, quantile effects, and much more. These methods can provide rich information for decision making, such as in experimentation platforms (“XP”) or in algorithmic policy engines.
每個Netflix數據科學家,無論其背景是生物學,心理學,物理學,經濟學,數學,統計學還是生物統計學,都對Netflix分析因果關系的方式做出了有意義的貢獻。 在過去的幾十年中,這些領域的科學家在因果效應研究方面取得了許多進步,涵蓋了工具變量,森林方法,非均質效應,時間動態效應,分位數效應等等。 這些方法可以為決策提供豐富的信息,例如在實驗平臺(“ XP”)或算法策略引擎中。
We want to amplify the effectiveness of our researchers by providing them software that can estimate causal effects models efficiently, and can integrate causal effects into large engineering systems. This can be challenging when algorithms for causal effects need to fit a model, condition on context and possible actions to take, score the response variable, and compute differences between counterfactuals. Computation can explode and become overwhelming when this is done with large datasets, with high dimensional features, with many possible actions to choose from, and with many responses. In order to gain broad software integration of causal effects models, a significant investment in software engineering, especially in computation, is needed. To address the challenges, Netflix has been building an interdisciplinary field across causal inference, algorithm design, and numerical computing, which we now want to share with the rest of the industry as computational causal inference (CompCI). A whitepaper detailing the field can be found here.
我們希望通過提供能夠有效估計因果關系模型并將因果關系整合到大型工程系統中的軟件來擴大研究人員的效率。 當因果效應算法需要適合模型,根據情況和采取的可能措施,對響應變量進行評分以及計算反事實之間的差異時,這可能會具有挑戰性。 當使用大型數據集,具有高維特征,有很多可能的動作可供選擇以及有很多響應時,計算可能會爆炸并變得不堪重負。 為了獲得因果模型的廣泛軟件集成,需要在軟件工程上,特別是在計算上進行大量投資。 為了應對這些挑戰,Netflix一直在跨因果推理,算法設計和數值計算領域建立跨學科領域,我們現在希望將其作為計算因果推理 (CompCI)與業界其他人士共享。 可以在此處找到詳細說明該領域的白皮書。
Computational causal inference brings a software implementation focus to causal inference, especially in regards to high performance numerical computing. We are implementing several algorithms to be highly performant, with a low memory footprint. As an example, our XP is pivoting away from two sample t-tests to models that estimate average effects, heterogeneous effects, and time-dynamic treatment effects. These effects help the business understand the user base, different segments in the user base, and whether there are trends in segments over time. We also take advantage of user covariates throughout these models in order to increase statistical power. While this rich analysis helps to inform business strategy and increase member joy, the volume of the data demands large amounts of memory, and the estimation of the causal effects on such volume of data is computationally heavy.
計算因果推理將軟件實現重點放在因果推理上,尤其是在高性能數值計算方面。 我們正在實現幾種算法,以實現高性能,低內存占用。 例如,我們的XP正在從兩個樣本t檢驗轉向使用估計平均效果,異構效果和時間動態處理效果的模型。 這些效果有助于企業了解用戶群,用戶群中的不同細分以及細分隨時間的變化趨勢。 我們還利用這些模型中的用戶協變量來提高統計能力。 盡管這種豐富的分析有助于告知業務策略并增加成員的滿意度,但數據量需要大量的內存,并且對這種數據量的因果效應的估計在計算上很繁瑣。
In the past, the computations for covariate adjusted heterogeneous effects and time-dynamic effects were slow, memory heavy, hard to debug, a large source of engineering risk, and ultimately could not scale to many large experiments. Using optimizations from CompCI, we can estimate hundreds of conditional average effects and their variances on a dataset with 10 million observations in 10 seconds, on a single machine. In the extreme, we can also analyze conditional time dynamic treatment effects for hundreds of millions of observations on a single machine in less than one hour. To achieve this, we leverage a software stack that is completely optimized for sparse linear algebra, a lossless data compression strategy that can reduce data volume, and mathematical formulas that are optimized specifically for estimating causal effects. We also optimize for memory and data alignment.
過去,協變量調整后的異構效應和時動態效應的計算速度慢,內存繁重,難以調試,工程風險很大,最終無法擴展到許多大型實驗。 使用CompCI的優化,我們可以在一臺機器上用10秒鐘內進行1000萬次觀測的數據集上估計數百個條件平均效果及其方差。 在極端情況下,我們還可以在不到一小時的時間內對一臺機器上的億萬個觀測值進行條件時間動態處理效果分析。 為了實現這一目標,我們利用了針對稀疏線性代數進行了完全優化的軟件堆棧,可以減少數據量的無損數據壓縮策略以及專門用于估計因果關系的數學公式。 我們還針對內存和數據對齊進行了優化。
This level of computing affords us a lot of luxury. First, the ability to scale complex models means we can deliver rich insights for the business. Second, being able to analyze large datasets for causal effects in seconds increases research agility. Third, analyzing data on a single machine makes debugging easy. Finally, the scalability makes computation for large engineering systems tractable, reducing engineering risk.
這種級別的計算為我們提供了很多奢侈。 首先,擴展復雜模型的能力意味著我們可以為企業提供豐富的見解。 其次,能夠在幾秒鐘內分析大型數據集的因果關系,從而提高了研究敏捷性。 第三,在一臺機器上分析數據使調試變得容易。 最后,可伸縮性使大型工程系統的計算變得容易處理,從而降低了工程風險。
Computational causal inference is a new, interdisciplinary field we are announcing because we want to build it collectively with the broader community of experimenters, researchers, and software engineers. The integration of causal inference into engineering systems can lead to large amounts of new innovation. Being an interdisciplinary field, it truly requires the community of local, domain experts to unite. We have released a whitepaper to begin the discussion. There, we describe the rising demand for scalable causal inference in research and in software engineering systems. Then, we describe the state of common causal effects models. Afterwards, we describe what we believe can be a good software framework for estimating and optimizing for causal effects.
計算因果推理是我們宣布的一個新的跨學科領域,因為我們希望與更廣泛的實驗人員,研究人員和軟件工程師共同構建該因果推理。 將因果推理集成到工程系統中可以導致大量新的創新。 作為一個跨學科領域,它確實需要本地領域專家的社區團結。 我們發布了一份白皮書來開始討論。 在這里,我們描述了在研究和軟件工程系統中對可伸縮因果推理的不斷增長的需求。 然后,我們描述了常見因果模型的狀態。 然后,我們描述我們認為可以成為評估和優化因果關系的良好軟件框架。
Finally, we close the CompCI whitepaper with a series of open challenges that we believe require an interdisciplinary collaboration, and can unite the community around. For example:
最后,我們以一系列公開挑戰結束了CompCI白皮書,我們認為這需要跨學科合作,并且可以團結社區。 例如:
- Time dynamic treatment effects are notoriously hard to scale. They require a panel of repeated observations, which generate large datasets. They also contain autocorrelation, creating complications for estimating the variance of the causal effect. How can we make the computation for the time-dynamic treatment effect, and its distribution, more scalable? 眾所周知,時間動態治療效果很難擴展。 他們需要一組重復的觀察結果,從而生成大型數據集。 它們還包含自相關,從而產生了復雜的估計因果效應的方差。 我們如何使時間動態治療效果及其分布的計算更具可擴展性?
- In machine learning, specifying a loss function and optimizing it using numerical methods allows a developer to interact with a single, umbrella framework that can span several models. Can such an umbrella framework exist to specify different causal effects models in a unified way? For example, could it be done through the generalized method of moments? Can it be computationally tractable? 在機器學習中,指定損失函數并使用數值方法對其進行優化,使開發人員可以與可以跨多個模型的單個傘形框架進行交互。 是否可以使用這樣的傘形框架以統一的方式指定不同的因果模型? 例如,可以通過廣義矩方法來完成嗎? 它在計算上可以處理嗎?
- How should we develop software that understands if a causal parameter is identified? A solution to this helps to create software that is safe to use, and can provide safe, programmatic access to the analysis of causal effects. We believe there are many edge cases in identification that require an interdisciplinary group to solve. 我們應該如何開發能夠識別因果參數的軟件? 解決此問題的方法有助于創建安全使用的軟件,并可以安全,編程地訪問因果關系分析。 我們認為,鑒定中存在許多需要跨學科小組解決的邊緣案例。
We hope this begins the discussion, and over the coming months we will be sharing more on the research we have done to make estimation of causal effects performant. There are still many more challenges in the field that are not listed here. We want to form a community spanning experimenters, researchers, and software engineers to learn about problems and solutions together. If you are interested in being part of this community, please reach us at compci-public@netflix.com.
我們希望這能開始討論,在接下來的幾個月中,我們將分享更多有關所做的研究以評估績效因果關系。 該領域中還有許多其他挑戰未在此處列出。 我們希望形成一個由實驗人員,研究人員和軟件工程師組成的社區,以共同了解問題和解決方案。 如果您有興趣加入這個社區,請通過compci-public@netflix.com與我們聯系。
翻譯自: https://netflixtechblog.com/computational-causal-inference-at-netflix-293591691c62
netflix
本文來自互聯網用戶投稿,該文觀點僅代表作者本人,不代表本站立場。本站僅提供信息存儲空間服務,不擁有所有權,不承擔相關法律責任。 如若轉載,請注明出處:http://www.pswp.cn/news/389133.shtml 繁體地址,請注明出處:http://hk.pswp.cn/news/389133.shtml 英文地址,請注明出處:http://en.pswp.cn/news/389133.shtml
如若內容造成侵權/違法違規/事實不符,請聯系多彩編程網進行投訴反饋email:809451989@qq.com,一經查實,立即刪除!