獨立同步分布的觀測數據 { W i = ( Y i , D i , X i ) ∣ i ∈ { 1 , . . . , n } } \{W_i=(Y_i,D_i,X_i)| i\in \{1,...,n\}\} {Wi?=(Yi?,Di?,Xi?)∣i∈{1,...,n}},其中 Y i Y_i Yi?表示結果變量, D i D_i Di?表示因變量, X i X_i Xi?表示控制變量。
目標參數 θ 0 \theta_0 θ0?的一般定義形式為:
E [ m ( W ; θ 0 , η 0 ) ] = 0 E[m(W;\theta_0,\eta_0)] = 0 E[m(W;θ0?,η0?)]=0
W W W為觀測到的變量, θ 0 ∈ Θ \theta_0\in \Theta θ0?∈Θ為目標參數, η 0 ∈ T \eta_0\in \mathcal{T} η0?∈T為輔助參數
例如,ATE 的定義為:
θ 0 A T E ≡ E [ E [ Y i ∣ D i = 1 , X i ] ? E [ Y i ∣ D i = 0 , X i ] ] \theta_0^{ATE}\equiv E[E[Y_i|D_i=1,X_i] - E[Y_i|D_i=0,X_i]] θ0ATE?≡E[E[Yi?∣Di?=1,Xi?]?E[Yi?∣Di?=0,Xi?]]
ATE的IPW估計定義為:
m I P W ( W i ; θ , α ) ≡ α ( D i , X i ) Y i ? θ ≡ [ D i E [ D i ∣ X i ] ? 1 ? D i 1 ? E [ D i ∣ X i ] ] Y i ? θ m_{IPW}(W_i;\theta,\alpha)\equiv \alpha(D_i,X_i)Y_i - \theta \equiv [\frac{D_i}{E[D_i|X_i]} - \frac{1-D_i}{1-E[D_i|X_i]}]Y_i - \theta mIPW?(Wi?;θ,α)≡α(Di?,Xi?)Yi??θ≡[E[Di?∣Xi?]Di???1?E[Di?∣Xi?]1?Di??]Yi??θ
ATE的Doubly Robust估計的定義為:
m D R ( W i ; θ , η ) ≡ α ( D i , X i ) ( Y i ? E [ Y i ∣ D i , X i ] ) Y i + E [ Y i ∣ D i = 1 , X i ] ? E [ Y i ∣ D i = 0 , X i ] ? θ m_{DR}(W_i;\theta,\eta)\equiv \alpha(D_i,X_i)(Y_i - E[Y_i|D_i,X_i])Y_i + E[Y_i|D_i=1,X_i]- E[Y_i|D_i=0,X_i]-\theta mDR?(Wi?;θ,η)≡α(Di?,Xi?)(Yi??E[Yi?∣Di?,Xi?])Yi?+E[Yi?∣Di?=1,Xi?]?E[Yi?∣Di?=0,Xi?]?θ
≡ [ D i E [ D i ∣ X i ] ? 1 ? D i 1 ? E [ D i ∣ X i ] ] Y i + E [ Y i ∣ D i = 1 , X i ] ? E [ Y i ∣ D i = 0 , X i ] ? θ \equiv [\frac{D_i}{E[D_i|X_i]} - \frac{1-D_i}{1-E[D_i|X_i]}] Y_i + E[Y_i|D_i=1,X_i]- E[Y_i|D_i=0,X_i]-\theta ≡[E[Di?∣Xi?]Di???1?E[Di?∣Xi?]1?Di??]Yi?+E[Yi?∣Di?=1,Xi?]?E[Yi?∣Di?=0,Xi?]?θ
一般情況下,目標參數 θ 0 \theta_0 θ0?的估計值定義為:
θ ^ : 1 n ∑ i = 1 n m ( W i ; θ ^ , η ^ ) = 0 \hat{\theta}:\frac{1}{n}\sum_{i=1}^nm(W_i;\hat{\theta},\hat{\eta}) = 0 θ^:n1?i=1∑n?m(Wi?;θ^,η^?)=0
一階泰勒展得出:
1 n ∑ i = 1 n m ( W i ; θ ^ , η ^ ) ≈ 1 n ∑ i = 1 n m ( W i ; θ 0 , η 0 ) + 1 n ∑ i = 1 n ? ? θ m ( W i ; θ 0 , η 0 ) ( θ ^ ? θ 0 ) + 1 n ∑ i = 1 n ? ? η m ( W i ; θ 0 , η 0 ) ( η ^ ? η 0 ) ≈ 0 \frac{1}{n}\sum_{i=1}^nm(W_i;\hat{\theta},\hat{\eta}) \approx \frac{1}{n}\sum_{i=1}^nm(W_i;\theta_0,\eta_0) + \frac{1}{n}\sum_{i=1}^n\frac{\partial}{\partial\theta}m(W_i;\theta_0,\eta_0)(\hat{\theta} - \theta_0) + \frac{1}{n}\sum_{i=1}^n\frac{\partial}{\partial\eta}m(W_i;\theta_0,\eta_0)(\hat{\eta} - \eta_0) \approx 0 n1?i=1∑n?m(Wi?;θ^,η^?)≈n1?i=1∑n?m(Wi?;θ0?,η0?)+n1?i=1∑n??θ??m(Wi?;θ0?,η0?)(θ^?θ0?)+n1?i=1∑n??η??m(Wi?;θ0?,η0?)(η^??η0?)≈0
( θ 0 ? θ ^ ) ≈ [ 1 n ∑ i = 1 n ? ? θ m ( W i ; θ 0 , η 0 ) ] ? 1 1 n ∑ i = 1 n m ( W i ; θ 0 , η 0 ) + [ 1 n ∑ i = 1 n ? ? θ m ( W i ; θ 0 , η 0 ) ] ? 1 ( η ^ ? η 0 ) 1 n ∑ i = 1 n ? ? η m ( W i ; θ 0 , η 0 ) (\theta_0 - \hat{\theta})\approx [\frac{1}{n}\sum_{i=1}^n\frac{\partial}{\partial\theta}m(W_i;\theta_0,\eta_0)]^{-1}\frac{1}{n}\sum_{i=1}^nm(W_i;\theta_0,\eta_0) + [\frac{1}{n}\sum_{i=1}^n\frac{\partial}{\partial\theta}m(W_i;\theta_0,\eta_0)]^{-1}(\hat{\eta} - \eta_0)\frac{1}{n}\sum_{i=1}^n\frac{\partial}{\partial\eta}m(W_i;\theta_0,\eta_0) (θ0??θ^)≈[n1?i=1∑n??θ??m(Wi?;θ0?,η0?)]?1n1?i=1∑n?m(Wi?;θ0?,η0?)+[n1?i=1∑n??θ??m(Wi?;θ0?,η0?)]?1(η^??η0?)n1?i=1∑n??η??m(Wi?;θ0?,η0?)
目標參數的估計偏差 ( θ 0 ? θ ^ ) (\theta_0 - \hat{\theta}) (θ0??θ^)將受到輔助參數估計偏差 ( η ^ ? η 0 ) (\hat{\eta} - \eta_0) (η^??η0?)的影響,說明目標參數的估計偏差的兩種來源分別是:
- 輔助參數的估計偏差 ( η ^ ? η 0 ) (\hat{\eta} - \eta_0) (η^??η0?)本身,稱之為正則化偏差
- 輔助參數的估計偏差 ( η ^ ? η 0 ) (\hat{\eta} - \eta_0) (η^??η0?)與 W i W_i Wi?的強相關性,稱之為過擬合偏差
Neyman Orthogonality
? ? λ { E [ ψ ( W i ; θ 0 , η 0 + λ ( η ? η 0 ) ) ] } ∣ λ = 0 = 0 , ? η ∈ T \frac{\partial}{\partial\lambda}\{E[\psi(W_i;\theta_0,\eta_0 + \lambda(\eta-\eta_0))]\}|_{\lambda=0}= 0,\forall\eta\in \mathcal{T} ?λ??{E[ψ(Wi?;θ0?,η0?+λ(η?η0?))]}∣λ=0?=0,?η∈T
m I P W m_{IPW} mIPW? is not Neyman orthogonal, m D R m_{DR} mDR? is Neyman orthogonal.
Cross Fitting
θ ^ : 1 n ∑ k = 1 K ∑ i ∈ I k m ( W i ; θ ^ , η ^ ? k ) = 0 \hat{\theta}:\frac{1}{n}\sum_{k=1}^K\sum_{i\in I_k}m(W_i;\hat{\theta},\hat{\eta}_{-k}) = 0 θ^:n1?k=1∑K?i∈Ik?∑?m(Wi?;θ^,η^??k?)=0
DML
θ ^ : 1 n ∑ k = 1 K ∑ i ∈ I k ψ ( W i ; θ ^ , η ^ ? k ) = 0 \hat{\theta}:\frac{1}{n}\sum_{k=1}^K\sum_{i\in I_k}\psi(W_i;\hat{\theta},\hat{\eta}_{-k}) = 0 θ^:n1?k=1∑K?i∈Ik?∑?ψ(Wi?;θ^,η^??k?)=0
直接回歸不滿足 Neyman 正交性
Y = θ T + g ( X ) + ? Y = \theta T + g(X) + \epsilon Y=θT+g(X)+?
m ( W ; θ , g ) = Y ? θ T ? g ( X ) + ? m(W;\theta,g) = Y - \theta T - g(X) + \epsilon m(W;θ,g)=Y?θT?g(X)+?
? ? λ E [ m ( w ; θ , g + λ Δ g ) ] ∣ λ = 0 = E [ ? Δ g ( x ) ] ≠ 0 \frac{\partial }{\partial \lambda}E[m(w;\theta,g + \lambda\Delta g)]|_{\lambda=0} = E[-\Delta g(x)] \ne 0 ?λ??E[m(w;θ,g+λΔg)]∣λ=0?=E[?Δg(x)]=0
DML 滿足Neyman正交性
Y ? l ( x ) = θ ( T ? m ( x ) ) + ? ′ , l ( x ) = E [ Y ∣ X = x ] , m ( x ) = E [ T ∣ X = x ] Y-l(x) = \theta (T - m(x)) + \epsilon',l(x) = E[Y|X=x],m(x)=E[T|X=x] Y?l(x)=θ(T?m(x))+?′,l(x)=E[Y∣X=x],m(x)=E[T∣X=x]
m ( W ; θ , η ) = Y ? l ( x ) ? θ ( T ? m ( x ) ) ? ? ′ , η = ( l , m ) m(W;\theta,\eta) = Y-l(x) - \theta (T - m(x)) - \epsilon',\eta = (l, m) m(W;θ,η)=Y?l(x)?θ(T?m(x))??′,η=(l,m)
? ? λ E [ W ; θ , η + λ Δ η ] ∣ λ = 0 = E [ ? Δ l ( x ) + θ Δ m ( x ) ] = 0 \frac{\partial}{\partial\lambda}E[W;\theta,\eta + \lambda\Delta\eta]|_{\lambda=0} = E[-\Delta l(x) + \theta\Delta m(x)] = 0 ?λ??E[W;θ,η+λΔη]∣λ=0?=E[?Δl(x)+θΔm(x)]=0
Example
模擬數據
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import math
import dowhy.datasets, dowhy.plotter
rvar = 1 if np.random.uniform() > 0.2 else 0
is_linear = False # A non-linear dataset. Change to True to see results for a linear dataset.
data_dict = dowhy.datasets.xy_dataset(10000, effect=rvar,num_common_causes=2,is_linear=is_linear,sd_error=0.2)
df = data_dict['df']
print(df.head())
dowhy.plotter.plot_treatment_outcome(df[data_dict["treatment_name"]], df[data_dict["outcome_name"]],df[data_dict["time_val"]])
因果關系假設:
- 基于領域知識提出因果關系的假設,定義模型結構
from dowhy import CausalModel
model= CausalModel(data=df,treatment=data_dict["treatment_name"],outcome=data_dict["outcome_name"],common_causes=data_dict["common_causes_names"],instruments=data_dict["instrument_names"])
model.view_model(layout="dot")
因果關系識別:
identified_estimand = model.identify_effect(proceed_when_unidentifiable=True)
print(identified_estimand)
因果關系估計:
from sklearn.preprocessing import PolynomialFeatures
from sklearn.linear_model import LassoCV
from sklearn.ensemble import GradientBoostingRegressor
dml_estimate = model.estimate_effect(identified_estimand, method_name="backdoor.econml.dml.DML",control_value = 0,treatment_value = 1,confidence_intervals=False,method_params={"init_params":{'model_y':GradientBoostingRegressor(),'model_t': GradientBoostingRegressor(),"model_final":LassoCV(fit_intercept=False),'featurizer':PolynomialFeatures(degree=2, include_bias=True)},"fit_params":{}})
print(dml_estimate)
因果關系反駁測試:
res_placebo=model.refute_estimate(identified_estimand, dml_estimate,method_name="placebo_treatment_refuter", placebo_type="permute",num_simulations=20)
print(res_placebo)