Provide nuisance estimates to pseudo-outcome methods #82

kklein · 2024-08-14T20:06:48Z

Status quo

As of now we have the following interface for the pseudo-outcome methods in the R-Learner and R-Learner:

DR-Learner

Lines 381 to 390 in d863df1

    
           def _pseudo_outcome( 
        
               self, 
        
               X: Matrix, 
        
               y: Vector, 
        
               w: Vector, 
        
               treatment_variant: int, 
        
               is_oos: bool, 
        
               oos_method: OosMethod = OVERALL, 
        
               epsilon: float = _EPSILON, 
        
           ) -> np.ndarray:

R-Learner

metalearners/metalearners/rlearner.py

Lines 469 to 479 in d863df1

    
           def _pseudo_outcome_and_weights( 
        
               self, 
        
               X: Matrix, 
        
               y: Vector, 
        
               w: Vector, 
        
               treatment_variant: int, 
        
               is_oos: bool, 
        
               oos_method: OosMethod = OVERALL, 
        
               mask: Vector | None = None, 
        
               epsilon: float = _EPSILON, 
        
           ) -> tuple[np.ndarray, np.ndarray]:

Since both pseudo outcome kinds require nuisance model estimates and since these are visibly not provided as input arguments, they are estimated as part of the respective pseudo outcome method.

Importantly, the pseudo outcome methods are treatment-variant specific. Yet, the nuisance estimates estimated as part of the pseudo outcome methods are not treatment variant specific:

In the case of the R-Learner, the overall outcome model $\hat{\mu}$ is applied on all data; the overall propensity model $\hat{e}$ is applied on all data. Only after the estimation is the data filtered wrt to the treatment variant at hand:

metalearners/metalearners/rlearner.py

Lines 495 to 508 in d863df1

    
           y_estimates = self.predict_nuisance( 
        
               X=X, 
        
               is_oos=is_oos, 
        
               model_kind=OUTCOME_MODEL, 
        
               model_ord=0, 
        
               oos_method=oos_method, 
        
           )[mask] 
        
           w_estimates = self.predict_nuisance( 
        
               X=X, 
        
               is_oos=is_oos, 
        
               model_kind=PROPENSITY_MODEL, 
        
               model_ord=0, 
        
               oos_method=oos_method, 
        
           )[mask]

In the case of the DR-Learner, the propensity $\hat{e}$ and all conditional average outcomes $\hat{mu}_k$ are estimated for all data points; filtering of variant-specific information only happens thereafter:

metalearners/metalearners/drlearner.py

Lines 394 to 411 in d863df1

    
           conditional_average_outcome_estimates = ( 
        
               self.predict_conditional_average_outcomes( 
        
                   X=X, 
        
                   is_oos=is_oos, 
        
                   oos_method=oos_method, 
        
               ) 
        
           ) 
        
           propensity_estimates = self.predict_nuisance( 
        
               X=X, 
        
               is_oos=is_oos, 
        
               oos_method=oos_method, 
        
               model_kind=PROPENSITY_MODEL, 
        
               model_ord=0, 
        
           ) 
        
           y0_estimate = conditional_average_outcome_estimates[:, 0] 
        
           y1_estimate = conditional_average_outcome_estimates[:, treatment_variant]

Assessment

In the case of $k>2$ many treatment variants, the above approach causes needlessly much effort since the same nuisance estimates are created, i.e. repeated, for every single treatment variant, which is not considered to be the 'control'.

Computational burden aside, it is not clear that it is a better method interface that the pseudo outcome methods does the estimation itself. Wouldn't it feel more natural that (and concerns be better separated if) the pseudo outcome methods merely defined the pseudo outcome given the nuisance estimates, rather than estimating quantities itself?

kklein added the enhancement New feature or request label Aug 14, 2024

kklein mentioned this issue Aug 14, 2024

Reuse conditional average outcome estimates for X-Learner pseudo outcome #83

Merged

1 task

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Provide nuisance estimates to pseudo-outcome methods #82

Provide nuisance estimates to pseudo-outcome methods #82

kklein commented Aug 14, 2024

Provide nuisance estimates to pseudo-outcome methods #82

Provide nuisance estimates to pseudo-outcome methods #82

Comments

kklein commented Aug 14, 2024

Status quo

Assessment