Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Provide nuisance estimates to pseudo-outcome methods #82

Open
kklein opened this issue Aug 14, 2024 · 0 comments
Open

Provide nuisance estimates to pseudo-outcome methods #82

kklein opened this issue Aug 14, 2024 · 0 comments
Labels
enhancement New feature or request

Comments

@kklein
Copy link
Collaborator

kklein commented Aug 14, 2024

Status quo

As of now we have the following interface for the pseudo-outcome methods in the R-Learner and R-Learner:

  • DR-Learner

    def _pseudo_outcome(
    self,
    X: Matrix,
    y: Vector,
    w: Vector,
    treatment_variant: int,
    is_oos: bool,
    oos_method: OosMethod = OVERALL,
    epsilon: float = _EPSILON,
    ) -> np.ndarray:

  • R-Learner

    def _pseudo_outcome_and_weights(
    self,
    X: Matrix,
    y: Vector,
    w: Vector,
    treatment_variant: int,
    is_oos: bool,
    oos_method: OosMethod = OVERALL,
    mask: Vector | None = None,
    epsilon: float = _EPSILON,
    ) -> tuple[np.ndarray, np.ndarray]:

Since both pseudo outcome kinds require nuisance model estimates and since these are visibly not provided as input arguments, they are estimated as part of the respective pseudo outcome method.

Importantly, the pseudo outcome methods are treatment-variant specific. Yet, the nuisance estimates estimated as part of the pseudo outcome methods are not treatment variant specific:

  • In the case of the R-Learner, the overall outcome model $\hat{\mu}$ is applied on all data; the overall propensity model $\hat{e}$ is applied on all data. Only after the estimation is the data filtered wrt to the treatment variant at hand:

    y_estimates = self.predict_nuisance(
    X=X,
    is_oos=is_oos,
    model_kind=OUTCOME_MODEL,
    model_ord=0,
    oos_method=oos_method,
    )[mask]
    w_estimates = self.predict_nuisance(
    X=X,
    is_oos=is_oos,
    model_kind=PROPENSITY_MODEL,
    model_ord=0,
    oos_method=oos_method,
    )[mask]

  • In the case of the DR-Learner, the propensity $\hat{e}$ and all conditional average outcomes $\hat{mu}_k$ are estimated for all data points; filtering of variant-specific information only happens thereafter:

    conditional_average_outcome_estimates = (
    self.predict_conditional_average_outcomes(
    X=X,
    is_oos=is_oos,
    oos_method=oos_method,
    )
    )
    propensity_estimates = self.predict_nuisance(
    X=X,
    is_oos=is_oos,
    oos_method=oos_method,
    model_kind=PROPENSITY_MODEL,
    model_ord=0,
    )
    y0_estimate = conditional_average_outcome_estimates[:, 0]
    y1_estimate = conditional_average_outcome_estimates[:, treatment_variant]

Assessment

In the case of $k>2$ many treatment variants, the above approach causes needlessly much effort since the same nuisance estimates are created, i.e. repeated, for every single treatment variant, which is not considered to be the 'control'.

Computational burden aside, it is not clear that it is a better method interface that the pseudo outcome methods does the estimation itself. Wouldn't it feel more natural that (and concerns be better separated if) the pseudo outcome methods merely defined the pseudo outcome given the nuisance estimates, rather than estimating quantities itself?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

1 participant