Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Internal refactor #804

Merged
merged 21 commits into from
Jul 4, 2024
Merged

Conversation

tomicapretto
Copy link
Collaborator

@tomicapretto tomicapretto commented Apr 14, 2024

This started as a refactor with the goal to change how we name the parameters of the response distribution. So far, we named the parent parameter with f"{response_name}_mean" and all the other parameters with f"{response_name}_{param_name}". I no longer think appending _mean is a sensible approach as the parent parameter can be something different than the mean. And also, I realized prepending the name of the response resulted in very long and confusing names. For that reason, I decided it's better to just use the name of the parameter (i.e. mu for the mean in many families, p for probability of success, kappa, sigma, and so on).

But that is not the only change. Many other changes come with this PR. I summarize them below, and they will be added to the changelog. After this is merged, we can have a 0.14.0 release.

Summary of changes

  • Use "__obs__" instead of f"{response_name}_obs" as the dimension name for the observation index. This makes it easier to avoid errors in our code base and results in a clear an unique name for all cases. Previously we had to worry about the possible different names of the response and their aliases.
  • Use param_name instead of f"{response_name}_{param_name" (mentioned above).
  • Use the name of the parent parameter instead of f"{response_name}_{mean}" (mentioned above).
  • kind in Model.predict() now use "params" and "response" instead of "mean" and "pps".
    • The old versions still work, with a future warning.
  • include_mean has been replaced by include_params in Model.fit(). The old version still works, with a future warning.
  • Wrap all the parameters of the response distribution (the likelihood) with a pm.Deterministic. The benefit is that the model graphs are clearer and we don't incur in any penalty related to the computation and storage of deterministics as they are not computed and stored when sampling. This is thanks to some recent developments in PyMC.
  • Keep bayeux-ml as the single direct JAX-related dependency. Bayeux itself requires all the other libraries providing sampling backends for us so we don't list them directly. This may change in the future if we need to pin specific versions but I hope that's not the case.
  • Implemented a function called create_posterior_bayeux that creates the xarray.Dataset that holds the samples from the posterior when doing inference via bayeux. This makes sure we use the right dim/coord names and coord levels.
  • The response component only holds response information about the response. Previously, the response component used to hold information about the parent parameter and the response component. There's a clearer separation of concerns now. As an example, to access things related to the linear predictor of the parent parameter in a normal model now we do model.components["mu"]. Before we needed model.response_component. Now, that component still exists but it doesn't hold information about mu.

Other important notes

  • I've reran all the examples.
  • I've adapted the tests and made sure they pass.
  • I'm pining PyMC from main, this is because truncated responses need a fix in there. We can change it once a new version (5.16?) is released.

EDIT Bayeux based inferences don't include the observed_data group. This is problematic if we want to do ppchecks. We need to add them. Done

EDIT: It also closes #814

@tomicapretto
Copy link
Collaborator Author

This is temporarily suspended because the var_names parameter in pm.sample() is not working as expected pymc-devs/pymc#7258, and we need it for the modifications here.

@tomicapretto tomicapretto changed the title Rename likelihood params [WIP] Rename likelihood params May 13, 2024
@tomicapretto
Copy link
Collaborator Author

Now there's a new blocker which is pymc-devs/pymc#7312

And I would also like to have pymc-devs/pymc#7290 merged. Currently, the progressbar has many updates which slows down the sampler in jupyter notebooks.

@tomicapretto
Copy link
Collaborator Author

Everything should be fixed now. The pyproject.toml file is pointing to the dev version of PyMC. This should be good to go once PyMC releases a new version.

Now I'm going to re-run examples.

Copy link

Check out this pull request on  ReviewNB

See visual diffs & provide feedback on Jupyter Notebooks.


Powered by ReviewNB

@tomicapretto tomicapretto changed the title Rename likelihood params Internal refactor May 27, 2024
@tomicapretto tomicapretto marked this pull request as ready for review May 27, 2024 01:18
@aloctavodia
Copy link
Collaborator

I like the changes, but I find include_params and kind="params" ambiguous. Could we use something like linear_predictor, linear_term or something like that?

@tomicapretto
Copy link
Collaborator Author

include_params

Thanks for the review! I'm not sure I understand what is the suggestion (i.e. if you want argument(s) called linear_predictor, linear_term or those should be argument values). I also see the current names are not the best as it's as "params" can be many things in a model.

@aloctavodia
Copy link
Collaborator

I am saying that the name "params" is very vague.

Copy link
Collaborator

@GStechschulte GStechschulte left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have been holding this PR back for too long. I reviewed the changes, but paid closer attention to the changes in interpret and with bayeux.

I really like the changes, and everything LGTM.

bambi/backend/pymc.py Show resolved Hide resolved
@tomicapretto
Copy link
Collaborator Author

@GStechschulte thanks for the review! Do you have any ideas for the issue @aloctavodia mentioned? I agree the current approach is a bit vague. But I don't have lots of ideas right now :D

@GStechschulte
Copy link
Collaborator

GStechschulte commented Jun 21, 2024

@tomicapretto @aloctavodia apologies for the delayed response. This notification slipped through my GitHub 😞

Before, include_mean would compute the "posterior of the mean response" which technically may not always be a mean as you stated above. This computation is $g^{-1}(\eta)$ which relates the linear predictor to the response. Since it is a response parameter, but not always the mean, maybe include_response_params. It's a longer name, but less vague?

To be consistent in model.predict:

  • kind="response_params" as calling model.fit(include_response_params=True) will result in the same InferenceData as the sequence of calls: (1) model.fit(include_response_params=False), then (2) model.predict(kind="response_params", data=None)

  • kind="response_obs" or response so users "know" posterior predictive samples are returned and the samples are on the response scale.

@tomicapretto tomicapretto merged commit 4cc3103 into bambinos:main Jul 4, 2024
4 checks passed
@tomicapretto tomicapretto deleted the rename_likelihood_params branch July 4, 2024 12:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Broken link in 1D HGSP example
3 participants