Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update from_xyz converters to follow schema conventions #1514

Closed
7 tasks done
OriolAbril opened this issue Jan 21, 2021 · 23 comments
Closed
7 tasks done

Update from_xyz converters to follow schema conventions #1514

OriolAbril opened this issue Jan 21, 2021 · 23 comments
Labels

Comments

@OriolAbril
Copy link
Member

OriolAbril commented Jan 21, 2021

We recently merged #1063 after deciding on the "ArviZ name" for each of the parameters.

Thoughts on implementation

Some of the libraries already rename some parameters, in many cases it will only mean adding some extra mappings to what is already happening. See for example https://github.com/arviz-devs/arviz/blob/master/arviz/data/io_numpyro.py#L136-L141.

Feel free to work only on one or several converters, but please comment below which are you targetting.

@utkarsh-maheshwari
Copy link
Contributor

Hello community,
I am a beginner in Open Source. I have been following all the issues related to this issue and read the whole discussion.

I maybe wrong, but shouldn't "potential energy" be matched to "model_logp" and not "lp" because pyMC3 uses "model_logp" to depict potential energy according to this comment ?
image

@OriolAbril
Copy link
Member Author

Hi, welcome! This screenshot is actually a very good example to illustrate the issue and the motivation behind establishing this sample stats naming convention.

One of the goals of ArviZ is to be backend agnostic, thus, once converted to InferenceData, there should be no differences other than the attributes (where we store the date, libraries used to get the samples and versions...). However, each of the inference libraries uses its own naming convention, precisely as you have seen in the comment you mention above. Currently, some sample stats are already renamed to the ArviZ convention, but not all of them. This issue is to make sure we enforce the sample stats names defined in the schema.

Now to the example at hand. Before agreeing on the names we just added to the schema, from_numpyro was written to rename sample stats and match the ones from PyMC3 with a little caveat which is that by that time there was already an unwritten ArviZ convention: lp, diverging and energy (all of which we have kept in the now written sample stats convention). A similar thing happened in ArviZ.jl with samples coming from Turing. In that case however, the unwritten convention was used for the 3 parameters that had it and the rest were renamed to match Stan instead of PyMC3. All of this was one of the reasons to define the names in the schema.

As for you question, to have the comment be exact yes, potential_energy should be matched to model_logp but we do not want that, lp is one of the sample stats that is already right in all converters (as you will see in io_pymc3 where model_logp is renamed to lp). From now on, all sample stats should match ArviZ names. In the case of numpyro, lp and step_size are already correct, whereas tree_size should instead be n_steps and mean_tree_accept should be acceptance_rate.

@utkarsh-maheshwari
Copy link
Contributor

If no one is working on from_numpyro, may I start with it?

@OriolAbril
Copy link
Member Author

That would be great, thanks!

@utkarsh-maheshwari
Copy link
Contributor

Done. submitted my first pull request in any Organization. Thanks a lot @OriolAbril for helping with understanding the issue.

@utkarsh-maheshwari
Copy link
Contributor

I think nothing need to be changed in from_pyro.
It uses only 1 sample_stats diverging which is already mapped.
Group names need not be changed as they already follow arviz naming scema.

@OriolAbril
Copy link
Member Author

I think nothing need to be changed in from_pyro.
It uses only 1 sample_stats diverging which is already mapped.

That is great then! Updating the issue description

Group names need not be changed as they already follow arviz naming scema.

Group names should already be right for all converters, except a couple hiccups in tfp case, see #812. Missmatches there would be a clear bug.

@utkarsh-maheshwari
Copy link
Contributor

Group names should already be right for all converters, except a couple hiccups in tfp case, see #812. Missmatches there would be a clear bug.

I see.

I would like to start updating from_cmdstan next if nobody else is working on it.

@OriolAbril
Copy link
Member Author

Great, thanks!

@madhucharan
Copy link
Contributor

madhucharan commented Feb 8, 2021

Hi @OriolAbril, I would like to work on this issue. I followed the conversation. I would like to know what's the remaining part to be done and kindly provide me some information towards the same. If there is any .py file I need to work on specifically etc, Please guide me through it.

@OriolAbril
Copy link
Member Author

Hi @madhucharan! @utkarsh-maheshwari has already worked on a couple converters, I have linked the PRs in the description above but there are still some converters to edit (also edited the description to show that). Let me know which would you like to work on and I can give some extra specific guidance if needed.

@madhucharan
Copy link
Contributor

Hi @OriolAbril , I would like to work on pymc3 or pystan.Could you please guide me further? Meanwhile I will look at the previous changes to numpyro and others to get some sense of it.

@OriolAbril
Copy link
Member Author

I would recommend starting with PyMC3, which will be quite similar to numpyro. There already is a rename dict defined at https://github.com/arviz-devs/arviz/blob/main/arviz/data/io_pymc3.py#L277 which has to be updated to have all the renamed defined in #1063 (comment).

PyStan will be somewhat similar to cmdstan, and will be a bit more convoluted. I think there are 3 places

for key, values in extraction.items():
values = np.stack(values, axis=0)
dtype = dtypes.get(key)
values = values.astype(dtype)
name = re.sub("__$", "", key)
name = "diverging" if name == "divergent" else name
data[name] = values
data_warmup = OrderedDict()
if warmup:
for key, values in extraction_warmup.items():
values = np.stack(values, axis=0)
dtype = dtypes.get(key)
values = values.astype(dtype)
name = re.sub("__$", "", key)
name = "diverging" if name == "divergent" else name
data_warmup[name] = values
(data and warmup) and
for key in fit.sample_and_sampler_param_names:
if (variables and key not in variables) or (ignore and key in ignore):
continue
new_shape = -1, fit.num_chains
values = fit._draws[fit._parameter_indexes(key)] # pylint: disable=protected-access
values = values.reshape(new_shape, order="F")
values = np.moveaxis(values, [-2, -1], [1, 0])
dtype = dtypes.get(key)
values = values.astype(dtype)
name = re.sub("__$", "", key)
name = "diverging" if name == "divergent" else name
data[name] = values
that have to be updated to use a rename dict (like it was done in cmdstan case), it should be checked that I have not missed any

@madhucharan
Copy link
Contributor

Thank you for the clear explanation @OriolAbril .Will get started with my first contribution to get familiar with the codebase. Will reach out after working on pymc3

@utkarsh-maheshwari
Copy link
Contributor

Can I start updating from_cmdstanpy? Since, It is much similar to from_cmdstan, it won't take much time to update it.

@OriolAbril
Copy link
Member Author

Updated the description, now only from_pystan is unassigned, thanks :)

@madhucharan
Copy link
Contributor

Hi @OriolAbril , I have changed the pymc3 rename dict and I tried a lot today to rebase the repo to sync my fork, but unable to do so Not sure why.

[email protected]: Permission denied (publickey).
fatal: Could not read from remote repository.

Please make sure you have the correct access rights
and the repository exists.

getting this error.Could you please guide me how to fix this? or should I send a pr regardless of this but I think I will need to sync my fork with main repo in long run

@utkarsh-maheshwari
Copy link
Contributor

Hi @madhucharan
I was having the same issue while rebasing. The following steps solved the issue.

https://docs.github.com/en/github/authenticating-to-github/generating-a-new-ssh-key-and-adding-it-to-the-ssh-agent : Generating a new SSH key and adding it to the ssh-agent.

https://docs.github.com/en/github/authenticating-to-github/adding-a-new-ssh-key-to-your-github-account : Adding a new SSH key to your GitHub account.

@madhucharan
Copy link
Contributor

I will be updating from_cmdstan

Hi @madhucharan
I was having the same issue while rebasing. The following steps solved the issue.

https://docs.github.com/en/github/authenticating-to-github/generating-a-new-ssh-key-and-adding-it-to-the-ssh-agent : Generating a new SSH key and adding it to the ssh-agent.

https://docs.github.com/en/github/authenticating-to-github/adding-a-new-ssh-key-to-your-github-account : Adding a new SSH key to your GitHub account.

Thank you @utkarsh-maheshwari. @OriolAbril I resolved it by removing the git URL to HTTP URL of base repository. Is it fine to do so or do we need to follow above steps itself?

@madhucharan
Copy link
Contributor

@OriolAbril , I have sent a PR for pymc3, please review. I would like to update pystan as well. Shall I start working on it?

@OriolAbril
Copy link
Member Author

There are multiple ways to authenticate, if you are now able to work without issues that is fine, no need for extra changes.

I would like to update pystan as well. Shall I start working on it?

Okey, I will update the description

@madhucharan madhucharan mentioned this issue Feb 26, 2021
6 tasks
@Rishabh261998
Copy link
Contributor

I guess this issue can be closed now. @OriolAbril ?

@OriolAbril
Copy link
Member Author

yeah, let's keep track of everything tfp related in #812

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

4 participants