Update from_xyz converters to follow schema conventions #1514

OriolAbril · 2021-01-21T19:37:43Z

We recently merged #1063 after deciding on the "ArviZ name" for each of the parameters.

from_numpyro: @utkarsh-maheshwari updated it with Update from_numpyro converter to follow schema convention #1525
from_pyro requires no extra work
from_emcee requires no extra work
from_cmdstan: @utkarsh-maheshwari updated it with Update from_cmdstan converter to follow schema convention #1541
from_pymc3: @madhucharan working on it
from_pystan: @madhucharan working on it
from_cmdstanpy: @utkarsh-maheshwari working on it

Thoughts on implementation

Some of the libraries already rename some parameters, in many cases it will only mean adding some extra mappings to what is already happening. See for example https://github.com/arviz-devs/arviz/blob/master/arviz/data/io_numpyro.py#L136-L141.

Feel free to work only on one or several converters, but please comment below which are you targetting.

utkarsh-maheshwari · 2021-01-23T20:03:59Z

Hello community,
I am a beginner in Open Source. I have been following all the issues related to this issue and read the whole discussion.

I maybe wrong, but shouldn't "potential energy" be matched to "model_logp" and not "lp" because pyMC3 uses "model_logp" to depict potential energy according to this comment ?

OriolAbril · 2021-01-24T00:19:27Z

Hi, welcome! This screenshot is actually a very good example to illustrate the issue and the motivation behind establishing this sample stats naming convention.

One of the goals of ArviZ is to be backend agnostic, thus, once converted to InferenceData, there should be no differences other than the attributes (where we store the date, libraries used to get the samples and versions...). However, each of the inference libraries uses its own naming convention, precisely as you have seen in the comment you mention above. Currently, some sample stats are already renamed to the ArviZ convention, but not all of them. This issue is to make sure we enforce the sample stats names defined in the schema.

Now to the example at hand. Before agreeing on the names we just added to the schema, from_numpyro was written to rename sample stats and match the ones from PyMC3 with a little caveat which is that by that time there was already an unwritten ArviZ convention: lp, diverging and energy (all of which we have kept in the now written sample stats convention). A similar thing happened in ArviZ.jl with samples coming from Turing. In that case however, the unwritten convention was used for the 3 parameters that had it and the rest were renamed to match Stan instead of PyMC3. All of this was one of the reasons to define the names in the schema.

As for you question, to have the comment be exact yes, potential_energy should be matched to model_logp but we do not want that, lp is one of the sample stats that is already right in all converters (as you will see in io_pymc3 where model_logp is renamed to lp). From now on, all sample stats should match ArviZ names. In the case of numpyro, lp and step_size are already correct, whereas tree_size should instead be n_steps and mean_tree_accept should be acceptance_rate.

utkarsh-maheshwari · 2021-01-27T18:40:53Z

If no one is working on from_numpyro, may I start with it?

OriolAbril · 2021-01-27T18:42:00Z

That would be great, thanks!

utkarsh-maheshwari · 2021-01-27T20:00:34Z

Done. submitted my first pull request in any Organization. Thanks a lot @OriolAbril for helping with understanding the issue.

utkarsh-maheshwari · 2021-02-03T20:10:14Z

I think nothing need to be changed in from_pyro.
It uses only 1 sample_stats diverging which is already mapped.
Group names need not be changed as they already follow arviz naming scema.

OriolAbril · 2021-02-04T07:52:26Z

I think nothing need to be changed in from_pyro.
It uses only 1 sample_stats diverging which is already mapped.

That is great then! Updating the issue description

Group names need not be changed as they already follow arviz naming scema.

Group names should already be right for all converters, except a couple hiccups in tfp case, see #812. Missmatches there would be a clear bug.

utkarsh-maheshwari · 2021-02-06T19:05:18Z

Group names should already be right for all converters, except a couple hiccups in tfp case, see #812. Missmatches there would be a clear bug.

I see.

I would like to start updating from_cmdstan next if nobody else is working on it.

OriolAbril · 2021-02-06T19:12:48Z

Great, thanks!

madhucharan · 2021-02-08T18:57:41Z

Hi @OriolAbril, I would like to work on this issue. I followed the conversation. I would like to know what's the remaining part to be done and kindly provide me some information towards the same. If there is any .py file I need to work on specifically etc, Please guide me through it.

OriolAbril · 2021-02-08T20:50:15Z

Hi @madhucharan! @utkarsh-maheshwari has already worked on a couple converters, I have linked the PRs in the description above but there are still some converters to edit (also edited the description to show that). Let me know which would you like to work on and I can give some extra specific guidance if needed.

madhucharan · 2021-02-09T03:24:07Z

Hi @OriolAbril , I would like to work on pymc3 or pystan.Could you please guide me further? Meanwhile I will look at the previous changes to numpyro and others to get some sense of it.

OriolAbril · 2021-02-09T03:39:41Z

I would recommend starting with PyMC3, which will be quite similar to numpyro. There already is a rename dict defined at https://github.com/arviz-devs/arviz/blob/main/arviz/data/io_pymc3.py#L277 which has to be updated to have all the renamed defined in #1063 (comment).

PyStan will be somewhat similar to cmdstan, and will be a bit more convoluted. I think there are 3 places

arviz/arviz/data/io_pystan.py

Lines 651 to 667 in 6fa1ce8

    
           for key, values in extraction.items(): 
        
               values = np.stack(values, axis=0) 
        
               dtype = dtypes.get(key) 
        
               values = values.astype(dtype) 
        
               name = re.sub("__$", "", key) 
        
               name = "diverging" if name == "divergent" else name 
        
               data[name] = values 
        
           data_warmup = OrderedDict() 
        
           if warmup: 
        
               for key, values in extraction_warmup.items(): 
        
                   values = np.stack(values, axis=0) 
        
                   dtype = dtypes.get(key) 
        
                   values = values.astype(dtype) 
        
                   name = re.sub("__$", "", key) 
        
                   name = "diverging" if name == "divergent" else name 
        
                   data_warmup[name] = values

(data and warmup) and

arviz/arviz/data/io_pystan.py

Lines 782 to 793 in 6fa1ce8

    
           for key in fit.sample_and_sampler_param_names: 
        
               if (variables and key not in variables) or (ignore and key in ignore): 
        
                   continue 
        
               new_shape = -1, fit.num_chains 
        
               values = fit._draws[fit._parameter_indexes(key)]  # pylint: disable=protected-access 
        
               values = values.reshape(new_shape, order="F") 
        
               values = np.moveaxis(values, [-2, -1], [1, 0]) 
        
               dtype = dtypes.get(key) 
        
               values = values.astype(dtype) 
        
               name = re.sub("__$", "", key) 
        
               name = "diverging" if name == "divergent" else name 
        
               data[name] = values

that have to be updated to use a rename dict (like it was done in cmdstan case), it should be checked that I have not missed any

madhucharan · 2021-02-09T03:47:04Z

Thank you for the clear explanation @OriolAbril .Will get started with my first contribution to get familiar with the codebase. Will reach out after working on pymc3

utkarsh-maheshwari · 2021-02-09T17:45:30Z

Can I start updating from_cmdstanpy? Since, It is much similar to from_cmdstan, it won't take much time to update it.

OriolAbril · 2021-02-09T18:00:51Z

Updated the description, now only from_pystan is unassigned, thanks :)

madhucharan · 2021-02-12T17:58:21Z

Hi @OriolAbril , I have changed the pymc3 rename dict and I tried a lot today to rebase the repo to sync my fork, but unable to do so Not sure why.

[email protected]: Permission denied (publickey).
fatal: Could not read from remote repository.

Please make sure you have the correct access rights
and the repository exists.

getting this error.Could you please guide me how to fix this? or should I send a pr regardless of this but I think I will need to sync my fork with main repo in long run

utkarsh-maheshwari · 2021-02-12T18:10:14Z

Hi @madhucharan
I was having the same issue while rebasing. The following steps solved the issue.

https://docs.github.com/en/github/authenticating-to-github/generating-a-new-ssh-key-and-adding-it-to-the-ssh-agent : Generating a new SSH key and adding it to the ssh-agent.

https://docs.github.com/en/github/authenticating-to-github/adding-a-new-ssh-key-to-your-github-account : Adding a new SSH key to your GitHub account.

madhucharan · 2021-02-12T18:57:48Z

I will be updating from_cmdstan

Hi @madhucharan
I was having the same issue while rebasing. The following steps solved the issue.

https://docs.github.com/en/github/authenticating-to-github/generating-a-new-ssh-key-and-adding-it-to-the-ssh-agent : Generating a new SSH key and adding it to the ssh-agent.

https://docs.github.com/en/github/authenticating-to-github/adding-a-new-ssh-key-to-your-github-account : Adding a new SSH key to your GitHub account.

Thank you @utkarsh-maheshwari. @OriolAbril I resolved it by removing the git URL to HTTP URL of base repository. Is it fine to do so or do we need to follow above steps itself?

madhucharan · 2021-02-12T18:59:26Z

@OriolAbril , I have sent a PR for pymc3, please review. I would like to update pystan as well. Shall I start working on it?

OriolAbril · 2021-02-12T21:03:44Z

There are multiple ways to authenticate, if you are now able to work without issues that is fine, no need for extra changes.

I would like to update pystan as well. Shall I start working on it?

Okey, I will update the description

Rishabh261998 · 2021-03-10T09:26:31Z

I guess this issue can be closed now. @OriolAbril ?

OriolAbril · 2021-03-10T10:53:28Z

yeah, let's keep track of everything tfp related in #812

OriolAbril added the Beginner label Jan 21, 2021

utkarsh-maheshwari mentioned this issue Jan 27, 2021

Update from_numpyro converter to follow schema convention #1525

Merged

3 tasks

utkarsh-maheshwari mentioned this issue Feb 6, 2021

Update from_cmdstan converter to follow schema convention #1541

Merged

3 tasks

utkarsh-maheshwari mentioned this issue Feb 9, 2021

Update from_cmdstanpy converter to follow schema convention #1550

Merged

3 tasks

madhucharan mentioned this issue Feb 12, 2021

Update rename_dict on io_pymc3.py converter to follow schema convention #1555

Merged

3 tasks

madhucharan mentioned this issue Feb 26, 2021

Update io pystan #1585

Merged

6 tasks

OriolAbril closed this as completed Mar 10, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Update from_xyz converters to follow schema conventions #1514

Update from_xyz converters to follow schema conventions #1514

OriolAbril commented Jan 21, 2021 •

edited

Loading

utkarsh-maheshwari commented Jan 23, 2021

OriolAbril commented Jan 24, 2021

utkarsh-maheshwari commented Jan 27, 2021

OriolAbril commented Jan 27, 2021

utkarsh-maheshwari commented Jan 27, 2021

utkarsh-maheshwari commented Feb 3, 2021

OriolAbril commented Feb 4, 2021

utkarsh-maheshwari commented Feb 6, 2021

OriolAbril commented Feb 6, 2021

madhucharan commented Feb 8, 2021 •

edited

Loading

OriolAbril commented Feb 8, 2021

madhucharan commented Feb 9, 2021

OriolAbril commented Feb 9, 2021

madhucharan commented Feb 9, 2021

utkarsh-maheshwari commented Feb 9, 2021

OriolAbril commented Feb 9, 2021

madhucharan commented Feb 12, 2021

utkarsh-maheshwari commented Feb 12, 2021

madhucharan commented Feb 12, 2021

madhucharan commented Feb 12, 2021

OriolAbril commented Feb 12, 2021

Rishabh261998 commented Mar 10, 2021

OriolAbril commented Mar 10, 2021

Update from_xyz converters to follow schema conventions #1514

Update from_xyz converters to follow schema conventions #1514

Comments

OriolAbril commented Jan 21, 2021 • edited Loading

Thoughts on implementation

utkarsh-maheshwari commented Jan 23, 2021

OriolAbril commented Jan 24, 2021

utkarsh-maheshwari commented Jan 27, 2021

OriolAbril commented Jan 27, 2021

utkarsh-maheshwari commented Jan 27, 2021

utkarsh-maheshwari commented Feb 3, 2021

OriolAbril commented Feb 4, 2021

utkarsh-maheshwari commented Feb 6, 2021

OriolAbril commented Feb 6, 2021

madhucharan commented Feb 8, 2021 • edited Loading

OriolAbril commented Feb 8, 2021

madhucharan commented Feb 9, 2021

OriolAbril commented Feb 9, 2021

madhucharan commented Feb 9, 2021

utkarsh-maheshwari commented Feb 9, 2021

OriolAbril commented Feb 9, 2021

madhucharan commented Feb 12, 2021

utkarsh-maheshwari commented Feb 12, 2021

madhucharan commented Feb 12, 2021

madhucharan commented Feb 12, 2021

OriolAbril commented Feb 12, 2021

Rishabh261998 commented Mar 10, 2021

OriolAbril commented Mar 10, 2021

OriolAbril commented Jan 21, 2021 •

edited

Loading

madhucharan commented Feb 8, 2021 •

edited

Loading