Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New predictions group! #1030

Closed
6 of 8 tasks
OriolAbril opened this issue Jan 26, 2020 · 27 comments · Fixed by #1125
Closed
6 of 8 tasks

New predictions group! #1030

OriolAbril opened this issue Jan 26, 2020 · 27 comments · Fixed by #1125
Labels
Enhancement Improvements to ArviZ

Comments

@OriolAbril
Copy link
Member

OriolAbril commented Jan 26, 2020

Tell us about it

Two new groups were recently added to the InferenceData schema: predictions and predictions_constant_data. #983 added them in from_pymc3, however it is still pending for the other libraries.

Current status:

  • from_pymc3
  • from_pystan
  • from_pyro
  • from_cmdstan
  • from_cmdstanpy
  • (from_emcee)
  • from_numpyro
  • from_dict

Thoughts on implementation

PyStan

@ahartikainen what do you think would be better? Mimicking posterior_predictive or log_likelihood and adding the dict option to rename variables on creation?

@OriolAbril OriolAbril added the Enhancement Improvements to ArviZ label Jan 26, 2020
@ahartikainen
Copy link
Contributor

I think following posterior_predictive should be the correct

@ahartikainen
Copy link
Contributor

For future, we need to think about how do we handle post-run generated_quantities.

@nitishp25
Copy link
Contributor

Can I work on this?

@OriolAbril
Copy link
Member Author

Yes! #1032 added predictions and predictions constant data to from_pystan (we forgot to link here) but there are still converters missing it.

@nitishp25
Copy link
Contributor

@OriolAbril which converters are you referring to? The predictions_to_xarray and predictions_constant_data_to_xarray are already present

@OriolAbril
Copy link
Member Author

Other converters such as from_pyro, from_cmdstan and other inference libraries allowing out-of- sample posterior predictive sampling.

@nitishp25
Copy link
Contributor

Should the predictions implementation in from_pyro also follow the posterior_predictive implementation?

Also, it doesn't seem to have constant_data so predictions_constant_data and constant_data both are to be added?

@OriolAbril
Copy link
Member Author

I don't really know the details of pyro well enough, I don't know if it is even possible or if this data is stored in pyro. A good start would be to replicate the model and workflow in the examples used to illustrate the schema specification. There is a version in PyStan and in PyMC3 implementing the same model and populating all the groups in InferenceData (with the exception of sample_stats_prior in PyMC3)

@ahartikainen
Copy link
Contributor

from_cmdstan and from_cmdstanpy are implemented in #1064

@nitishp25
Copy link
Contributor

Hi, I was trying to implement this example in Pyro. I have come up with this so far but as you can see I have incorrect dimensions in the InferenceData. Can you please help me understand what's the problem?

@OriolAbril
Copy link
Member Author

The dimension looks ok (the extra dims are 1, the data contained there looks correct). Maybe it is an issue with a missing squeeze?

Also, it looks like you have no log_likelihood group, are you using latest pyro release and arviz master?

@nitishp25
Copy link
Contributor

nitishp25 commented Feb 14, 2020

The dimension looks ok (the extra dims are 1, the data contained there looks correct). Maybe it is an issue with a missing squeeze?

If you compare it with the PyMC3 example then the prior and log_likelihood have different dimensions right? I don't understand how the parameters have entered the prior dimensions. Does this require a squeeze?

Also, it looks like you have no log_likelihood group, are you using latest pyro release and arviz master?

I had not updated arviz but now I have and you can review again.

@OriolAbril
Copy link
Member Author

In the case of log_likelihood it looks like an error of dimension handling in io_pyro. In the case of the prior, all extra dimensions (b0_dim_o, c1_dim_0,...) have length 1 which is why I believe a squeeze may solve the "problem". The data is the same for both a shape (2,) array [3, 4] and a shape (2,1) array [[3], [4]]

@OriolAbril
Copy link
Member Author

I was reading the code from my mobile on the train, so double check and it looks like log likelihood to xarray uses dims instead of self.dims, should be an easy fix.

I am a little bit puzzled by the extra dims in prior, a squeeze (before adding the chain dim) should solve the issue, but they probably shold not be there to begin with

@nitishp25
Copy link
Contributor

I was reading the code from my mobile on the train, so double check and it looks like log likelihood to xarray uses dims instead of self.dims, should be an easy fix.

Got it.

I am a little bit puzzled by the extra dims in prior, a squeeze (before adding the chain dim) should solve the issue, but they probably shold not be there to begin with

Does it have anything to do with my implementation? I don't have much experience with Pyro.
I'll still keep looking for something in io_pyro.

@ahartikainen
Copy link
Contributor

Is posterior squeezing the input in io_pyro?

In Pyro 1D arrays are actually 2D with last dimension being 1. --> column vector

@nitishp25
Copy link
Contributor

Is posterior squeezing the input in io_pyro?

There seems to be no squeeze for posterior.

In Pyro 1D arrays are actually 2D with last dimension being 1. --> column vector

Yes, I just checked the shape of prior that is passed to from_pyro and it is (400, 1). Calling expand_dims on this makes its shape (1, 400, 1). I think this is causing the extra dim right?

@nitishp25
Copy link
Contributor

Hi, I am now looking to get predictions on new data but what I get is the data has shape of posterior_predictive even though the new data has different shape than the one passed to get posterior. I've even tried to remove pyro.plate or use for loop instead and also used guide but the results were same.

It seems like everytime you need new predictions you generate a new trace in Pyro? But there must be someway to get the right dims on new data with same trace. Here's the example @OriolAbril @ahartikainen.

@OriolAbril
Copy link
Member Author

@nitishp25 I don't know how to fix this, as I said above I don't even know if it is possible to get out of sample posterior predictive samples (what we shortened to predictions) in Pyro. If you want to keep working on this maybe their forum is good place to get some help.

Side comment: I was looking at posterior_predictive and prior_predictive groups and it looks like Pyro is only repeating over and over the observed data. I can't explain why though, it could either be because of an error in the notebook or because of a bug in Pyro.

@nitishp25
Copy link
Contributor

I have asked these questions on their forum. I'll wait for the reply.

@nitishp25
Copy link
Contributor

Thanks to the Pyro forum I was able to figure out how to make out-of-sample predictions. Here's the notebook.

I'm thinking of implementing this data similar to PyMC3 by creating a from_pyro_predictions API.
And for constant_data I'm thinking of adding a constant_data argument to from_pyro and predictions_constant_data argument to from_pyro_predictions. What do you think @OriolAbril @ahartikainen?

@OriolAbril
Copy link
Member Author

I am not sure about following pymc3 or pystan approach. PyMC3 needs a different function, because the information stored in the model is needed for the conversion, and to obtain predictions, the model changes, so a single function would not be able to access both the old and new instances of the model.

It is probably better to follow pystan converter, because then both options become possible; users can merge the current inference data object with the new inference data containing only predictions (pymc3 like workflow) or generate the inference data with everything at once. In this case it would then be important to test that from_pyro works with only predictions and predictions_constant_data arguments.

@nitishp25
Copy link
Contributor

nitishp25 commented Feb 26, 2020

Yes that makes more sense.

Also note that currently there seems to be no way to store constant data directly in Pyro, there exists pyro.determinictics but it samples the constant data with the rest of the data. So, the only way for the user is to manually pass the dict of constant_data and predictions_constant_data containing variable names to data values and we basically just use dict_to_dataset. So is it worth it to add these groups right now?

@OriolAbril
Copy link
Member Author

I'd say it is useful to include the group because it let's users keep track of how where the predictions generated. For example, in the model you are working with, if you only get the inference data without the constant data groups you are missing quite some information as you would not have the times used to generate the posterior predictive samples.

@ahartikainen
Copy link
Contributor

Why it (group input) can not be a dictionary that user provides?

@OriolAbril
Copy link
Member Author

Why it (group input) can not be a dictionary that user provides?

I think this was the idea? @nitishp25

@nitishp25
Copy link
Contributor

nitishp25 commented Feb 26, 2020

Why it (group input) can not be a dictionary that user provides?

It can be. I was just trying to find an easier way from the user's perspective like if Pyro had an option like pymc3.data or pym3.deterministics then all the user would have to do would be to declare the constant data in the model and ArviZ would convert these automatically to inference data (like in PyMC3) without having to pass an argument separately for them.

But since this is not yet possible in Pyro, a dictionary can definitely be the desired input type. I was just looking for a better way and I thought if waiting for such a solution is worth it or not :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Enhancement Improvements to ArviZ
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants