-
-
Notifications
You must be signed in to change notification settings - Fork 395
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add an InferenceData
object
#173
Conversation
I can update PyStan stuff, I just need the normalized names.
And then we also have |
Heh, this maybe means we need better names. I took these straight from their names in pymc3 after a NUTS run: data.sample_stats
<xarray.Dataset>
Dimensions: (chain: 4, draw: 500)
Coordinates:
* chain (chain) int64 0 1 2 3
* draw (draw) int64 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 ...
Data variables:
depth (chain, draw) int64 ...
diverging (chain, draw) bool False False False False False False ...
energy (chain, draw) float64 ...
energy_error (chain, draw) float64 ...
max_energy_error (chain, draw) float64 ...
mean_tree_accept (chain, draw) float64 ...
step_size (chain, draw) float64 ...
step_size_bar (chain, draw) float64 ...
tree_size (chain, draw) float64 ...
tune (chain, draw) bool ... |
59b16f0
to
8c60516
Compare
@@ -1,18 +1,21 @@ | |||
from abc import ABC, abstractmethod, abstractstaticmethod | |||
import re |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does it make sense to call this xarray_utils now?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good point - I think we're close to getting rid of some of the original dataframe functions in utils.py
. probably there should be a refactor then. Really, these classes to convert objects should probably live in their own module, with the inference_data
object.
I have updated the PyStan code (I also did update some dim calculations). |
Also, I found that only |
sounds good to me! most sampler stats will have to be optional ( |
This is a pretty big change - see also #169. I am sure there are some rough edges, but this seems like a flexible way to get to feature parity with current PyMC3 plotting.
InferenceData
is the object that carries all the schema data that is available. Working with it a little bit, I like it - it is just a light wrapper on anetCDF.Dataset
that accessesxarray.Dataset
's. Supports tab completion, and usage looks like:I added
sample_stats
to the PyMC3 extractor. It was pretty easy, and I can follow up with a similar job on PyStan (or @ahartikainen can!). We should argue about names for thosesample_stats
as well as the required/optional stats then, and update the schema accordingly.A funny thing about
InferenceData
is that it is file based, not memory based. I did not want to require every plotting function to require a filename, and I want them to work out of the box with PyMC3 or PyStan objects, so making a plot with one of these objects will write a file to disk. It usestempfile
to get a unique filename, and will always write into the same directory. By default, it writes to.arviz_data/
, but that can be updated withIf anyone has a more elegant way of handling this, I'm all ears. I was thinking of at least adding a warning every once in a while about the existence of this folder along with a suggestion to clean it out. Maybe every time the number of files in the directory is a multiple of 10, spawn a warning?
I updated the sample data to use
InferenceData
.az.load_arviz_data('centered_eight')
is a nice way to start playing with this.I tried to update documentation and function names, and got most of the way there.