You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
With PyMC being the main user of ArviZ, I would like to coordinate regarding the ongoing refactor on the arviz side as it has a lot of breaking changes.
General idea
Split ArviZ into multiple smaller subpackages, so it isn't such a huge monolithical block but a more modular thing. Each of these smaller libraries: arviz-base, arviz-stats and arviz-plots has as dependencies only the minimal set strictily needed, anything that extends functionality or that does things that can happen via different alternatives (like plotting backend or idata io engine) is an optional dependency.
We still plan to have an arviz package which would install all 3 of them (unclear if along with some "default" optional dependencies to have a feel closer to what it is now) which exposes the functions from all 3 libraries through a common namespace. But for people running a model on a cloud for example, it is might best to install pymc and arviz-base only, save the output as zarr or netcdf and download it. Then locally or on a smaller machine run convergence checks and analyze the results.
Module/library highlight of breaking changes
arviz-base
Uses DataTree instead of InferenceData. This will probably be the main pain point but also a source of nice new features.
New features, more io backends and support for nested hierarchies. Potential pain points idata[group] will be a DataTree instead of a Dataset even if there are no nested groups. DataTree is new so it will probably have some rough edges for a bit, plus the custom methods like .map or .extend won't exist anymore (there are things like merge, map_over_subtree...).
A bit more flexible in general, especially when it comes to groups, no warnings anymore for "unrecognized" ones things like that.
Small ask for help. DataTree supports nested groups, but I don't have an example of this nor I am sure how should nested groups behave.
arviz-stats
Very unclear as of now, it is the last module to be worked on. For now it mostly has what we need for arviz-plots to work.
arviz-plots
The main focus on this end has actually been easing development and maintenance, but thanks to the refactor it is more flexible when it comes to facetting/aesthetics mappings as well as more homogeneous plotting backend support (instead of nice matplotlib and barely working bokeh stuff) having now support for matplotlib, bokeh and plotly.
Several plots have been renamed such as plot_posterior -> plot_dist, plot_trace -> plot_trace_dist (plot_trace continues to exist but plots only the traces now). And all plots return a new class defined in arviz-plots called PlotCollection which contains the figure, axes and artist objects in matplotlib lingo.
This is the more advanced out of the 3 libraries in my opinion and it is ready to use, so it would be great to get people to test it out. My recommendation is install arviz-plots from github along with pymc+arviz, then you can pass arviz.InferenceData to arviz-plots functions. Useful docs: example gallery of updated plots (showing all 3 backends) and main intro notebook
Regarding PyMC itself. What would you like PyMC to depend on? And how would you like PyMC to behave?
For me, continuing to depend on arviz (provided it only installs the 3 arviz-xyz, numpy, scipy and xarray) would probably be best so functionality continues to be the same, convergence checks continue to be run by default and stats and plots can continue to be exposed if desired (even if plotting won't work unless at least one of the plotting backends is installed).
And how would you coordinate updates in pymc to account for the breaking changes that will happen? Keep in mind the still unclear timeline on arviz-xyz so I don't think it is nothing urgent and there is a lot of room to do things however we want on this end.
The text was updated successfully, but these errors were encountered:
Description
With PyMC being the main user of ArviZ, I would like to coordinate regarding the ongoing refactor on the arviz side as it has a lot of breaking changes.
General idea
Split ArviZ into multiple smaller subpackages, so it isn't such a huge monolithical block but a more modular thing. Each of these smaller libraries:
arviz-base
,arviz-stats
andarviz-plots
has as dependencies only the minimal set strictily needed, anything that extends functionality or that does things that can happen via different alternatives (like plotting backend or idata io engine) is an optional dependency.We still plan to have an
arviz
package which would install all 3 of them (unclear if along with some "default" optional dependencies to have a feel closer to what it is now) which exposes the functions from all 3 libraries through a common namespace. But for people running a model on a cloud for example, it is might best to install pymc and arviz-base only, save the output as zarr or netcdf and download it. Then locally or on a smaller machine run convergence checks and analyze the results.Module/library highlight of breaking changes
arviz-base
Uses
DataTree
instead ofInferenceData
. This will probably be the main pain point but also a source of nice new features.New features, more io backends and support for nested hierarchies. Potential pain points
idata[group]
will be aDataTree
instead of aDataset
even if there are no nested groups.DataTree
is new so it will probably have some rough edges for a bit, plus the custom methods like.map
or.extend
won't exist anymore (there are things like merge, map_over_subtree...).A bit more flexible in general, especially when it comes to groups, no warnings anymore for "unrecognized" ones things like that.
Small ask for help. DataTree supports nested groups, but I don't have an example of this nor I am sure how should nested groups behave.
arviz-stats
Very unclear as of now, it is the last module to be worked on. For now it mostly has what we need for
arviz-plots
to work.arviz-plots
The main focus on this end has actually been easing development and maintenance, but thanks to the refactor it is more flexible when it comes to facetting/aesthetics mappings as well as more homogeneous plotting backend support (instead of nice matplotlib and barely working bokeh stuff) having now support for matplotlib, bokeh and plotly.
Several plots have been renamed such as
plot_posterior
->plot_dist
,plot_trace
->plot_trace_dist
(plot_trace continues to exist but plots only the traces now). And all plots return a new class defined inarviz-plots
calledPlotCollection
which contains the figure, axes and artist objects in matplotlib lingo.This is the more advanced out of the 3 libraries in my opinion and it is ready to use, so it would be great to get people to test it out. My recommendation is install
arviz-plots
from github along with pymc+arviz, then you can passarviz.InferenceData
toarviz-plots
functions. Useful docs: example gallery of updated plots (showing all 3 backends) and main intro notebookRegarding PyMC itself. What would you like PyMC to depend on? And how would you like PyMC to behave?
For me, continuing to depend on
arviz
(provided it only installs the 3 arviz-xyz, numpy, scipy and xarray) would probably be best so functionality continues to be the same, convergence checks continue to be run by default and stats and plots can continue to be exposed if desired (even if plotting won't work unless at least one of the plotting backends is installed).And how would you coordinate updates in pymc to account for the breaking changes that will happen? Keep in mind the still unclear timeline on arviz-xyz so I don't think it is nothing urgent and there is a lot of room to do things however we want on this end.
The text was updated successfully, but these errors were encountered: