Histogram distribution plot fails on hist, succeeds on KDE #1104

rpgoldman · 2020-03-03T17:07:36Z

Describe the bug
I was plotting posterior data from an InferenceData object, and the KDE worked fine. But I was interested to see more exactly what the samples looked like, so I tried plotting with a histogram instead. This errored out, with the backtrace below.

To the extent I can tell, it looks like Arviz grabbed up a multi-dimensional array (chains x samples x inputs, 4 x 500 x 4) and instead of flattening it -- as the KDE seems to have done -- or splitting it out into four plots for the inputs dimension), it errored out.

To Reproduce
Steps to reproduce the behavior. Ideally a self-contained snippet of code, or link to a notebook or external code. Please include screenshots/images produced with ArviZ here, or the stacktrace including arviz code to help.

import arviz as az
idata = az.from_netcdf('/Users/rpg/Dropbox/arviz-hist-bug.nc')  
az.plot_dist(idata.posterior['err_sd'])  # works fine
az.plot_dist(idata.posterior['err_sd'], kind='hist')   # error in np.reshape()

Expected behavior
I expected to get a histogram plot, or possibly four.

Alternatively, if I was using this wrong, I would expect to get an error for the KDE case as well as the basis case.

Additional context

Arviz version was today's master.
NetCDF file to reproduce the error is here: https://www.dropbox.com/s/181nuxivbkqr8ck/arviz-hist-bug.nc?dl=0

Backtrace

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-10-c027c32088dc> in <module>
----> 1 az.plot_dist(idata.posterior['err_sd'], kind='hist')

~/src/arviz/arviz/plots/distplot.py in plot_dist(values, values2, color, kind, cumulative, label, rotated, rug, bw, quantiles, contour, fill_last, textsize, plot_kwargs, fill_kwargs, rug_kwargs, contour_kwargs, contourf_kwargs, pcolormesh_kwargs, hist_kwargs, ax, backend, backend_kwargs, show, **kwargs)
    197 
    198     plot = get_plotting_function("plot_dist", "distplot", backend)
--> 199     ax = plot(**dist_plot_args)
    200     return ax

~/src/arviz/arviz/plots/backends/matplotlib/distplot.py in plot_dist(values, values2, color, kind, cumulative, label, rotated, rug, bw, quantiles, contour, fill_last, textsize, plot_kwargs, fill_kwargs, rug_kwargs, contour_kwargs, contourf_kwargs, pcolormesh_kwargs, hist_kwargs, ax, backend_kwargs, show)
     47     if kind == "hist":
     48         ax = _histplot_mpl_op(
---> 49             values=values, values2=values2, rotated=rotated, ax=ax, hist_kwargs=hist_kwargs
     50         )
     51 

~/src/arviz/arviz/plots/backends/matplotlib/distplot.py in _histplot_mpl_op(values, values2, rotated, ax, hist_kwargs)
     93     bins = hist_kwargs.pop("bins")
     94 
---> 95     ax.hist(values, bins=bins, **hist_kwargs)
     96     if rotated:
     97         ax.set_yticks(bins[:-1])

~/.virtualenvs/xplan-dev-env/lib/python3.6/site-packages/matplotlib/__init__.py in inner(ax, data, *args, **kwargs)
   1599     def inner(ax, *args, data=None, **kwargs):
   1600         if data is None:
-> 1601             return func(ax, *map(sanitize_sequence, args), **kwargs)
   1602 
   1603         bound = new_sig.bind(ax, *args, **kwargs)

~/.virtualenvs/xplan-dev-env/lib/python3.6/site-packages/matplotlib/axes/_axes.py in hist(self, x, bins, range, density, weights, cumulative, bottom, histtype, align, orientation, rwidth, log, color, label, stacked, normed, **kwargs)
   6686         input_empty = np.size(x) == 0
   6687         # Massage 'x' for processing.
-> 6688         x = cbook._reshape_2D(x, 'x')
   6689         nx = len(x)  # number of datasets
   6690 

~/.virtualenvs/xplan-dev-env/lib/python3.6/site-packages/matplotlib/cbook/__init__.py in _reshape_2D(X, name)
   1428         return [np.reshape(x, -1) for x in X]
   1429     else:
-> 1430         raise ValueError("{} must have 2 or fewer dimensions".format(name))
   1431 
   1432 

ValueError: x must have 2 or fewer dimensions

The text was updated successfully, but these errors were encountered:

OriolAbril · 2020-03-03T17:42:39Z

A flatten is missing. To get 4 plots, plot_posterior should be used (with point_estimate=None and credible_interval=None to get a plot_dist look).

However, I have just seen that credible_interval=None used to be no black line indicating credible interval but became using value in rcParams, so 2 things to get fixed, the flatten in plot_dist and making "auto" the default to use rcParams credible interval and None to remove line from plot.

rpgoldman · 2020-03-03T17:59:05Z

@OriolAbril Thanks. That helped. BTW, I tried using plot_posterior instead, and passed it axes, because I wanted to stack the four plots (with sharex) for easier comparison.

I find that the Arviz plots put their suptitles way up high, which means it's hard to stack them without them overlapping. Presumably using Figure.adjust_subplots() can fix this, but I haven't been able to figure out a value of hspace that does anything useful.

smit-s · 2020-03-10T20:29:43Z

Hi, there @OriolAbril @rpgoldman . I am new here and would like to take up this issue. I have already solved half of it locally .I would like to solve it. Will you assign me this issue please?

OriolAbril · 2020-03-10T20:39:28Z

No need to be assigned, with issues with nobody working on them just say you'll work on it in order to avoid duplicated work. We have a more detailed description on the contributing guide.

We count you are working on it now, do not hesitate to ask for guidance if you were to need any.

smit-s · 2020-03-11T15:31:38Z

Yes @OriolAbril , I have started working on it. I have already added the "flatten" and it seems to work fine.
Can you please explain what do you mean by 'making "auto" the default to use rcParams credible interval'. Here, what does 'making "auto" the default' mean?

OriolAbril · 2020-03-11T15:39:29Z

Currently the credible_interval default is None, in which case, the value stored in rcParams is used. However, this should not be the behaviour, what should happen is:

credible_interval="auto" -> use value in rcParams (default)
credible_interval=None -> keep none and pass it to backend so that no credible interval line is plotted
credible_interval=<float in 0-1> -> use this value
Otherwise, raise an error.

You'll see in the code that the 2 last cases already behave as desired, however, the 2 first ones do not, this is what should be fixed.

smit-s · 2020-03-11T18:09:25Z

@OriolAbril , I have made the required changes but, I would like to highlight another possible problem related to this issue as follows-
import arviz as az idata = az.from_netcdf('arviz-hist-bug.nc') idata=az.convert_to_inference_data(idata) az.plot_posterior(idata.posterior['err_sd'],point_estimate=None,credible_interval=None)
When we run above code we get following error:

Trace
Traceback (most recent call last):
File "F:/ppts/gsoc/repro.py", line 5, in
az.plot_posterior(idata.posterior['err_sd'],point_estimate=None,credible_interval=None)
File "F:\ppts\gsoc\arvizinst\lib\site-packages\arviz-0.7.0-py3.8.egg\arviz\plots\posteriorplot.py", line 184, in plot_posterior
data = convert_to_dataset(data, group=group)
File "F:\ppts\gsoc\arvizinst\lib\site-packages\arviz-0.7.0-py3.8.egg\arviz\data\converters.py", line 170, in convert_to_dataset
inference_data = convert_to_inference_data(obj, group=group, coords=coords, dims=dims)
File "F:\ppts\gsoc\arvizinst\lib\site-packages\arviz-0.7.0-py3.8.egg\arviz\data\converters.py", line 126, in convert_to_inference_data
raise ValueError(
ValueError: Can only convert xarray dataset, dict, netcdf filename, numpy array, pystan fit, pymc3 trace, emcee fit, pyro mcmc fit, numpyro mcmc fit, cmdstan fit csv filename, cmdstanpy fit to InferenceData, not DataArray

Is this correct behaviour?
I think Dataarrays should be allowed. Actually case of dataarray is not handled in file converters.py. I tried converting dataarays to dataset by making changes to converters.py and it seems to work fine and is creating 4 seperate plots as shown

What is your opinion on this?

ahartikainen · 2020-03-11T18:23:24Z

It is not supported for now.

There are "problems" for assuming do we have draws + shape or chain + draws +shape.

Does Dataarray contain axis names? And coordinates? I mean if we have a good way handle these, then we should implement these.

OriolAbril · 2020-03-11T18:35:04Z

It would be a good idea to add dataarray case to convert to inference data. It'd be great if you could make a PR for each feature.

I have two comments on the code above. To plot only err_sd, you should use plot_posterior(idata, var_names="err_sd",...) which will give the same result.

The second comment is that even though credible interval is None, the line is still there 🤔

smit-s · 2020-03-11T19:31:10Z

@ahartikainen Yes, Dataarray has all these information. You can refer following text file
which shows structure of dataarray for the data used in this issue.:-
Dataarray_structure.txt

@OriolAbril I am sorry for posting the wrong image previously. The correct image for that code snippet is
.

And also, I would like to add support for DataArray if possible. But, first I will make a pull request for existing issue.
Also, for adding dataarray shall I create another issue or should I send a seperate pull request?

ahartikainen · 2020-03-11T19:38:32Z

You can create another PR for Dataarray

smit-s · 2020-03-11T19:43:00Z

Ok, I will create another PR.

OriolAbril added the Good first issue label Mar 3, 2020

smit-s mentioned this issue Mar 13, 2020

Fix credible_interval=None and plot_dist with ndim>2 #1115

Merged

4 tasks

ahartikainen closed this as completed in #1115 Mar 15, 2020

smit-s mentioned this issue Mar 15, 2020

Allow xarray.Dataarray input to plots. #1120

Merged

5 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Histogram distribution plot fails on hist, succeeds on KDE #1104

Histogram distribution plot fails on hist, succeeds on KDE #1104

rpgoldman commented Mar 3, 2020

OriolAbril commented Mar 3, 2020

rpgoldman commented Mar 3, 2020

smit-s commented Mar 10, 2020

OriolAbril commented Mar 10, 2020

smit-s commented Mar 11, 2020

OriolAbril commented Mar 11, 2020

smit-s commented Mar 11, 2020

ahartikainen commented Mar 11, 2020

OriolAbril commented Mar 11, 2020

smit-s commented Mar 11, 2020 •

edited

Loading

ahartikainen commented Mar 11, 2020

smit-s commented Mar 11, 2020

Histogram distribution plot fails on hist, succeeds on KDE #1104

Histogram distribution plot fails on hist, succeeds on KDE #1104

Comments

rpgoldman commented Mar 3, 2020

Backtrace

OriolAbril commented Mar 3, 2020

rpgoldman commented Mar 3, 2020

smit-s commented Mar 10, 2020

OriolAbril commented Mar 10, 2020

smit-s commented Mar 11, 2020

OriolAbril commented Mar 11, 2020

smit-s commented Mar 11, 2020

ahartikainen commented Mar 11, 2020

OriolAbril commented Mar 11, 2020

smit-s commented Mar 11, 2020 • edited Loading

ahartikainen commented Mar 11, 2020

smit-s commented Mar 11, 2020

smit-s commented Mar 11, 2020 •

edited

Loading