ppcplot needs refining #525

ahartikainen · 2019-01-12T21:33:14Z

Describe the bug

Multichain posterior_predictive will throw an error.

Also low count of data variables doesn't work.

To Reproduce

This works

idata = az.from_dict(
    posterior_predictive={"y" : np.random.randn(1,200,10), "x" : 3+np.random.randn(1,200,20)},
    observed_data={"y" : np.random.randn(10), "x" : 3+np.random.randn(20)}
            )

az.plot_ppc(idata)

This fails (multichain issue)

idata = az.from_dict(
    posterior_predictive={"y" : np.random.randn(4,200,10), "x" : 3+np.random.randn(4,200,20)},
    observed_data={"y" : np.random.randn(10), "x" : 3+np.random.randn(20)}
            )

az.plot_ppc(idata)

Errors

plot_ppc(data, kind, alpha, mean, figsize, textsize, data_pairs, var_names, coords, flatten, num_pp_samples, random_seed)
    205         if len(pp_vals.shape) > 2:
    206             pp_vals = pp_vals.reshape((pp_vals.shape[0], np.prod(pp_vals.shape[1:])))
--> 207         pp_sampled_vals = pp_vals[pp_sample_ix]
    208 
    209         if kind == "density":

IndexError: index 590 is out of bounds for axis 0 with size 4

This fails (low data count)

idata = az.from_dict(
    posterior_predictive={"y" : np.random.randn(1,200,10), "x" : 3+np.random.randn(1,200,3)},
    observed_data={"y" : np.random.randn(10), "x" : 3+np.random.randn(3)}
            )

az.plot_ppc(idata)

Errors

plot_ppc(data, kind, alpha, mean, figsize, textsize, data_pairs, var_names, coords, flatten, num_pp_samples, random_seed)
    234                 vals = np.array([vals]).flatten()
    235                 if dtype == "f":
--> 236                     pp_density, lower, upper = _fast_kde(vals)
    237                     pp_x = np.linspace(lower, upper, len(pp_density))
    238                     pp_densities.extend([pp_x, pp_density])

arviz\plots\kdeplot.py in _fast_kde(x, cumulative, bw)
    256 
    257     n_bins = min(int(len_x ** (1 / 3) * std_x * 2), 200)
--> 258     grid, _ = np.histogram(x, bins=n_bins)
    259 
    260     scotts_factor = len_x ** (-0.2)

~\miniconda3\envs\stan\lib\site-packages\numpy\lib\histograms.py in histogram(a, bins, range, normed, weights, density)
    674     a, weights = _ravel_and_check_weights(a, weights)
    675 
--> 676     bin_edges, uniform_bins = _get_bin_edges(a, bins, range, weights)
    677 
    678     # Histogram is an integer or a float array depending on the weights.

~\miniconda3\envs\stan\lib\site-packages\numpy\lib\histograms.py in _get_bin_edges(a, bins, range, weights)
    325                 '`bins` must be an integer, a string, or an array')
    326         if n_equal_bins < 1:
--> 327             raise ValueError('`bins` must be positive, when an integer')
    328 
    329         first_edge, last_edge = _get_outer_edges(a, range)

ValueError: `bins` must be positive, when an integer

We could have a few different options for ppcplot

Data from one distribution --> current plot fixed for multichain
Data with independent view --> each point gets unique "x" value and distribution of ppc is done with some form of distribution visualization
Data part of the model --> structured view (regression plot, hpdplot etc)

The text was updated successfully, but these errors were encountered:

ahartikainen · 2019-01-16T20:08:08Z

I will close this one as errors where fixed by #526

ahartikainen closed this as completed Jan 16, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ppcplot needs refining #525

ppcplot needs refining #525

ahartikainen commented Jan 12, 2019

ahartikainen commented Jan 16, 2019

ppcplot needs refining #525

ppcplot needs refining #525

Comments

ahartikainen commented Jan 12, 2019

ahartikainen commented Jan 16, 2019