Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ppcplot needs refining #525

Closed
ahartikainen opened this issue Jan 12, 2019 · 1 comment
Closed

ppcplot needs refining #525

ahartikainen opened this issue Jan 12, 2019 · 1 comment

Comments

@ahartikainen
Copy link
Contributor

Describe the bug

Multichain posterior_predictive will throw an error.

Also low count of data variables doesn't work.

To Reproduce

This works

idata = az.from_dict(
    posterior_predictive={"y" : np.random.randn(1,200,10), "x" : 3+np.random.randn(1,200,20)},
    observed_data={"y" : np.random.randn(10), "x" : 3+np.random.randn(20)}
            )

az.plot_ppc(idata)

This fails (multichain issue)

idata = az.from_dict(
    posterior_predictive={"y" : np.random.randn(4,200,10), "x" : 3+np.random.randn(4,200,20)},
    observed_data={"y" : np.random.randn(10), "x" : 3+np.random.randn(20)}
            )

az.plot_ppc(idata)

Errors

plot_ppc(data, kind, alpha, mean, figsize, textsize, data_pairs, var_names, coords, flatten, num_pp_samples, random_seed)
    205         if len(pp_vals.shape) > 2:
    206             pp_vals = pp_vals.reshape((pp_vals.shape[0], np.prod(pp_vals.shape[1:])))
--> 207         pp_sampled_vals = pp_vals[pp_sample_ix]
    208 
    209         if kind == "density":

IndexError: index 590 is out of bounds for axis 0 with size 4

This fails (low data count)

idata = az.from_dict(
    posterior_predictive={"y" : np.random.randn(1,200,10), "x" : 3+np.random.randn(1,200,3)},
    observed_data={"y" : np.random.randn(10), "x" : 3+np.random.randn(3)}
            )

az.plot_ppc(idata)

Errors

plot_ppc(data, kind, alpha, mean, figsize, textsize, data_pairs, var_names, coords, flatten, num_pp_samples, random_seed)
    234                 vals = np.array([vals]).flatten()
    235                 if dtype == "f":
--> 236                     pp_density, lower, upper = _fast_kde(vals)
    237                     pp_x = np.linspace(lower, upper, len(pp_density))
    238                     pp_densities.extend([pp_x, pp_density])

arviz\plots\kdeplot.py in _fast_kde(x, cumulative, bw)
    256 
    257     n_bins = min(int(len_x ** (1 / 3) * std_x * 2), 200)
--> 258     grid, _ = np.histogram(x, bins=n_bins)
    259 
    260     scotts_factor = len_x ** (-0.2)

~\miniconda3\envs\stan\lib\site-packages\numpy\lib\histograms.py in histogram(a, bins, range, normed, weights, density)
    674     a, weights = _ravel_and_check_weights(a, weights)
    675 
--> 676     bin_edges, uniform_bins = _get_bin_edges(a, bins, range, weights)
    677 
    678     # Histogram is an integer or a float array depending on the weights.

~\miniconda3\envs\stan\lib\site-packages\numpy\lib\histograms.py in _get_bin_edges(a, bins, range, weights)
    325                 '`bins` must be an integer, a string, or an array')
    326         if n_equal_bins < 1:
--> 327             raise ValueError('`bins` must be positive, when an integer')
    328 
    329         first_edge, last_edge = _get_outer_edges(a, range)

ValueError: `bins` must be positive, when an integer

We could have a few different options for ppcplot

  • Data from one distribution --> current plot fixed for multichain
  • Data with independent view --> each point gets unique "x" value and distribution of ppc is done with some form of distribution visualization
  • Data part of the model --> structured view (regression plot, hpdplot etc)
@ahartikainen
Copy link
Contributor Author

I will close this one as errors where fixed by #526

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant