-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
pclean.cc integration tests are failing #233
Comments
Maybe it isn't roundoff error, when I change the max length of the Time string from 40 to 30 the logp discrepancy is higher:
|
I commented out the failing logp checks in CleanRelation to try to get some insight into why the Dirichlet hparams were NaN, and it appears there are NaNs in the counts vector of the bigram insertions distribution.
|
This looks weird to me: https://github.com/probcomp/hierarchical-irm/blob/master/cxx/emissions/bigram_string.cc#L138
We're adding up exp(a.cost) to get total prob, but then we're weighting a.cost without the exp (and dividing by total_prob). I suspect it should be exp(a.cost) on the last line of the snippet too, but when I make that change just to try it, the code hangs. |
The assertions in CleanRelation::logp_gibbs_exact* are failing, apparently due to roundoff error. When I comment out the assertions, it still crashes with "Warning: all Dirichlet hyperparameters give nans!"
Below is the log when I run
./bazel-bin/pclean/pclean --schema=assets/flights.schema --obs=assets/flights_dirty.10.csv --iters=5 --output=/tmp/flights.out
on the branch 100424-emilyaf-bigram-debug.
The text was updated successfully, but these errors were encountered: