-
-
Notifications
You must be signed in to change notification settings - Fork 4.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix, improve, complete 'training loss' computation for *2Vec models #2617
Comments
In addition to adding loss tallying where it's missing (FastText, Doc2Vec)... To address the potential multithreading issues (#2743), each thread should have its own loss-tally, only combined safely at the end of an epoch. To address the precision issue of #2735, wider types should be used as appropriate. But also tallying the loss from a single call/batch into a local var first, before adding to a larger running total (that's potentially much larger and thus in lower-precision ranges of the floating-point implementation) could also help. And splitting tallying per thread, as above, could help as well. Ensuring there's an easy way to get a loss summary from a single training-batch (or non-training inference in the case of Doc2Vec) might offer new/improved ways of doing a "does this text match this model's expectation" calculation, which might enable new uses (and/or replacing the old 'scoring' feature @mataddy added to some Word2Vec modes long ago). Potentially even offering a way to tally loss per word (or other model aspect), if low-overhead, could also enable new insight into whether different parts of a model are relatively undertrained compared to others, or warnings when parts of a model are updated a lot without any updates to others (in the case of incremental training), or even dynamic choice of learning-rate per epoch or per word (as in Adagrad/etc). Having loss-tracking really working might also allow a mode that avoids any explicit/fixed choice of |
Word2Vec training-loss isn't quite yet the epoch-based loss most would expect – as pending PR #2135 might address – but also
Doc2Vec
andFastText
should offer functional, analogous reporting, and the docs should make clear what this loss is good for (monitoring training progress) and what it's not good for (assessing overall model fitness for downstream tasks).(Loss for
Doc2Vec
looks like it might be there due to inherited interfaces, and was requested along withWord2Vec
as in #1272, but that request was closed as a duplicate of #999, which wound up only implementing it forWord2Vec
.)The text was updated successfully, but these errors were encountered: