Fix, improve, complete 'training loss' computation for *2Vec models #2617

gojomo · 2019-10-01T16:40:52Z

Word2Vec training-loss isn't quite yet the epoch-based loss most would expect – as pending PR #2135 might address – but also Doc2Vec and FastText should offer functional, analogous reporting, and the docs should make clear what this loss is good for (monitoring training progress) and what it's not good for (assessing overall model fitness for downstream tasks).

(Loss for Doc2Vec looks like it might be there due to inherited interfaces, and was requested along with Word2Vec as in #1272, but that request was closed as a duplicate of #999, which wound up only implementing it for Word2Vec.)

The text was updated successfully, but these errors were encountered:

gojomo · 2020-07-12T01:18:17Z

In addition to adding loss tallying where it's missing (FastText, Doc2Vec)...

To address the potential multithreading issues (#2743), each thread should have its own loss-tally, only combined safely at the end of an epoch.

To address the precision issue of #2735, wider types should be used as appropriate. But also tallying the loss from a single call/batch into a local var first, before adding to a larger running total (that's potentially much larger and thus in lower-precision ranges of the floating-point implementation) could also help. And splitting tallying per thread, as above, could help as well.

Ensuring there's an easy way to get a loss summary from a single training-batch (or non-training inference in the case of Doc2Vec) might offer new/improved ways of doing a "does this text match this model's expectation" calculation, which might enable new uses (and/or replacing the old 'scoring' feature @mataddy added to some Word2Vec modes long ago).

Potentially even offering a way to tally loss per word (or other model aspect), if low-overhead, could also enable new insight into whether different parts of a model are relatively undertrained compared to others, or warnings when parts of a model are updated a lot without any updates to others (in the case of incremental training), or even dynamic choice of learning-rate per epoch or per word (as in Adagrad/etc).

Having loss-tracking really working might also allow a mode that avoids any explicit/fixed choice of epochs, just a "train 'til converged" option.

gojomo mentioned this issue Dec 27, 2019

Strange loss behaviour #2709

Closed

gojomo mentioned this issue Jan 27, 2020

Word2vec: loss tally maxes at 134217728.0 due to float32 limited-precision #2735

Open

gojomo mentioned this issue Feb 4, 2020

Word2vec: total loss suspiciously drops with worker count, probably thread-unsafe tallying #2743

Open

gojomo mentioned this issue Jul 7, 2020

KeyedVectors & *2Vec API streamlining, consistency #2698

Merged

gojomo mentioned this issue Jul 12, 2020

Can you add compute Loss feature (as seen in Word2Vec model) in the fasttext Implementation too. It will help us in sending a callback to monitor the loss at each epoch. #2658

Closed

gojomo mentioned this issue Aug 24, 2020

[early WIP] Fix/rationalize loss-tallying #2922

Draft

This was referenced Oct 19, 2020

track training loss while using doc2vec issue. #2983

Open

Faster evaluation metrics (baked into the library?) #2986

Closed

gojomo mentioned this issue Dec 14, 2020

Fix computation of Word2Vec loss & add loss value to logging string #2135

Open

gojomo mentioned this issue May 23, 2021

Vectorize word2vec.predict_output_word for speed #3153

Merged

gojomo mentioned this issue Jun 10, 2021

Implement shrink_windows argument for Word2Vec. #3169

Merged

gojomo mentioned this issue Sep 10, 2021

get_latest_training_loss returns 0 #3228

Open

TalIfargan mentioned this issue Oct 31, 2022

Fixed bug in loss computation for Word2Vec with hierarchical softmax #3397

Merged

gojomo mentioned this issue Oct 13, 2023

Vocabulary size is much smaller than requested #3500

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix, improve, complete 'training loss' computation for *2Vec models #2617

Fix, improve, complete 'training loss' computation for *2Vec models #2617

gojomo commented Oct 1, 2019

gojomo commented Jul 12, 2020

Fix, improve, complete 'training loss' computation for *2Vec models #2617

Fix, improve, complete 'training loss' computation for *2Vec models #2617

Comments

gojomo commented Oct 1, 2019

gojomo commented Jul 12, 2020