Add label smoothing to CopyNet #287

JohnGiorgi · 2021-06-28T13:18:28Z

This PR adds label smoothing to CopyNetSeq2Seq. As discussed in allenai/allennlp#5276, label smoothing is added to the generation scores only. It is mostly a re-working of the existing label smoothing code in sequence_cross_entropy_with_logits.

As a sanity check, I ran the code with my own model. A model with a small label_smoothing value reaches similar performance as a model with label_smoothing == 0.0. As for additional unit tests, I think a modification of test_get_ll_contrib might make the most sense.

dirkgr

Looks good, just a small comment.

dirkgr · 2021-06-30T00:57:34Z

allennlp_models/generation/models/copynet_seq2seq.py

+            one_hot_targets = torch.zeros_like(log_probs).scatter_(
+                -1, target_tokens.unsqueeze(1), 1.0 - self._label_smoothing
+            )
+            smoothed_targets = one_hot_targets + smoothing_value


It seems like it would be faster to start with torch.full_like(log_probs, smoothing_value) and go from there? I did not measure it though.

one_hot_targets isn't used anywhere else, is it? It's not even properly "one hot" like this. Maybe it should be called one_warm_targets 🤣 .

It seems like it would be faster to start with torch.full_like(log_probs, smoothing_value) and go from there? I did not measure it though.

Something like this?

one_hot_targets = torch.full_like(log_probs, smoothing_value).scatter_( 1, target_tokens.unsqueeze(1), 1.0 - self._label_smoothing + smoothing_value )

It appears to be slightly faster based on a quick test:

one_hot_targets isn't used anywhere else, is it? It's not even properly "one hot" like this. Maybe it should be called one_warm_targets 🤣 .

Good point! I am stealing the variable one_hot_targets right from here. If we switch to the approach above, we could just call it smoothed_targets

smoothed_targets = torch.full_like(log_probs, smoothing_value).scatter_( 1, target_tokens.unsqueeze(1), 1.0 - self._label_smoothing + smoothing_value )

Might be worth updating https://github.com/allenai/allennlp/blob/cf113d705b9054d329c67cf9bb29cbc3f191015d/allennlp/nn/util.py#L825-L828 to use this micro-optimization

dirkgr · 2021-06-30T00:59:31Z

Oh, I see it's also missing a changelog entry. This definitely has enough magnitude to warrant one.

JohnGiorgi · 2021-06-30T15:32:37Z

Oh, I see it's also missing a changelog entry. This definitely has enough magnitude to warrant one.

Done!

dirkgr · 2021-06-30T21:39:57Z

Thanks! I'll make the switch in util.py.

Add label smoothing to CopyNet

9168c37

JohnGiorgi mentioned this pull request Jun 28, 2021

Add label smoothing to CopyNetSeq2Seq allenai/allennlp#5276

Closed

Remove erroneous sign flip

e22f90b

dirkgr approved these changes Jun 30, 2021

View reviewed changes

Update changelog

c9d0209

JohnGiorgi added 2 commits June 30, 2021 14:57

Micro-optimization using torch.full_like

4cb85cb

Use index 1 to be consistant

941678f

dirkgr merged commit 07fa124 into allenai:main Jun 30, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add label smoothing to CopyNet #287

Add label smoothing to CopyNet #287

JohnGiorgi commented Jun 28, 2021 •

edited

Loading

dirkgr left a comment

dirkgr Jun 30, 2021

JohnGiorgi Jun 30, 2021 •

edited

Loading

JohnGiorgi Jun 30, 2021

dirkgr commented Jun 30, 2021

JohnGiorgi commented Jun 30, 2021

dirkgr commented Jun 30, 2021

Add label smoothing to CopyNet #287

Add label smoothing to CopyNet #287

Conversation

JohnGiorgi commented Jun 28, 2021 • edited Loading

dirkgr left a comment

Choose a reason for hiding this comment

dirkgr Jun 30, 2021

Choose a reason for hiding this comment

JohnGiorgi Jun 30, 2021 • edited Loading

Choose a reason for hiding this comment

JohnGiorgi Jun 30, 2021

Choose a reason for hiding this comment

dirkgr commented Jun 30, 2021

JohnGiorgi commented Jun 30, 2021

dirkgr commented Jun 30, 2021

JohnGiorgi commented Jun 28, 2021 •

edited

Loading

JohnGiorgi Jun 30, 2021 •

edited

Loading