Skip to content
This repository has been archived by the owner on Dec 16, 2022. It is now read-only.

Add label smoothing to CopyNet #287

Merged
merged 5 commits into from
Jun 30, 2021

Conversation

JohnGiorgi
Copy link
Contributor

@JohnGiorgi JohnGiorgi commented Jun 28, 2021

This PR adds label smoothing to CopyNetSeq2Seq. As discussed in allenai/allennlp#5276, label smoothing is added to the generation scores only. It is mostly a re-working of the existing label smoothing code in sequence_cross_entropy_with_logits.

As a sanity check, I ran the code with my own model. A model with a small label_smoothing value reaches similar performance as a model with label_smoothing == 0.0. As for additional unit tests, I think a modification of test_get_ll_contrib might make the most sense.

Copy link
Member

@dirkgr dirkgr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, just a small comment.

Comment on lines 457 to 460
one_hot_targets = torch.zeros_like(log_probs).scatter_(
-1, target_tokens.unsqueeze(1), 1.0 - self._label_smoothing
)
smoothed_targets = one_hot_targets + smoothing_value
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems like it would be faster to start with torch.full_like(log_probs, smoothing_value) and go from there? I did not measure it though.

one_hot_targets isn't used anywhere else, is it? It's not even properly "one hot" like this. Maybe it should be called one_warm_targets 🤣 .

Copy link
Contributor Author

@JohnGiorgi JohnGiorgi Jun 30, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems like it would be faster to start with torch.full_like(log_probs, smoothing_value) and go from there? I did not measure it though.

Something like this?

one_hot_targets = torch.full_like(log_probs, smoothing_value).scatter_(
    1, target_tokens.unsqueeze(1), 1.0 - self._label_smoothing + smoothing_value
)

It appears to be slightly faster based on a quick test:

image

one_hot_targets isn't used anywhere else, is it? It's not even properly "one hot" like this. Maybe it should be called one_warm_targets 🤣 .

Good point! I am stealing the variable one_hot_targets right from here. If we switch to the approach above, we could just call it smoothed_targets

smoothed_targets = torch.full_like(log_probs, smoothing_value).scatter_(
    1, target_tokens.unsqueeze(1), 1.0 - self._label_smoothing + smoothing_value
)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dirkgr
Copy link
Member

dirkgr commented Jun 30, 2021

Oh, I see it's also missing a changelog entry. This definitely has enough magnitude to warrant one.

@JohnGiorgi
Copy link
Contributor Author

Oh, I see it's also missing a changelog entry. This definitely has enough magnitude to warrant one.

Done!

@dirkgr
Copy link
Member

dirkgr commented Jun 30, 2021

Thanks! I'll make the switch in util.py.

@dirkgr dirkgr merged commit 07fa124 into allenai:main Jun 30, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants