Skip to content
This repository has been archived by the owner on Dec 16, 2022. It is now read-only.

Implementation of Weighted CRF Tagger (handling unbalanced datasets) #341

Merged
merged 14 commits into from
Jul 14, 2022

Conversation

eraldoluis
Copy link
Contributor

Closes allennlp issue #4619.

Depends on allennlp PR #5676

Changes proposed in this pull request:

  • I implemented and experimentally compared three sample weighting strategies for CrfTagger.
  • I added two parameters to CrfTagger: label_weights and weight_strategy.
  • The parameter label_weights is a Dict[str, float] with a mapping {label : weight} to be used in the loss function in order to give different weights for each token depending on its label.
  • The parameter weight_strategy can be: None 'emission', 'emission_transition' or 'lannoy'.
  • If label_weights is given and weight_strategy is None or 'emission', then the emission score of each tag is multiplied by the corresponding weight (as given by label_weights).
  • If emission_transition, both emission and transition scores of each tag are multiplied by the corresponding weight.
  • If weight_strategy is 'lannoy', then we use the strategy proposed by Lannoy et al. (2019).
  • An experimental comparison among these three strategies and a brief discussion of their differences here.
  • Tests were created to cover the new feature.

Copy link
Member

@epwalsh epwalsh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

CHANGELOG.md Outdated Show resolved Hide resolved
@epwalsh epwalsh enabled auto-merge (squash) July 14, 2022 00:42
@epwalsh epwalsh merged commit 97df196 into allenai:main Jul 14, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Handling unbalanced datasets in the CRF tagger
2 participants