Skip to content

Model Parameters

Jieyu Zhang edited this page Sep 24, 2021 · 1 revision

General

A Unified End Model

For pytorch-based models, wrench supports change optimizer, lr scheduler, and backbone by arguments, we take the EndClassifierModel as an example.

Optimizer & LR Scheduler

We adopt optimizer and lr scheduler in pytorch. For example, if we want to use Adam optimizer, with lr=1e-2 and weight_decay=1e-5, we could do

model = EndClassifierModel(
    batch_size=128,
    test_batch_size=512,
    n_steps=10000,
    backbone='MLP',
    optimizer='Adam',
    optimizer_lr=1e-2,
    optimizer_weight_decay=0.0,
)

or

model = EndClassifierModel(
    batch_size=128,
    test_batch_size=512,
    n_steps=10000,
    backbone='MLP',
)
model.fit(
    dataset_train=train_data,
    y_train=aggregated_hard_labels,
    dataset_valid=valid_data,
    optimizer='Adam',
    optimizer_lr=1e-2,
    optimizer_weight_decay=0.0,
)

Similarly, for lr scheduler, if we want to use StepLR with step_size=10, we could dp

model = EndClassifierModel(
    batch_size=128,
    test_batch_size=512,
    n_steps=10000,
    backbone='MLP',
    optimizer='Adam',
    optimizer_lr=1e-2,
    optimizer_weight_decay=0.0,
    use_lr_scheduler=True,
    lr_scheduler='StepLR',
    lr_scheduler_step_size=10,
)

or

model = EndClassifierModel(
    batch_size=128,
    test_batch_size=512,
    n_steps=10000,
    backbone='MLP',
    use_lr_scheduler=True,
)
model.fit(
    dataset_train=train_data,
    y_train=aggregated_hard_labels,
    dataset_valid=valid_data,
    optimizer='Adam',
    optimizer_lr=1e-2,
    optimizer_weight_decay=0.0,
    lr_scheduler='StepLR',
    lr_scheduler_step_size=10,
)

Note that the argument use_lr_scheduler should be set to True when initializing the model

BackBone

The EndClassifierModel supports three backbones, i.e., LogReg (Logistic Regression), MLP and BERT. We take BERT as an example:

model = EndClassifierModel(
    batch_size=128,
    test_batch_size=512,
    n_steps=10000,
    backbone='BERT',
    backbone_model_name='bert-based-uncased',
)

We list the parameter for the three backbones as below:

LogReg

no parameters

MLP

hidden_size: the number of neurons of the hidden layer, default is 100.

dropout: default is 0.0.

BERT

Note that wrench will automatically decide which BERT model to use for different tasks. That is, for text classification, it uses BertTextClassifier; for relation classification, it uses BertRelationClassifier.

model_name: the name of pretrained models in HuggingFace, default is 'bert-base-cased'.

fine_tune_layers: default is -1, which means fine-tuning all layers. If fine_tune_layers>=0, the last fine_tune_layers layers will be fine-tuned.

max_length: for BertTextClassifier only, how many tokens to be considered.

Classification

Label Model

No parameters

n_epochs: the maximum number of epochs.

tolerance: the learning will be terminated when the differences of parameters are less than tolerance.

n_epochs: the number of epochs.

lr: the learning rate.

l2: the weight of regularization term.

n_epochs: the number of epochs.

lr: the learning rate.

l2: the weight of the regularization term.

No parameters

End Model

Logistic Regression, MLP and BERT (see above)

Cosine is a pytorch-based model, so the optimizer, lr scheduler and backbone could be changed in the same way as EndClassifierModel, but Cosine does not support LogReg as backbone.

teacher_update: The number of steps for updating the teacher model.

margin: The distance threshold for calculating the contrastive loss.

thresh: The threshold for sampling high-confidence data.

lamda: The weight for confident regularization loss.

Joint Model

Denoise is a pytorch-based model, so the optimizer, lr scheduler and backbone could be changed in the same way as EndClassifierModel, but Cosine does not support BERT as backbone.

alpha: The moving weight for aggregating the prediction of the current epoch and the previous epoch.

c1: The weight for the loss on the rule denoising module.

c2: The weight for the loss on the feature-based neural classifier module.

hidden_size: The dimension of hidden layers in the label denoiser and the feature-based neural classifier.

Sequence Tagging

Label Model