-
Notifications
You must be signed in to change notification settings - Fork 31
Model Parameters
A Unified End Model
For pytorch-based models, wrench supports change optimizer, lr scheduler, and backbone by arguments, we take the EndClassifierModel
as an example.
We adopt optimizer and lr scheduler in pytorch. For example, if we want to use Adam
optimizer, with lr=1e-2
and weight_decay=1e-5
, we could do
model = EndClassifierModel(
batch_size=128,
test_batch_size=512,
n_steps=10000,
backbone='MLP',
optimizer='Adam',
optimizer_lr=1e-2,
optimizer_weight_decay=0.0,
)
or
model = EndClassifierModel(
batch_size=128,
test_batch_size=512,
n_steps=10000,
backbone='MLP',
)
model.fit(
dataset_train=train_data,
y_train=aggregated_hard_labels,
dataset_valid=valid_data,
optimizer='Adam',
optimizer_lr=1e-2,
optimizer_weight_decay=0.0,
)
Similarly, for lr scheduler, if we want to use StepLR
with step_size=10
, we could dp
model = EndClassifierModel(
batch_size=128,
test_batch_size=512,
n_steps=10000,
backbone='MLP',
optimizer='Adam',
optimizer_lr=1e-2,
optimizer_weight_decay=0.0,
use_lr_scheduler=True,
lr_scheduler='StepLR',
lr_scheduler_step_size=10,
)
or
model = EndClassifierModel(
batch_size=128,
test_batch_size=512,
n_steps=10000,
backbone='MLP',
use_lr_scheduler=True,
)
model.fit(
dataset_train=train_data,
y_train=aggregated_hard_labels,
dataset_valid=valid_data,
optimizer='Adam',
optimizer_lr=1e-2,
optimizer_weight_decay=0.0,
lr_scheduler='StepLR',
lr_scheduler_step_size=10,
)
Note that the argument use_lr_scheduler
should be set to True
when initializing the model
The EndClassifierModel
supports three backbones, i.e., LogReg
(Logistic Regression), MLP
and BERT
. We take BERT
as an example:
model = EndClassifierModel(
batch_size=128,
test_batch_size=512,
n_steps=10000,
backbone='BERT',
backbone_model_name='bert-based-uncased',
)
We list the parameter for the three backbones as below:
no parameters
hidden_size
: the number of neurons of the hidden layer, default is 100.
dropout
: default is 0.0.
Note that wrench will automatically decide which BERT
model to use for different tasks. That is, for text classification, it uses BertTextClassifier
; for relation classification, it uses BertRelationClassifier
.
model_name
: the name of pretrained models in HuggingFace, default is 'bert-base-cased'
.
fine_tune_layers
: default is -1, which means fine-tuning all layers. If fine_tune_layers>=0
, the last fine_tune_layers
layers will be fine-tuned.
max_length
: for BertTextClassifier
only, how many tokens to be considered.
No parameters
n_epochs
: the maximum number of epochs.
tolerance
: the learning will be terminated when the differences of parameters are less than tolerance
.
n_epochs
: the number of epochs.
lr
: the learning rate.
l2
: the weight of regularization term.
n_epochs
: the number of epochs.
lr
: the learning rate.
l2
: the weight of the regularization term.
No parameters
Cosine
is a pytorch-based model, so the optimizer, lr scheduler and backbone could be changed in the same way as EndClassifierModel
, but Cosine
does not support LogReg
as backbone.
teacher_update
: The number of steps for updating the teacher model.
margin
: The distance threshold for calculating the contrastive loss.
thresh
: The threshold for sampling high-confidence data.
lamda
: The weight for confident regularization loss.
Denoise
is a pytorch-based model, so the optimizer, lr scheduler and backbone could be changed in the same way as EndClassifierModel
, but Cosine
does not support BERT
as backbone.
alpha
: The moving weight for aggregating the prediction of the current epoch and the previous epoch.
c1
: The weight for the loss on the rule denoising module.
c2
: The weight for the loss on the feature-based neural classifier module.
hidden_size
: The dimension of hidden layers in the label denoiser and the feature-based neural classifier.