You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I am in the process of converting the fairseq training code for MPNet into a training loop that is compatible with Huggingface. Although many of the convenience classes already exist in Huggingface (like MPNetForMaskedLM), one thing that has become clear to us is that we will need to port over the collator function in MaskedDataset (under tasks/masked_permutation_lm).
In exploring how this collator works, I understand the logic as:
Permute input IDs (based on whole word spans or tokens via arg) and positions
Create masked/corrupted tokens based on the final n indices of the permuted sequence, where n is the prediction size (i.e. seq_len x 0.15 at default values)
Concat these together using concat(seq, mask, mask) and concat(positions, predict_positions, predict_positions)
Using this logic, we might expect the collator function to perform the below operation on some dummy input IDs:
However, after rereading the MPNet paper, especially section 2.2 and 2.3 with attention on Figure 2, it would SEEM that the output of the collator is incongruous with what is described in these sections.
Figure 2 points out that the content and query masks are built using a permuted sequence that looks like:
In this example within the paper, we are masking the pred_len tokens and then appending the content to the end for the content stream. However, the collator output KEEPS the token content in the main sequence, and then adds TWO batches of mask tokens to the end, which to me seems necessarily different than what's described in the paper. Referring back to our dummy example above, I can outline the discrepancies I'm seeing:
My question, then, is this: am I correct in understanding that the collator implementation is different than what's described in the paper? If so, why?
The text was updated successfully, but these errors were encountered:
Hi all on the MPNet research team,
I am in the process of converting the fairseq training code for MPNet into a training loop that is compatible with Huggingface. Although many of the convenience classes already exist in Huggingface (like
MPNetForMaskedLM
), one thing that has become clear to us is that we will need to port over the collator function inMaskedDataset
(undertasks/masked_permutation_lm
).In exploring how this collator works, I understand the logic as:
n
indices of the permuted sequence, wheren
is the prediction size (i.e.seq_len
x0.15
at default values)concat(seq, mask, mask)
andconcat(positions, predict_positions, predict_positions)
Using this logic, we might expect the collator function to perform the below operation on some dummy input IDs:
However, after rereading the MPNet paper, especially section 2.2 and 2.3 with attention on Figure 2, it would SEEM that the output of the collator is incongruous with what is described in these sections.
Figure 2 points out that the content and query masks are built using a permuted sequence that looks like:
In this example within the paper, we are masking the
pred_len
tokens and then appending the content to the end for the content stream. However, the collator output KEEPS the token content in the main sequence, and then adds TWO batches of mask tokens to the end, which to me seems necessarily different than what's described in the paper. Referring back to our dummy example above, I can outline the discrepancies I'm seeing:My question, then, is this: am I correct in understanding that the collator implementation is different than what's described in the paper? If so, why?
The text was updated successfully, but these errors were encountered: