Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The future is to combine MPNet with other language models innovations #15

Open
LifeIsStrange opened this issue Feb 13, 2022 · 0 comments

Comments

@LifeIsStrange
Copy link

LifeIsStrange commented Feb 13, 2022

For example, it could really make sense to adapt MPNet to preserve PLM but uses the approach of ELECTRA for MLM.
SpanBERT has some potential too (e.g on coreference resolution)
I believe this could really push the state of the art of accuracy on key tasks.

What do you think?
@StillKeepTry
@tan-xu

Moreover there are important low hanging fruits that have been consistently ignored by transformer researchers:

The activation function used should probably be https://github.com/digantamisra98/Mish
as it is the one that give the most accuracy gains in general. It can give 1% accuracy gains which is huge.

Secondly the optimizer you're using, Adam is flawed and you should use its rectified version:
https://github.com/LiyuanLucasLiu/RAdam
Moreover it can be optionally combined with a complementary optimizer:
https://github.com/michaelrzhang/lookahead

Moreover there are newer techniques for training that yield significant accuracy gains, such as:
https://github.com/Yonghongwei/Gradient-Centralization
And gradient normalization.

There is a library that integrate all those advances and more here:
https://github.com/lessw2020/Ranger21

Accuracy gains in NLP/NLU have reached a plateau. The reason is that researchers works far too much in isolation. They bring N new innovations per years but the number of researchers that attempt to use those innovations/optimization together can be counted on the fingers of one hand.

XLnet has been consistently ignored by researchers, you are the ones that saw the opportunity to combine the best of both worlds of BERT and XLnet. Why stop there?
As I said, both transformer/language model wise and activation function/optimizer wise there are a LOT of significant accuracy optimizations to integrate into the successor of MPNet.
Aggregating those optimizations could yield a revolutionary language model that would have 5-10% accuracy gains on average over existing SOTA. It would mark history.
No one will attempt to combine a wide range of those innovations, you are the only hope. I you do not do it, I'm afraid no one else will and NLU will stagnate for the decade to come.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant