-
Notifications
You must be signed in to change notification settings - Fork 26.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix CI with change of name of nlp #7054
Conversation
Codecov Report
@@ Coverage Diff @@
## master #7054 +/- ##
==========================================
+ Coverage 78.74% 80.85% +2.11%
==========================================
Files 168 168
Lines 32172 32172
==========================================
+ Hits 25335 26014 +679
+ Misses 6837 6158 -679
Continue to review full report at Codecov.
|
Merging to make the CI green but happy to address any comment in a follow-up PR. |
|
and after |
I think you need an install from source. |
not sure what you mean? install from source I did:
in transformers |
It's not in the depencies of transformers and requires a separate install from source for now. It does work on the CI and my machine:
|
I did what you suggested, same failures. |
Are you sure you are in the same env? nlp was never in |
I have 2 gpus, you probably don't? Indeed, if I run:
it works. |
Yup, it's multi-gpu that is the problem. It works if I do |
Mmmm, why would the multi-gpu not see a new module. That's weird. |
Oh this is a different error, not a missing model. Looks like those tests need a decorator to run on the CPU only. |
* nlp -> datasets * More nlp -> datasets * Woopsie * More nlp -> datasets * One last
|
Yes, like I said you need a separate source install of it. You can't have a source install from dev that is properly up to date AFAIK. Documented this in #7058 |
* ready for PR * cleanup * correct FSMT_PRETRAINED_MODEL_ARCHIVE_LIST * fix * perfectionism * revert change from another PR * odd, already committed this one * non-interactive upload workaround * backup the failed experiment * store langs in config * workaround for localizing model path * doc clean up as in #6956 * style * back out debug mode * document: run_eval.py --num_beams 10 * remove unneeded constant * typo * re-use bart's Attention * re-use EncoderLayer, DecoderLayer from bart * refactor * send to cuda and fp16 * cleanup * revert (moved to another PR) * better error message * document run_eval --num_beams * solve the problem of tokenizer finding the right files when model is local * polish, remove hardcoded config * add a note that the file is autogenerated to avoid losing changes * prep for org change, remove unneeded code * switch to model4.pt, update scores * s/python/bash/ * missing init (but doesn't impact the finetuned model) * cleanup * major refactor (reuse-bart) * new model, new expected weights * cleanup * cleanup * full link * fix model type * merge porting notes * style * cleanup * have to create a DecoderConfig object to handle vocab_size properly * doc fix * add note (not a public class) * parametrize * - add bleu scores integration tests * skip test if sacrebleu is not installed * cache heavy models/tokenizers * some tweaks * remove tokens that aren't used * more purging * simplify code * switch to using decoder_start_token_id * add doc * Revert "major refactor (reuse-bart)" This reverts commit 226dad1. * decouple from bart * remove unused code #1 * remove unused code #2 * remove unused code #3 * update instructions * clean up * move bleu eval to examples * check import only once * move data+gen script into files * reuse via import * take less space * add prepare_seq2seq_batch (auto-tested) * cleanup * recode test to use json instead of yaml * ignore keys not needed * use the new -y in transformers-cli upload -y * [xlm tok] config dict: fix str into int to match definition (#7034) * [s2s] --eval_max_generate_length (#7018) * Fix CI with change of name of nlp (#7054) * nlp -> datasets * More nlp -> datasets * Woopsie * More nlp -> datasets * One last * extending to support allen_nlp wmt models - allow a specific checkpoint file to be passed - more arg settings - scripts for allen_nlp models * sync with changes * s/fsmt-wmt/wmt/ in model names * s/fsmt-wmt/wmt/ in model names (p2) * s/fsmt-wmt/wmt/ in model names (p3) * switch to a better checkpoint * typo * make non-optional args such - adjust tests where possible or skip when there is no other choice * consistency * style * adjust header * cards moved (model rename) * use best custom hparams * update info * remove old cards * cleanup * s/stas/facebook/ * update scores * s/allen_nlp/allenai/ * url maps aren't needed * typo * move all the doc / build /eval generators to their own scripts * cleanup * Apply suggestions from code review Co-authored-by: Lysandre Debut <[email protected]> * Apply suggestions from code review Co-authored-by: Lysandre Debut <[email protected]> * fix indent * duplicated line * style * use the correct add_start_docstrings * oops * resizing can't be done with the core approach, due to 2 dicts * check that the arg is a list * style * style Co-authored-by: Sam Shleifer <[email protected]> Co-authored-by: Sylvain Gugger <[email protected]> Co-authored-by: Lysandre Debut <[email protected]>
* ready for PR * cleanup * correct FSMT_PRETRAINED_MODEL_ARCHIVE_LIST * fix * perfectionism * revert change from another PR * odd, already committed this one * non-interactive upload workaround * backup the failed experiment * store langs in config * workaround for localizing model path * doc clean up as in huggingface#6956 * style * back out debug mode * document: run_eval.py --num_beams 10 * remove unneeded constant * typo * re-use bart's Attention * re-use EncoderLayer, DecoderLayer from bart * refactor * send to cuda and fp16 * cleanup * revert (moved to another PR) * better error message * document run_eval --num_beams * solve the problem of tokenizer finding the right files when model is local * polish, remove hardcoded config * add a note that the file is autogenerated to avoid losing changes * prep for org change, remove unneeded code * switch to model4.pt, update scores * s/python/bash/ * missing init (but doesn't impact the finetuned model) * cleanup * major refactor (reuse-bart) * new model, new expected weights * cleanup * cleanup * full link * fix model type * merge porting notes * style * cleanup * have to create a DecoderConfig object to handle vocab_size properly * doc fix * add note (not a public class) * parametrize * - add bleu scores integration tests * skip test if sacrebleu is not installed * cache heavy models/tokenizers * some tweaks * remove tokens that aren't used * more purging * simplify code * switch to using decoder_start_token_id * add doc * Revert "major refactor (reuse-bart)" This reverts commit 226dad1. * decouple from bart * remove unused code #1 * remove unused code #2 * remove unused code huggingface#3 * update instructions * clean up * move bleu eval to examples * check import only once * move data+gen script into files * reuse via import * take less space * add prepare_seq2seq_batch (auto-tested) * cleanup * recode test to use json instead of yaml * ignore keys not needed * use the new -y in transformers-cli upload -y * [xlm tok] config dict: fix str into int to match definition (huggingface#7034) * [s2s] --eval_max_generate_length (huggingface#7018) * Fix CI with change of name of nlp (huggingface#7054) * nlp -> datasets * More nlp -> datasets * Woopsie * More nlp -> datasets * One last * extending to support allen_nlp wmt models - allow a specific checkpoint file to be passed - more arg settings - scripts for allen_nlp models * sync with changes * s/fsmt-wmt/wmt/ in model names * s/fsmt-wmt/wmt/ in model names (p2) * s/fsmt-wmt/wmt/ in model names (p3) * switch to a better checkpoint * typo * make non-optional args such - adjust tests where possible or skip when there is no other choice * consistency * style * adjust header * cards moved (model rename) * use best custom hparams * update info * remove old cards * cleanup * s/stas/facebook/ * update scores * s/allen_nlp/allenai/ * url maps aren't needed * typo * move all the doc / build /eval generators to their own scripts * cleanup * Apply suggestions from code review Co-authored-by: Lysandre Debut <[email protected]> * Apply suggestions from code review Co-authored-by: Lysandre Debut <[email protected]> * fix indent * duplicated line * style * use the correct add_start_docstrings * oops * resizing can't be done with the core approach, due to 2 dicts * check that the arg is a list * style * style Co-authored-by: Sam Shleifer <[email protected]> Co-authored-by: Sylvain Gugger <[email protected]> Co-authored-by: Lysandre Debut <[email protected]>
* nlp -> datasets * More nlp -> datasets * Woopsie * More nlp -> datasets * One last
* ready for PR * cleanup * correct FSMT_PRETRAINED_MODEL_ARCHIVE_LIST * fix * perfectionism * revert change from another PR * odd, already committed this one * non-interactive upload workaround * backup the failed experiment * store langs in config * workaround for localizing model path * doc clean up as in #6956 * style * back out debug mode * document: run_eval.py --num_beams 10 * remove unneeded constant * typo * re-use bart's Attention * re-use EncoderLayer, DecoderLayer from bart * refactor * send to cuda and fp16 * cleanup * revert (moved to another PR) * better error message * document run_eval --num_beams * solve the problem of tokenizer finding the right files when model is local * polish, remove hardcoded config * add a note that the file is autogenerated to avoid losing changes * prep for org change, remove unneeded code * switch to model4.pt, update scores * s/python/bash/ * missing init (but doesn't impact the finetuned model) * cleanup * major refactor (reuse-bart) * new model, new expected weights * cleanup * cleanup * full link * fix model type * merge porting notes * style * cleanup * have to create a DecoderConfig object to handle vocab_size properly * doc fix * add note (not a public class) * parametrize * - add bleu scores integration tests * skip test if sacrebleu is not installed * cache heavy models/tokenizers * some tweaks * remove tokens that aren't used * more purging * simplify code * switch to using decoder_start_token_id * add doc * Revert "major refactor (reuse-bart)" This reverts commit 226dad1. * decouple from bart * remove unused code #1 * remove unused code #2 * remove unused code #3 * update instructions * clean up * move bleu eval to examples * check import only once * move data+gen script into files * reuse via import * take less space * add prepare_seq2seq_batch (auto-tested) * cleanup * recode test to use json instead of yaml * ignore keys not needed * use the new -y in transformers-cli upload -y * [xlm tok] config dict: fix str into int to match definition (#7034) * [s2s] --eval_max_generate_length (#7018) * Fix CI with change of name of nlp (#7054) * nlp -> datasets * More nlp -> datasets * Woopsie * More nlp -> datasets * One last * extending to support allen_nlp wmt models - allow a specific checkpoint file to be passed - more arg settings - scripts for allen_nlp models * sync with changes * s/fsmt-wmt/wmt/ in model names * s/fsmt-wmt/wmt/ in model names (p2) * s/fsmt-wmt/wmt/ in model names (p3) * switch to a better checkpoint * typo * make non-optional args such - adjust tests where possible or skip when there is no other choice * consistency * style * adjust header * cards moved (model rename) * use best custom hparams * update info * remove old cards * cleanup * s/stas/facebook/ * update scores * s/allen_nlp/allenai/ * url maps aren't needed * typo * move all the doc / build /eval generators to their own scripts * cleanup * Apply suggestions from code review Co-authored-by: Lysandre Debut <[email protected]> * Apply suggestions from code review Co-authored-by: Lysandre Debut <[email protected]> * fix indent * duplicated line * style * use the correct add_start_docstrings * oops * resizing can't be done with the core approach, due to 2 dicts * check that the arg is a list * style * style Co-authored-by: Sam Shleifer <[email protected]> Co-authored-by: Sylvain Gugger <[email protected]> Co-authored-by: Lysandre Debut <[email protected]>
* nlp -> datasets * More nlp -> datasets * Woopsie * More nlp -> datasets * One last
* nlp -> datasets * More nlp -> datasets * Woopsie * More nlp -> datasets * One last
* nlp -> datasets * More nlp -> datasets * Woopsie * More nlp -> datasets * One last
* WIP flax bert * Initial commit Bert Jax/Flax implementation. * Embeddings working and equivalent to PyTorch. * Move embeddings in its own module BertEmbeddings * Added jax.jit annotation on forward call * BertEncoder on par with PyTorch ! :D * Add BertPooler on par with PyTorch !! * Working Jax+Flax implementation of BertModel with < 1e-5 differences on the last layer. * Fix pooled output to take only the first token of the sequence. * Refactoring to use BertConfig from transformers. * Renamed FXBertModel to FlaxBertModel * Model is now initialized in FlaxBertModel constructor and reused. * WIP JaxPreTrainedModel * Cleaning up the code of FlaxBertModel * Added ability to load Flax model saved through save_pretrained() * Added ability to convert Pytorch Bert model to FlaxBert * FlaxBert can now load every Pytorch Bert model with on-the-fly conversion * Fix hardcoded shape values in conversion scripts. * Improve the way we handle LayerNorm conversion from PyTorch to Flax. * Added positional embeddings as parameter of BertModel with default to np.arange. * Let's roll FlaxRoberta ! * Fix missing position_ids parameters on predict for Bert * Flax backend now supports batched inputs Signed-off-by: Morgan Funtowicz <[email protected]> * Make it possible to load msgpacked model on convert from pytorch in last resort. Signed-off-by: Morgan Funtowicz <[email protected]> * Moved save_pretrained to Jax base class along with more constructor parameters. * Use specialized, model dependent conversion functio. * Expose `is_flax_available` in file_utils. * Added unittest for Flax models. * Added run_tests_flax to the CI. * Introduce FlaxAutoModel * Added more unittests * Flax model reference the _MODEL_ARCHIVE_MAP from PyTorch model. * Addressing review comments. * Expose seed in both Bert and Roberta * Fix typo suggested by @stefan-it Co-Authored-By: Stefan Schweter <[email protected]> * Attempt to make style * Attempt to make style in tests too * Added jax & jaxlib to the flax optional dependencies. * Attempt to fix flake8 warnings ... * Redo black again and again * When black and flake8 fight each other for a space ... 💥 💥 💥 * Try removing trailing comma to make both black and flake happy! * Fix invalid is_<framework>_available call, thanks @LysandreJik 🎉 * Fix another invalid import in flax_roberta test * Bump and pin flax release to 0.1.0. * Make flake8 happy, remove unused jax import * Change the type of the catch for msgpack. * Remove unused import. * Put seed as optional constructor parameter. * trigger ci again * Fix too much parameters in BertAttention. * Formatting. * Simplify Flax unittests to avoid machine crashes. * Fix invalid number of arguments when raising issue for an unknown model. * Address @bastings comment in PR, moving jax.jit decorated outside of __call__ * Fix incorrect path to require_flax/require_pytorch functions. Signed-off-by: Morgan Funtowicz <[email protected]> * Attempt to make style. Signed-off-by: Morgan Funtowicz <[email protected]> * Correct rebasing of circle-ci dependencies Signed-off-by: Morgan Funtowicz <[email protected]> * Fix import sorting. Signed-off-by: Morgan Funtowicz <[email protected]> * Fix unused imports. Signed-off-by: Morgan Funtowicz <[email protected]> * Again import sorting... Signed-off-by: Morgan Funtowicz <[email protected]> * Installing missing nlp dependency for flax unittests. Signed-off-by: Morgan Funtowicz <[email protected]> * Fix laoding of model for Flax implementations. Signed-off-by: Morgan Funtowicz <[email protected]> * jit the inner function call to make JAX-compatible Signed-off-by: Morgan Funtowicz <[email protected]> * Format ! Signed-off-by: Morgan Funtowicz <[email protected]> * Flake one more time 🎶 Signed-off-by: Morgan Funtowicz <[email protected]> * Rewrites BERT in Flax to the new Linen API (#7211) * Rewrite Flax HuggingFace PR to Linen * Some fixes * Fix tests * Fix CI with change of name of nlp (#7054) * nlp -> datasets * More nlp -> datasets * Woopsie * More nlp -> datasets * One last * Expose `is_flax_available` in file_utils. * Added run_tests_flax to the CI. * Attempt to make style * trigger ci again * Fix import sorting. Signed-off-by: Morgan Funtowicz <[email protected]> * Revert "Rewrites BERT in Flax to the new Linen API (#7211)" This reverts commit 23703a5. * Remove jnp.lax references Signed-off-by: Morgan Funtowicz <[email protected]> * Make style. Signed-off-by: Morgan Funtowicz <[email protected]> * Reintroduce Linen changes ... Signed-off-by: Morgan Funtowicz <[email protected]> * Make style. Signed-off-by: Morgan Funtowicz <[email protected]> * Use jax native's gelu function. Signed-off-by: Morgan Funtowicz <[email protected]> * Renaming BertModel to BertModule to highlight the fact this is the Flax Module object. Signed-off-by: Morgan Funtowicz <[email protected]> * Rewrite FlaxAutoModel test to not rely on pretrained_model_archive_map Signed-off-by: Morgan Funtowicz <[email protected]> * Remove unused variable in BertModule. Signed-off-by: Morgan Funtowicz <[email protected]> * Remove unused variable in BertModule again Signed-off-by: Morgan Funtowicz <[email protected]> * Attempt to have is_flax_available working again. Signed-off-by: Morgan Funtowicz <[email protected]> * Introduce JAX TensorType Signed-off-by: Morgan Funtowicz <[email protected]> * Improve ImportError message when trying to convert to various TensorType format. Signed-off-by: Morgan Funtowicz <[email protected]> * Makes Flax model jittable. Signed-off-by: Morgan Funtowicz <[email protected]> * Ensure flax models are jittable in unittests. Signed-off-by: Morgan Funtowicz <[email protected]> * Remove unused imports. Signed-off-by: Morgan Funtowicz <[email protected]> * Ensure jax imports are guarded behind is_flax_available. Signed-off-by: Morgan Funtowicz <[email protected]> * Make style. Signed-off-by: Morgan Funtowicz <[email protected]> * Make style again Signed-off-by: Morgan Funtowicz <[email protected]> * Make style again again Signed-off-by: Morgan Funtowicz <[email protected]> * Make style again again again Signed-off-by: Morgan Funtowicz <[email protected]> * Update src/transformers/file_utils.py Co-authored-by: Marc van Zee <[email protected]> * Bump flax to it's latest version Co-authored-by: Marc van Zee <[email protected]> * Bump jax version to at least 0.2.0 Signed-off-by: Morgan Funtowicz <[email protected]> * Style. Signed-off-by: Morgan Funtowicz <[email protected]> * Update the unittest to use TensorType.JAX Signed-off-by: Morgan Funtowicz <[email protected]> * isort import in tests. Signed-off-by: Morgan Funtowicz <[email protected]> * Match new flax parameters name "params" Signed-off-by: Morgan Funtowicz <[email protected]> * Remove unused imports. Signed-off-by: Morgan Funtowicz <[email protected]> * Add flax models to transformers __init__ Signed-off-by: Morgan Funtowicz <[email protected]> * Attempt to address all CI related comments. Signed-off-by: Morgan Funtowicz <[email protected]> * Correct circle.yml indent. Signed-off-by: Morgan Funtowicz <[email protected]> * Correct circle.yml indent (2) Signed-off-by: Morgan Funtowicz <[email protected]> * Remove coverage from flax tests Signed-off-by: Morgan Funtowicz <[email protected]> * Addressing many naming suggestions from comments Signed-off-by: Morgan Funtowicz <[email protected]> * Simplify for loop logic to interate over layers in FlaxBertLayerCollection Signed-off-by: Morgan Funtowicz <[email protected]> * use f-string syntax for formatting logs. Signed-off-by: Morgan Funtowicz <[email protected]> * Use config property from FlaxPreTrainedModel. Signed-off-by: Morgan Funtowicz <[email protected]> * use "cls_token" instead of "first_token" variable name. Signed-off-by: Morgan Funtowicz <[email protected]> * use "hidden_state" instead of "h" variable name. Signed-off-by: Morgan Funtowicz <[email protected]> * Correct class reference in docstring to link to Flax related modules. Signed-off-by: Morgan Funtowicz <[email protected]> * Added HF + Google Flax team copyright. Signed-off-by: Morgan Funtowicz <[email protected]> * Make Roberta independent from Bert Signed-off-by: Morgan Funtowicz <[email protected]> * Move activation functions to flax_utils. Signed-off-by: Morgan Funtowicz <[email protected]> * Move activation functions to flax_utils for bert. Signed-off-by: Morgan Funtowicz <[email protected]> * Added docstring for BERT Signed-off-by: Morgan Funtowicz <[email protected]> * Update import for Bert and Roberta tokenizers Signed-off-by: Morgan Funtowicz <[email protected]> * Make style. Signed-off-by: Morgan Funtowicz <[email protected]> * fix-copies Signed-off-by: Morgan Funtowicz <[email protected]> * Correct FlaxRobertaLayer to match PyTorch. Signed-off-by: Morgan Funtowicz <[email protected]> * Use the same store_artifact for flax unittest Signed-off-by: Morgan Funtowicz <[email protected]> * Style. Signed-off-by: Morgan Funtowicz <[email protected]> * Make sure gradient are disabled only locally for flax unittest using torch equivalence. Signed-off-by: Morgan Funtowicz <[email protected]> * Use relative imports Signed-off-by: Morgan Funtowicz <[email protected]> Co-authored-by: Stefan Schweter <[email protected]> Co-authored-by: Marc van Zee <[email protected]> Co-authored-by: Sylvain Gugger <[email protected]>
* nlp -> datasets * More nlp -> datasets * Woopsie * More nlp -> datasets * One last
* ready for PR * cleanup * correct FSMT_PRETRAINED_MODEL_ARCHIVE_LIST * fix * perfectionism * revert change from another PR * odd, already committed this one * non-interactive upload workaround * backup the failed experiment * store langs in config * workaround for localizing model path * doc clean up as in huggingface#6956 * style * back out debug mode * document: run_eval.py --num_beams 10 * remove unneeded constant * typo * re-use bart's Attention * re-use EncoderLayer, DecoderLayer from bart * refactor * send to cuda and fp16 * cleanup * revert (moved to another PR) * better error message * document run_eval --num_beams * solve the problem of tokenizer finding the right files when model is local * polish, remove hardcoded config * add a note that the file is autogenerated to avoid losing changes * prep for org change, remove unneeded code * switch to model4.pt, update scores * s/python/bash/ * missing init (but doesn't impact the finetuned model) * cleanup * major refactor (reuse-bart) * new model, new expected weights * cleanup * cleanup * full link * fix model type * merge porting notes * style * cleanup * have to create a DecoderConfig object to handle vocab_size properly * doc fix * add note (not a public class) * parametrize * - add bleu scores integration tests * skip test if sacrebleu is not installed * cache heavy models/tokenizers * some tweaks * remove tokens that aren't used * more purging * simplify code * switch to using decoder_start_token_id * add doc * Revert "major refactor (reuse-bart)" This reverts commit 226dad1. * decouple from bart * remove unused code huggingface#1 * remove unused code huggingface#2 * remove unused code huggingface#3 * update instructions * clean up * move bleu eval to examples * check import only once * move data+gen script into files * reuse via import * take less space * add prepare_seq2seq_batch (auto-tested) * cleanup * recode test to use json instead of yaml * ignore keys not needed * use the new -y in transformers-cli upload -y * [xlm tok] config dict: fix str into int to match definition (huggingface#7034) * [s2s] --eval_max_generate_length (huggingface#7018) * Fix CI with change of name of nlp (huggingface#7054) * nlp -> datasets * More nlp -> datasets * Woopsie * More nlp -> datasets * One last * extending to support allen_nlp wmt models - allow a specific checkpoint file to be passed - more arg settings - scripts for allen_nlp models * sync with changes * s/fsmt-wmt/wmt/ in model names * s/fsmt-wmt/wmt/ in model names (p2) * s/fsmt-wmt/wmt/ in model names (p3) * switch to a better checkpoint * typo * make non-optional args such - adjust tests where possible or skip when there is no other choice * consistency * style * adjust header * cards moved (model rename) * use best custom hparams * update info * remove old cards * cleanup * s/stas/facebook/ * update scores * s/allen_nlp/allenai/ * url maps aren't needed * typo * move all the doc / build /eval generators to their own scripts * cleanup * Apply suggestions from code review Co-authored-by: Lysandre Debut <[email protected]> * Apply suggestions from code review Co-authored-by: Lysandre Debut <[email protected]> * fix indent * duplicated line * style * use the correct add_start_docstrings * oops * resizing can't be done with the core approach, due to 2 dicts * check that the arg is a list * style * style Co-authored-by: Sam Shleifer <[email protected]> Co-authored-by: Sylvain Gugger <[email protected]> Co-authored-by: Lysandre Debut <[email protected]>
* nlp -> datasets * More nlp -> datasets * Woopsie * More nlp -> datasets * One last
* ready for PR * cleanup * correct FSMT_PRETRAINED_MODEL_ARCHIVE_LIST * fix * perfectionism * revert change from another PR * odd, already committed this one * non-interactive upload workaround * backup the failed experiment * store langs in config * workaround for localizing model path * doc clean up as in huggingface#6956 * style * back out debug mode * document: run_eval.py --num_beams 10 * remove unneeded constant * typo * re-use bart's Attention * re-use EncoderLayer, DecoderLayer from bart * refactor * send to cuda and fp16 * cleanup * revert (moved to another PR) * better error message * document run_eval --num_beams * solve the problem of tokenizer finding the right files when model is local * polish, remove hardcoded config * add a note that the file is autogenerated to avoid losing changes * prep for org change, remove unneeded code * switch to model4.pt, update scores * s/python/bash/ * missing init (but doesn't impact the finetuned model) * cleanup * major refactor (reuse-bart) * new model, new expected weights * cleanup * cleanup * full link * fix model type * merge porting notes * style * cleanup * have to create a DecoderConfig object to handle vocab_size properly * doc fix * add note (not a public class) * parametrize * - add bleu scores integration tests * skip test if sacrebleu is not installed * cache heavy models/tokenizers * some tweaks * remove tokens that aren't used * more purging * simplify code * switch to using decoder_start_token_id * add doc * Revert "major refactor (reuse-bart)" This reverts commit 226dad1. * decouple from bart * remove unused code #1 * remove unused code huggingface#2 * remove unused code huggingface#3 * update instructions * clean up * move bleu eval to examples * check import only once * move data+gen script into files * reuse via import * take less space * add prepare_seq2seq_batch (auto-tested) * cleanup * recode test to use json instead of yaml * ignore keys not needed * use the new -y in transformers-cli upload -y * [xlm tok] config dict: fix str into int to match definition (huggingface#7034) * [s2s] --eval_max_generate_length (huggingface#7018) * Fix CI with change of name of nlp (huggingface#7054) * nlp -> datasets * More nlp -> datasets * Woopsie * More nlp -> datasets * One last * extending to support allen_nlp wmt models - allow a specific checkpoint file to be passed - more arg settings - scripts for allen_nlp models * sync with changes * s/fsmt-wmt/wmt/ in model names * s/fsmt-wmt/wmt/ in model names (p2) * s/fsmt-wmt/wmt/ in model names (p3) * switch to a better checkpoint * typo * make non-optional args such - adjust tests where possible or skip when there is no other choice * consistency * style * adjust header * cards moved (model rename) * use best custom hparams * update info * remove old cards * cleanup * s/stas/facebook/ * update scores * s/allen_nlp/allenai/ * url maps aren't needed * typo * move all the doc / build /eval generators to their own scripts * cleanup * Apply suggestions from code review Co-authored-by: Lysandre Debut <[email protected]> * Apply suggestions from code review Co-authored-by: Lysandre Debut <[email protected]> * fix indent * duplicated line * style * use the correct add_start_docstrings * oops * resizing can't be done with the core approach, due to 2 dicts * check that the arg is a list * style * style Co-authored-by: Sam Shleifer <[email protected]> Co-authored-by: Sylvain Gugger <[email protected]> Co-authored-by: Lysandre Debut <[email protected]>
* WIP flax bert * Initial commit Bert Jax/Flax implementation. * Embeddings working and equivalent to PyTorch. * Move embeddings in its own module BertEmbeddings * Added jax.jit annotation on forward call * BertEncoder on par with PyTorch ! :D * Add BertPooler on par with PyTorch !! * Working Jax+Flax implementation of BertModel with < 1e-5 differences on the last layer. * Fix pooled output to take only the first token of the sequence. * Refactoring to use BertConfig from transformers. * Renamed FXBertModel to FlaxBertModel * Model is now initialized in FlaxBertModel constructor and reused. * WIP JaxPreTrainedModel * Cleaning up the code of FlaxBertModel * Added ability to load Flax model saved through save_pretrained() * Added ability to convert Pytorch Bert model to FlaxBert * FlaxBert can now load every Pytorch Bert model with on-the-fly conversion * Fix hardcoded shape values in conversion scripts. * Improve the way we handle LayerNorm conversion from PyTorch to Flax. * Added positional embeddings as parameter of BertModel with default to np.arange. * Let's roll FlaxRoberta ! * Fix missing position_ids parameters on predict for Bert * Flax backend now supports batched inputs Signed-off-by: Morgan Funtowicz <[email protected]> * Make it possible to load msgpacked model on convert from pytorch in last resort. Signed-off-by: Morgan Funtowicz <[email protected]> * Moved save_pretrained to Jax base class along with more constructor parameters. * Use specialized, model dependent conversion functio. * Expose `is_flax_available` in file_utils. * Added unittest for Flax models. * Added run_tests_flax to the CI. * Introduce FlaxAutoModel * Added more unittests * Flax model reference the _MODEL_ARCHIVE_MAP from PyTorch model. * Addressing review comments. * Expose seed in both Bert and Roberta * Fix typo suggested by @stefan-it Co-Authored-By: Stefan Schweter <[email protected]> * Attempt to make style * Attempt to make style in tests too * Added jax & jaxlib to the flax optional dependencies. * Attempt to fix flake8 warnings ... * Redo black again and again * When black and flake8 fight each other for a space ... 💥 💥 💥 * Try removing trailing comma to make both black and flake happy! * Fix invalid is_<framework>_available call, thanks @LysandreJik 🎉 * Fix another invalid import in flax_roberta test * Bump and pin flax release to 0.1.0. * Make flake8 happy, remove unused jax import * Change the type of the catch for msgpack. * Remove unused import. * Put seed as optional constructor parameter. * trigger ci again * Fix too much parameters in BertAttention. * Formatting. * Simplify Flax unittests to avoid machine crashes. * Fix invalid number of arguments when raising issue for an unknown model. * Address @bastings comment in PR, moving jax.jit decorated outside of __call__ * Fix incorrect path to require_flax/require_pytorch functions. Signed-off-by: Morgan Funtowicz <[email protected]> * Attempt to make style. Signed-off-by: Morgan Funtowicz <[email protected]> * Correct rebasing of circle-ci dependencies Signed-off-by: Morgan Funtowicz <[email protected]> * Fix import sorting. Signed-off-by: Morgan Funtowicz <[email protected]> * Fix unused imports. Signed-off-by: Morgan Funtowicz <[email protected]> * Again import sorting... Signed-off-by: Morgan Funtowicz <[email protected]> * Installing missing nlp dependency for flax unittests. Signed-off-by: Morgan Funtowicz <[email protected]> * Fix laoding of model for Flax implementations. Signed-off-by: Morgan Funtowicz <[email protected]> * jit the inner function call to make JAX-compatible Signed-off-by: Morgan Funtowicz <[email protected]> * Format ! Signed-off-by: Morgan Funtowicz <[email protected]> * Flake one more time 🎶 Signed-off-by: Morgan Funtowicz <[email protected]> * Rewrites BERT in Flax to the new Linen API (huggingface#7211) * Rewrite Flax HuggingFace PR to Linen * Some fixes * Fix tests * Fix CI with change of name of nlp (huggingface#7054) * nlp -> datasets * More nlp -> datasets * Woopsie * More nlp -> datasets * One last * Expose `is_flax_available` in file_utils. * Added run_tests_flax to the CI. * Attempt to make style * trigger ci again * Fix import sorting. Signed-off-by: Morgan Funtowicz <[email protected]> * Revert "Rewrites BERT in Flax to the new Linen API (huggingface#7211)" This reverts commit 23703a5. * Remove jnp.lax references Signed-off-by: Morgan Funtowicz <[email protected]> * Make style. Signed-off-by: Morgan Funtowicz <[email protected]> * Reintroduce Linen changes ... Signed-off-by: Morgan Funtowicz <[email protected]> * Make style. Signed-off-by: Morgan Funtowicz <[email protected]> * Use jax native's gelu function. Signed-off-by: Morgan Funtowicz <[email protected]> * Renaming BertModel to BertModule to highlight the fact this is the Flax Module object. Signed-off-by: Morgan Funtowicz <[email protected]> * Rewrite FlaxAutoModel test to not rely on pretrained_model_archive_map Signed-off-by: Morgan Funtowicz <[email protected]> * Remove unused variable in BertModule. Signed-off-by: Morgan Funtowicz <[email protected]> * Remove unused variable in BertModule again Signed-off-by: Morgan Funtowicz <[email protected]> * Attempt to have is_flax_available working again. Signed-off-by: Morgan Funtowicz <[email protected]> * Introduce JAX TensorType Signed-off-by: Morgan Funtowicz <[email protected]> * Improve ImportError message when trying to convert to various TensorType format. Signed-off-by: Morgan Funtowicz <[email protected]> * Makes Flax model jittable. Signed-off-by: Morgan Funtowicz <[email protected]> * Ensure flax models are jittable in unittests. Signed-off-by: Morgan Funtowicz <[email protected]> * Remove unused imports. Signed-off-by: Morgan Funtowicz <[email protected]> * Ensure jax imports are guarded behind is_flax_available. Signed-off-by: Morgan Funtowicz <[email protected]> * Make style. Signed-off-by: Morgan Funtowicz <[email protected]> * Make style again Signed-off-by: Morgan Funtowicz <[email protected]> * Make style again again Signed-off-by: Morgan Funtowicz <[email protected]> * Make style again again again Signed-off-by: Morgan Funtowicz <[email protected]> * Update src/transformers/file_utils.py Co-authored-by: Marc van Zee <[email protected]> * Bump flax to it's latest version Co-authored-by: Marc van Zee <[email protected]> * Bump jax version to at least 0.2.0 Signed-off-by: Morgan Funtowicz <[email protected]> * Style. Signed-off-by: Morgan Funtowicz <[email protected]> * Update the unittest to use TensorType.JAX Signed-off-by: Morgan Funtowicz <[email protected]> * isort import in tests. Signed-off-by: Morgan Funtowicz <[email protected]> * Match new flax parameters name "params" Signed-off-by: Morgan Funtowicz <[email protected]> * Remove unused imports. Signed-off-by: Morgan Funtowicz <[email protected]> * Add flax models to transformers __init__ Signed-off-by: Morgan Funtowicz <[email protected]> * Attempt to address all CI related comments. Signed-off-by: Morgan Funtowicz <[email protected]> * Correct circle.yml indent. Signed-off-by: Morgan Funtowicz <[email protected]> * Correct circle.yml indent (2) Signed-off-by: Morgan Funtowicz <[email protected]> * Remove coverage from flax tests Signed-off-by: Morgan Funtowicz <[email protected]> * Addressing many naming suggestions from comments Signed-off-by: Morgan Funtowicz <[email protected]> * Simplify for loop logic to interate over layers in FlaxBertLayerCollection Signed-off-by: Morgan Funtowicz <[email protected]> * use f-string syntax for formatting logs. Signed-off-by: Morgan Funtowicz <[email protected]> * Use config property from FlaxPreTrainedModel. Signed-off-by: Morgan Funtowicz <[email protected]> * use "cls_token" instead of "first_token" variable name. Signed-off-by: Morgan Funtowicz <[email protected]> * use "hidden_state" instead of "h" variable name. Signed-off-by: Morgan Funtowicz <[email protected]> * Correct class reference in docstring to link to Flax related modules. Signed-off-by: Morgan Funtowicz <[email protected]> * Added HF + Google Flax team copyright. Signed-off-by: Morgan Funtowicz <[email protected]> * Make Roberta independent from Bert Signed-off-by: Morgan Funtowicz <[email protected]> * Move activation functions to flax_utils. Signed-off-by: Morgan Funtowicz <[email protected]> * Move activation functions to flax_utils for bert. Signed-off-by: Morgan Funtowicz <[email protected]> * Added docstring for BERT Signed-off-by: Morgan Funtowicz <[email protected]> * Update import for Bert and Roberta tokenizers Signed-off-by: Morgan Funtowicz <[email protected]> * Make style. Signed-off-by: Morgan Funtowicz <[email protected]> * fix-copies Signed-off-by: Morgan Funtowicz <[email protected]> * Correct FlaxRobertaLayer to match PyTorch. Signed-off-by: Morgan Funtowicz <[email protected]> * Use the same store_artifact for flax unittest Signed-off-by: Morgan Funtowicz <[email protected]> * Style. Signed-off-by: Morgan Funtowicz <[email protected]> * Make sure gradient are disabled only locally for flax unittest using torch equivalence. Signed-off-by: Morgan Funtowicz <[email protected]> * Use relative imports Signed-off-by: Morgan Funtowicz <[email protected]> Co-authored-by: Stefan Schweter <[email protected]> Co-authored-by: Marc van Zee <[email protected]> Co-authored-by: Sylvain Gugger <[email protected]>
This reverts commit 31ed545.
Fixes #7055 (because yes, I can see the future)