Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix CI with change of name of nlp #7054

Merged
merged 5 commits into from
Sep 10, 2020
Merged

Fix CI with change of name of nlp #7054

merged 5 commits into from
Sep 10, 2020

Conversation

sgugger
Copy link
Collaborator

@sgugger sgugger commented Sep 10, 2020

Fixes #7055 (because yes, I can see the future)

@codecov
Copy link

codecov bot commented Sep 10, 2020

Codecov Report

Merging #7054 into master will increase coverage by 2.11%.
The diff coverage is 100.00%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #7054      +/-   ##
==========================================
+ Coverage   78.74%   80.85%   +2.11%     
==========================================
  Files         168      168              
  Lines       32172    32172              
==========================================
+ Hits        25335    26014     +679     
+ Misses       6837     6158     -679     
Impacted Files Coverage Δ
src/transformers/__init__.py 99.33% <ø> (ø)
src/transformers/tokenization_xlm.py 82.93% <ø> (ø)
src/transformers/file_utils.py 82.66% <100.00%> (+0.25%) ⬆️
src/transformers/trainer.py 54.68% <100.00%> (ø)
src/transformers/modeling_tf_funnel.py 18.53% <0.00%> (-75.51%) ⬇️
src/transformers/modeling_tf_flaubert.py 24.53% <0.00%> (-63.81%) ⬇️
src/transformers/modeling_marian.py 60.00% <0.00%> (-30.00%) ⬇️
src/transformers/activations.py 85.00% <0.00%> (-5.00%) ⬇️
src/transformers/configuration_bart.py 90.00% <0.00%> (-4.00%) ⬇️
src/transformers/modeling_bart.py 93.77% <0.00%> (-0.68%) ⬇️
... and 26 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update df4594a...f8b9682. Read the comment docs.

@sgugger
Copy link
Collaborator Author

sgugger commented Sep 10, 2020

Merging to make the CI green but happy to address any comment in a follow-up PR.

@sgugger sgugger merged commit 5144867 into master Sep 10, 2020
@sgugger sgugger deleted the fix_ci_nlp branch September 10, 2020 18:51
@stas00
Copy link
Contributor

stas00 commented Sep 10, 2020

_______________________________________________________ ERROR collecting tests/test_trainer.py _______________________________________________________
ImportError while importing test module '/mnt/nvme1/code/huggingface/transformers-master/tests/test_trainer.py'.
Hint: make sure your test modules/packages have valid Python names.
Traceback:
/home/stas/anaconda3/envs/main-37/lib/python3.7/importlib/__init__.py:127: in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
tests/test_trainer.py:3: in <module>
    import datasets
E   ModuleNotFoundError: No module named 'datasets'

@stas00
Copy link
Contributor

stas00 commented Sep 10, 2020

and after pip install datasets (needed in setup.py), the failure is still the same as in #7055

@sgugger
Copy link
Collaborator Author

sgugger commented Sep 10, 2020

I think you need an install from source.

@stas00
Copy link
Contributor

stas00 commented Sep 10, 2020

not sure what you mean? install from source datasets?

I did:

git pull
pip install -e .[dev]

in transformers

@sgugger
Copy link
Collaborator Author

sgugger commented Sep 10, 2020

It's not in the depencies of transformers and requires a separate install from source for now. It does work on the CI and my machine:

git clone https://github.com/huggingface/datasets
cd datasets
pip install -e .

@stas00
Copy link
Contributor

stas00 commented Sep 10, 2020

I did what you suggested, same failures.

@sgugger
Copy link
Collaborator Author

sgugger commented Sep 10, 2020

Are you sure you are in the same env?

nlp was never in setup.py. It is an additional dep required for the full test suite as a source install for now, will become a dep when it's stable enough. I'll add that to the CONTRIBUTING but trying to understand why it fails for you before.

@stas00
Copy link
Contributor

stas00 commented Sep 10, 2020

I have 2 gpus, you probably don't?

Indeed, if I run:

CUDA_VISIBLE_DEVICES="" pytest tests/test_trainer.py

it works.

@stas00
Copy link
Contributor

stas00 commented Sep 10, 2020

Yup, it's multi-gpu that is the problem. It works if I do CUDA_VISIBLE_DEVICES="0" pytest tests/test_trainer.py

@sgugger
Copy link
Collaborator Author

sgugger commented Sep 10, 2020

Mmmm, why would the multi-gpu not see a new module. That's weird.

@stas00
Copy link
Contributor

stas00 commented Sep 10, 2020

I'm not sure you have looked at the errors #7055 - they are of numeric mismatch nature. Have a look?

12 != 24.0 looks like 1 vs 2 gpu issue.

let's move back into #7055 and continue there.

@sgugger
Copy link
Collaborator Author

sgugger commented Sep 10, 2020

Oh this is a different error, not a missing model. Looks like those tests need a decorator to run on the CPU only.

stas00 pushed a commit to stas00/transformers that referenced this pull request Sep 10, 2020
* nlp -> datasets

* More nlp -> datasets

* Woopsie

* More nlp -> datasets

* One last
@stas00
Copy link
Contributor

stas00 commented Sep 10, 2020

nlp was never in setup.py. It is an additional dep required for the full test suite as a source install for now, will become a dep when it's stable enough. I'll add that to the CONTRIBUTING but trying to understand why it fails for you before.

datasets needs to be in requirements for dev - otherwise test suite fails.

@sgugger
Copy link
Collaborator Author

sgugger commented Sep 10, 2020

Yes, like I said you need a separate source install of it. You can't have a source install from dev that is properly up to date AFAIK.

Documented this in #7058

LysandreJik added a commit that referenced this pull request Sep 17, 2020
* ready for PR

* cleanup

* correct FSMT_PRETRAINED_MODEL_ARCHIVE_LIST

* fix

* perfectionism

* revert change from another PR

* odd, already committed this one

* non-interactive upload workaround

* backup the failed experiment

* store langs in config

* workaround for localizing model path

* doc clean up as in #6956

* style

* back out debug mode

* document: run_eval.py --num_beams 10

* remove unneeded constant

* typo

* re-use bart's Attention

* re-use EncoderLayer, DecoderLayer from bart

* refactor

* send to cuda and fp16

* cleanup

* revert (moved to another PR)

* better error message

* document run_eval --num_beams

* solve the problem of tokenizer finding the right files when model is local

* polish, remove hardcoded config

* add a note that the file is autogenerated to avoid losing changes

* prep for org change, remove unneeded code

* switch to model4.pt, update scores

* s/python/bash/

* missing init (but doesn't impact the finetuned model)

* cleanup

* major refactor (reuse-bart)

* new model, new expected weights

* cleanup

* cleanup

* full link

* fix model type

* merge porting notes

* style

* cleanup

* have to create a DecoderConfig object to handle vocab_size properly

* doc fix

* add note (not a public class)

* parametrize

* - add bleu scores integration tests

* skip test if sacrebleu is not installed

* cache heavy models/tokenizers

* some tweaks

* remove tokens that aren't used

* more purging

* simplify code

* switch to using decoder_start_token_id

* add doc

* Revert "major refactor (reuse-bart)"

This reverts commit 226dad1.

* decouple from bart

* remove unused code #1

* remove unused code #2

* remove unused code #3

* update instructions

* clean up

* move bleu eval to examples

* check import only once

* move data+gen script into files

* reuse via import

* take less space

* add prepare_seq2seq_batch (auto-tested)

* cleanup

* recode test to use json instead of yaml

* ignore keys not needed

* use the new -y in transformers-cli upload -y

* [xlm tok] config dict: fix str into int to match definition (#7034)

* [s2s] --eval_max_generate_length (#7018)

* Fix CI with change of name of nlp (#7054)

* nlp -> datasets

* More nlp -> datasets

* Woopsie

* More nlp -> datasets

* One last

* extending to support allen_nlp wmt models

- allow a specific checkpoint file to be passed
- more arg settings
- scripts for allen_nlp models

* sync with changes

* s/fsmt-wmt/wmt/ in model names

* s/fsmt-wmt/wmt/ in model names (p2)

* s/fsmt-wmt/wmt/ in model names (p3)

* switch to a better checkpoint

* typo

* make non-optional args such - adjust tests where possible or skip when there is no other choice

* consistency

* style

* adjust header

* cards moved (model rename)

* use best custom hparams

* update info

* remove old cards

* cleanup

* s/stas/facebook/

* update scores

* s/allen_nlp/allenai/

* url maps aren't needed

* typo

* move all the doc / build /eval generators to their own scripts

* cleanup

* Apply suggestions from code review

Co-authored-by: Lysandre Debut <[email protected]>

* Apply suggestions from code review

Co-authored-by: Lysandre Debut <[email protected]>

* fix indent

* duplicated line

* style

* use the correct add_start_docstrings

* oops

* resizing can't be done with the core approach, due to 2 dicts

* check that the arg is a list

* style

* style

Co-authored-by: Sam Shleifer <[email protected]>
Co-authored-by: Sylvain Gugger <[email protected]>
Co-authored-by: Lysandre Debut <[email protected]>
sshleifer added a commit to sshleifer/transformers_fork that referenced this pull request Sep 17, 2020
* ready for PR

* cleanup

* correct FSMT_PRETRAINED_MODEL_ARCHIVE_LIST

* fix

* perfectionism

* revert change from another PR

* odd, already committed this one

* non-interactive upload workaround

* backup the failed experiment

* store langs in config

* workaround for localizing model path

* doc clean up as in huggingface#6956

* style

* back out debug mode

* document: run_eval.py --num_beams 10

* remove unneeded constant

* typo

* re-use bart's Attention

* re-use EncoderLayer, DecoderLayer from bart

* refactor

* send to cuda and fp16

* cleanup

* revert (moved to another PR)

* better error message

* document run_eval --num_beams

* solve the problem of tokenizer finding the right files when model is local

* polish, remove hardcoded config

* add a note that the file is autogenerated to avoid losing changes

* prep for org change, remove unneeded code

* switch to model4.pt, update scores

* s/python/bash/

* missing init (but doesn't impact the finetuned model)

* cleanup

* major refactor (reuse-bart)

* new model, new expected weights

* cleanup

* cleanup

* full link

* fix model type

* merge porting notes

* style

* cleanup

* have to create a DecoderConfig object to handle vocab_size properly

* doc fix

* add note (not a public class)

* parametrize

* - add bleu scores integration tests

* skip test if sacrebleu is not installed

* cache heavy models/tokenizers

* some tweaks

* remove tokens that aren't used

* more purging

* simplify code

* switch to using decoder_start_token_id

* add doc

* Revert "major refactor (reuse-bart)"

This reverts commit 226dad1.

* decouple from bart

* remove unused code #1

* remove unused code #2

* remove unused code huggingface#3

* update instructions

* clean up

* move bleu eval to examples

* check import only once

* move data+gen script into files

* reuse via import

* take less space

* add prepare_seq2seq_batch (auto-tested)

* cleanup

* recode test to use json instead of yaml

* ignore keys not needed

* use the new -y in transformers-cli upload -y

* [xlm tok] config dict: fix str into int to match definition (huggingface#7034)

* [s2s] --eval_max_generate_length (huggingface#7018)

* Fix CI with change of name of nlp (huggingface#7054)

* nlp -> datasets

* More nlp -> datasets

* Woopsie

* More nlp -> datasets

* One last

* extending to support allen_nlp wmt models

- allow a specific checkpoint file to be passed
- more arg settings
- scripts for allen_nlp models

* sync with changes

* s/fsmt-wmt/wmt/ in model names

* s/fsmt-wmt/wmt/ in model names (p2)

* s/fsmt-wmt/wmt/ in model names (p3)

* switch to a better checkpoint

* typo

* make non-optional args such - adjust tests where possible or skip when there is no other choice

* consistency

* style

* adjust header

* cards moved (model rename)

* use best custom hparams

* update info

* remove old cards

* cleanup

* s/stas/facebook/

* update scores

* s/allen_nlp/allenai/

* url maps aren't needed

* typo

* move all the doc / build /eval generators to their own scripts

* cleanup

* Apply suggestions from code review

Co-authored-by: Lysandre Debut <[email protected]>

* Apply suggestions from code review

Co-authored-by: Lysandre Debut <[email protected]>

* fix indent

* duplicated line

* style

* use the correct add_start_docstrings

* oops

* resizing can't be done with the core approach, due to 2 dicts

* check that the arg is a list

* style

* style

Co-authored-by: Sam Shleifer <[email protected]>
Co-authored-by: Sylvain Gugger <[email protected]>
Co-authored-by: Lysandre Debut <[email protected]>
mfuntowicz pushed a commit that referenced this pull request Sep 18, 2020
* nlp -> datasets

* More nlp -> datasets

* Woopsie

* More nlp -> datasets

* One last
mfuntowicz pushed a commit that referenced this pull request Sep 18, 2020
* ready for PR

* cleanup

* correct FSMT_PRETRAINED_MODEL_ARCHIVE_LIST

* fix

* perfectionism

* revert change from another PR

* odd, already committed this one

* non-interactive upload workaround

* backup the failed experiment

* store langs in config

* workaround for localizing model path

* doc clean up as in #6956

* style

* back out debug mode

* document: run_eval.py --num_beams 10

* remove unneeded constant

* typo

* re-use bart's Attention

* re-use EncoderLayer, DecoderLayer from bart

* refactor

* send to cuda and fp16

* cleanup

* revert (moved to another PR)

* better error message

* document run_eval --num_beams

* solve the problem of tokenizer finding the right files when model is local

* polish, remove hardcoded config

* add a note that the file is autogenerated to avoid losing changes

* prep for org change, remove unneeded code

* switch to model4.pt, update scores

* s/python/bash/

* missing init (but doesn't impact the finetuned model)

* cleanup

* major refactor (reuse-bart)

* new model, new expected weights

* cleanup

* cleanup

* full link

* fix model type

* merge porting notes

* style

* cleanup

* have to create a DecoderConfig object to handle vocab_size properly

* doc fix

* add note (not a public class)

* parametrize

* - add bleu scores integration tests

* skip test if sacrebleu is not installed

* cache heavy models/tokenizers

* some tweaks

* remove tokens that aren't used

* more purging

* simplify code

* switch to using decoder_start_token_id

* add doc

* Revert "major refactor (reuse-bart)"

This reverts commit 226dad1.

* decouple from bart

* remove unused code #1

* remove unused code #2

* remove unused code #3

* update instructions

* clean up

* move bleu eval to examples

* check import only once

* move data+gen script into files

* reuse via import

* take less space

* add prepare_seq2seq_batch (auto-tested)

* cleanup

* recode test to use json instead of yaml

* ignore keys not needed

* use the new -y in transformers-cli upload -y

* [xlm tok] config dict: fix str into int to match definition (#7034)

* [s2s] --eval_max_generate_length (#7018)

* Fix CI with change of name of nlp (#7054)

* nlp -> datasets

* More nlp -> datasets

* Woopsie

* More nlp -> datasets

* One last

* extending to support allen_nlp wmt models

- allow a specific checkpoint file to be passed
- more arg settings
- scripts for allen_nlp models

* sync with changes

* s/fsmt-wmt/wmt/ in model names

* s/fsmt-wmt/wmt/ in model names (p2)

* s/fsmt-wmt/wmt/ in model names (p3)

* switch to a better checkpoint

* typo

* make non-optional args such - adjust tests where possible or skip when there is no other choice

* consistency

* style

* adjust header

* cards moved (model rename)

* use best custom hparams

* update info

* remove old cards

* cleanup

* s/stas/facebook/

* update scores

* s/allen_nlp/allenai/

* url maps aren't needed

* typo

* move all the doc / build /eval generators to their own scripts

* cleanup

* Apply suggestions from code review

Co-authored-by: Lysandre Debut <[email protected]>

* Apply suggestions from code review

Co-authored-by: Lysandre Debut <[email protected]>

* fix indent

* duplicated line

* style

* use the correct add_start_docstrings

* oops

* resizing can't be done with the core approach, due to 2 dicts

* check that the arg is a list

* style

* style

Co-authored-by: Sam Shleifer <[email protected]>
Co-authored-by: Sylvain Gugger <[email protected]>
Co-authored-by: Lysandre Debut <[email protected]>
mfuntowicz pushed a commit that referenced this pull request Oct 5, 2020
* nlp -> datasets

* More nlp -> datasets

* Woopsie

* More nlp -> datasets

* One last
mfuntowicz pushed a commit that referenced this pull request Oct 19, 2020
* nlp -> datasets

* More nlp -> datasets

* Woopsie

* More nlp -> datasets

* One last
mfuntowicz pushed a commit that referenced this pull request Oct 19, 2020
* nlp -> datasets

* More nlp -> datasets

* Woopsie

* More nlp -> datasets

* One last
LysandreJik pushed a commit that referenced this pull request Oct 19, 2020
* WIP flax bert

* Initial commit Bert Jax/Flax implementation.

* Embeddings working and equivalent to PyTorch.

* Move embeddings in its own module BertEmbeddings

* Added jax.jit annotation on forward call

* BertEncoder on par with PyTorch ! :D

* Add BertPooler on par with PyTorch !!

* Working Jax+Flax implementation of BertModel with < 1e-5 differences on the last layer.

* Fix pooled output to take only the first token of the sequence.

* Refactoring to use BertConfig from transformers.

* Renamed FXBertModel to FlaxBertModel

* Model is now initialized in FlaxBertModel constructor and reused.

* WIP JaxPreTrainedModel

* Cleaning up the code of FlaxBertModel

* Added ability to load Flax model saved through save_pretrained()

* Added ability to convert Pytorch Bert model to FlaxBert

* FlaxBert can now load every Pytorch Bert model with on-the-fly conversion

* Fix hardcoded shape values in conversion scripts.

* Improve the way we handle LayerNorm conversion from PyTorch to Flax.

* Added positional embeddings as parameter of BertModel with default to np.arange.

* Let's roll FlaxRoberta !

* Fix missing position_ids parameters on predict for Bert

* Flax backend now supports batched inputs

Signed-off-by: Morgan Funtowicz <[email protected]>

* Make it possible to load msgpacked model on convert from pytorch in last resort.

Signed-off-by: Morgan Funtowicz <[email protected]>

* Moved save_pretrained to Jax base class along with more constructor parameters.

* Use specialized, model dependent conversion functio.

* Expose `is_flax_available` in file_utils.

* Added unittest for Flax models.

* Added run_tests_flax to the CI.

* Introduce FlaxAutoModel

* Added more unittests

* Flax model reference the _MODEL_ARCHIVE_MAP from PyTorch model.

* Addressing review comments.

* Expose seed in both Bert and Roberta

* Fix typo suggested by @stefan-it

Co-Authored-By: Stefan Schweter <[email protected]>

* Attempt to make style

* Attempt to make style in tests too

* Added jax & jaxlib to the flax optional dependencies.

* Attempt to fix flake8 warnings ...

* Redo black again and again

* When black and flake8 fight each other for a space ... 💥 💥 💥

* Try removing trailing comma to make both black and flake happy!

* Fix invalid is_<framework>_available call, thanks @LysandreJik 🎉

* Fix another invalid import in flax_roberta test

* Bump and pin flax release to 0.1.0.

* Make flake8 happy, remove unused jax import

* Change the type of the catch for msgpack.

* Remove unused import.

* Put seed as optional constructor parameter.

* trigger ci again

* Fix too much parameters in BertAttention.

* Formatting.

* Simplify Flax unittests to avoid machine crashes.

* Fix invalid number of arguments when raising issue for an unknown model.

* Address @bastings comment in PR, moving jax.jit decorated outside of __call__

* Fix incorrect path to require_flax/require_pytorch functions.

Signed-off-by: Morgan Funtowicz <[email protected]>

* Attempt to make style.

Signed-off-by: Morgan Funtowicz <[email protected]>

* Correct rebasing of circle-ci dependencies

Signed-off-by: Morgan Funtowicz <[email protected]>

* Fix import sorting.

Signed-off-by: Morgan Funtowicz <[email protected]>

* Fix unused imports.

Signed-off-by: Morgan Funtowicz <[email protected]>

* Again import sorting...

Signed-off-by: Morgan Funtowicz <[email protected]>

* Installing missing nlp dependency for flax unittests.

Signed-off-by: Morgan Funtowicz <[email protected]>

* Fix laoding of model for Flax implementations.

Signed-off-by: Morgan Funtowicz <[email protected]>

* jit the inner function call to make JAX-compatible

Signed-off-by: Morgan Funtowicz <[email protected]>

* Format !

Signed-off-by: Morgan Funtowicz <[email protected]>

* Flake one more time 🎶

Signed-off-by: Morgan Funtowicz <[email protected]>

* Rewrites BERT in Flax to the new Linen API (#7211)

* Rewrite Flax HuggingFace PR to Linen

* Some fixes

* Fix tests

* Fix CI with change of name of nlp (#7054)

* nlp -> datasets

* More nlp -> datasets

* Woopsie

* More nlp -> datasets

* One last

* Expose `is_flax_available` in file_utils.

* Added run_tests_flax to the CI.

* Attempt to make style

* trigger ci again

* Fix import sorting.

Signed-off-by: Morgan Funtowicz <[email protected]>

* Revert "Rewrites BERT in Flax to the new Linen API (#7211)"

This reverts commit 23703a5.

* Remove jnp.lax references

Signed-off-by: Morgan Funtowicz <[email protected]>

* Make style.

Signed-off-by: Morgan Funtowicz <[email protected]>

* Reintroduce Linen changes ...

Signed-off-by: Morgan Funtowicz <[email protected]>

* Make style.

Signed-off-by: Morgan Funtowicz <[email protected]>

* Use jax native's gelu function.

Signed-off-by: Morgan Funtowicz <[email protected]>

* Renaming BertModel to BertModule to highlight the fact this is the Flax Module object.

Signed-off-by: Morgan Funtowicz <[email protected]>

* Rewrite FlaxAutoModel test to not rely on pretrained_model_archive_map

Signed-off-by: Morgan Funtowicz <[email protected]>

* Remove unused variable in BertModule.

Signed-off-by: Morgan Funtowicz <[email protected]>

* Remove unused variable in BertModule again

Signed-off-by: Morgan Funtowicz <[email protected]>

* Attempt to have is_flax_available working again.

Signed-off-by: Morgan Funtowicz <[email protected]>

* Introduce JAX TensorType

Signed-off-by: Morgan Funtowicz <[email protected]>

* Improve ImportError message when trying to convert to various TensorType format.

Signed-off-by: Morgan Funtowicz <[email protected]>

* Makes Flax model jittable.

Signed-off-by: Morgan Funtowicz <[email protected]>

* Ensure flax models are jittable in unittests.

Signed-off-by: Morgan Funtowicz <[email protected]>

* Remove unused imports.

Signed-off-by: Morgan Funtowicz <[email protected]>

* Ensure jax imports are guarded behind is_flax_available.

Signed-off-by: Morgan Funtowicz <[email protected]>

* Make style.

Signed-off-by: Morgan Funtowicz <[email protected]>

* Make style again

Signed-off-by: Morgan Funtowicz <[email protected]>

* Make style again again

Signed-off-by: Morgan Funtowicz <[email protected]>

* Make style again again again

Signed-off-by: Morgan Funtowicz <[email protected]>

* Update src/transformers/file_utils.py

Co-authored-by: Marc van Zee <[email protected]>

* Bump flax to it's latest version

Co-authored-by: Marc van Zee <[email protected]>

* Bump jax version to at least 0.2.0

Signed-off-by: Morgan Funtowicz <[email protected]>

* Style.

Signed-off-by: Morgan Funtowicz <[email protected]>

* Update the unittest to use TensorType.JAX

Signed-off-by: Morgan Funtowicz <[email protected]>

* isort import in tests.

Signed-off-by: Morgan Funtowicz <[email protected]>

* Match new flax parameters name "params"

Signed-off-by: Morgan Funtowicz <[email protected]>

* Remove unused imports.

Signed-off-by: Morgan Funtowicz <[email protected]>

* Add flax models to transformers __init__

Signed-off-by: Morgan Funtowicz <[email protected]>

* Attempt to address all CI related comments.

Signed-off-by: Morgan Funtowicz <[email protected]>

* Correct circle.yml indent.

Signed-off-by: Morgan Funtowicz <[email protected]>

* Correct circle.yml indent (2)

Signed-off-by: Morgan Funtowicz <[email protected]>

* Remove coverage from flax tests

Signed-off-by: Morgan Funtowicz <[email protected]>

* Addressing many naming suggestions from comments

Signed-off-by: Morgan Funtowicz <[email protected]>

* Simplify for loop logic to interate over layers in FlaxBertLayerCollection

Signed-off-by: Morgan Funtowicz <[email protected]>

* use f-string syntax for formatting logs.

Signed-off-by: Morgan Funtowicz <[email protected]>

* Use config property from FlaxPreTrainedModel.

Signed-off-by: Morgan Funtowicz <[email protected]>

* use "cls_token" instead of "first_token" variable name.

Signed-off-by: Morgan Funtowicz <[email protected]>

* use "hidden_state" instead of "h" variable name.

Signed-off-by: Morgan Funtowicz <[email protected]>

* Correct class reference in docstring to link to Flax related modules.

Signed-off-by: Morgan Funtowicz <[email protected]>

* Added HF + Google Flax team copyright.

Signed-off-by: Morgan Funtowicz <[email protected]>

* Make Roberta independent from Bert

Signed-off-by: Morgan Funtowicz <[email protected]>

* Move activation functions to flax_utils.

Signed-off-by: Morgan Funtowicz <[email protected]>

* Move activation functions to flax_utils for bert.

Signed-off-by: Morgan Funtowicz <[email protected]>

* Added docstring for BERT

Signed-off-by: Morgan Funtowicz <[email protected]>

* Update import for Bert and Roberta tokenizers

Signed-off-by: Morgan Funtowicz <[email protected]>

* Make style.

Signed-off-by: Morgan Funtowicz <[email protected]>

* fix-copies

Signed-off-by: Morgan Funtowicz <[email protected]>

* Correct FlaxRobertaLayer to match PyTorch.

Signed-off-by: Morgan Funtowicz <[email protected]>

* Use the same store_artifact for flax unittest

Signed-off-by: Morgan Funtowicz <[email protected]>

* Style.

Signed-off-by: Morgan Funtowicz <[email protected]>

* Make sure gradient are disabled only locally for flax unittest using torch equivalence.

Signed-off-by: Morgan Funtowicz <[email protected]>

* Use relative imports

Signed-off-by: Morgan Funtowicz <[email protected]>

Co-authored-by: Stefan Schweter <[email protected]>
Co-authored-by: Marc van Zee <[email protected]>
Co-authored-by: Sylvain Gugger <[email protected]>
Zigur pushed a commit to Zigur/transformers that referenced this pull request Oct 26, 2020
* nlp -> datasets

* More nlp -> datasets

* Woopsie

* More nlp -> datasets

* One last
Zigur pushed a commit to Zigur/transformers that referenced this pull request Oct 26, 2020
* ready for PR

* cleanup

* correct FSMT_PRETRAINED_MODEL_ARCHIVE_LIST

* fix

* perfectionism

* revert change from another PR

* odd, already committed this one

* non-interactive upload workaround

* backup the failed experiment

* store langs in config

* workaround for localizing model path

* doc clean up as in huggingface#6956

* style

* back out debug mode

* document: run_eval.py --num_beams 10

* remove unneeded constant

* typo

* re-use bart's Attention

* re-use EncoderLayer, DecoderLayer from bart

* refactor

* send to cuda and fp16

* cleanup

* revert (moved to another PR)

* better error message

* document run_eval --num_beams

* solve the problem of tokenizer finding the right files when model is local

* polish, remove hardcoded config

* add a note that the file is autogenerated to avoid losing changes

* prep for org change, remove unneeded code

* switch to model4.pt, update scores

* s/python/bash/

* missing init (but doesn't impact the finetuned model)

* cleanup

* major refactor (reuse-bart)

* new model, new expected weights

* cleanup

* cleanup

* full link

* fix model type

* merge porting notes

* style

* cleanup

* have to create a DecoderConfig object to handle vocab_size properly

* doc fix

* add note (not a public class)

* parametrize

* - add bleu scores integration tests

* skip test if sacrebleu is not installed

* cache heavy models/tokenizers

* some tweaks

* remove tokens that aren't used

* more purging

* simplify code

* switch to using decoder_start_token_id

* add doc

* Revert "major refactor (reuse-bart)"

This reverts commit 226dad1.

* decouple from bart

* remove unused code huggingface#1

* remove unused code huggingface#2

* remove unused code huggingface#3

* update instructions

* clean up

* move bleu eval to examples

* check import only once

* move data+gen script into files

* reuse via import

* take less space

* add prepare_seq2seq_batch (auto-tested)

* cleanup

* recode test to use json instead of yaml

* ignore keys not needed

* use the new -y in transformers-cli upload -y

* [xlm tok] config dict: fix str into int to match definition (huggingface#7034)

* [s2s] --eval_max_generate_length (huggingface#7018)

* Fix CI with change of name of nlp (huggingface#7054)

* nlp -> datasets

* More nlp -> datasets

* Woopsie

* More nlp -> datasets

* One last

* extending to support allen_nlp wmt models

- allow a specific checkpoint file to be passed
- more arg settings
- scripts for allen_nlp models

* sync with changes

* s/fsmt-wmt/wmt/ in model names

* s/fsmt-wmt/wmt/ in model names (p2)

* s/fsmt-wmt/wmt/ in model names (p3)

* switch to a better checkpoint

* typo

* make non-optional args such - adjust tests where possible or skip when there is no other choice

* consistency

* style

* adjust header

* cards moved (model rename)

* use best custom hparams

* update info

* remove old cards

* cleanup

* s/stas/facebook/

* update scores

* s/allen_nlp/allenai/

* url maps aren't needed

* typo

* move all the doc / build /eval generators to their own scripts

* cleanup

* Apply suggestions from code review

Co-authored-by: Lysandre Debut <[email protected]>

* Apply suggestions from code review

Co-authored-by: Lysandre Debut <[email protected]>

* fix indent

* duplicated line

* style

* use the correct add_start_docstrings

* oops

* resizing can't be done with the core approach, due to 2 dicts

* check that the arg is a list

* style

* style

Co-authored-by: Sam Shleifer <[email protected]>
Co-authored-by: Sylvain Gugger <[email protected]>
Co-authored-by: Lysandre Debut <[email protected]>
fabiocapsouza pushed a commit to fabiocapsouza/transformers that referenced this pull request Nov 15, 2020
* nlp -> datasets

* More nlp -> datasets

* Woopsie

* More nlp -> datasets

* One last
fabiocapsouza pushed a commit to fabiocapsouza/transformers that referenced this pull request Nov 15, 2020
* ready for PR

* cleanup

* correct FSMT_PRETRAINED_MODEL_ARCHIVE_LIST

* fix

* perfectionism

* revert change from another PR

* odd, already committed this one

* non-interactive upload workaround

* backup the failed experiment

* store langs in config

* workaround for localizing model path

* doc clean up as in huggingface#6956

* style

* back out debug mode

* document: run_eval.py --num_beams 10

* remove unneeded constant

* typo

* re-use bart's Attention

* re-use EncoderLayer, DecoderLayer from bart

* refactor

* send to cuda and fp16

* cleanup

* revert (moved to another PR)

* better error message

* document run_eval --num_beams

* solve the problem of tokenizer finding the right files when model is local

* polish, remove hardcoded config

* add a note that the file is autogenerated to avoid losing changes

* prep for org change, remove unneeded code

* switch to model4.pt, update scores

* s/python/bash/

* missing init (but doesn't impact the finetuned model)

* cleanup

* major refactor (reuse-bart)

* new model, new expected weights

* cleanup

* cleanup

* full link

* fix model type

* merge porting notes

* style

* cleanup

* have to create a DecoderConfig object to handle vocab_size properly

* doc fix

* add note (not a public class)

* parametrize

* - add bleu scores integration tests

* skip test if sacrebleu is not installed

* cache heavy models/tokenizers

* some tweaks

* remove tokens that aren't used

* more purging

* simplify code

* switch to using decoder_start_token_id

* add doc

* Revert "major refactor (reuse-bart)"

This reverts commit 226dad1.

* decouple from bart

* remove unused code #1

* remove unused code huggingface#2

* remove unused code huggingface#3

* update instructions

* clean up

* move bleu eval to examples

* check import only once

* move data+gen script into files

* reuse via import

* take less space

* add prepare_seq2seq_batch (auto-tested)

* cleanup

* recode test to use json instead of yaml

* ignore keys not needed

* use the new -y in transformers-cli upload -y

* [xlm tok] config dict: fix str into int to match definition (huggingface#7034)

* [s2s] --eval_max_generate_length (huggingface#7018)

* Fix CI with change of name of nlp (huggingface#7054)

* nlp -> datasets

* More nlp -> datasets

* Woopsie

* More nlp -> datasets

* One last

* extending to support allen_nlp wmt models

- allow a specific checkpoint file to be passed
- more arg settings
- scripts for allen_nlp models

* sync with changes

* s/fsmt-wmt/wmt/ in model names

* s/fsmt-wmt/wmt/ in model names (p2)

* s/fsmt-wmt/wmt/ in model names (p3)

* switch to a better checkpoint

* typo

* make non-optional args such - adjust tests where possible or skip when there is no other choice

* consistency

* style

* adjust header

* cards moved (model rename)

* use best custom hparams

* update info

* remove old cards

* cleanup

* s/stas/facebook/

* update scores

* s/allen_nlp/allenai/

* url maps aren't needed

* typo

* move all the doc / build /eval generators to their own scripts

* cleanup

* Apply suggestions from code review

Co-authored-by: Lysandre Debut <[email protected]>

* Apply suggestions from code review

Co-authored-by: Lysandre Debut <[email protected]>

* fix indent

* duplicated line

* style

* use the correct add_start_docstrings

* oops

* resizing can't be done with the core approach, due to 2 dicts

* check that the arg is a list

* style

* style

Co-authored-by: Sam Shleifer <[email protected]>
Co-authored-by: Sylvain Gugger <[email protected]>
Co-authored-by: Lysandre Debut <[email protected]>
fabiocapsouza pushed a commit to fabiocapsouza/transformers that referenced this pull request Nov 15, 2020
* WIP flax bert

* Initial commit Bert Jax/Flax implementation.

* Embeddings working and equivalent to PyTorch.

* Move embeddings in its own module BertEmbeddings

* Added jax.jit annotation on forward call

* BertEncoder on par with PyTorch ! :D

* Add BertPooler on par with PyTorch !!

* Working Jax+Flax implementation of BertModel with < 1e-5 differences on the last layer.

* Fix pooled output to take only the first token of the sequence.

* Refactoring to use BertConfig from transformers.

* Renamed FXBertModel to FlaxBertModel

* Model is now initialized in FlaxBertModel constructor and reused.

* WIP JaxPreTrainedModel

* Cleaning up the code of FlaxBertModel

* Added ability to load Flax model saved through save_pretrained()

* Added ability to convert Pytorch Bert model to FlaxBert

* FlaxBert can now load every Pytorch Bert model with on-the-fly conversion

* Fix hardcoded shape values in conversion scripts.

* Improve the way we handle LayerNorm conversion from PyTorch to Flax.

* Added positional embeddings as parameter of BertModel with default to np.arange.

* Let's roll FlaxRoberta !

* Fix missing position_ids parameters on predict for Bert

* Flax backend now supports batched inputs

Signed-off-by: Morgan Funtowicz <[email protected]>

* Make it possible to load msgpacked model on convert from pytorch in last resort.

Signed-off-by: Morgan Funtowicz <[email protected]>

* Moved save_pretrained to Jax base class along with more constructor parameters.

* Use specialized, model dependent conversion functio.

* Expose `is_flax_available` in file_utils.

* Added unittest for Flax models.

* Added run_tests_flax to the CI.

* Introduce FlaxAutoModel

* Added more unittests

* Flax model reference the _MODEL_ARCHIVE_MAP from PyTorch model.

* Addressing review comments.

* Expose seed in both Bert and Roberta

* Fix typo suggested by @stefan-it

Co-Authored-By: Stefan Schweter <[email protected]>

* Attempt to make style

* Attempt to make style in tests too

* Added jax & jaxlib to the flax optional dependencies.

* Attempt to fix flake8 warnings ...

* Redo black again and again

* When black and flake8 fight each other for a space ... 💥 💥 💥

* Try removing trailing comma to make both black and flake happy!

* Fix invalid is_<framework>_available call, thanks @LysandreJik 🎉

* Fix another invalid import in flax_roberta test

* Bump and pin flax release to 0.1.0.

* Make flake8 happy, remove unused jax import

* Change the type of the catch for msgpack.

* Remove unused import.

* Put seed as optional constructor parameter.

* trigger ci again

* Fix too much parameters in BertAttention.

* Formatting.

* Simplify Flax unittests to avoid machine crashes.

* Fix invalid number of arguments when raising issue for an unknown model.

* Address @bastings comment in PR, moving jax.jit decorated outside of __call__

* Fix incorrect path to require_flax/require_pytorch functions.

Signed-off-by: Morgan Funtowicz <[email protected]>

* Attempt to make style.

Signed-off-by: Morgan Funtowicz <[email protected]>

* Correct rebasing of circle-ci dependencies

Signed-off-by: Morgan Funtowicz <[email protected]>

* Fix import sorting.

Signed-off-by: Morgan Funtowicz <[email protected]>

* Fix unused imports.

Signed-off-by: Morgan Funtowicz <[email protected]>

* Again import sorting...

Signed-off-by: Morgan Funtowicz <[email protected]>

* Installing missing nlp dependency for flax unittests.

Signed-off-by: Morgan Funtowicz <[email protected]>

* Fix laoding of model for Flax implementations.

Signed-off-by: Morgan Funtowicz <[email protected]>

* jit the inner function call to make JAX-compatible

Signed-off-by: Morgan Funtowicz <[email protected]>

* Format !

Signed-off-by: Morgan Funtowicz <[email protected]>

* Flake one more time 🎶

Signed-off-by: Morgan Funtowicz <[email protected]>

* Rewrites BERT in Flax to the new Linen API (huggingface#7211)

* Rewrite Flax HuggingFace PR to Linen

* Some fixes

* Fix tests

* Fix CI with change of name of nlp (huggingface#7054)

* nlp -> datasets

* More nlp -> datasets

* Woopsie

* More nlp -> datasets

* One last

* Expose `is_flax_available` in file_utils.

* Added run_tests_flax to the CI.

* Attempt to make style

* trigger ci again

* Fix import sorting.

Signed-off-by: Morgan Funtowicz <[email protected]>

* Revert "Rewrites BERT in Flax to the new Linen API (huggingface#7211)"

This reverts commit 23703a5.

* Remove jnp.lax references

Signed-off-by: Morgan Funtowicz <[email protected]>

* Make style.

Signed-off-by: Morgan Funtowicz <[email protected]>

* Reintroduce Linen changes ...

Signed-off-by: Morgan Funtowicz <[email protected]>

* Make style.

Signed-off-by: Morgan Funtowicz <[email protected]>

* Use jax native's gelu function.

Signed-off-by: Morgan Funtowicz <[email protected]>

* Renaming BertModel to BertModule to highlight the fact this is the Flax Module object.

Signed-off-by: Morgan Funtowicz <[email protected]>

* Rewrite FlaxAutoModel test to not rely on pretrained_model_archive_map

Signed-off-by: Morgan Funtowicz <[email protected]>

* Remove unused variable in BertModule.

Signed-off-by: Morgan Funtowicz <[email protected]>

* Remove unused variable in BertModule again

Signed-off-by: Morgan Funtowicz <[email protected]>

* Attempt to have is_flax_available working again.

Signed-off-by: Morgan Funtowicz <[email protected]>

* Introduce JAX TensorType

Signed-off-by: Morgan Funtowicz <[email protected]>

* Improve ImportError message when trying to convert to various TensorType format.

Signed-off-by: Morgan Funtowicz <[email protected]>

* Makes Flax model jittable.

Signed-off-by: Morgan Funtowicz <[email protected]>

* Ensure flax models are jittable in unittests.

Signed-off-by: Morgan Funtowicz <[email protected]>

* Remove unused imports.

Signed-off-by: Morgan Funtowicz <[email protected]>

* Ensure jax imports are guarded behind is_flax_available.

Signed-off-by: Morgan Funtowicz <[email protected]>

* Make style.

Signed-off-by: Morgan Funtowicz <[email protected]>

* Make style again

Signed-off-by: Morgan Funtowicz <[email protected]>

* Make style again again

Signed-off-by: Morgan Funtowicz <[email protected]>

* Make style again again again

Signed-off-by: Morgan Funtowicz <[email protected]>

* Update src/transformers/file_utils.py

Co-authored-by: Marc van Zee <[email protected]>

* Bump flax to it's latest version

Co-authored-by: Marc van Zee <[email protected]>

* Bump jax version to at least 0.2.0

Signed-off-by: Morgan Funtowicz <[email protected]>

* Style.

Signed-off-by: Morgan Funtowicz <[email protected]>

* Update the unittest to use TensorType.JAX

Signed-off-by: Morgan Funtowicz <[email protected]>

* isort import in tests.

Signed-off-by: Morgan Funtowicz <[email protected]>

* Match new flax parameters name "params"

Signed-off-by: Morgan Funtowicz <[email protected]>

* Remove unused imports.

Signed-off-by: Morgan Funtowicz <[email protected]>

* Add flax models to transformers __init__

Signed-off-by: Morgan Funtowicz <[email protected]>

* Attempt to address all CI related comments.

Signed-off-by: Morgan Funtowicz <[email protected]>

* Correct circle.yml indent.

Signed-off-by: Morgan Funtowicz <[email protected]>

* Correct circle.yml indent (2)

Signed-off-by: Morgan Funtowicz <[email protected]>

* Remove coverage from flax tests

Signed-off-by: Morgan Funtowicz <[email protected]>

* Addressing many naming suggestions from comments

Signed-off-by: Morgan Funtowicz <[email protected]>

* Simplify for loop logic to interate over layers in FlaxBertLayerCollection

Signed-off-by: Morgan Funtowicz <[email protected]>

* use f-string syntax for formatting logs.

Signed-off-by: Morgan Funtowicz <[email protected]>

* Use config property from FlaxPreTrainedModel.

Signed-off-by: Morgan Funtowicz <[email protected]>

* use "cls_token" instead of "first_token" variable name.

Signed-off-by: Morgan Funtowicz <[email protected]>

* use "hidden_state" instead of "h" variable name.

Signed-off-by: Morgan Funtowicz <[email protected]>

* Correct class reference in docstring to link to Flax related modules.

Signed-off-by: Morgan Funtowicz <[email protected]>

* Added HF + Google Flax team copyright.

Signed-off-by: Morgan Funtowicz <[email protected]>

* Make Roberta independent from Bert

Signed-off-by: Morgan Funtowicz <[email protected]>

* Move activation functions to flax_utils.

Signed-off-by: Morgan Funtowicz <[email protected]>

* Move activation functions to flax_utils for bert.

Signed-off-by: Morgan Funtowicz <[email protected]>

* Added docstring for BERT

Signed-off-by: Morgan Funtowicz <[email protected]>

* Update import for Bert and Roberta tokenizers

Signed-off-by: Morgan Funtowicz <[email protected]>

* Make style.

Signed-off-by: Morgan Funtowicz <[email protected]>

* fix-copies

Signed-off-by: Morgan Funtowicz <[email protected]>

* Correct FlaxRobertaLayer to match PyTorch.

Signed-off-by: Morgan Funtowicz <[email protected]>

* Use the same store_artifact for flax unittest

Signed-off-by: Morgan Funtowicz <[email protected]>

* Style.

Signed-off-by: Morgan Funtowicz <[email protected]>

* Make sure gradient are disabled only locally for flax unittest using torch equivalence.

Signed-off-by: Morgan Funtowicz <[email protected]>

* Use relative imports

Signed-off-by: Morgan Funtowicz <[email protected]>

Co-authored-by: Stefan Schweter <[email protected]>
Co-authored-by: Marc van Zee <[email protected]>
Co-authored-by: Sylvain Gugger <[email protected]>
fabiocapsouza added a commit to fabiocapsouza/transformers that referenced this pull request Nov 15, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[testing] test_trainer.py is failing
2 participants