merge main (#23866)

* Debug example code for MegaForCausalLM (#23382) * Debug example code for MegaForCausalLM set ignore_mismatched_sizes=True in model loading code * Fix up * Remove erroneous `img` closing tag (#23646) See #23625 * Fix tensor device while attention_mask is not None (#23538) * Fix tensor device while attention_mask is not None * Fix tensor device while attention_mask is not None * Fix accelerate logger bug (#23650) * fix logger bug * Update tests/mixed_int8/test_mixed_int8.py Co-authored-by: Zachary Mueller <[email protected]> * import `PartialState` --------- Co-authored-by: Zachary Mueller <[email protected]> * Muellerzr fix deepspeed (#23657) * Fix deepspeed recursion * Better fix * Bugfix: LLaMA layer norm incorrectly changes input type and consumers lots of memory (#23535) * Fixed bug where LLaMA layer norm would change input type. * make fix-copies --------- Co-authored-by: younesbelkada <[email protected]> * Fix wav2vec2 is_batched check to include 2-D numpy arrays (#23223) * Fix wav2vec2 is_batched check to include 2-D numpy arrays * address comment * Add tests * oops * oops * Switch to np array Co-authored-by: Sanchit Gandhi <[email protected]> * Switch to np array * condition merge * Specify mono channel only in comment * oops, add other comment too * make style * Switch list check from falsiness to empty --------- Co-authored-by: Sanchit Gandhi <[email protected]> * changing the requirements to a cpu torch version that works (#23483) * Fix SAM tests and use smaller checkpoints (#23656) * Fix SAM tests and use smaller checkpoints * Override test_model_from_pretrained to use sam-vit-base as well * make fixup * Update all no_trainer with skip_first_batches (#23664) * Update workflow files (#23658) * fix * fix --------- Co-authored-by: ydshieh <[email protected]> * [image-to-text pipeline] Add conditional text support + GIT (#23362) * First draft * Remove print statements * Add conditional generation * Add more tests * Remove scripts * Remove BLIP specific linkes * Add support for pix2struct * Add fast test * Address comment * Fix style * small fix to remove unused eos in processor when it's not used. (#23408) * Bump requests from 2.27.1 to 2.31.0 in /examples/research_projects/decision_transformer (#23673) Bump requests in /examples/research_projects/decision_transformer Bumps [requests](https://github.com/psf/requests) from 2.27.1 to 2.31.0. - [Release notes](https://github.com/psf/requests/releases) - [Changelog](https://github.com/psf/requests/blob/main/HISTORY.md) - [Commits](psf/requests@v2.27.1...v2.31.0) --- updated-dependencies: - dependency-name: requests dependency-type: direct:production ... Signed-off-by: dependabot[bot] <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * Bump requests from 2.22.0 to 2.31.0 in /examples/research_projects/visual_bert (#23670) Bump requests in /examples/research_projects/visual_bert Bumps [requests](https://github.com/psf/requests) from 2.22.0 to 2.31.0. - [Release notes](https://github.com/psf/requests/releases) - [Changelog](https://github.com/psf/requests/blob/main/HISTORY.md) - [Commits](psf/requests@v2.22.0...v2.31.0) --- updated-dependencies: - dependency-name: requests dependency-type: direct:production ... Signed-off-by: dependabot[bot] <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * Bump requests from 2.22.0 to 2.31.0 in /examples/research_projects/lxmert (#23668) Bump requests in /examples/research_projects/lxmert Bumps [requests](https://github.com/psf/requests) from 2.22.0 to 2.31.0. - [Release notes](https://github.com/psf/requests/releases) - [Changelog](https://github.com/psf/requests/blob/main/HISTORY.md) - [Commits](psf/requests@v2.22.0...v2.31.0) --- updated-dependencies: - dependency-name: requests dependency-type: direct:production ... Signed-off-by: dependabot[bot] <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * Add PerSAM [bis] (#23659) * Add PerSAM args * Make attn_sim optional * Rename to attention_similarity * Add docstrigns * Improve docstrings * Fix typo in a parameter name for open llama model (#23637) * Update modeling_open_llama.py Fix typo in `use_memorry_efficient_attention` parameter name * Update configuration_open_llama.py Fix typo in `use_memorry_efficient_attention` parameter name * Update configuration_open_llama.py Take care of backwards compatibility ensuring that the previous parameter name is taken into account if used * Update configuration_open_llama.py format to adjust the line length * Update configuration_open_llama.py proper code formatting using `make fixup` * Update configuration_open_llama.py pop the argument not to let it be set later down the line * Fix PyTorch SAM tests (#23682) fix Co-authored-by: ydshieh <[email protected]> * Making `safetensors` a core dependency. (#23254) * Making `safetensors` a core dependency. To be merged later, I'm creating the PR so we can try it out. * Update setup.py * Remove duplicates. * Even more redundant. * 🌐 [i18n-KO] Translated `tasks/monocular_depth_estimation.mdx` to Korean (#23621) docs: ko: `tasks/monocular_depth_estimation` Co-authored-by: Hyeonseo Yun <[email protected]> Co-authored-by: Sohyun Sim <[email protected]> Co-authored-by: Gabriel Yang <[email protected]> Co-authored-by: Wonhyeong Seo <[email protected]> Co-authored-by: Jungnerd <[email protected]> * Fix a `BridgeTower` test (#23694) fix Co-authored-by: ydshieh <[email protected]> * [`SAM`] Fixes pipeline and adds a dummy pipeline test (#23684) * add a dummy pipeline test * change test name * TF version compatibility fixes (#23663) * New TF version compatibility fixes * Remove dummy print statement, move expand_1d * Make a proper framework inference function * Make a proper framework inference function * ValueError -> TypeError * [`Blip`] Fix blip doctest (#23698) fix blip doctest * is_batched fix for remaining 2-D numpy arrays (#23309) * Fix is_batched code to allow 2-D numpy arrays for audio * Tests * Fix typo * Incorporate comments from PR #23223 * Skip `TFCvtModelTest::test_keras_fit_mixed_precision` for now (#23699) fix Co-authored-by: ydshieh <[email protected]> * fix: load_best_model_at_end error when load_in_8bit is True (#23443) Ref: huggingface/peft#394 Loading a quantized checkpoint into non-quantized Linear8bitLt is not supported. call module.cuda() before module.load_state_dict() * Fix some docs what layerdrop does (#23691) * Fix some docs what layerdrop does * Update src/transformers/models/data2vec/configuration_data2vec_audio.py Co-authored-by: Sylvain Gugger <[email protected]> * Fix more docs --------- Co-authored-by: Sylvain Gugger <[email protected]> * add GPTJ/bloom/llama/opt into model list and enhance the jit support (#23291) Signed-off-by: Wang, Yi A <[email protected]> * 4-bit QLoRA via bitsandbytes (4-bit base model + LoRA) (#23479) * Added lion and paged optimizers and made original tests pass. * Added tests for paged and lion optimizers. * Added and fixed optimizer tests. * Style and quality checks. * Initial draft. Some tests fail. * Fixed dtype bug. * Fixed bug caused by torch_dtype='auto'. * All test green for 8-bit and 4-bit layers. * Added fix for fp32 layer norms and bf16 compute in LLaMA. * Initial draft. Some tests fail. * Fixed dtype bug. * Fixed bug caused by torch_dtype='auto'. * All test green for 8-bit and 4-bit layers. * Added lion and paged optimizers and made original tests pass. * Added tests for paged and lion optimizers. * Added and fixed optimizer tests. * Style and quality checks. * Fixing issues for PR #23479. * Added fix for fp32 layer norms and bf16 compute in LLaMA. * Reverted variable name change. * Initial draft. Some tests fail. * Fixed dtype bug. * Fixed bug caused by torch_dtype='auto'. * All test green for 8-bit and 4-bit layers. * Added lion and paged optimizers and made original tests pass. * Added tests for paged and lion optimizers. * Added and fixed optimizer tests. * Style and quality checks. * Added missing tests. * Fixup changes. * Added fixup changes. * Missed some variables to rename. * revert trainer tests * revert test trainer * another revert * fix tests and safety checkers * protect import * simplify a bit * Update src/transformers/trainer.py * few fixes * add warning * replace with `load_in_kbit = load_in_4bit or load_in_8bit` * fix test * fix tests * this time fix tests * safety checker * add docs * revert torch_dtype * Apply suggestions from code review Co-authored-by: Sylvain Gugger <[email protected]> * multiple fixes * update docs * version checks and multiple fixes * replace `is_loaded_in_kbit` * replace `load_in_kbit` * change methods names * better checks * oops * oops * address final comments --------- Co-authored-by: younesbelkada <[email protected]> Co-authored-by: Younes Belkada <[email protected]> Co-authored-by: Sylvain Gugger <[email protected]> * Paged Optimizer + Lion Optimizer for Trainer (#23217) * Added lion and paged optimizers and made original tests pass. * Added tests for paged and lion optimizers. * Added and fixed optimizer tests. * Style and quality checks. --------- Co-authored-by: younesbelkada <[email protected]> * Export to ONNX doc refocused on using optimum, added tflite (#23434) * doc refocused on using optimum, tflite * minor updates to fix checks * Apply suggestions from code review Co-authored-by: regisss <[email protected]> * TFLite to separate page, added links * Removed the onnx list builder * make style * Update docs/source/en/serialization.mdx Co-authored-by: regisss <[email protected]> --------- Co-authored-by: regisss <[email protected]> * fix: use bool instead of uint8/byte in Deberta/DebertaV2/SEW-D to make it compatible with TensorRT (#23683) * Use bool instead of uint8/byte in DebertaV2 to make it compatible with TensorRT TensorRT cannot accept onnx graph with uint8/byte intermediate tensors. This PR uses bool tensors instead of unit8/byte tensors to make the exported onnx file can work with TensorRT. * fix: use bool instead of uint8/byte in Deberta and SEW-D --------- Co-authored-by: Yuxian Qiu <[email protected]> * fix gptj could not jit.trace in GPU (#23317) Signed-off-by: Wang, Yi A <[email protected]> * Better TF docstring types (#23477) * Rework TF type hints to use | None instead of Optional[] for tf.Tensor * Rework TF type hints to use | None instead of Optional[] for tf.Tensor * Don't forget the imports * Add the imports to tests too * make fixup * Refactor tests that depended on get_type_hints * Better test refactor * Fix an old hidden bug in the test_keras_fit input creation code * Fix for the Deit tests * Minor awesome-transformers.md fixes (#23453) Minor docs fixes * TF SAM memory reduction (#23732) * Extremely small change to TF SAM dummies to reduce memory usage on build * remove debug breakpoint * Debug print statement to track array sizes * More debug shape printing * More debug shape printing * Now remove the debug shape printing * make fixup * make fixup * fix: delete duplicate sentences in `document_question_answering.mdx` (#23735) fix: delete duplicate sentence * fix: Whisper generate, move text_prompt_ids trim up for max_new_tokens calculation (#23724) move text_prompt_ids trimming to top * Overhaul TF serving signatures + dummy inputs (#23234) * Let's try autodetecting serving sigs * Don't clobber existing sigs * Change shapes for multiplechoice models * Make default dummy inputs smarter too * Fix missing f-string * Let's YOLO a serving output too * Read __class__.__name__ properly * Don't just pass naked lists in there and expect it to be okay * Code cleanup * Update default serving sig * Clearer error messages * Further updates to the default serving output * make fixup * Update the serving output a bit more * Cleanups and renames, raise errors appropriately when we can't infer inputs * More renames * we're building in a functional context again, yolo * import DUMMY_INPUTS from the right place * import DUMMY_INPUTS from the right place * Support cross-attention in the dummies * Support cross-attention in the dummies * Complete removal of dummy/serving overrides in BERT * Complete removal of dummy/serving overrides in RoBERTa * Obliterate lots and lots of serving sig and dummy overrides * merge type hint changes * Fix for token_type_ids with vocab_size 1 * Add missing property decorator * Fix T5 and hopefully some models that take conv inputs * More signature pruning * Fix T5's signature * Fix Wav2Vec2 signature * Fix LongformerForMultipleChoice input signature * Fix BLIP and LED * Better default serving output error handling * Fix BART dummies * Fix dummies for cross-attention, esp encoder-decoder models * Fix visionencoderdecoder signature * Fix BLIP serving output * Small tweak to BART dummies * Cleanup the ugly parameter inspection line that I used in a few places * committed a breakpoint again * Move the text_dims check * Remove blip_text serving_output * Add decoder_input_ids to the default input sig * Remove all the manual overrides for encoder-decoder model signatures * Tweak longformer/led input sigs * Tweak default serving output * output.keys() -> output * make fixup * [Whisper] Reduce batch size in tests (#23736) * Fix the regex in `get_imports` to support multiline try blocks and excepts with specific exception types (#23725) * fix and test get_imports for multiline try blocks, and excepts with specific errors * fixup * add some more tests * add license * Fix sagemaker DP/MP (#23681) * Check for use_sagemaker_dp * Add a check for is_sagemaker_mp when setting _n_gpu again. Should be last broken thing * Try explicit check? * Quality * Enable prompts on the Hub (#23662) * Enable prompts on the Hub * Update src/transformers/tools/prompts.py Co-authored-by: amyeroberts <[email protected]> * Address review comments --------- Co-authored-by: amyeroberts <[email protected]> * Remove the last few TF serving sigs (#23738) Remove some more serving methods that (I think?) turned up while this PR was open * Fix `pip install --upgrade accelerate` command in modeling_utils.py (#23747) Fix command in modeling_utils.py * Add LlamaIndex to awesome-transformers.md (#23484) * Fix psuh_to_hub in Trainer when nothing needs pushing (#23751) * Revamp test selection for the example tests (#23737) * Revamp test selection for the example tests * Rename old XLA test and fake modif in run_glue * Fixes * Fake Trainer modif * Remove fake modifs * [LongFormer] code nits, removed unused parameters (#23749) * remove unused parameters * remove unused parameters in config * Fix is_ninja_available() (#23752) * Fix is_ninja_available() search ninja using subprocess instead of importlib. * Fix style * Fix doc * Fix style * Bump tornado from 6.0.4 to 6.3.2 in /examples/research_projects/lxmert (#23766) Bumps [tornado](https://github.com/tornadoweb/tornado) from 6.0.4 to 6.3.2. - [Changelog](https://github.com/tornadoweb/tornado/blob/master/docs/releases.rst) - [Commits](tornadoweb/tornado@v6.0.4...v6.3.2) --- updated-dependencies: - dependency-name: tornado dependency-type: direct:production ... Signed-off-by: dependabot[bot] <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * Bump tornado from 6.0.4 to 6.3.2 in /examples/research_projects/visual_bert (#23767) Bump tornado in /examples/research_projects/visual_bert Bumps [tornado](https://github.com/tornadoweb/tornado) from 6.0.4 to 6.3.2. - [Changelog](https://github.com/tornadoweb/tornado/blob/master/docs/releases.rst) - [Commits](tornadoweb/tornado@v6.0.4...v6.3.2) --- updated-dependencies: - dependency-name: tornado dependency-type: direct:production ... Signed-off-by: dependabot[bot] <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * [`Nllb-Moe`] Fix nllb moe accelerate issue (#23758) fix nllb moe accelerate issue * [OPT] Doc nit, using fast is fine (#23789) small doc nit * Fix RWKV backward on GPU (#23774) * Update trainer.mdx class_weights example (#23787) class_weights tensor should follow model's device * no_cuda does not take effect in non distributed environment (#23795) Signed-off-by: Wang, Yi <[email protected]> * Fix no such file or directory error (#23783) * Fix no such file or directory error * Address comment * Fix formatting issue * Log the right train_batch_size if using auto_find_batch_size and also log the adjusted value seperately. (#23800) * Log right bs * Log * Diff message * Enable code-specific revision for code on the Hub (#23799) * Enable code-specific revision for code on the Hub * invalidate old revision * [Time-Series] Autoformer model (#21891) * ran `transformers-cli add-new-model-like` * added `AutoformerLayernorm` and `AutoformerSeriesDecomposition` * added `decomposition_layer` in `init` and `moving_avg` to config * added `AutoformerAutoCorrelation` to encoder & decoder * removed caninical self attention `AutoformerAttention` * added arguments in config and model tester. Init works! 😁 * WIP autoformer attention with autocorrlation * fixed `attn_weights` size * wip time_delay_agg_training * fixing sizes and debug time_delay_agg_training * aggregation in training works! 😁 * `top_k_delays` -> `top_k_delays_index` and added `contiguous()` * wip time_delay_agg_inference * finish time_delay_agg_inference 😎 * added resize to autocorrelation * bug fix: added the length of the output signal to `irfft` * `attention_mask = None` in the decoder * fixed test: changed attention expected size, `test_attention_outputs` works! * removed unnecessary code * apply AutoformerLayernorm in final norm in enc & dec * added series decomposition to the encoder * added series decomp to decoder, with inputs * added trend todos * added autoformer to README * added to index * added autoformer.mdx * remove scaling and init attention_mask in the decoder * make style * fix copies * make fix-copies * inital fix-copies * fix from #22076 * make style * fix class names * added trend * added d_model and projection layers * added `trend_projection` source, and decomp layer init * added trend & seasonal init for decoder input * AutoformerModel cannot be copied as it has the decomp layer too * encoder can be copied from time series transformer * fixed generation and made distrb. out more robust * use context window to calculate decomposition * use the context_window for decomposition * use output_params helper * clean up AutoformerAttention * subsequences_length off by 1 * make fix copies * fix test * added init for nn.Conv1d * fix IGNORE_NON_TESTED * added model_doc * fix ruff * ignore tests * remove dup * fix SPECIAL_CASES_TO_ALLOW * do not copy due to conv1d weight init * remove unused imports * added short summary * added label_length and made the model non-autoregressive * added params docs * better doc for `factor` * fix tests * renamed `moving_avg` to `moving_average` * renamed `factor` to `autocorrelation_factor` * make style * Update src/transformers/models/autoformer/configuration_autoformer.py Co-authored-by: NielsRogge <[email protected]> * Update src/transformers/models/autoformer/configuration_autoformer.py Co-authored-by: NielsRogge <[email protected]> * fix configurations * fix integration tests * Update src/transformers/models/autoformer/configuration_autoformer.py Co-authored-by: amyeroberts <[email protected]> * fixing `lags_sequence` doc * Revert "fixing `lags_sequence` doc" This reverts commit 21e3491. * Update src/transformers/models/autoformer/modeling_autoformer.py Co-authored-by: amyeroberts <[email protected]> * Update src/transformers/models/autoformer/modeling_autoformer.py Co-authored-by: amyeroberts <[email protected]> * Update src/transformers/models/autoformer/modeling_autoformer.py Co-authored-by: amyeroberts <[email protected]> * Apply suggestions from code review Co-authored-by: amyeroberts <[email protected]> * Update src/transformers/models/autoformer/configuration_autoformer.py Co-authored-by: amyeroberts <[email protected]> * model layers now take the config * added `layer_norm_eps` to the config * Update src/transformers/models/autoformer/modeling_autoformer.py Co-authored-by: amyeroberts <[email protected]> * added `config.layer_norm_eps` to AutoformerLayernorm * added `config.layer_norm_eps` to all layernorm layers * Update src/transformers/models/autoformer/configuration_autoformer.py Co-authored-by: amyeroberts <[email protected]> * Update src/transformers/models/autoformer/configuration_autoformer.py Co-authored-by: amyeroberts <[email protected]> * Update src/transformers/models/autoformer/configuration_autoformer.py Co-authored-by: amyeroberts <[email protected]> * Update src/transformers/models/autoformer/configuration_autoformer.py Co-authored-by: amyeroberts <[email protected]> * fix variable names * added inital pretrained model * added use_cache docstring * doc strings for trend and use_cache * fix order of args * imports on one line * fixed get_lagged_subsequences docs * add docstring for create_network_inputs * get rid of layer_norm_eps config * add back layernorm * update fixture location * fix signature * use AutoformerModelOutput dataclass * fix pretrain config * no need as default exists * subclass ModelOutput * remove layer_norm_eps config * fix test_model_outputs_equivalence test * test hidden_states_output * make fix-copies * Update src/transformers/models/autoformer/configuration_autoformer.py Co-authored-by: amyeroberts <[email protected]> * removed unused attr * Update tests/models/autoformer/test_modeling_autoformer.py Co-authored-by: amyeroberts <[email protected]> * Update src/transformers/models/autoformer/modeling_autoformer.py Co-authored-by: amyeroberts <[email protected]> * Update src/transformers/models/autoformer/modeling_autoformer.py Co-authored-by: amyeroberts <[email protected]> * Update src/transformers/models/autoformer/modeling_autoformer.py Co-authored-by: amyeroberts <[email protected]> * Update src/transformers/models/autoformer/modeling_autoformer.py Co-authored-by: amyeroberts <[email protected]> * Update src/transformers/models/autoformer/modeling_autoformer.py Co-authored-by: amyeroberts <[email protected]> * Update src/transformers/models/autoformer/modeling_autoformer.py Co-authored-by: amyeroberts <[email protected]> * use AutoFormerDecoderOutput * fix formatting * fix formatting --------- Co-authored-by: Kashif Rasul <[email protected]> Co-authored-by: NielsRogge <[email protected]> Co-authored-by: amyeroberts <[email protected]> * add type hint in pipeline model argument (#23740) * add type hint in pipeline model argument * add pretrainedmodel and tfpretainedmodel type hint * make type hints string * TF SAM shape flexibility fixes (#23842) SAM shape flexibility fixes for compilation * fix Whisper tests on GPU (#23753) * move input features to GPU * skip these tests because undefined behavior * unskip tests * 🌐 [i18n-KO] Translated `fast_tokenizers.mdx` to Korean (#22956) * docs: ko: fast_tokenizer.mdx content - translated Co-Authored-By: Gabriel Yang <[email protected]> Co-Authored-By: Nayeon Han <[email protected]> Co-Authored-By: Hyeonseo Yun <[email protected]> Co-Authored-By: Sohyun Sim <[email protected]> Co-Authored-By: Jungnerd <[email protected]> Co-Authored-By: Wonhyeong Seo <[email protected]> * Update docs/source/ko/fast_tokenizers.mdx Co-authored-by: Sohyun Sim <[email protected]> * Update docs/source/ko/fast_tokenizers.mdx Co-authored-by: Sohyun Sim <[email protected]> * Update docs/source/ko/fast_tokenizers.mdx Co-authored-by: Sohyun Sim <[email protected]> * Update docs/source/ko/fast_tokenizers.mdx Co-authored-by: Sohyun Sim <[email protected]> * Update docs/source/ko/fast_tokenizers.mdx Co-authored-by: Sohyun Sim <[email protected]> * Update docs/source/ko/fast_tokenizers.mdx Co-authored-by: Sohyun Sim <[email protected]> * Update docs/source/ko/fast_tokenizers.mdx Co-authored-by: Hyeonseo Yun <[email protected]> * Update fast_tokenizers.mdx * Update fast_tokenizers.mdx * Update fast_tokenizers.mdx * Update fast_tokenizers.mdx * Update _toctree.yml --------- Co-authored-by: Gabriel Yang <[email protected]> Co-authored-by: Nayeon Han <[email protected]> Co-authored-by: Hyeonseo Yun <[email protected]> Co-authored-by: Sohyun Sim <[email protected]> Co-authored-by: Jungnerd <[email protected]> Co-authored-by: Wonhyeong Seo <[email protected]> Co-authored-by: Hyeonseo Yun <[email protected]> * [i18n-KO] Translated video_classification.mdx to Korean (#23026) * task/video_classification translated Co-Authored-By: Hyeonseo Yun <[email protected]> Co-Authored-By: Gabriel Yang <[email protected]> Co-Authored-By: Sohyun Sim <[email protected]> Co-Authored-By: Nayeon Han <[email protected]> Co-Authored-By: Wonhyeong Seo <[email protected]> Co-Authored-By: Jungnerd <[email protected]> * Update docs/source/ko/tasks/video_classification.mdx Co-authored-by: Jungnerd <[email protected]> * Update docs/source/ko/tasks/video_classification.mdx Co-authored-by: Jungnerd <[email protected]> * Update docs/source/ko/tasks/video_classification.mdx Co-authored-by: Jungnerd <[email protected]> * Update docs/source/ko/tasks/video_classification.mdx Co-authored-by: Jungnerd <[email protected]> * Update docs/source/ko/tasks/video_classification.mdx Co-authored-by: Jungnerd <[email protected]> * Update docs/source/ko/tasks/video_classification.mdx Co-authored-by: Jungnerd <[email protected]> * Update docs/source/ko/tasks/video_classification.mdx Co-authored-by: Jungnerd <[email protected]> * Update docs/source/ko/tasks/video_classification.mdx Co-authored-by: Jungnerd <[email protected]> * Update docs/source/ko/tasks/video_classification.mdx Co-authored-by: Sohyun Sim <[email protected]> * Update docs/source/ko/tasks/video_classification.mdx Co-authored-by: Sohyun Sim <[email protected]> * Apply suggestions from code review Co-authored-by: Sohyun Sim <[email protected]> Co-authored-by: Hyeonseo Yun <[email protected]> Co-authored-by: Jungnerd <[email protected]> Co-authored-by: Gabriel Yang <[email protected]> * Update video_classification.mdx * Update _toctree.yml * Update _toctree.yml * Update _toctree.yml * Update _toctree.yml --------- Co-authored-by: Hyeonseo Yun <[email protected]> Co-authored-by: Gabriel Yang <[email protected]> Co-authored-by: Sohyun Sim <[email protected]> Co-authored-by: Nayeon Han <[email protected]> Co-authored-by: Wonhyeong Seo <[email protected]> Co-authored-by: Jungnerd <[email protected]> Co-authored-by: Hyeonseo Yun <[email protected]> * 🌐 [i18n-KO] Translated `troubleshooting.mdx` to Korean (#23166) * docs: ko: troubleshooting.mdx * revised: fix _toctree.yml #23112 * feat: nmt draft `troubleshooting.mdx` * fix: manual edits `troubleshooting.mdx` * revised: resolve suggestions troubleshooting.mdx Co-authored-by: Sohyun Sim <[email protected]> --------- Co-authored-by: Sohyun Sim <[email protected]> * Adds a FlyteCallback (#23759) * initial flyte callback * lint * logs should still be saved to Flyte even if pandas isn't install (unlikely) * cr - flyte team * add docs for Flytecallback * fix doc string - cr sgugger * Apply suggestions from code review cr - sgugger fix doc strings Co-authored-by: Sylvain Gugger <[email protected]> --------- Co-authored-by: Sylvain Gugger <[email protected]> * Update collating_graphormer.py (#23862) * [LlamaTokenizerFast] nit update `post_processor` on the fly (#23855) * Update the processor when changing add_eos and add_bos * fixup * update * add a test * fix failing tests * fixup * #23388 Issue: Update RoBERTa configuration (#23863) * [from_pretrained] imporve the error message when `_no_split_modules` is not defined (#23861) * Better warning * Update src/transformers/modeling_utils.py Co-authored-by: Sylvain Gugger <[email protected]> * format line --------- Co-authored-by: Sylvain Gugger <[email protected]> --------- Signed-off-by: dependabot[bot] <[email protected]> Signed-off-by: Wang, Yi A <[email protected]> Signed-off-by: Wang, Yi <[email protected]> Co-authored-by: Tyler <[email protected]> Co-authored-by: Joshua Lochner <[email protected]> Co-authored-by: zspo <[email protected]> Co-authored-by: Younes Belkada <[email protected]> Co-authored-by: Zachary Mueller <[email protected]> Co-authored-by: Tim Dettmers <[email protected]> Co-authored-by: younesbelkada <[email protected]> Co-authored-by: LWprogramming <[email protected]> Co-authored-by: Sanchit Gandhi <[email protected]> Co-authored-by: sshahrokhi <[email protected]> Co-authored-by: Matt <[email protected]> Co-authored-by: Yih-Dar <[email protected]> Co-authored-by: ydshieh <[email protected]> Co-authored-by: NielsRogge <[email protected]> Co-authored-by: Nicolas Patry <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: Alex <[email protected]> Co-authored-by: Nayeon Han <[email protected]> Co-authored-by: Hyeonseo Yun <[email protected]> Co-authored-by: Sohyun Sim <[email protected]> Co-authored-by: Gabriel Yang <[email protected]> Co-authored-by: Wonhyeong Seo <[email protected]> Co-authored-by: Jungnerd <[email protected]> Co-authored-by: 小桐桐 <[email protected]> Co-authored-by: Sylvain Gugger <[email protected]> Co-authored-by: Wang, Yi <[email protected]> Co-authored-by: Maria Khalusova <[email protected]> Co-authored-by: regisss <[email protected]> Co-authored-by: uchuhimo <[email protected]> Co-authored-by: Yuxian Qiu <[email protected]> Co-authored-by: pagarsky <[email protected]> Co-authored-by: Connor Henderson <[email protected]> Co-authored-by: Daniel King <[email protected]> Co-authored-by: amyeroberts <[email protected]> Co-authored-by: Eric J. Wang <[email protected]> Co-authored-by: Ravi Theja <[email protected]> Co-authored-by: Arthur <[email protected]> Co-authored-by: 玩火 <[email protected]> Co-authored-by: amitportnoy <[email protected]> Co-authored-by: Ran Ran <[email protected]> Co-authored-by: Eli Simhayev <[email protected]> Co-authored-by: Kashif Rasul <[email protected]> Co-authored-by: Samin Yasar <[email protected]> Co-authored-by: Matthijs Hollemans <[email protected]> Co-authored-by: Kihoon Son <[email protected]> Co-authored-by: Hyeonseo Yun <[email protected]> Co-authored-by: peridotml <[email protected]> Co-authored-by: Clémentine Fourrier <[email protected]> Co-authored-by: Vijeth Moudgalya <[email protected]>
huggingface · May 30, 2023 · 2d0e384 · 2d0e384
1 parent c9f3cff
commit 2d0e384
Show file tree

Hide file tree

Showing 320 changed files with 9,944 additions and 8,114 deletions.
diff --git a/.circleci/config.yml b/.circleci/config.yml
@@ -43,6 +43,12 @@ jobs:
                 else
                     touch test_preparation/test_list.txt
                 fi
+            - run: |
+                  if [ -f examples_test_list.txt ]; then
+                      mv examples_test_list.txt test_preparation/examples_test_list.txt
+                  else
+                      touch test_preparation/examples_test_list.txt
+                  fi
             - run: |
                 if [ -f doctest_list.txt ]; then
                     cp doctest_list.txt test_preparation/doctest_list.txt
@@ -62,19 +68,6 @@ jobs:
                 else
                     touch test_preparation/filtered_test_list.txt
                 fi
-            - run: python utils/tests_fetcher.py --filters tests examples | tee examples_tests_fetched_summary.txt
-            - run: |
-                  if [ -f test_list.txt ]; then
-                      mv test_list.txt test_preparation/examples_test_list.txt
-                  else
-                      touch test_preparation/examples_test_list.txt
-                  fi
-            - run: |
-                  if [ -f filtered_test_list_cross_tests.txt ]; then
-                      mv filtered_test_list_cross_tests.txt test_preparation/filtered_test_list_cross_tests.txt
-                  else
-                      touch test_preparation/filtered_test_list_cross_tests.txt
-                  fi
             - store_artifacts:
                   path: test_preparation/test_list.txt
             - store_artifacts:
@@ -111,7 +104,7 @@ jobs:
             - run: |
                   mkdir test_preparation
                   echo -n "tests" > test_preparation/test_list.txt
-                  echo -n "tests" > test_preparation/examples_test_list.txt
+                  echo -n "all" > test_preparation/examples_test_list.txt
                   echo -n "tests/repo_utils" > test_preparation/test_repo_utils.txt
             - run: |
                   echo -n "tests" > test_list.txt

diff --git a/.circleci/create_circleci_config.py b/.circleci/create_circleci_config.py
@@ -342,7 +342,6 @@ def job_name(self):
         "pip install .[sklearn,torch,sentencepiece,testing,torch-speech]",
         "pip install -r examples/pytorch/_tests_requirements.txt",
     ],
-    tests_to_run="./examples/pytorch/",
 )
 
 
@@ -355,7 +354,6 @@ def job_name(self):
         "pip install .[sklearn,tensorflow,sentencepiece,testing]",
         "pip install -r examples/tensorflow/_tests_requirements.txt",
     ],
-    tests_to_run="./examples/tensorflow/",
 )
 
 
@@ -367,7 +365,6 @@ def job_name(self):
         "pip install .[flax,testing,sentencepiece]",
         "pip install -r examples/flax/_tests_requirements.txt",
     ],
-    tests_to_run="./examples/flax/",
 )
 
 
@@ -551,7 +548,17 @@ def create_circleci_config(folder=None):
 
     example_file = os.path.join(folder, "examples_test_list.txt")
     if os.path.exists(example_file) and os.path.getsize(example_file) > 0:
-        jobs.extend(EXAMPLES_TESTS)
+        with open(example_file, "r", encoding="utf-8") as f:
+            example_tests = f.read().split(" ")
+        for job in EXAMPLES_TESTS:
+            framework = job.name.replace("examples_", "").replace("torch", "pytorch")
+            if example_tests == "all":
+                job.tests_to_run = [f"examples/{framework}"]
+            else:
+                job.tests_to_run = [f for f in example_tests if f.startswith(f"examples/{framework}")]
+
+            if len(job.tests_to_run) > 0:
+                jobs.append(job)
 
     doctest_file = os.path.join(folder, "doctest_list.txt")
     if os.path.exists(doctest_file):

diff --git a/.github/workflows/self-push.yml b/.github/workflows/self-push.yml
@@ -195,6 +195,10 @@ jobs:
           git checkout ${{ env.CI_SHA }}
           echo "log = $(git log -n 1)"
 
+      - name: Reinstall transformers in edit mode (remove the one installed during docker image build)
+        working-directory: /transformers
+        run: python3 -m pip uninstall -y transformers && python3 -m pip install -e .
+
       - name: Echo folder ${{ matrix.folders }}
         shell: bash
         # For folders like `models/bert`, set an env. var. (`matrix_folders`) to `models_bert`, which will be used to
@@ -284,6 +288,10 @@ jobs:
           git checkout ${{ env.CI_SHA }}
           echo "log = $(git log -n 1)"
 
+      - name: Reinstall transformers in edit mode (remove the one installed during docker image build)
+        working-directory: /transformers
+        run: python3 -m pip uninstall -y transformers && python3 -m pip install -e .
+
       - name: Echo folder ${{ matrix.folders }}
         shell: bash
         # For folders like `models/bert`, set an env. var. (`matrix_folders`) to `models_bert`, which will be used to
@@ -373,6 +381,10 @@ jobs:
           git checkout ${{ env.CI_SHA }}
           echo "log = $(git log -n 1)"
 
+      - name: Reinstall transformers in edit mode (remove the one installed during docker image build)
+        working-directory: /workspace/transformers
+        run: python3 -m pip uninstall -y transformers && python3 -m pip install -e .
+
       - name: Remove cached torch extensions
         run: rm -rf /github/home/.cache/torch_extensions/
 
@@ -459,6 +471,10 @@ jobs:
           git checkout ${{ env.CI_SHA }}
           echo "log = $(git log -n 1)"
 
+      - name: Reinstall transformers in edit mode (remove the one installed during docker image build)
+        working-directory: /workspace/transformers
+        run: python3 -m pip uninstall -y transformers && python3 -m pip install -e .
+
       - name: Remove cached torch extensions
         run: rm -rf /github/home/.cache/torch_extensions/
 

diff --git a/.github/workflows/self-scheduled.yml b/.github/workflows/self-scheduled.yml
@@ -119,6 +119,10 @@ jobs:
         working-directory: /transformers
         run: git fetch && git checkout ${{ github.sha }}
 
+      - name: Reinstall transformers in edit mode (remove the one installed during docker image build)
+        working-directory: /transformers
+        run: python3 -m pip uninstall -y transformers && python3 -m pip install -e .
+
       - name: NVIDIA-SMI
         run: |
           nvidia-smi
@@ -176,6 +180,10 @@ jobs:
         working-directory: /transformers
         run: git fetch && git checkout ${{ github.sha }}
 
+      - name: Reinstall transformers in edit mode (remove the one installed during docker image build)
+        working-directory: /transformers
+        run: python3 -m pip uninstall -y transformers && python3 -m pip install -e .
+
       - name: NVIDIA-SMI
         run: |
           nvidia-smi
@@ -221,6 +229,10 @@ jobs:
         working-directory: /transformers
         run: git fetch && git checkout ${{ github.sha }}
 
+      - name: Reinstall transformers in edit mode (remove the one installed during docker image build)
+        working-directory: /transformers
+        run: python3 -m pip uninstall -y transformers && python3 -m pip install -e .
+
       - name: NVIDIA-SMI
         run: |
           nvidia-smi
@@ -268,6 +280,10 @@ jobs:
         working-directory: /transformers
         run: git fetch && git checkout ${{ github.sha }}
 
+      - name: Reinstall transformers in edit mode (remove the one installed during docker image build)
+        working-directory: /transformers
+        run: python3 -m pip uninstall -y transformers && python3 -m pip install -e .
+
       - name: NVIDIA-SMI
         run: |
           nvidia-smi
@@ -315,6 +331,10 @@ jobs:
         run: |
           git fetch && git checkout ${{ github.sha }}
 
+      - name: Reinstall transformers in edit mode (remove the one installed during docker image build)
+        working-directory: /transformers
+        run: python3 -m pip uninstall -y transformers && python3 -m pip install -e .
+
       - name: NVIDIA-SMI
         run: |
           nvidia-smi
@@ -361,6 +381,10 @@ jobs:
         working-directory: /workspace/transformers
         run: git fetch && git checkout ${{ github.sha }}
 
+      - name: Reinstall transformers in edit mode (remove the one installed during docker image build)
+        working-directory: /workspace/transformers
+        run: python3 -m pip uninstall -y transformers && python3 -m pip install -e .
+
       - name: Remove cached torch extensions
         run: rm -rf /github/home/.cache/torch_extensions/
 

diff --git a/README.md b/README.md
@@ -292,6 +292,7 @@ Current number of checkpoints: ![](https://img.shields.io/endpoint?url=https://h
 1. **[ALIGN](https://huggingface.co/docs/transformers/model_doc/align)** (from Google Research) released with the paper [Scaling Up Visual and Vision-Language Representation Learning With Noisy Text Supervision](https://arxiv.org/abs/2102.05918) by Chao Jia, Yinfei Yang, Ye Xia, Yi-Ting Chen, Zarana Parekh, Hieu Pham, Quoc V. Le, Yunhsuan Sung, Zhen Li, Tom Duerig.
 1. **[AltCLIP](https://huggingface.co/docs/transformers/model_doc/altclip)** (from BAAI) released with the paper [AltCLIP: Altering the Language Encoder in CLIP for Extended Language Capabilities](https://arxiv.org/abs/2211.06679) by Chen, Zhongzhi and Liu, Guang and Zhang, Bo-Wen and Ye, Fulong and Yang, Qinghong and Wu, Ledell.
 1. **[Audio Spectrogram Transformer](https://huggingface.co/docs/transformers/model_doc/audio-spectrogram-transformer)** (from MIT) released with the paper [AST: Audio Spectrogram Transformer](https://arxiv.org/abs/2104.01778) by Yuan Gong, Yu-An Chung, James Glass.
+1. **[Autoformer](https://huggingface.co/docs/transformers/main/model_doc/autoformer)** (from Tsinghua University) released with the paper [Autoformer: Decomposition Transformers with Auto-Correlation for Long-Term Series Forecasting](https://arxiv.org/abs/2106.13008) by Haixu Wu, Jiehui Xu, Jianmin Wang, Mingsheng Long.
 1. **[BART](https://huggingface.co/docs/transformers/model_doc/bart)** (from Facebook) released with the paper [BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension](https://arxiv.org/abs/1910.13461) by Mike Lewis, Yinhan Liu, Naman Goyal, Marjan Ghazvininejad, Abdelrahman Mohamed, Omer Levy, Ves Stoyanov and Luke Zettlemoyer.
 1. **[BARThez](https://huggingface.co/docs/transformers/model_doc/barthez)** (from École polytechnique) released with the paper [BARThez: a Skilled Pretrained French Sequence-to-Sequence Model](https://arxiv.org/abs/2010.12321) by Moussa Kamal Eddine, Antoine J.-P. Tixier, Michalis Vazirgiannis.
 1. **[BARTpho](https://huggingface.co/docs/transformers/model_doc/bartpho)** (from VinAI Research) released with the paper [BARTpho: Pre-trained Sequence-to-Sequence Models for Vietnamese](https://arxiv.org/abs/2109.09701) by Nguyen Luong Tran, Duong Minh Le and Dat Quoc Nguyen.

diff --git a/README_es.md b/README_es.md
@@ -267,6 +267,7 @@ Número actual de puntos de control: ![](https://img.shields.io/endpoint?url=htt
 1. **[ALIGN](https://huggingface.co/docs/transformers/model_doc/align)** (from Google Research) released with the paper [Scaling Up Visual and Vision-Language Representation Learning With Noisy Text Supervision](https://arxiv.org/abs/2102.05918) by Chao Jia, Yinfei Yang, Ye Xia, Yi-Ting Chen, Zarana Parekh, Hieu Pham, Quoc V. Le, Yunhsuan Sung, Zhen Li, Tom Duerig.
 1. **[AltCLIP](https://huggingface.co/docs/transformers/model_doc/altclip)** (from BAAI) released with the paper [AltCLIP: Altering the Language Encoder in CLIP for Extended Language Capabilities](https://arxiv.org/abs/2211.06679) by Chen, Zhongzhi and Liu, Guang and Zhang, Bo-Wen and Ye, Fulong and Yang, Qinghong and Wu, Ledell.
 1. **[Audio Spectrogram Transformer](https://huggingface.co/docs/transformers/model_doc/audio-spectrogram-transformer)** (from MIT) released with the paper [AST: Audio Spectrogram Transformer](https://arxiv.org/abs/2104.01778) by Yuan Gong, Yu-An Chung, James Glass.
+1. **[Autoformer](https://huggingface.co/docs/transformers/main/model_doc/autoformer)** (from Tsinghua University) released with the paper [Autoformer: Decomposition Transformers with Auto-Correlation for Long-Term Series Forecasting](https://arxiv.org/abs/2106.13008) by Haixu Wu, Jiehui Xu, Jianmin Wang, Mingsheng Long.
 1. **[BART](https://huggingface.co/docs/transformers/model_doc/bart)** (from Facebook) released with the paper [BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension](https://arxiv.org/abs/1910.13461) by Mike Lewis, Yinhan Liu, Naman Goyal, Marjan Ghazvininejad, Abdelrahman Mohamed, Omer Levy, Ves Stoyanov and Luke Zettlemoyer.
 1. **[BARThez](https://huggingface.co/docs/transformers/model_doc/barthez)** (from École polytechnique) released with the paper [BARThez: a Skilled Pretrained French Sequence-to-Sequence Model](https://arxiv.org/abs/2010.12321) by Moussa Kamal Eddine, Antoine J.-P. Tixier, Michalis Vazirgiannis.
 1. **[BARTpho](https://huggingface.co/docs/transformers/model_doc/bartpho)** (from VinAI Research) released with the paper [BARTpho: Pre-trained Sequence-to-Sequence Models for Vietnamese](https://arxiv.org/abs/2109.09701) by Nguyen Luong Tran, Duong Minh Le and Dat Quoc Nguyen.

diff --git a/README_hd.md b/README_hd.md
@@ -239,6 +239,7 @@ conda install -c huggingface transformers
 1. **[ALIGN](https://huggingface.co/docs/transformers/model_doc/align)** (Google Research से) Chao Jia, Yinfei Yang, Ye Xia, Yi-Ting Chen, Zarana Parekh, Hieu Pham, Quoc V. Le, Yunhsuan Sung, Zhen Li, Tom Duerig. द्वाराअनुसंधान पत्र [Scaling Up Visual and Vision-Language Representation Learning With Noisy Text Supervision](https://arxiv.org/abs/2102.05918) के साथ जारी किया गया
 1. **[AltCLIP](https://huggingface.co/docs/transformers/model_doc/altclip)** (from BAAI) released with the paper [AltCLIP: Altering the Language Encoder in CLIP for Extended Language Capabilities](https://arxiv.org/abs/2211.06679) by Chen, Zhongzhi and Liu, Guang and Zhang, Bo-Wen and Ye, Fulong and Yang, Qinghong and Wu, Ledell.
 1. **[Audio Spectrogram Transformer](https://huggingface.co/docs/transformers/model_doc/audio-spectrogram-transformer)** (from MIT) released with the paper [AST: Audio Spectrogram Transformer](https://arxiv.org/abs/2104.01778) by Yuan Gong, Yu-An Chung, James Glass.
+1. **[Autoformer](https://huggingface.co/docs/transformers/main/model_doc/autoformer)** (from Tsinghua University) released with the paper [Autoformer: Decomposition Transformers with Auto-Correlation for Long-Term Series Forecasting](https://arxiv.org/abs/2106.13008) by Haixu Wu, Jiehui Xu, Jianmin Wang, Mingsheng Long.
 1. **[BART](https://huggingface.co/docs/transformers/model_doc/bart)** (फेसबुक) साथ थीसिस [बार्ट: प्राकृतिक भाषा निर्माण, अनुवाद के लिए अनुक्रम-से-अनुक्रम पूर्व प्रशिक्षण , और समझ] (https://arxiv.org/pdf/1910.13461.pdf) पर निर्भर माइक लुईस, यिनहान लियू, नमन गोयल, मार्जन ग़ज़विनिनेजाद, अब्देलरहमान मोहम्मद, ओमर लेवी, वेस स्टोयानोव और ल्यूक ज़ेटलमॉयर
 1. **[BARThez](https://huggingface.co/docs/transformers/model_doc/barthez)** (से École polytechnique) साथ थीसिस [BARThez: a Skilled Pretrained French Sequence-to-Sequence Model](https://arxiv.org/abs/2010.12321) पर निर्भर Moussa Kamal Eddine, Antoine J.-P. Tixier, Michalis Vazirgiannis रिहाई।
 1. **[BARTpho](https://huggingface.co/docs/transformers/model_doc/bartpho)** (VinAI Research से) साथ में पेपर [BARTpho: Pre-trained Sequence-to-Sequence Models for Vietnamese](https://arxiv.org/abs/2109.09701)गुयेन लुओंग ट्रान, डुओंग मिन्ह ले और डाट क्वोक गुयेन द्वारा पोस्ट किया गया।

diff --git a/README_ja.md b/README_ja.md
@@ -301,6 +301,7 @@ Flax、PyTorch、TensorFlowをcondaでインストールする方法は、それ
 1. **[ALIGN](https://huggingface.co/docs/transformers/model_doc/align)** (Google Research から) Chao Jia, Yinfei Yang, Ye Xia, Yi-Ting Chen, Zarana Parekh, Hieu Pham, Quoc V. Le, Yunhsuan Sung, Zhen Li, Tom Duerig. から公開された研究論文 [Scaling Up Visual and Vision-Language Representation Learning With Noisy Text Supervision](https://arxiv.org/abs/2102.05918)
 1. **[AltCLIP](https://huggingface.co/docs/transformers/model_doc/altclip)** (BAAI から) Chen, Zhongzhi and Liu, Guang and Zhang, Bo-Wen and Ye, Fulong and Yang, Qinghong and Wu, Ledell から公開された研究論文: [AltCLIP: Altering the Language Encoder in CLIP for Extended Language Capabilities](https://arxiv.org/abs/2211.06679)
 1. **[Audio Spectrogram Transformer](https://huggingface.co/docs/transformers/model_doc/audio-spectrogram-transformer)** (MIT から) Yuan Gong, Yu-An Chung, James Glass から公開された研究論文: [AST: Audio Spectrogram Transformer](https://arxiv.org/abs/2104.01778)
+1. **[Autoformer](https://huggingface.co/docs/transformers/main/model_doc/autoformer)** (from Tsinghua University) released with the paper [Autoformer: Decomposition Transformers with Auto-Correlation for Long-Term Series Forecasting](https://arxiv.org/abs/2106.13008) by Haixu Wu, Jiehui Xu, Jianmin Wang, Mingsheng Long.
 1. **[BART](https://huggingface.co/docs/transformers/model_doc/bart)** (Facebook から) Mike Lewis, Yinhan Liu, Naman Goyal, Marjan Ghazvininejad, Abdelrahman Mohamed, Omer Levy, Ves Stoyanov and Luke Zettlemoyer から公開された研究論文: [BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension](https://arxiv.org/abs/1910.13461)
 1. **[BARThez](https://huggingface.co/docs/transformers/model_doc/barthez)** (École polytechnique から) Moussa Kamal Eddine, Antoine J.-P. Tixier, Michalis Vazirgiannis から公開された研究論文: [BARThez: a Skilled Pretrained French Sequence-to-Sequence Model](https://arxiv.org/abs/2010.12321)
 1. **[BARTpho](https://huggingface.co/docs/transformers/model_doc/bartpho)** (VinAI Research から) Nguyen Luong Tran, Duong Minh Le and Dat Quoc Nguyen から公開された研究論文: [BARTpho: Pre-trained Sequence-to-Sequence Models for Vietnamese](https://arxiv.org/abs/2109.09701)