From 2d0e3842a09d68d40eb81d744940a0d93499ba55 Mon Sep 17 00:00:00 2001 From: Mishig Date: Tue, 30 May 2023 18:05:35 +0200 Subject: [PATCH] merge main (#23866) MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit * Debug example code for MegaForCausalLM (#23382) * Debug example code for MegaForCausalLM set ignore_mismatched_sizes=True in model loading code * Fix up * Remove erroneous `img` closing tag (#23646) See https://github.com/huggingface/transformers/pull/23625 * Fix tensor device while attention_mask is not None (#23538) * Fix tensor device while attention_mask is not None * Fix tensor device while attention_mask is not None * Fix accelerate logger bug (#23650) * fix logger bug * Update tests/mixed_int8/test_mixed_int8.py Co-authored-by: Zachary Mueller * import `PartialState` --------- Co-authored-by: Zachary Mueller * Muellerzr fix deepspeed (#23657) * Fix deepspeed recursion * Better fix * Bugfix: LLaMA layer norm incorrectly changes input type and consumers lots of memory (#23535) * Fixed bug where LLaMA layer norm would change input type. * make fix-copies --------- Co-authored-by: younesbelkada * Fix wav2vec2 is_batched check to include 2-D numpy arrays (#23223) * Fix wav2vec2 is_batched check to include 2-D numpy arrays * address comment * Add tests * oops * oops * Switch to np array Co-authored-by: Sanchit Gandhi <93869735+sanchit-gandhi@users.noreply.github.com> * Switch to np array * condition merge * Specify mono channel only in comment * oops, add other comment too * make style * Switch list check from falsiness to empty --------- Co-authored-by: Sanchit Gandhi <93869735+sanchit-gandhi@users.noreply.github.com> * changing the requirements to a cpu torch version that works (#23483) * Fix SAM tests and use smaller checkpoints (#23656) * Fix SAM tests and use smaller checkpoints * Override test_model_from_pretrained to use sam-vit-base as well * make fixup * Update all no_trainer with skip_first_batches (#23664) * Update workflow files (#23658) * fix * fix --------- Co-authored-by: ydshieh * [image-to-text pipeline] Add conditional text support + GIT (#23362) * First draft * Remove print statements * Add conditional generation * Add more tests * Remove scripts * Remove BLIP specific linkes * Add support for pix2struct * Add fast test * Address comment * Fix style * small fix to remove unused eos in processor when it's not used. (#23408) * Bump requests from 2.27.1 to 2.31.0 in /examples/research_projects/decision_transformer (#23673) Bump requests in /examples/research_projects/decision_transformer Bumps [requests](https://github.com/psf/requests) from 2.27.1 to 2.31.0. - [Release notes](https://github.com/psf/requests/releases) - [Changelog](https://github.com/psf/requests/blob/main/HISTORY.md) - [Commits](https://github.com/psf/requests/compare/v2.27.1...v2.31.0) --- updated-dependencies: - dependency-name: requests dependency-type: direct:production ... Signed-off-by: dependabot[bot] Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * Bump requests from 2.22.0 to 2.31.0 in /examples/research_projects/visual_bert (#23670) Bump requests in /examples/research_projects/visual_bert Bumps [requests](https://github.com/psf/requests) from 2.22.0 to 2.31.0. - [Release notes](https://github.com/psf/requests/releases) - [Changelog](https://github.com/psf/requests/blob/main/HISTORY.md) - [Commits](https://github.com/psf/requests/compare/v2.22.0...v2.31.0) --- updated-dependencies: - dependency-name: requests dependency-type: direct:production ... Signed-off-by: dependabot[bot] Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * Bump requests from 2.22.0 to 2.31.0 in /examples/research_projects/lxmert (#23668) Bump requests in /examples/research_projects/lxmert Bumps [requests](https://github.com/psf/requests) from 2.22.0 to 2.31.0. - [Release notes](https://github.com/psf/requests/releases) - [Changelog](https://github.com/psf/requests/blob/main/HISTORY.md) - [Commits](https://github.com/psf/requests/compare/v2.22.0...v2.31.0) --- updated-dependencies: - dependency-name: requests dependency-type: direct:production ... Signed-off-by: dependabot[bot] Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * Add PerSAM [bis] (#23659) * Add PerSAM args * Make attn_sim optional * Rename to attention_similarity * Add docstrigns * Improve docstrings * Fix typo in a parameter name for open llama model (#23637) * Update modeling_open_llama.py Fix typo in `use_memorry_efficient_attention` parameter name * Update configuration_open_llama.py Fix typo in `use_memorry_efficient_attention` parameter name * Update configuration_open_llama.py Take care of backwards compatibility ensuring that the previous parameter name is taken into account if used * Update configuration_open_llama.py format to adjust the line length * Update configuration_open_llama.py proper code formatting using `make fixup` * Update configuration_open_llama.py pop the argument not to let it be set later down the line * Fix PyTorch SAM tests (#23682) fix Co-authored-by: ydshieh * Making `safetensors` a core dependency. (#23254) * Making `safetensors` a core dependency. To be merged later, I'm creating the PR so we can try it out. * Update setup.py * Remove duplicates. * Even more redundant. * 🌐 [i18n-KO] Translated `tasks/monocular_depth_estimation.mdx` to Korean (#23621) docs: ko: `tasks/monocular_depth_estimation` Co-authored-by: Hyeonseo Yun <0525yhs@gmail.com> Co-authored-by: Sohyun Sim <96299403+sim-so@users.noreply.github.com> Co-authored-by: Gabriel Yang Co-authored-by: Wonhyeong Seo Co-authored-by: Jungnerd <46880056+jungnerd@users.noreply.github.com> * Fix a `BridgeTower` test (#23694) fix Co-authored-by: ydshieh * [`SAM`] Fixes pipeline and adds a dummy pipeline test (#23684) * add a dummy pipeline test * change test name * TF version compatibility fixes (#23663) * New TF version compatibility fixes * Remove dummy print statement, move expand_1d * Make a proper framework inference function * Make a proper framework inference function * ValueError -> TypeError * [`Blip`] Fix blip doctest (#23698) fix blip doctest * is_batched fix for remaining 2-D numpy arrays (#23309) * Fix is_batched code to allow 2-D numpy arrays for audio * Tests * Fix typo * Incorporate comments from PR #23223 * Skip `TFCvtModelTest::test_keras_fit_mixed_precision` for now (#23699) fix Co-authored-by: ydshieh * fix: load_best_model_at_end error when load_in_8bit is True (#23443) Ref: https://github.com/huggingface/peft/issues/394 Loading a quantized checkpoint into non-quantized Linear8bitLt is not supported. call module.cuda() before module.load_state_dict() * Fix some docs what layerdrop does (#23691) * Fix some docs what layerdrop does * Update src/transformers/models/data2vec/configuration_data2vec_audio.py Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * Fix more docs --------- Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * add GPTJ/bloom/llama/opt into model list and enhance the jit support (#23291) Signed-off-by: Wang, Yi A * 4-bit QLoRA via bitsandbytes (4-bit base model + LoRA) (#23479) * Added lion and paged optimizers and made original tests pass. * Added tests for paged and lion optimizers. * Added and fixed optimizer tests. * Style and quality checks. * Initial draft. Some tests fail. * Fixed dtype bug. * Fixed bug caused by torch_dtype='auto'. * All test green for 8-bit and 4-bit layers. * Added fix for fp32 layer norms and bf16 compute in LLaMA. * Initial draft. Some tests fail. * Fixed dtype bug. * Fixed bug caused by torch_dtype='auto'. * All test green for 8-bit and 4-bit layers. * Added lion and paged optimizers and made original tests pass. * Added tests for paged and lion optimizers. * Added and fixed optimizer tests. * Style and quality checks. * Fixing issues for PR #23479. * Added fix for fp32 layer norms and bf16 compute in LLaMA. * Reverted variable name change. * Initial draft. Some tests fail. * Fixed dtype bug. * Fixed bug caused by torch_dtype='auto'. * All test green for 8-bit and 4-bit layers. * Added lion and paged optimizers and made original tests pass. * Added tests for paged and lion optimizers. * Added and fixed optimizer tests. * Style and quality checks. * Added missing tests. * Fixup changes. * Added fixup changes. * Missed some variables to rename. * revert trainer tests * revert test trainer * another revert * fix tests and safety checkers * protect import * simplify a bit * Update src/transformers/trainer.py * few fixes * add warning * replace with `load_in_kbit = load_in_4bit or load_in_8bit` * fix test * fix tests * this time fix tests * safety checker * add docs * revert torch_dtype * Apply suggestions from code review Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * multiple fixes * update docs * version checks and multiple fixes * replace `is_loaded_in_kbit` * replace `load_in_kbit` * change methods names * better checks * oops * oops * address final comments --------- Co-authored-by: younesbelkada Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com> Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * Paged Optimizer + Lion Optimizer for Trainer (#23217) * Added lion and paged optimizers and made original tests pass. * Added tests for paged and lion optimizers. * Added and fixed optimizer tests. * Style and quality checks. --------- Co-authored-by: younesbelkada * Export to ONNX doc refocused on using optimum, added tflite (#23434) * doc refocused on using optimum, tflite * minor updates to fix checks * Apply suggestions from code review Co-authored-by: regisss <15324346+regisss@users.noreply.github.com> * TFLite to separate page, added links * Removed the onnx list builder * make style * Update docs/source/en/serialization.mdx Co-authored-by: regisss <15324346+regisss@users.noreply.github.com> --------- Co-authored-by: regisss <15324346+regisss@users.noreply.github.com> * fix: use bool instead of uint8/byte in Deberta/DebertaV2/SEW-D to make it compatible with TensorRT (#23683) * Use bool instead of uint8/byte in DebertaV2 to make it compatible with TensorRT TensorRT cannot accept onnx graph with uint8/byte intermediate tensors. This PR uses bool tensors instead of unit8/byte tensors to make the exported onnx file can work with TensorRT. * fix: use bool instead of uint8/byte in Deberta and SEW-D --------- Co-authored-by: Yuxian Qiu * fix gptj could not jit.trace in GPU (#23317) Signed-off-by: Wang, Yi A * Better TF docstring types (#23477) * Rework TF type hints to use | None instead of Optional[] for tf.Tensor * Rework TF type hints to use | None instead of Optional[] for tf.Tensor * Don't forget the imports * Add the imports to tests too * make fixup * Refactor tests that depended on get_type_hints * Better test refactor * Fix an old hidden bug in the test_keras_fit input creation code * Fix for the Deit tests * Minor awesome-transformers.md fixes (#23453) Minor docs fixes * TF SAM memory reduction (#23732) * Extremely small change to TF SAM dummies to reduce memory usage on build * remove debug breakpoint * Debug print statement to track array sizes * More debug shape printing * More debug shape printing * Now remove the debug shape printing * make fixup * make fixup * fix: delete duplicate sentences in `document_question_answering.mdx` (#23735) fix: delete duplicate sentence * fix: Whisper generate, move text_prompt_ids trim up for max_new_tokens calculation (#23724) move text_prompt_ids trimming to top * Overhaul TF serving signatures + dummy inputs (#23234) * Let's try autodetecting serving sigs * Don't clobber existing sigs * Change shapes for multiplechoice models * Make default dummy inputs smarter too * Fix missing f-string * Let's YOLO a serving output too * Read __class__.__name__ properly * Don't just pass naked lists in there and expect it to be okay * Code cleanup * Update default serving sig * Clearer error messages * Further updates to the default serving output * make fixup * Update the serving output a bit more * Cleanups and renames, raise errors appropriately when we can't infer inputs * More renames * we're building in a functional context again, yolo * import DUMMY_INPUTS from the right place * import DUMMY_INPUTS from the right place * Support cross-attention in the dummies * Support cross-attention in the dummies * Complete removal of dummy/serving overrides in BERT * Complete removal of dummy/serving overrides in RoBERTa * Obliterate lots and lots of serving sig and dummy overrides * merge type hint changes * Fix for token_type_ids with vocab_size 1 * Add missing property decorator * Fix T5 and hopefully some models that take conv inputs * More signature pruning * Fix T5's signature * Fix Wav2Vec2 signature * Fix LongformerForMultipleChoice input signature * Fix BLIP and LED * Better default serving output error handling * Fix BART dummies * Fix dummies for cross-attention, esp encoder-decoder models * Fix visionencoderdecoder signature * Fix BLIP serving output * Small tweak to BART dummies * Cleanup the ugly parameter inspection line that I used in a few places * committed a breakpoint again * Move the text_dims check * Remove blip_text serving_output * Add decoder_input_ids to the default input sig * Remove all the manual overrides for encoder-decoder model signatures * Tweak longformer/led input sigs * Tweak default serving output * output.keys() -> output * make fixup * [Whisper] Reduce batch size in tests (#23736) * Fix the regex in `get_imports` to support multiline try blocks and excepts with specific exception types (#23725) * fix and test get_imports for multiline try blocks, and excepts with specific errors * fixup * add some more tests * add license * Fix sagemaker DP/MP (#23681) * Check for use_sagemaker_dp * Add a check for is_sagemaker_mp when setting _n_gpu again. Should be last broken thing * Try explicit check? * Quality * Enable prompts on the Hub (#23662) * Enable prompts on the Hub * Update src/transformers/tools/prompts.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Address review comments --------- Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Remove the last few TF serving sigs (#23738) Remove some more serving methods that (I think?) turned up while this PR was open * Fix `pip install --upgrade accelerate` command in modeling_utils.py (#23747) Fix command in modeling_utils.py * Add LlamaIndex to awesome-transformers.md (#23484) * Fix psuh_to_hub in Trainer when nothing needs pushing (#23751) * Revamp test selection for the example tests (#23737) * Revamp test selection for the example tests * Rename old XLA test and fake modif in run_glue * Fixes * Fake Trainer modif * Remove fake modifs * [LongFormer] code nits, removed unused parameters (#23749) * remove unused parameters * remove unused parameters in config * Fix is_ninja_available() (#23752) * Fix is_ninja_available() search ninja using subprocess instead of importlib. * Fix style * Fix doc * Fix style * Bump tornado from 6.0.4 to 6.3.2 in /examples/research_projects/lxmert (#23766) Bumps [tornado](https://github.com/tornadoweb/tornado) from 6.0.4 to 6.3.2. - [Changelog](https://github.com/tornadoweb/tornado/blob/master/docs/releases.rst) - [Commits](https://github.com/tornadoweb/tornado/compare/v6.0.4...v6.3.2) --- updated-dependencies: - dependency-name: tornado dependency-type: direct:production ... Signed-off-by: dependabot[bot] Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * Bump tornado from 6.0.4 to 6.3.2 in /examples/research_projects/visual_bert (#23767) Bump tornado in /examples/research_projects/visual_bert Bumps [tornado](https://github.com/tornadoweb/tornado) from 6.0.4 to 6.3.2. - [Changelog](https://github.com/tornadoweb/tornado/blob/master/docs/releases.rst) - [Commits](https://github.com/tornadoweb/tornado/compare/v6.0.4...v6.3.2) --- updated-dependencies: - dependency-name: tornado dependency-type: direct:production ... Signed-off-by: dependabot[bot] Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * [`Nllb-Moe`] Fix nllb moe accelerate issue (#23758) fix nllb moe accelerate issue * [OPT] Doc nit, using fast is fine (#23789) small doc nit * Fix RWKV backward on GPU (#23774) * Update trainer.mdx class_weights example (#23787) class_weights tensor should follow model's device * no_cuda does not take effect in non distributed environment (#23795) Signed-off-by: Wang, Yi * Fix no such file or directory error (#23783) * Fix no such file or directory error * Address comment * Fix formatting issue * Log the right train_batch_size if using auto_find_batch_size and also log the adjusted value seperately. (#23800) * Log right bs * Log * Diff message * Enable code-specific revision for code on the Hub (#23799) * Enable code-specific revision for code on the Hub * invalidate old revision * [Time-Series] Autoformer model (#21891) * ran `transformers-cli add-new-model-like` * added `AutoformerLayernorm` and `AutoformerSeriesDecomposition` * added `decomposition_layer` in `init` and `moving_avg` to config * added `AutoformerAutoCorrelation` to encoder & decoder * removed caninical self attention `AutoformerAttention` * added arguments in config and model tester. Init works! 😁 * WIP autoformer attention with autocorrlation * fixed `attn_weights` size * wip time_delay_agg_training * fixing sizes and debug time_delay_agg_training * aggregation in training works! 😁 * `top_k_delays` -> `top_k_delays_index` and added `contiguous()` * wip time_delay_agg_inference * finish time_delay_agg_inference 😎 * added resize to autocorrelation * bug fix: added the length of the output signal to `irfft` * `attention_mask = None` in the decoder * fixed test: changed attention expected size, `test_attention_outputs` works! * removed unnecessary code * apply AutoformerLayernorm in final norm in enc & dec * added series decomposition to the encoder * added series decomp to decoder, with inputs * added trend todos * added autoformer to README * added to index * added autoformer.mdx * remove scaling and init attention_mask in the decoder * make style * fix copies * make fix-copies * inital fix-copies * fix from https://github.com/huggingface/transformers/pull/22076 * make style * fix class names * added trend * added d_model and projection layers * added `trend_projection` source, and decomp layer init * added trend & seasonal init for decoder input * AutoformerModel cannot be copied as it has the decomp layer too * encoder can be copied from time series transformer * fixed generation and made distrb. out more robust * use context window to calculate decomposition * use the context_window for decomposition * use output_params helper * clean up AutoformerAttention * subsequences_length off by 1 * make fix copies * fix test * added init for nn.Conv1d * fix IGNORE_NON_TESTED * added model_doc * fix ruff * ignore tests * remove dup * fix SPECIAL_CASES_TO_ALLOW * do not copy due to conv1d weight init * remove unused imports * added short summary * added label_length and made the model non-autoregressive * added params docs * better doc for `factor` * fix tests * renamed `moving_avg` to `moving_average` * renamed `factor` to `autocorrelation_factor` * make style * Update src/transformers/models/autoformer/configuration_autoformer.py Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com> * Update src/transformers/models/autoformer/configuration_autoformer.py Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com> * fix configurations * fix integration tests * Update src/transformers/models/autoformer/configuration_autoformer.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * fixing `lags_sequence` doc * Revert "fixing `lags_sequence` doc" This reverts commit 21e34911e36a6f8f45f25cbf43584a49e5316c55. * Update src/transformers/models/autoformer/modeling_autoformer.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Update src/transformers/models/autoformer/modeling_autoformer.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Update src/transformers/models/autoformer/modeling_autoformer.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Apply suggestions from code review Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Update src/transformers/models/autoformer/configuration_autoformer.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * model layers now take the config * added `layer_norm_eps` to the config * Update src/transformers/models/autoformer/modeling_autoformer.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * added `config.layer_norm_eps` to AutoformerLayernorm * added `config.layer_norm_eps` to all layernorm layers * Update src/transformers/models/autoformer/configuration_autoformer.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Update src/transformers/models/autoformer/configuration_autoformer.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Update src/transformers/models/autoformer/configuration_autoformer.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Update src/transformers/models/autoformer/configuration_autoformer.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * fix variable names * added inital pretrained model * added use_cache docstring * doc strings for trend and use_cache * fix order of args * imports on one line * fixed get_lagged_subsequences docs * add docstring for create_network_inputs * get rid of layer_norm_eps config * add back layernorm * update fixture location * fix signature * use AutoformerModelOutput dataclass * fix pretrain config * no need as default exists * subclass ModelOutput * remove layer_norm_eps config * fix test_model_outputs_equivalence test * test hidden_states_output * make fix-copies * Update src/transformers/models/autoformer/configuration_autoformer.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * removed unused attr * Update tests/models/autoformer/test_modeling_autoformer.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Update src/transformers/models/autoformer/modeling_autoformer.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Update src/transformers/models/autoformer/modeling_autoformer.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Update src/transformers/models/autoformer/modeling_autoformer.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Update src/transformers/models/autoformer/modeling_autoformer.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Update src/transformers/models/autoformer/modeling_autoformer.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Update src/transformers/models/autoformer/modeling_autoformer.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * use AutoFormerDecoderOutput * fix formatting * fix formatting --------- Co-authored-by: Kashif Rasul Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com> Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * add type hint in pipeline model argument (#23740) * add type hint in pipeline model argument * add pretrainedmodel and tfpretainedmodel type hint * make type hints string * TF SAM shape flexibility fixes (#23842) SAM shape flexibility fixes for compilation * fix Whisper tests on GPU (#23753) * move input features to GPU * skip these tests because undefined behavior * unskip tests * 🌐 [i18n-KO] Translated `fast_tokenizers.mdx` to Korean (#22956) * docs: ko: fast_tokenizer.mdx content - translated Co-Authored-By: Gabriel Yang Co-Authored-By: Nayeon Han Co-Authored-By: Hyeonseo Yun <0525_hhgus@naver.com> Co-Authored-By: Sohyun Sim <96299403+sim-so@users.noreply.github.com> Co-Authored-By: Jungnerd <46880056+jungnerd@users.noreply.github.com> Co-Authored-By: Wonhyeong Seo * Update docs/source/ko/fast_tokenizers.mdx Co-authored-by: Sohyun Sim <96299403+sim-so@users.noreply.github.com> * Update docs/source/ko/fast_tokenizers.mdx Co-authored-by: Sohyun Sim <96299403+sim-so@users.noreply.github.com> * Update docs/source/ko/fast_tokenizers.mdx Co-authored-by: Sohyun Sim <96299403+sim-so@users.noreply.github.com> * Update docs/source/ko/fast_tokenizers.mdx Co-authored-by: Sohyun Sim <96299403+sim-so@users.noreply.github.com> * Update docs/source/ko/fast_tokenizers.mdx Co-authored-by: Sohyun Sim <96299403+sim-so@users.noreply.github.com> * Update docs/source/ko/fast_tokenizers.mdx Co-authored-by: Sohyun Sim <96299403+sim-so@users.noreply.github.com> * Update docs/source/ko/fast_tokenizers.mdx Co-authored-by: Hyeonseo Yun <0525yhs@gmail.com> * Update fast_tokenizers.mdx * Update fast_tokenizers.mdx * Update fast_tokenizers.mdx * Update fast_tokenizers.mdx * Update _toctree.yml --------- Co-authored-by: Gabriel Yang Co-authored-by: Nayeon Han Co-authored-by: Hyeonseo Yun <0525_hhgus@naver.com> Co-authored-by: Sohyun Sim <96299403+sim-so@users.noreply.github.com> Co-authored-by: Jungnerd <46880056+jungnerd@users.noreply.github.com> Co-authored-by: Wonhyeong Seo Co-authored-by: Hyeonseo Yun <0525yhs@gmail.com> * [i18n-KO] Translated video_classification.mdx to Korean (#23026) * task/video_classification translated Co-Authored-By: Hyeonseo Yun <0525_hhgus@naver.com> Co-Authored-By: Gabriel Yang Co-Authored-By: Sohyun Sim <96299403+sim-so@users.noreply.github.com> Co-Authored-By: Nayeon Han Co-Authored-By: Wonhyeong Seo Co-Authored-By: Jungnerd <46880056+jungnerd@users.noreply.github.com> * Update docs/source/ko/tasks/video_classification.mdx Co-authored-by: Jungnerd <46880056+jungnerd@users.noreply.github.com> * Update docs/source/ko/tasks/video_classification.mdx Co-authored-by: Jungnerd <46880056+jungnerd@users.noreply.github.com> * Update docs/source/ko/tasks/video_classification.mdx Co-authored-by: Jungnerd <46880056+jungnerd@users.noreply.github.com> * Update docs/source/ko/tasks/video_classification.mdx Co-authored-by: Jungnerd <46880056+jungnerd@users.noreply.github.com> * Update docs/source/ko/tasks/video_classification.mdx Co-authored-by: Jungnerd <46880056+jungnerd@users.noreply.github.com> * Update docs/source/ko/tasks/video_classification.mdx Co-authored-by: Jungnerd <46880056+jungnerd@users.noreply.github.com> * Update docs/source/ko/tasks/video_classification.mdx Co-authored-by: Jungnerd <46880056+jungnerd@users.noreply.github.com> * Update docs/source/ko/tasks/video_classification.mdx Co-authored-by: Jungnerd <46880056+jungnerd@users.noreply.github.com> * Update docs/source/ko/tasks/video_classification.mdx Co-authored-by: Sohyun Sim <96299403+sim-so@users.noreply.github.com> * Update docs/source/ko/tasks/video_classification.mdx Co-authored-by: Sohyun Sim <96299403+sim-so@users.noreply.github.com> * Apply suggestions from code review Co-authored-by: Sohyun Sim <96299403+sim-so@users.noreply.github.com> Co-authored-by: Hyeonseo Yun <0525yhs@gmail.com> Co-authored-by: Jungnerd <46880056+jungnerd@users.noreply.github.com> Co-authored-by: Gabriel Yang * Update video_classification.mdx * Update _toctree.yml * Update _toctree.yml * Update _toctree.yml * Update _toctree.yml --------- Co-authored-by: Hyeonseo Yun <0525_hhgus@naver.com> Co-authored-by: Gabriel Yang Co-authored-by: Sohyun Sim <96299403+sim-so@users.noreply.github.com> Co-authored-by: Nayeon Han Co-authored-by: Wonhyeong Seo Co-authored-by: Jungnerd <46880056+jungnerd@users.noreply.github.com> Co-authored-by: Hyeonseo Yun <0525yhs@gmail.com> * 🌐 [i18n-KO] Translated `troubleshooting.mdx` to Korean (#23166) * docs: ko: troubleshooting.mdx * revised: fix _toctree.yml #23112 * feat: nmt draft `troubleshooting.mdx` * fix: manual edits `troubleshooting.mdx` * revised: resolve suggestions troubleshooting.mdx Co-authored-by: Sohyun Sim <96299403+sim-so@users.noreply.github.com> --------- Co-authored-by: Sohyun Sim <96299403+sim-so@users.noreply.github.com> * Adds a FlyteCallback (#23759) * initial flyte callback * lint * logs should still be saved to Flyte even if pandas isn't install (unlikely) * cr - flyte team * add docs for Flytecallback * fix doc string - cr sgugger * Apply suggestions from code review cr - sgugger fix doc strings Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> --------- Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * Update collating_graphormer.py (#23862) * [LlamaTokenizerFast] nit update `post_processor` on the fly (#23855) * Update the processor when changing add_eos and add_bos * fixup * update * add a test * fix failing tests * fixup * #23388 Issue: Update RoBERTa configuration (#23863) * [from_pretrained] imporve the error message when `_no_split_modules` is not defined (#23861) * Better warning * Update src/transformers/modeling_utils.py Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * format line --------- Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> --------- Signed-off-by: dependabot[bot] Signed-off-by: Wang, Yi A Signed-off-by: Wang, Yi Co-authored-by: Tyler <41713505+Tylersuard@users.noreply.github.com> Co-authored-by: Joshua Lochner Co-authored-by: zspo Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com> Co-authored-by: Zachary Mueller Co-authored-by: Tim Dettmers Co-authored-by: younesbelkada Co-authored-by: LWprogramming Co-authored-by: Sanchit Gandhi <93869735+sanchit-gandhi@users.noreply.github.com> Co-authored-by: sshahrokhi Co-authored-by: Matt Co-authored-by: Yih-Dar <2521628+ydshieh@users.noreply.github.com> Co-authored-by: ydshieh Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com> Co-authored-by: Nicolas Patry Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: Alex <116374290+aaalexlit@users.noreply.github.com> Co-authored-by: Nayeon Han Co-authored-by: Hyeonseo Yun <0525yhs@gmail.com> Co-authored-by: Sohyun Sim <96299403+sim-so@users.noreply.github.com> Co-authored-by: Gabriel Yang Co-authored-by: Wonhyeong Seo Co-authored-by: Jungnerd <46880056+jungnerd@users.noreply.github.com> Co-authored-by: 小桐桐 <32215330+dkqkxx@users.noreply.github.com> Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> Co-authored-by: Wang, Yi Co-authored-by: Maria Khalusova Co-authored-by: regisss <15324346+regisss@users.noreply.github.com> Co-authored-by: uchuhimo Co-authored-by: Yuxian Qiu Co-authored-by: pagarsky <36376725+pagarsky@users.noreply.github.com> Co-authored-by: Connor Henderson Co-authored-by: Daniel King <43149077+dakinggg@users.noreply.github.com> Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> Co-authored-by: Eric J. Wang Co-authored-by: Ravi Theja Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> Co-authored-by: 玩火 Co-authored-by: amitportnoy <113588658+amitportnoy@users.noreply.github.com> Co-authored-by: Ran Ran Co-authored-by: Eli Simhayev Co-authored-by: Kashif Rasul Co-authored-by: Samin Yasar Co-authored-by: Matthijs Hollemans Co-authored-by: Kihoon Son <75935546+KIHOON71@users.noreply.github.com> Co-authored-by: Hyeonseo Yun <0525_hhgus@naver.com> Co-authored-by: peridotml <106936600+peridotml@users.noreply.github.com> Co-authored-by: Clémentine Fourrier <22726840+clefourrier@users.noreply.github.com> Co-authored-by: Vijeth Moudgalya <33093576+vijethmoudgalya@users.noreply.github.com> --- .circleci/config.yml | 21 +- .circleci/create_circleci_config.py | 15 +- .github/workflows/self-push.yml | 16 + .github/workflows/self-scheduled.yml | 24 + README.md | 1 + README_es.md | 1 + README_hd.md | 1 + README_ja.md | 1 + README_ko.md | 1 + README_zh-hans.md | 1 + README_zh-hant.md | 1 + awesome-transformers.md | 18 +- docs/source/en/_toctree.yml | 4 + docs/source/en/custom_tools.mdx | 7 + docs/source/en/index.mdx | 2 + docs/source/en/main_classes/callback.mdx | 3 + docs/source/en/main_classes/quantization.mdx | 85 +- docs/source/en/main_classes/trainer.mdx | 2 +- docs/source/en/model_doc/autoformer.mdx | 42 + docs/source/en/model_doc/opt.mdx | 2 +- docs/source/en/perf_infer_gpu_one.mdx | 54 + docs/source/en/serialization.mdx | 556 +---- .../en/tasks/document_question_answering.mdx | 3 - docs/source/en/tflite.mdx | 58 + docs/source/ko/_toctree.yml | 18 +- docs/source/ko/fast_tokenizers.mdx | 67 + .../ko/tasks/monocular_depth_estimation.mdx | 145 ++ docs/source/ko/tasks/video_classification.mdx | 494 ++++ docs/source/ko/troubleshooting.mdx | 194 ++ docs/source/pt/index.mdx | 2 +- examples/flax/vision/requirements.txt | 4 +- .../run_image_classification_no_trainer.py | 18 +- .../image-pretraining/run_mim_no_trainer.py | 20 +- .../language-modeling/run_clm_no_trainer.py | 20 +- .../language-modeling/run_mlm_no_trainer.py | 20 +- .../multiple-choice/run_swag_no_trainer.py | 18 +- ...a_examples.py => old_test_xla_examples.py} | 0 .../run_qa_beam_search_no_trainer.py | 18 +- .../question-answering/run_qa_no_trainer.py | 18 +- .../run_semantic_segmentation_no_trainer.py | 20 +- .../run_summarization_no_trainer.py | 18 +- .../run_glue_no_trainer.py | 12 +- examples/pytorch/text-generation/README.md | 2 +- .../pytorch/text-generation/run_generation.py | 95 +- .../run_ner_no_trainer.py | 12 +- .../translation/run_translation_no_trainer.py | 19 +- .../pabee/modeling_pabee_albert.py | 2 +- .../decision_transformer/requirements.txt | 2 +- .../research_projects/lxmert/requirements.txt | 4 +- .../visual_bert/requirements.txt | 4 +- .../run_image_classification.py | 1 + setup.py | 2 +- src/transformers/__init__.py | 22 + src/transformers/convert_slow_tokenizer.py | 6 +- src/transformers/dynamic_module_utils.py | 15 +- .../feature_extraction_sequence_utils.py | 2 +- src/transformers/generation/logits_process.py | 2 +- src/transformers/integrations.py | 76 +- src/transformers/modeling_outputs.py | 4 +- src/transformers/modeling_tf_outputs.py | 190 +- src/transformers/modeling_tf_utils.py | 166 +- src/transformers/modeling_utils.py | 142 +- src/transformers/models/__init__.py | 1 + .../models/albert/modeling_tf_albert.py | 204 +- .../models/align/configuration_align.py | 2 +- ...xtraction_audio_spectrogram_transformer.py | 11 +- src/transformers/models/auto/auto_factory.py | 18 + .../models/auto/configuration_auto.py | 4 + .../models/auto/feature_extraction_auto.py | 1 + .../models/auto/image_processing_auto.py | 1 + src/transformers/models/auto/modeling_auto.py | 3 + .../models/auto/processing_auto.py | 1 + .../models/auto/tokenization_auto.py | 1 + .../models/autoformer/__init__.py | 63 + .../autoformer/configuration_autoformer.py | 245 ++ .../models/autoformer/modeling_autoformer.py | 2178 +++++++++++++++++ .../models/bart/modeling_tf_bart.py | 176 +- .../models/bert/modeling_tf_bert.py | 295 +-- .../blenderbot/modeling_tf_blenderbot.py | 101 +- .../modeling_tf_blenderbot_small.py | 103 +- src/transformers/models/blip/modeling_blip.py | 4 + .../models/blip/modeling_blip_text.py | 2 +- .../models/blip/modeling_tf_blip.py | 308 +-- .../models/blip/modeling_tf_blip_text.py | 83 +- .../models/blip_2/modeling_blip_2.py | 2 +- .../models/bloom/modeling_bloom.py | 2 +- .../models/camembert/modeling_tf_camembert.py | 266 +- .../models/clap/feature_extraction_clap.py | 11 +- .../models/clip/modeling_tf_clip.py | 159 +- .../models/convbert/modeling_tf_convbert.py | 142 +- .../models/convnext/modeling_tf_convnext.py | 61 +- .../models/ctrl/modeling_tf_ctrl.py | 78 +- .../models/cvt/modeling_tf_cvt.py | 53 +- .../data2vec/configuration_data2vec_audio.py | 3 + .../data2vec/modeling_tf_data2vec_vision.py | 90 +- .../models/deberta/modeling_deberta.py | 5 +- .../models/deberta/modeling_tf_deberta.py | 104 +- .../models/deberta_v2/modeling_deberta_v2.py | 7 +- .../deberta_v2/modeling_tf_deberta_v2.py | 104 +- .../configuration_deformable_detr.py | 2 +- .../models/deit/modeling_tf_deit.py | 109 +- .../models/deta/configuration_deta.py | 2 +- .../distilbert/modeling_tf_distilbert.py | 143 +- .../models/dpr/modeling_tf_dpr.py | 91 +- .../configuration_efficientnet.py | 2 +- .../models/electra/modeling_tf_electra.py | 243 +- .../modeling_tf_encoder_decoder.py | 62 +- .../models/esm/modeling_tf_esm.py | 174 +- .../models/flaubert/modeling_tf_flaubert.py | 187 +- .../models/funnel/modeling_tf_funnel.py | 110 +- .../models/gpt2/modeling_tf_gpt2.py | 196 +- src/transformers/models/gptj/modeling_gptj.py | 2 +- .../models/gptj/modeling_tf_gptj.py | 137 +- .../models/graphormer/collating_graphormer.py | 2 +- .../models/groupvit/modeling_tf_groupvit.py | 173 +- .../models/hubert/configuration_hubert.py | 3 + .../models/hubert/modeling_tf_hubert.py | 99 +- .../models/informer/configuration_informer.py | 3 - .../models/informer/modeling_informer.py | 8 +- .../models/layoutlm/modeling_tf_layoutlm.py | 164 +- .../layoutlmv3/modeling_tf_layoutlmv3.py | 203 +- .../models/led/modeling_tf_led.py | 115 +- .../models/llama/modeling_llama.py | 11 +- .../models/llama/tokenization_llama_fast.py | 44 + .../longformer/configuration_longformer.py | 8 - .../models/longformer/modeling_longformer.py | 2 - .../longformer/modeling_tf_longformer.py | 265 +- .../models/lxmert/modeling_lxmert.py | 14 +- .../models/lxmert/modeling_tf_lxmert.py | 163 +- .../models/marian/modeling_tf_marian.py | 151 +- .../mask2former/modeling_mask2former.py | 4 +- .../models/mbart/modeling_tf_mbart.py | 139 +- .../models/mctct/feature_extraction_mctct.py | 11 +- src/transformers/models/mega/modeling_mega.py | 4 +- .../mobilebert/modeling_tf_mobilebert.py | 217 +- .../models/mobilevit/modeling_tf_mobilevit.py | 63 +- .../models/mpnet/modeling_tf_mpnet.py | 141 +- .../models/mpnet/tokenization_mpnet.py | 2 +- .../models/mpnet/tokenization_mpnet_fast.py | 2 +- .../models/nllb_moe/modeling_nllb_moe.py | 2 +- .../open_llama/configuration_open_llama.py | 6 +- .../models/open_llama/modeling_open_llama.py | 17 +- .../models/openai/modeling_tf_openai.py | 133 +- .../models/opt/configuration_opt.py | 2 +- src/transformers/models/opt/modeling_opt.py | 4 +- .../models/opt/modeling_tf_opt.py | 76 +- .../models/pegasus/modeling_tf_pegasus.py | 141 +- .../pegasus_x/configuration_pegasus_x.py | 4 +- src/transformers/models/rag/modeling_rag.py | 2 +- .../models/rag/modeling_tf_rag.py | 121 +- src/transformers/models/rag/retrieval_rag.py | 4 +- .../models/realm/modeling_realm.py | 2 +- .../models/regnet/modeling_tf_regnet.py | 45 +- .../models/rembert/modeling_tf_rembert.py | 241 +- .../models/resnet/modeling_tf_resnet.py | 36 +- .../models/roberta/configuration_roberta.py | 4 +- .../models/roberta/modeling_tf_roberta.py | 266 +- .../configuration_roberta_prelayernorm.py | 4 +- .../modeling_tf_roberta_prelayernorm.py | 266 +- .../models/roformer/modeling_tf_roformer.py | 166 +- src/transformers/models/rwkv/modeling_rwkv.py | 9 +- .../models/sam/image_processing_sam.py | 2 +- src/transformers/models/sam/modeling_sam.py | 39 +- .../models/sam/modeling_tf_sam.py | 95 +- .../models/segformer/modeling_tf_segformer.py | 61 +- .../models/sew/configuration_sew.py | 3 + .../models/sew_d/modeling_sew_d.py | 7 +- .../feature_extraction_speech_to_text.py | 11 +- .../modeling_tf_speech_to_text.py | 110 +- .../speecht5/feature_extraction_speecht5.py | 11 +- .../models/swin/modeling_tf_swin.py | 123 +- src/transformers/models/t5/modeling_tf_t5.py | 120 +- .../models/tapas/modeling_tf_tapas.py | 152 +- .../configuration_time_series_transformer.py | 3 - .../modeling_time_series_transformer.py | 8 +- .../transfo_xl/modeling_tf_transfo_xl.py | 96 +- .../models/tvlt/feature_extraction_tvlt.py | 11 +- .../unispeech/configuration_unispeech.py | 3 + .../configuration_unispeech_sat.py | 3 + .../modeling_tf_vision_encoder_decoder.py | 54 +- .../modeling_tf_vision_text_dual_encoder.py | 12 +- .../models/vit/modeling_tf_vit.py | 67 +- .../models/vit_mae/modeling_tf_vit_mae.py | 87 +- .../models/wav2vec2/configuration_wav2vec2.py | 3 + .../wav2vec2/feature_extraction_wav2vec2.py | 11 +- .../models/wav2vec2/modeling_tf_wav2vec2.py | 137 +- .../models/wav2vec2/tokenization_wav2vec2.py | 11 +- .../configuration_wav2vec2_conformer.py | 3 + .../models/wavlm/configuration_wavlm.py | 3 + .../whisper/feature_extraction_whisper.py | 11 +- .../models/whisper/modeling_tf_whisper.py | 73 +- .../models/whisper/modeling_whisper.py | 7 +- .../models/xglm/modeling_tf_xglm.py | 141 +- src/transformers/models/xglm/modeling_xglm.py | 4 +- .../models/xlm/modeling_tf_xlm.py | 155 +- .../xlm_roberta/modeling_tf_xlm_roberta.py | 266 +- .../models/xlnet/modeling_tf_xlnet.py | 276 +-- src/transformers/optimization_tf.py | 4 +- src/transformers/pipelines/__init__.py | 2 +- src/transformers/pipelines/audio_utils.py | 2 +- src/transformers/pipelines/base.py | 9 +- src/transformers/pipelines/image_to_text.py | 49 +- src/transformers/tf_utils.py | 87 + src/transformers/time_series_utils.py | 4 +- src/transformers/tokenization_utils_base.py | 2 +- src/transformers/tools/agents.py | 39 +- src/transformers/tools/prompts.py | 180 +- src/transformers/trainer.py | 85 +- src/transformers/training_args.py | 24 +- src/transformers/training_args_tf.py | 2 +- src/transformers/utils/__init__.py | 1 + src/transformers/utils/bitsandbytes.py | 92 +- src/transformers/utils/dummy_pt_objects.py | 24 + src/transformers/utils/generic.py | 32 +- src/transformers/utils/import_utils.py | 13 +- src/transformers/utils/quantization_config.py | 92 +- ...on_{{cookiecutter.lowercase_modelname}}.py | 4 +- ...tf_{{cookiecutter.lowercase_modelname}}.py | 329 +-- tests/{mixed_int8 => bitsandbytes}/README.md | 0 .../{mixed_int8 => bitsandbytes}/__init__.py | 0 tests/bitsandbytes/test_4bit.py | 460 ++++ .../test_mixed_int8.py | 26 + tests/generation/test_tf_logits_process.py | 2 + tests/generation/test_tf_utils.py | 2 + .../models/albert/test_modeling_tf_albert.py | 2 + ...xtraction_audio_spectrogram_transformer.py | 8 + tests/models/auto/test_modeling_tf_auto.py | 2 + tests/models/auto/test_modeling_tf_pytorch.py | 2 + tests/models/autoformer/__init__.py | 0 .../autoformer/test_modeling_autoformer.py | 449 ++++ tests/models/bart/test_modeling_tf_bart.py | 2 + tests/models/bert/test_modeling_tf_bert.py | 2 + .../blenderbot/test_modeling_tf_blenderbot.py | 2 + .../test_modeling_tf_blenderbot_small.py | 2 + tests/models/blip/test_modeling_tf_blip.py | 2 + .../models/blip/test_modeling_tf_blip_text.py | 2 + tests/models/bort/test_modeling_tf_bort.py | 2 + .../bridgetower/test_modeling_bridgetower.py | 3 +- .../camembert/test_modeling_tf_camembert.py | 2 + .../clap/test_feature_extraction_clap.py | 8 + tests/models/clip/test_modeling_tf_clip.py | 2 + .../convbert/test_modeling_tf_convbert.py | 2 + .../convnext/test_modeling_tf_convnext.py | 2 + tests/models/ctrl/test_modeling_tf_ctrl.py | 2 + tests/models/cvt/test_modeling_tf_cvt.py | 3 + .../test_modeling_tf_data2vec_vision.py | 2 + .../deberta/test_modeling_tf_deberta.py | 2 + .../deberta_v2/test_modeling_tf_deberta_v2.py | 2 + tests/models/deit/test_modeling_tf_deit.py | 4 +- .../distilbert/test_modeling_tf_distilbert.py | 2 + tests/models/dpr/test_modeling_tf_dpr.py | 2 + .../electra/test_modeling_tf_electra.py | 2 + .../test_modeling_tf_encoder_decoder.py | 2 + tests/models/esm/test_modeling_tf_esm.py | 2 + .../flaubert/test_modeling_tf_flaubert.py | 2 + .../models/funnel/test_modeling_tf_funnel.py | 2 + tests/models/gpt2/test_modeling_tf_gpt2.py | 2 + tests/models/gptj/test_modeling_tf_gptj.py | 2 + .../groupvit/test_modeling_tf_groupvit.py | 2 + .../models/hubert/test_modeling_tf_hubert.py | 2 + .../models/informer/test_modeling_informer.py | 2 +- .../layoutlm/test_modeling_tf_layoutlm.py | 2 + .../layoutlmv3/test_modeling_tf_layoutlmv3.py | 2 + tests/models/led/test_modeling_tf_led.py | 2 + tests/models/llama/test_tokenization_llama.py | 33 + .../longformer/test_modeling_tf_longformer.py | 2 + .../models/lxmert/test_modeling_tf_lxmert.py | 2 + .../models/marian/test_modeling_tf_marian.py | 2 + tests/models/mbart/test_modeling_tf_mbart.py | 2 + .../mctct/test_feature_extraction_mctct.py | 8 + .../mobilebert/test_modeling_tf_mobilebert.py | 2 + .../mobilevit/test_modeling_tf_mobilevit.py | 2 + tests/models/mpnet/test_modeling_tf_mpnet.py | 2 + tests/models/mt5/test_modeling_tf_mt5.py | 2 + .../models/openai/test_modeling_tf_openai.py | 2 + tests/models/opt/test_modeling_tf_opt.py | 2 + .../pegasus/test_modeling_tf_pegasus.py | 2 + tests/models/rag/test_modeling_tf_rag.py | 2 + .../models/regnet/test_modeling_tf_regnet.py | 2 + .../rembert/test_modeling_tf_rembert.py | 2 + .../models/resnet/test_modeling_tf_resnet.py | 2 + .../roberta/test_modeling_tf_roberta.py | 2 + .../test_modeling_tf_roberta_prelayernorm.py | 2 + .../roformer/test_modeling_tf_roformer.py | 2 + tests/models/sam/test_modeling_sam.py | 112 +- tests/models/sam/test_modeling_tf_sam.py | 102 +- .../segformer/test_modeling_tf_segformer.py | 2 + .../test_feature_extraction_speech_to_text.py | 8 + .../test_modeling_tf_speech_to_text.py | 2 + .../test_feature_extraction_speecht5.py | 8 + tests/models/swin/test_modeling_tf_swin.py | 2 + tests/models/t5/test_modeling_tf_t5.py | 2 + tests/models/tapas/test_modeling_tf_tapas.py | 2 + .../test_modeling_time_series_transformer.py | 2 +- .../transfo_xl/test_modeling_tf_transfo_xl.py | 2 + .../tvlt/test_feature_extraction_tvlt.py | 9 + ...test_modeling_tf_vision_encoder_decoder.py | 2 + ...st_modeling_tf_vision_text_dual_encoder.py | 2 + tests/models/vit/test_modeling_tf_vit.py | 2 + .../vit_mae/test_modeling_tf_vit_mae.py | 2 + .../test_feature_extraction_wav2vec2.py | 8 + .../wav2vec2/test_modeling_tf_wav2vec2.py | 2 + .../wav2vec2/test_tokenization_wav2vec2.py | 8 + .../test_feature_extraction_whisper.py | 8 + .../whisper/test_modeling_tf_whisper.py | 2 + tests/models/whisper/test_modeling_whisper.py | 10 +- tests/models/xglm/test_modeling_tf_xglm.py | 2 + tests/models/xlm/test_modeling_tf_xlm.py | 2 + .../test_modeling_tf_xlm_roberta.py | 2 + tests/models/xlnet/test_modeling_tf_xlnet.py | 2 + .../pipelines/test_pipelines_image_to_text.py | 76 + tests/repo_utils/test_tests_fetcher.py | 113 +- tests/test_modeling_tf_common.py | 53 +- tests/trainer/test_trainer.py | 193 +- tests/utils/test_dynamic_module_utils.py | 129 + tests/utils/test_modeling_tf_core.py | 2 + utils/check_config_attributes.py | 2 + utils/check_repo.py | 3 + utils/check_table.py | 47 - utils/tests_fetcher.py | 63 +- 320 files changed, 9944 insertions(+), 8114 deletions(-) create mode 100644 docs/source/en/model_doc/autoformer.mdx create mode 100644 docs/source/en/tflite.mdx create mode 100644 docs/source/ko/fast_tokenizers.mdx create mode 100644 docs/source/ko/tasks/monocular_depth_estimation.mdx create mode 100644 docs/source/ko/tasks/video_classification.mdx create mode 100644 docs/source/ko/troubleshooting.mdx rename examples/pytorch/{test_xla_examples.py => old_test_xla_examples.py} (100%) create mode 100644 src/transformers/models/autoformer/__init__.py create mode 100644 src/transformers/models/autoformer/configuration_autoformer.py create mode 100644 src/transformers/models/autoformer/modeling_autoformer.py rename tests/{mixed_int8 => bitsandbytes}/README.md (100%) rename tests/{mixed_int8 => bitsandbytes}/__init__.py (100%) create mode 100644 tests/bitsandbytes/test_4bit.py rename tests/{mixed_int8 => bitsandbytes}/test_mixed_int8.py (96%) create mode 100644 tests/models/autoformer/__init__.py create mode 100644 tests/models/autoformer/test_modeling_autoformer.py create mode 100644 tests/utils/test_dynamic_module_utils.py diff --git a/.circleci/config.yml b/.circleci/config.yml index 66678d0d4a0f5d..0a5060b3490043 100644 --- a/.circleci/config.yml +++ b/.circleci/config.yml @@ -43,6 +43,12 @@ jobs: else touch test_preparation/test_list.txt fi + - run: | + if [ -f examples_test_list.txt ]; then + mv examples_test_list.txt test_preparation/examples_test_list.txt + else + touch test_preparation/examples_test_list.txt + fi - run: | if [ -f doctest_list.txt ]; then cp doctest_list.txt test_preparation/doctest_list.txt @@ -62,19 +68,6 @@ jobs: else touch test_preparation/filtered_test_list.txt fi - - run: python utils/tests_fetcher.py --filters tests examples | tee examples_tests_fetched_summary.txt - - run: | - if [ -f test_list.txt ]; then - mv test_list.txt test_preparation/examples_test_list.txt - else - touch test_preparation/examples_test_list.txt - fi - - run: | - if [ -f filtered_test_list_cross_tests.txt ]; then - mv filtered_test_list_cross_tests.txt test_preparation/filtered_test_list_cross_tests.txt - else - touch test_preparation/filtered_test_list_cross_tests.txt - fi - store_artifacts: path: test_preparation/test_list.txt - store_artifacts: @@ -111,7 +104,7 @@ jobs: - run: | mkdir test_preparation echo -n "tests" > test_preparation/test_list.txt - echo -n "tests" > test_preparation/examples_test_list.txt + echo -n "all" > test_preparation/examples_test_list.txt echo -n "tests/repo_utils" > test_preparation/test_repo_utils.txt - run: | echo -n "tests" > test_list.txt diff --git a/.circleci/create_circleci_config.py b/.circleci/create_circleci_config.py index 4bc5ce17d08cf9..c9d56754a9407f 100644 --- a/.circleci/create_circleci_config.py +++ b/.circleci/create_circleci_config.py @@ -342,7 +342,6 @@ def job_name(self): "pip install .[sklearn,torch,sentencepiece,testing,torch-speech]", "pip install -r examples/pytorch/_tests_requirements.txt", ], - tests_to_run="./examples/pytorch/", ) @@ -355,7 +354,6 @@ def job_name(self): "pip install .[sklearn,tensorflow,sentencepiece,testing]", "pip install -r examples/tensorflow/_tests_requirements.txt", ], - tests_to_run="./examples/tensorflow/", ) @@ -367,7 +365,6 @@ def job_name(self): "pip install .[flax,testing,sentencepiece]", "pip install -r examples/flax/_tests_requirements.txt", ], - tests_to_run="./examples/flax/", ) @@ -551,7 +548,17 @@ def create_circleci_config(folder=None): example_file = os.path.join(folder, "examples_test_list.txt") if os.path.exists(example_file) and os.path.getsize(example_file) > 0: - jobs.extend(EXAMPLES_TESTS) + with open(example_file, "r", encoding="utf-8") as f: + example_tests = f.read().split(" ") + for job in EXAMPLES_TESTS: + framework = job.name.replace("examples_", "").replace("torch", "pytorch") + if example_tests == "all": + job.tests_to_run = [f"examples/{framework}"] + else: + job.tests_to_run = [f for f in example_tests if f.startswith(f"examples/{framework}")] + + if len(job.tests_to_run) > 0: + jobs.append(job) doctest_file = os.path.join(folder, "doctest_list.txt") if os.path.exists(doctest_file): diff --git a/.github/workflows/self-push.yml b/.github/workflows/self-push.yml index 603a148358d9b5..878ab4f18c0b09 100644 --- a/.github/workflows/self-push.yml +++ b/.github/workflows/self-push.yml @@ -195,6 +195,10 @@ jobs: git checkout ${{ env.CI_SHA }} echo "log = $(git log -n 1)" + - name: Reinstall transformers in edit mode (remove the one installed during docker image build) + working-directory: /transformers + run: python3 -m pip uninstall -y transformers && python3 -m pip install -e . + - name: Echo folder ${{ matrix.folders }} shell: bash # For folders like `models/bert`, set an env. var. (`matrix_folders`) to `models_bert`, which will be used to @@ -284,6 +288,10 @@ jobs: git checkout ${{ env.CI_SHA }} echo "log = $(git log -n 1)" + - name: Reinstall transformers in edit mode (remove the one installed during docker image build) + working-directory: /transformers + run: python3 -m pip uninstall -y transformers && python3 -m pip install -e . + - name: Echo folder ${{ matrix.folders }} shell: bash # For folders like `models/bert`, set an env. var. (`matrix_folders`) to `models_bert`, which will be used to @@ -373,6 +381,10 @@ jobs: git checkout ${{ env.CI_SHA }} echo "log = $(git log -n 1)" + - name: Reinstall transformers in edit mode (remove the one installed during docker image build) + working-directory: /workspace/transformers + run: python3 -m pip uninstall -y transformers && python3 -m pip install -e . + - name: Remove cached torch extensions run: rm -rf /github/home/.cache/torch_extensions/ @@ -459,6 +471,10 @@ jobs: git checkout ${{ env.CI_SHA }} echo "log = $(git log -n 1)" + - name: Reinstall transformers in edit mode (remove the one installed during docker image build) + working-directory: /workspace/transformers + run: python3 -m pip uninstall -y transformers && python3 -m pip install -e . + - name: Remove cached torch extensions run: rm -rf /github/home/.cache/torch_extensions/ diff --git a/.github/workflows/self-scheduled.yml b/.github/workflows/self-scheduled.yml index e4a8cecb448ec7..a0a9d3a5de4e9f 100644 --- a/.github/workflows/self-scheduled.yml +++ b/.github/workflows/self-scheduled.yml @@ -119,6 +119,10 @@ jobs: working-directory: /transformers run: git fetch && git checkout ${{ github.sha }} + - name: Reinstall transformers in edit mode (remove the one installed during docker image build) + working-directory: /transformers + run: python3 -m pip uninstall -y transformers && python3 -m pip install -e . + - name: NVIDIA-SMI run: | nvidia-smi @@ -176,6 +180,10 @@ jobs: working-directory: /transformers run: git fetch && git checkout ${{ github.sha }} + - name: Reinstall transformers in edit mode (remove the one installed during docker image build) + working-directory: /transformers + run: python3 -m pip uninstall -y transformers && python3 -m pip install -e . + - name: NVIDIA-SMI run: | nvidia-smi @@ -221,6 +229,10 @@ jobs: working-directory: /transformers run: git fetch && git checkout ${{ github.sha }} + - name: Reinstall transformers in edit mode (remove the one installed during docker image build) + working-directory: /transformers + run: python3 -m pip uninstall -y transformers && python3 -m pip install -e . + - name: NVIDIA-SMI run: | nvidia-smi @@ -268,6 +280,10 @@ jobs: working-directory: /transformers run: git fetch && git checkout ${{ github.sha }} + - name: Reinstall transformers in edit mode (remove the one installed during docker image build) + working-directory: /transformers + run: python3 -m pip uninstall -y transformers && python3 -m pip install -e . + - name: NVIDIA-SMI run: | nvidia-smi @@ -315,6 +331,10 @@ jobs: run: | git fetch && git checkout ${{ github.sha }} + - name: Reinstall transformers in edit mode (remove the one installed during docker image build) + working-directory: /transformers + run: python3 -m pip uninstall -y transformers && python3 -m pip install -e . + - name: NVIDIA-SMI run: | nvidia-smi @@ -361,6 +381,10 @@ jobs: working-directory: /workspace/transformers run: git fetch && git checkout ${{ github.sha }} + - name: Reinstall transformers in edit mode (remove the one installed during docker image build) + working-directory: /workspace/transformers + run: python3 -m pip uninstall -y transformers && python3 -m pip install -e . + - name: Remove cached torch extensions run: rm -rf /github/home/.cache/torch_extensions/ diff --git a/README.md b/README.md index 35ede94067a419..e8e3f26d0ad6fb 100644 --- a/README.md +++ b/README.md @@ -292,6 +292,7 @@ Current number of checkpoints: ![](https://img.shields.io/endpoint?url=https://h 1. **[ALIGN](https://huggingface.co/docs/transformers/model_doc/align)** (from Google Research) released with the paper [Scaling Up Visual and Vision-Language Representation Learning With Noisy Text Supervision](https://arxiv.org/abs/2102.05918) by Chao Jia, Yinfei Yang, Ye Xia, Yi-Ting Chen, Zarana Parekh, Hieu Pham, Quoc V. Le, Yunhsuan Sung, Zhen Li, Tom Duerig. 1. **[AltCLIP](https://huggingface.co/docs/transformers/model_doc/altclip)** (from BAAI) released with the paper [AltCLIP: Altering the Language Encoder in CLIP for Extended Language Capabilities](https://arxiv.org/abs/2211.06679) by Chen, Zhongzhi and Liu, Guang and Zhang, Bo-Wen and Ye, Fulong and Yang, Qinghong and Wu, Ledell. 1. **[Audio Spectrogram Transformer](https://huggingface.co/docs/transformers/model_doc/audio-spectrogram-transformer)** (from MIT) released with the paper [AST: Audio Spectrogram Transformer](https://arxiv.org/abs/2104.01778) by Yuan Gong, Yu-An Chung, James Glass. +1. **[Autoformer](https://huggingface.co/docs/transformers/main/model_doc/autoformer)** (from Tsinghua University) released with the paper [Autoformer: Decomposition Transformers with Auto-Correlation for Long-Term Series Forecasting](https://arxiv.org/abs/2106.13008) by Haixu Wu, Jiehui Xu, Jianmin Wang, Mingsheng Long. 1. **[BART](https://huggingface.co/docs/transformers/model_doc/bart)** (from Facebook) released with the paper [BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension](https://arxiv.org/abs/1910.13461) by Mike Lewis, Yinhan Liu, Naman Goyal, Marjan Ghazvininejad, Abdelrahman Mohamed, Omer Levy, Ves Stoyanov and Luke Zettlemoyer. 1. **[BARThez](https://huggingface.co/docs/transformers/model_doc/barthez)** (from École polytechnique) released with the paper [BARThez: a Skilled Pretrained French Sequence-to-Sequence Model](https://arxiv.org/abs/2010.12321) by Moussa Kamal Eddine, Antoine J.-P. Tixier, Michalis Vazirgiannis. 1. **[BARTpho](https://huggingface.co/docs/transformers/model_doc/bartpho)** (from VinAI Research) released with the paper [BARTpho: Pre-trained Sequence-to-Sequence Models for Vietnamese](https://arxiv.org/abs/2109.09701) by Nguyen Luong Tran, Duong Minh Le and Dat Quoc Nguyen. diff --git a/README_es.md b/README_es.md index 2f611fb09dfa4b..8d971e6f304745 100644 --- a/README_es.md +++ b/README_es.md @@ -267,6 +267,7 @@ Número actual de puntos de control: ![](https://img.shields.io/endpoint?url=htt 1. **[ALIGN](https://huggingface.co/docs/transformers/model_doc/align)** (from Google Research) released with the paper [Scaling Up Visual and Vision-Language Representation Learning With Noisy Text Supervision](https://arxiv.org/abs/2102.05918) by Chao Jia, Yinfei Yang, Ye Xia, Yi-Ting Chen, Zarana Parekh, Hieu Pham, Quoc V. Le, Yunhsuan Sung, Zhen Li, Tom Duerig. 1. **[AltCLIP](https://huggingface.co/docs/transformers/model_doc/altclip)** (from BAAI) released with the paper [AltCLIP: Altering the Language Encoder in CLIP for Extended Language Capabilities](https://arxiv.org/abs/2211.06679) by Chen, Zhongzhi and Liu, Guang and Zhang, Bo-Wen and Ye, Fulong and Yang, Qinghong and Wu, Ledell. 1. **[Audio Spectrogram Transformer](https://huggingface.co/docs/transformers/model_doc/audio-spectrogram-transformer)** (from MIT) released with the paper [AST: Audio Spectrogram Transformer](https://arxiv.org/abs/2104.01778) by Yuan Gong, Yu-An Chung, James Glass. +1. **[Autoformer](https://huggingface.co/docs/transformers/main/model_doc/autoformer)** (from Tsinghua University) released with the paper [Autoformer: Decomposition Transformers with Auto-Correlation for Long-Term Series Forecasting](https://arxiv.org/abs/2106.13008) by Haixu Wu, Jiehui Xu, Jianmin Wang, Mingsheng Long. 1. **[BART](https://huggingface.co/docs/transformers/model_doc/bart)** (from Facebook) released with the paper [BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension](https://arxiv.org/abs/1910.13461) by Mike Lewis, Yinhan Liu, Naman Goyal, Marjan Ghazvininejad, Abdelrahman Mohamed, Omer Levy, Ves Stoyanov and Luke Zettlemoyer. 1. **[BARThez](https://huggingface.co/docs/transformers/model_doc/barthez)** (from École polytechnique) released with the paper [BARThez: a Skilled Pretrained French Sequence-to-Sequence Model](https://arxiv.org/abs/2010.12321) by Moussa Kamal Eddine, Antoine J.-P. Tixier, Michalis Vazirgiannis. 1. **[BARTpho](https://huggingface.co/docs/transformers/model_doc/bartpho)** (from VinAI Research) released with the paper [BARTpho: Pre-trained Sequence-to-Sequence Models for Vietnamese](https://arxiv.org/abs/2109.09701) by Nguyen Luong Tran, Duong Minh Le and Dat Quoc Nguyen. diff --git a/README_hd.md b/README_hd.md index 8245082beadea0..bb94c9961a6048 100644 --- a/README_hd.md +++ b/README_hd.md @@ -239,6 +239,7 @@ conda install -c huggingface transformers 1. **[ALIGN](https://huggingface.co/docs/transformers/model_doc/align)** (Google Research से) Chao Jia, Yinfei Yang, Ye Xia, Yi-Ting Chen, Zarana Parekh, Hieu Pham, Quoc V. Le, Yunhsuan Sung, Zhen Li, Tom Duerig. द्वाराअनुसंधान पत्र [Scaling Up Visual and Vision-Language Representation Learning With Noisy Text Supervision](https://arxiv.org/abs/2102.05918) के साथ जारी किया गया 1. **[AltCLIP](https://huggingface.co/docs/transformers/model_doc/altclip)** (from BAAI) released with the paper [AltCLIP: Altering the Language Encoder in CLIP for Extended Language Capabilities](https://arxiv.org/abs/2211.06679) by Chen, Zhongzhi and Liu, Guang and Zhang, Bo-Wen and Ye, Fulong and Yang, Qinghong and Wu, Ledell. 1. **[Audio Spectrogram Transformer](https://huggingface.co/docs/transformers/model_doc/audio-spectrogram-transformer)** (from MIT) released with the paper [AST: Audio Spectrogram Transformer](https://arxiv.org/abs/2104.01778) by Yuan Gong, Yu-An Chung, James Glass. +1. **[Autoformer](https://huggingface.co/docs/transformers/main/model_doc/autoformer)** (from Tsinghua University) released with the paper [Autoformer: Decomposition Transformers with Auto-Correlation for Long-Term Series Forecasting](https://arxiv.org/abs/2106.13008) by Haixu Wu, Jiehui Xu, Jianmin Wang, Mingsheng Long. 1. **[BART](https://huggingface.co/docs/transformers/model_doc/bart)** (फेसबुक) साथ थीसिस [बार्ट: प्राकृतिक भाषा निर्माण, अनुवाद के लिए अनुक्रम-से-अनुक्रम पूर्व प्रशिक्षण , और समझ] (https://arxiv.org/pdf/1910.13461.pdf) पर निर्भर माइक लुईस, यिनहान लियू, नमन गोयल, मार्जन ग़ज़विनिनेजाद, अब्देलरहमान मोहम्मद, ओमर लेवी, वेस स्टोयानोव और ल्यूक ज़ेटलमॉयर 1. **[BARThez](https://huggingface.co/docs/transformers/model_doc/barthez)** (से École polytechnique) साथ थीसिस [BARThez: a Skilled Pretrained French Sequence-to-Sequence Model](https://arxiv.org/abs/2010.12321) पर निर्भर Moussa Kamal Eddine, Antoine J.-P. Tixier, Michalis Vazirgiannis रिहाई। 1. **[BARTpho](https://huggingface.co/docs/transformers/model_doc/bartpho)** (VinAI Research से) साथ में पेपर [BARTpho: Pre-trained Sequence-to-Sequence Models for Vietnamese](https://arxiv.org/abs/2109.09701)गुयेन लुओंग ट्रान, डुओंग मिन्ह ले और डाट क्वोक गुयेन द्वारा पोस्ट किया गया। diff --git a/README_ja.md b/README_ja.md index 8a75ec530c75df..25a2cdb1d53856 100644 --- a/README_ja.md +++ b/README_ja.md @@ -301,6 +301,7 @@ Flax、PyTorch、TensorFlowをcondaでインストールする方法は、それ 1. **[ALIGN](https://huggingface.co/docs/transformers/model_doc/align)** (Google Research から) Chao Jia, Yinfei Yang, Ye Xia, Yi-Ting Chen, Zarana Parekh, Hieu Pham, Quoc V. Le, Yunhsuan Sung, Zhen Li, Tom Duerig. から公開された研究論文 [Scaling Up Visual and Vision-Language Representation Learning With Noisy Text Supervision](https://arxiv.org/abs/2102.05918) 1. **[AltCLIP](https://huggingface.co/docs/transformers/model_doc/altclip)** (BAAI から) Chen, Zhongzhi and Liu, Guang and Zhang, Bo-Wen and Ye, Fulong and Yang, Qinghong and Wu, Ledell から公開された研究論文: [AltCLIP: Altering the Language Encoder in CLIP for Extended Language Capabilities](https://arxiv.org/abs/2211.06679) 1. **[Audio Spectrogram Transformer](https://huggingface.co/docs/transformers/model_doc/audio-spectrogram-transformer)** (MIT から) Yuan Gong, Yu-An Chung, James Glass から公開された研究論文: [AST: Audio Spectrogram Transformer](https://arxiv.org/abs/2104.01778) +1. **[Autoformer](https://huggingface.co/docs/transformers/main/model_doc/autoformer)** (from Tsinghua University) released with the paper [Autoformer: Decomposition Transformers with Auto-Correlation for Long-Term Series Forecasting](https://arxiv.org/abs/2106.13008) by Haixu Wu, Jiehui Xu, Jianmin Wang, Mingsheng Long. 1. **[BART](https://huggingface.co/docs/transformers/model_doc/bart)** (Facebook から) Mike Lewis, Yinhan Liu, Naman Goyal, Marjan Ghazvininejad, Abdelrahman Mohamed, Omer Levy, Ves Stoyanov and Luke Zettlemoyer から公開された研究論文: [BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension](https://arxiv.org/abs/1910.13461) 1. **[BARThez](https://huggingface.co/docs/transformers/model_doc/barthez)** (École polytechnique から) Moussa Kamal Eddine, Antoine J.-P. Tixier, Michalis Vazirgiannis から公開された研究論文: [BARThez: a Skilled Pretrained French Sequence-to-Sequence Model](https://arxiv.org/abs/2010.12321) 1. **[BARTpho](https://huggingface.co/docs/transformers/model_doc/bartpho)** (VinAI Research から) Nguyen Luong Tran, Duong Minh Le and Dat Quoc Nguyen から公開された研究論文: [BARTpho: Pre-trained Sequence-to-Sequence Models for Vietnamese](https://arxiv.org/abs/2109.09701) diff --git a/README_ko.md b/README_ko.md index c7e128ad98d0ee..7c160536f19bd9 100644 --- a/README_ko.md +++ b/README_ko.md @@ -216,6 +216,7 @@ Flax, PyTorch, TensorFlow 설치 페이지에서 이들을 conda로 설치하는 1. **[ALIGN](https://huggingface.co/docs/transformers/model_doc/align)** (Google Research 에서 제공)은 Chao Jia, Yinfei Yang, Ye Xia, Yi-Ting Chen, Zarana Parekh, Hieu Pham, Quoc V. Le, Yunhsuan Sung, Zhen Li, Tom Duerig.의 [Scaling Up Visual and Vision-Language Representation Learning With Noisy Text Supervision](https://arxiv.org/abs/2102.05918)논문과 함께 발표했습니다. 1. **[AltCLIP](https://huggingface.co/docs/transformers/model_doc/altclip)** (from BAAI) released with the paper [AltCLIP: Altering the Language Encoder in CLIP for Extended Language Capabilities](https://arxiv.org/abs/2211.06679) by Chen, Zhongzhi and Liu, Guang and Zhang, Bo-Wen and Ye, Fulong and Yang, Qinghong and Wu, Ledell. 1. **[Audio Spectrogram Transformer](https://huggingface.co/docs/transformers/model_doc/audio-spectrogram-transformer)** (from MIT) released with the paper [AST: Audio Spectrogram Transformer](https://arxiv.org/abs/2104.01778) by Yuan Gong, Yu-An Chung, James Glass. +1. **[Autoformer](https://huggingface.co/docs/transformers/main/model_doc/autoformer)** (from Tsinghua University) released with the paper [Autoformer: Decomposition Transformers with Auto-Correlation for Long-Term Series Forecasting](https://arxiv.org/abs/2106.13008) by Haixu Wu, Jiehui Xu, Jianmin Wang, Mingsheng Long. 1. **[BART](https://huggingface.co/docs/transformers/model_doc/bart)** (from Facebook) released with the paper [BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension](https://arxiv.org/pdf/1910.13461.pdf) by Mike Lewis, Yinhan Liu, Naman Goyal, Marjan Ghazvininejad, Abdelrahman Mohamed, Omer Levy, Ves Stoyanov and Luke Zettlemoyer. 1. **[BARThez](https://huggingface.co/docs/transformers/model_doc/barthez)** (from École polytechnique) released with the paper [BARThez: a Skilled Pretrained French Sequence-to-Sequence Model](https://arxiv.org/abs/2010.12321) by Moussa Kamal Eddine, Antoine J.-P. Tixier, Michalis Vazirgiannis. 1. **[BARTpho](https://huggingface.co/docs/transformers/model_doc/bartpho)** (from VinAI Research) released with the paper [BARTpho: Pre-trained Sequence-to-Sequence Models for Vietnamese](https://arxiv.org/abs/2109.09701) by Nguyen Luong Tran, Duong Minh Le and Dat Quoc Nguyen. diff --git a/README_zh-hans.md b/README_zh-hans.md index 1a4b647ca14498..75353804d30b39 100644 --- a/README_zh-hans.md +++ b/README_zh-hans.md @@ -240,6 +240,7 @@ conda install -c huggingface transformers 1. **[ALIGN](https://huggingface.co/docs/transformers/model_doc/align)** (来自 Google Research) 伴随论文 [Scaling Up Visual and Vision-Language Representation Learning With Noisy Text Supervision](https://arxiv.org/abs/2102.05918) 由 Chao Jia, Yinfei Yang, Ye Xia, Yi-Ting Chen, Zarana Parekh, Hieu Pham, Quoc V. Le, Yunhsuan Sung, Zhen Li, Tom Duerig 发布。 1. **[AltCLIP](https://huggingface.co/docs/transformers/model_doc/altclip)** (来自 BAAI) 伴随论文 [AltCLIP: Altering the Language Encoder in CLIP for Extended Language Capabilities](https://arxiv.org/abs/2211.06679) 由 Chen, Zhongzhi and Liu, Guang and Zhang, Bo-Wen and Ye, Fulong and Yang, Qinghong and Wu, Ledell 发布。 1. **[Audio Spectrogram Transformer](https://huggingface.co/docs/transformers/model_doc/audio-spectrogram-transformer)** (来自 MIT) 伴随论文 [AST: Audio Spectrogram Transformer](https://arxiv.org/abs/2104.01778) 由 Yuan Gong, Yu-An Chung, James Glass 发布。 +1. **[Autoformer](https://huggingface.co/docs/transformers/main/model_doc/autoformer)** (from Tsinghua University) released with the paper [Autoformer: Decomposition Transformers with Auto-Correlation for Long-Term Series Forecasting](https://arxiv.org/abs/2106.13008) by Haixu Wu, Jiehui Xu, Jianmin Wang, Mingsheng Long. 1. **[BART](https://huggingface.co/docs/transformers/model_doc/bart)** (来自 Facebook) 伴随论文 [BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension](https://arxiv.org/pdf/1910.13461.pdf) 由 Mike Lewis, Yinhan Liu, Naman Goyal, Marjan Ghazvininejad, Abdelrahman Mohamed, Omer Levy, Ves Stoyanov and Luke Zettlemoyer 发布。 1. **[BARThez](https://huggingface.co/docs/transformers/model_doc/barthez)** (来自 École polytechnique) 伴随论文 [BARThez: a Skilled Pretrained French Sequence-to-Sequence Model](https://arxiv.org/abs/2010.12321) 由 Moussa Kamal Eddine, Antoine J.-P. Tixier, Michalis Vazirgiannis 发布。 1. **[BARTpho](https://huggingface.co/docs/transformers/model_doc/bartpho)** (来自 VinAI Research) 伴随论文 [BARTpho: Pre-trained Sequence-to-Sequence Models for Vietnamese](https://arxiv.org/abs/2109.09701) 由 Nguyen Luong Tran, Duong Minh Le and Dat Quoc Nguyen 发布。 diff --git a/README_zh-hant.md b/README_zh-hant.md index 9e915cde44ddd9..066604362c13a7 100644 --- a/README_zh-hant.md +++ b/README_zh-hant.md @@ -252,6 +252,7 @@ conda install -c huggingface transformers 1. **[ALIGN](https://huggingface.co/docs/transformers/model_doc/align)** (from Google Research) released with the paper [Scaling Up Visual and Vision-Language Representation Learning With Noisy Text Supervision](https://arxiv.org/abs/2102.05918) by Chao Jia, Yinfei Yang, Ye Xia, Yi-Ting Chen, Zarana Parekh, Hieu Pham, Quoc V. Le, Yunhsuan Sung, Zhen Li, Tom Duerig. 1. **[AltCLIP](https://huggingface.co/docs/transformers/model_doc/altclip)** (from BAAI) released with the paper [AltCLIP: Altering the Language Encoder in CLIP for Extended Language Capabilities](https://arxiv.org/abs/2211.06679) by Chen, Zhongzhi and Liu, Guang and Zhang, Bo-Wen and Ye, Fulong and Yang, Qinghong and Wu, Ledell. 1. **[Audio Spectrogram Transformer](https://huggingface.co/docs/transformers/model_doc/audio-spectrogram-transformer)** (from MIT) released with the paper [AST: Audio Spectrogram Transformer](https://arxiv.org/abs/2104.01778) by Yuan Gong, Yu-An Chung, James Glass. +1. **[Autoformer](https://huggingface.co/docs/transformers/main/model_doc/autoformer)** (from Tsinghua University) released with the paper [Autoformer: Decomposition Transformers with Auto-Correlation for Long-Term Series Forecasting](https://arxiv.org/abs/2106.13008) by Haixu Wu, Jiehui Xu, Jianmin Wang, Mingsheng Long. 1. **[BART](https://huggingface.co/docs/transformers/model_doc/bart)** (from Facebook) released with the paper [BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension](https://arxiv.org/pdf/1910.13461.pdf) by Mike Lewis, Yinhan Liu, Naman Goyal, Marjan Ghazvininejad, Abdelrahman Mohamed, Omer Levy, Ves Stoyanov and Luke Zettlemoyer. 1. **[BARThez](https://huggingface.co/docs/transformers/model_doc/barthez)** (from École polytechnique) released with the paper [BARThez: a Skilled Pretrained French Sequence-to-Sequence Model](https://arxiv.org/abs/2010.12321) by Moussa Kamal Eddine, Antoine J.-P. Tixier, Michalis Vazirgiannis. 1. **[BARTpho](https://huggingface.co/docs/transformers/model_doc/bartpho)** (from VinAI Research) released with the paper [BARTpho: Pre-trained Sequence-to-Sequence Models for Vietnamese](https://arxiv.org/abs/2109.09701) by Nguyen Luong Tran, Duong Minh Le and Dat Quoc Nguyen. diff --git a/awesome-transformers.md b/awesome-transformers.md index 36059eafa62a7f..446b30bdada967 100644 --- a/awesome-transformers.md +++ b/awesome-transformers.md @@ -29,7 +29,7 @@ Keywords: inpainting, SD, Stable Diffusion ## [flair](https://github.com/flairNLP/flair) -FLAIR is a powerful PyTorch NLP framework, convering several important tasks: NER, sentiment-analysis, part-of-speech tagging, text and ducoment embeddings, among other things. +FLAIR is a powerful PyTorch NLP framework, convering several important tasks: NER, sentiment-analysis, part-of-speech tagging, text and document embeddings, among other things. Keywords: NLP, text embedding, document embedding, biomedical, NER, PoS, sentiment-analysis @@ -45,6 +45,12 @@ Keywords: Database, low-code, AI table Keywords: LLMs, Large Language Models, Agents, Chains +## [LlamaIndex](https://github.com/jerryjliu/llama_index) + +[LlamaIndex](https://github.com/jerryjliu/llama_index) is a project that provides a central interface to connect your LLM's with external data. It provides various kinds of indices and retreival mechanisms to perform different LLM tasks and obtain knowledge-augmented results. + +Keywords: LLMs, Large Language Models, Data Retrieval, Indices, Knowledge Augmentation + ## [ParlAI](https://github.com/facebookresearch/ParlAI) [ParlAI](https://github.com/facebookresearch/ParlAI) is a python framework for sharing, training and testing dialogue models, from open-domain chitchat, to task-oriented dialogue, to visual question answering. It provides more than 100 datasets under the same API, a large zoo of pretrained models, a set of agents, and has several integrations. @@ -360,7 +366,7 @@ Keywords: Model inspection, Model interpretation, Black-box, White-box ## [tortoise-tts](https://github.com/neonbjb/tortoise-tts) -Tortoise is a text-to-speech program built with the following priorities: strong multi-voice capabilities., and highly realistic prosody and intonation. +Tortoise is a text-to-speech program built with the following priorities: strong multi-voice capabilities, and highly realistic prosody and intonation. Keywords: Text-to-speech @@ -405,7 +411,7 @@ Keywords: Training, Generation Diffgram aims to integrate human supervision into platforms. We support your team programmatically changing the UI (Schema, layout, etc.) like in Streamlit. This means that you can collect and annotate timely data from users. In other words, we are the platform behind your platform, an integrated part of your application, to ship new & better AI products faster. -Keywords: Human supervision, Platfor, +Keywords: Human supervision, Platform ## [ecco](https://github.com/jalammar/ecco) @@ -431,11 +437,11 @@ Keywords: DALL-E, Russian Keywords: Knowledge Extraction, Knowledge Graphs -## [nebullvm](https://github.com/nebuly-ai/nebullvm) +## [Nebuly](https://github.com/nebuly-ai/nebuly) -Nebullvm is an ecosystem of plug and play modules to optimize the performances of your AI systems. The optimization modules are stack-agnostic and work with any library. They are designed to be easily integrated into your system, providing a quick and seamless boost to its performance. Simply plug and play to start realizing the benefits of optimized performance right away. +Nebuly is the next-generation platform to monitor and optimize your AI costs in one place. The platform connects to all your AI cost sources (compute, API providers, AI software licenses, etc) and centralizes them in one place to give you full visibility on a model basis. The platform also provides optimization recommendations and a co-pilot model that can guide during the optimization process. The platform builds on top of the open-source tools allowing you to optimize the different steps of your AI stack to squeeze out the best possible cost performances. -Keywords: Optimization, Performance +Keywords: Optimization, Performance, Monitoring ## [imaginAIry](https://github.com/brycedrennan/imaginAIry) diff --git a/docs/source/en/_toctree.yml b/docs/source/en/_toctree.yml index a9113c5e740029..dfe899032ea866 100644 --- a/docs/source/en/_toctree.yml +++ b/docs/source/en/_toctree.yml @@ -93,6 +93,8 @@ title: Run training on Amazon SageMaker - local: serialization title: Export to ONNX + - local: tflite + title: Export to TFLite - local: torchscript title: Export to TorchScript - local: benchmarks @@ -654,6 +656,8 @@ title: Reinforcement learning models - isExpanded: false sections: + - local: model_doc/autoformer + title: Autoformer - local: model_doc/informer title: Informer - local: model_doc/time_series_transformer diff --git a/docs/source/en/custom_tools.mdx b/docs/source/en/custom_tools.mdx index 23ca4bccbadca4..feb4c7ac46d779 100644 --- a/docs/source/en/custom_tools.mdx +++ b/docs/source/en/custom_tools.mdx @@ -414,6 +414,13 @@ of the tools, it has available to it. +In both cases, you can pass a repo ID instead of the prompt template if you would like to use a template hosted by someone in the community. The default prompts live in [this repo](https://huggingface.co/datasets/huggingface-tools/default-prompts) as an example. + +To upload your custom prompt on a repo on the Hub and share it with the community just make sure: +- to use a dataset repository +- to put the prompt template for the `run` command in a file named `run_prompt_template.txt` +- to put the prompt template for the `chat` command in a file named `chat_prompt_template.txt` + ## Using custom tools In this section, we'll be leveraging two existing custom tools that are specific to image generation: diff --git a/docs/source/en/index.mdx b/docs/source/en/index.mdx index 497803322f8953..7ee35c613669f8 100644 --- a/docs/source/en/index.mdx +++ b/docs/source/en/index.mdx @@ -53,6 +53,7 @@ The documentation is organized into five sections: 1. **[ALIGN](model_doc/align)** (from Google Research) released with the paper [Scaling Up Visual and Vision-Language Representation Learning With Noisy Text Supervision](https://arxiv.org/abs/2102.05918) by Chao Jia, Yinfei Yang, Ye Xia, Yi-Ting Chen, Zarana Parekh, Hieu Pham, Quoc V. Le, Yunhsuan Sung, Zhen Li, Tom Duerig. 1. **[AltCLIP](model_doc/altclip)** (from BAAI) released with the paper [AltCLIP: Altering the Language Encoder in CLIP for Extended Language Capabilities](https://arxiv.org/abs/2211.06679) by Chen, Zhongzhi and Liu, Guang and Zhang, Bo-Wen and Ye, Fulong and Yang, Qinghong and Wu, Ledell. 1. **[Audio Spectrogram Transformer](model_doc/audio-spectrogram-transformer)** (from MIT) released with the paper [AST: Audio Spectrogram Transformer](https://arxiv.org/abs/2104.01778) by Yuan Gong, Yu-An Chung, James Glass. +1. **[Autoformer](model_doc/autoformer)** (from Tsinghua University) released with the paper [Autoformer: Decomposition Transformers with Auto-Correlation for Long-Term Series Forecasting](https://arxiv.org/abs/2106.13008) by Haixu Wu, Jiehui Xu, Jianmin Wang, Mingsheng Long. 1. **[BART](model_doc/bart)** (from Facebook) released with the paper [BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension](https://arxiv.org/abs/1910.13461) by Mike Lewis, Yinhan Liu, Naman Goyal, Marjan Ghazvininejad, Abdelrahman Mohamed, Omer Levy, Ves Stoyanov and Luke Zettlemoyer. 1. **[BARThez](model_doc/barthez)** (from École polytechnique) released with the paper [BARThez: a Skilled Pretrained French Sequence-to-Sequence Model](https://arxiv.org/abs/2010.12321) by Moussa Kamal Eddine, Antoine J.-P. Tixier, Michalis Vazirgiannis. 1. **[BARTpho](model_doc/bartpho)** (from VinAI Research) released with the paper [BARTpho: Pre-trained Sequence-to-Sequence Models for Vietnamese](https://arxiv.org/abs/2109.09701) by Nguyen Luong Tran, Duong Minh Le and Dat Quoc Nguyen. @@ -268,6 +269,7 @@ Flax), PyTorch, and/or TensorFlow. | ALIGN | ❌ | ❌ | ✅ | ❌ | ❌ | | AltCLIP | ❌ | ❌ | ✅ | ❌ | ❌ | | Audio Spectrogram Transformer | ❌ | ❌ | ✅ | ❌ | ❌ | +| Autoformer | ❌ | ❌ | ✅ | ❌ | ❌ | | BART | ✅ | ✅ | ✅ | ✅ | ✅ | | BEiT | ❌ | ❌ | ✅ | ❌ | ✅ | | BERT | ✅ | ✅ | ✅ | ✅ | ✅ | diff --git a/docs/source/en/main_classes/callback.mdx b/docs/source/en/main_classes/callback.mdx index 33ae17c66df2bb..2636130473c706 100644 --- a/docs/source/en/main_classes/callback.mdx +++ b/docs/source/en/main_classes/callback.mdx @@ -39,6 +39,7 @@ By default a [`Trainer`] will use the following callbacks: installed. - [`~integrations.ClearMLCallback`] if [clearml](https://github.com/allegroai/clearml) is installed. - [`~integrations.DagsHubCallback`] if [dagshub](https://dagshub.com/) is installed. +- [`~integrations.FlyteCallback`] if [flyte](https://flyte.org/) is installed. The main class that implements callbacks is [`TrainerCallback`]. It gets the [`TrainingArguments`] used to instantiate the [`Trainer`], can access that @@ -79,6 +80,8 @@ Here is the list of the available [`TrainerCallback`] in the library: [[autodoc]] integrations.DagsHubCallback +[[autodoc]] integrations.FlyteCallback + ## TrainerCallback [[autodoc]] TrainerCallback diff --git a/docs/source/en/main_classes/quantization.mdx b/docs/source/en/main_classes/quantization.mdx index 3dd6d36ee497d8..c168b11a8302f5 100644 --- a/docs/source/en/main_classes/quantization.mdx +++ b/docs/source/en/main_classes/quantization.mdx @@ -19,8 +19,45 @@ This is supported by most of the GPU hardwares since the `0.37.0` release of `bi Learn more about the quantization method in the [LLM.int8()](https://arxiv.org/abs/2208.07339) paper, or the [blogpost](https://huggingface.co/blog/hf-bitsandbytes-integration) about the collaboration. +Since its `0.39.0` release, you can load any model that supports `device_map` using 4-bit quantization, leveraging FP4 data type. + Here are the things you can do using `bitsandbytes` integration +### FP4 quantization + +#### Requirements + +Make sure that you have installed the requirements below before running any of the code snippets below. + +- Latest `bitsandbytes` library +`pip install bitsandbytes>=0.39.0` + +- Install latest `accelerate` from source +`pip install git+https://github.com/huggingface/accelerate.git` + +- Install latest `transformers` from source +`pip install git+https://github.com/huggingface/transformers.git` + +#### Load a large model in 4bit + +By using `load_in_4bit=True` when calling the `.from_pretrained` method, you can divide your memory use by 4 (roughly). + +```python +# pip install transformers accelerate bitsandbytes +from transformers import AutoModelForCausalLM, AutoTokenizer + +model_id = "bigscience/bloom-1b7" + +tokenizer = AutoTokenizer.from_pretrained(model_id) +model = AutoModelForCausalLM.from_pretrained(model_id, device_map="auto", load_in_4bit=True) +``` + + + +Note that once a model has been loaded in 4-bit it is currently not possible to push the quantized weights on the Hub. Note also that you cannot train 4-bit weights as this is not supported yet. However you can use 4-bit models to train extra parameters, this will be covered in the next section. + + + ### Load a large model in 8bit You can load a model by roughly halving the memory requirements by using `load_in_8bit=True` argument when calling `.from_pretrained` method @@ -48,10 +85,56 @@ With this integration we were able to load large models on smaller devices and r -Note that once a model has been loaded in 8-bit it is currently not possible to push the quantized weights on the Hub. Note also that you cannot train 8-bit weights as this is not supported yet. However you can use 8-bit models to train extra parameters, this will be covered in the next section. +Note that once a model has been loaded in 8-bit it is currently not possible to push the quantized weights on the Hub except if you use the latest `transformers` and `bitsandbytes`. Note also that you cannot train 8-bit weights as this is not supported yet. However you can use 8-bit models to train extra parameters, this will be covered in the next section. +#### Advanced usecases + +Here we will cover some advanced usecases you can perform with FP4 quantization + +##### Change the compute dtype + +The compute dtype is used to change the dtype that will be used during computation. For example, hidden states could be in `float32` but computation can be set to bf16 for speedups. By default, the compute dtype is set to `float32`. + +```python +import torch +from transformers import BitsAndBytesConfig + +quantization_config = BitsAndBytesConfig(load_in_4bit=True, bnb_4bit_compute_dtype=torch.bfloat16) +``` + +##### Using NF4 (Normal Float 4) data type + +You can also use the NF4 data type, which is a new 4bit datatype adapted for weights that have been initialized using a normal distribution. For that run: + +```python +from transformers import BitsAndBytesConfig + +nf4_config = BitsAndBytesConfig( + load_in_4bit=True, + bnb_4bit_quant_type="nf4", +) + +model_nf4 = AutoModelForCausalLM.from_pretrained(model_id, quantization_config=nf4_config) +``` + +##### Use nested quantization for more memory efficient inference + +We also advise users to use the nested quantization technique. This saves more memory at no additional performance - from our empirical observations, this enables fine-tuning llama-13b model on an NVIDIA-T4 16GB with a sequence length of 1024, batch size of 1 and gradient accumulation steps of 4. + +```python +from transformers import BitsAndBytesConfig + +double_quant_config = BitsAndBytesConfig( + load_in_4bit=True, + bnb_4bit_use_double_quant=True, +) + +model_double_quant = AutoModelForCausalLM.from_pretrained(model_id, quantization_config=double_quant_config) +``` + + ### Push quantized models on the 🤗 Hub You can push a quantized model on the Hub by naively using `push_to_hub` method. This will first push the quantization configuration file, then push the quantized model weights. diff --git a/docs/source/en/main_classes/trainer.mdx b/docs/source/en/main_classes/trainer.mdx index 67ab6aba42ef4e..409a6c6d33afa1 100644 --- a/docs/source/en/main_classes/trainer.mdx +++ b/docs/source/en/main_classes/trainer.mdx @@ -61,7 +61,7 @@ class CustomTrainer(Trainer): outputs = model(**inputs) logits = outputs.get("logits") # compute custom loss (suppose one has 3 labels with different weights) - loss_fct = nn.CrossEntropyLoss(weight=torch.tensor([1.0, 2.0, 3.0])) + loss_fct = nn.CrossEntropyLoss(weight=torch.tensor([1.0, 2.0, 3.0], device=model.device)) loss = loss_fct(logits.view(-1, self.model.config.num_labels), labels.view(-1)) return (loss, outputs) if return_outputs else loss ``` diff --git a/docs/source/en/model_doc/autoformer.mdx b/docs/source/en/model_doc/autoformer.mdx new file mode 100644 index 00000000000000..c1bd30555cdbcc --- /dev/null +++ b/docs/source/en/model_doc/autoformer.mdx @@ -0,0 +1,42 @@ + + +# Autoformer + +## Overview + +The Autoformer model was proposed in [Autoformer: Decomposition Transformers with Auto-Correlation for Long-Term Series Forecasting](https://arxiv.org/abs/2106.13008) by Haixu Wu, Jiehui Xu, Jianmin Wang, Mingsheng Long. + +This model augments the Transformer as a deep decomposition architecture, which can progressively decompose the trend and seasonal components during the forecasting process. + +The abstract from the paper is the following: + +*Extending the forecasting time is a critical demand for real applications, such as extreme weather early warning and long-term energy consumption planning. This paper studies the long-term forecasting problem of time series. Prior Transformer-based models adopt various self-attention mechanisms to discover the long-range dependencies. However, intricate temporal patterns of the long-term future prohibit the model from finding reliable dependencies. Also, Transformers have to adopt the sparse versions of point-wise self-attentions for long series efficiency, resulting in the information utilization bottleneck. Going beyond Transformers, we design Autoformer as a novel decomposition architecture with an Auto-Correlation mechanism. We break with the pre-processing convention of series decomposition and renovate it as a basic inner block of deep models. This design empowers Autoformer with progressive decomposition capacities for complex time series. Further, inspired by the stochastic process theory, we design the Auto-Correlation mechanism based on the series periodicity, which conducts the dependencies discovery and representation aggregation at the sub-series level. Auto-Correlation outperforms self-attention in both efficiency and accuracy. In long-term forecasting, Autoformer yields state-of-the-art accuracy, with a 38% relative improvement on six benchmarks, covering five practical applications: energy, traffic, economics, weather and disease.* + +This model was contributed by [elisim](https://huggingface.co/elisim) and [kashif](https://huggingface.co/kashif). +The original code can be found [here](https://github.com/thuml/Autoformer). + +## AutoformerConfig + +[[autodoc]] AutoformerConfig + + +## AutoformerModel + +[[autodoc]] AutoformerModel + - forward + + +## AutoformerForPrediction + +[[autodoc]] AutoformerForPrediction + - forward \ No newline at end of file diff --git a/docs/source/en/model_doc/opt.mdx b/docs/source/en/model_doc/opt.mdx index 0c041c5ecb8794..073c7c3bfc68b5 100644 --- a/docs/source/en/model_doc/opt.mdx +++ b/docs/source/en/model_doc/opt.mdx @@ -23,7 +23,7 @@ The abstract from the paper is the following: Tips: - OPT has the same architecture as [`BartDecoder`]. -- Contrary to GPT2, OPT adds the EOS token `` to the beginning of every prompt. **Note**: Make sure to pass `use_fast=False` when loading OPT's tokenizer with [`AutoTokenizer`] to get the correct tokenizer. +- Contrary to GPT2, OPT adds the EOS token `` to the beginning of every prompt. This model was contributed by [Arthur Zucker](https://huggingface.co/ArthurZ), [Younes Belkada](https://huggingface.co/ybelkada), and [Patrick Von Platen](https://huggingface.co/patrickvonplaten). The original code can be found [here](https://github.com/facebookresearch/metaseq). diff --git a/docs/source/en/perf_infer_gpu_one.mdx b/docs/source/en/perf_infer_gpu_one.mdx index 3403e81fb38451..4bcf3c1111161c 100644 --- a/docs/source/en/perf_infer_gpu_one.mdx +++ b/docs/source/en/perf_infer_gpu_one.mdx @@ -34,6 +34,60 @@ model.save_pretrained("saved_model") As of PyTorch 2.0, the attention fastpath is supported for both encoders and decoders. The list of supported architectures can be found [here](https://huggingface.co/docs/optimum/bettertransformer/overview#supported-models). +## `bitsandbytes` integration for FP4 mixed-precision inference + +You can install `bitsandbytes` and benefit from easy model compression on GPUs. Using FP4 quantization you can expect to reduce up to 8x the model size compared to its native full precision version. Check out below how to get started. + + + +Note that this feature can also be used in a multi GPU setup. + + + +### Requirements + +- Latest `bitsandbytes` library +`pip install bitsandbytes>=0.39.0` + +- Install latest `accelerate` from source +`pip install git+https://github.com/huggingface/accelerate.git` + +- Install latest `transformers` from source +`pip install git+https://github.com/huggingface/transformers.git` + +### Running FP4 models - single GPU setup - Quickstart + +You can quickly run a FP4 model on a single GPU by running the following code: + +```py +from transformers import AutoModelForCausalLM + +model_name = "bigscience/bloom-2b5" +model_8bit = AutoModelForCausalLM.from_pretrained(model_name, device_map="auto", load_in_4bit=True) +``` + +### Running FP4 models - multi GPU setup + +The way to load your mixed 8-bit model in multiple GPUs is as follows (same command as single GPU setup): +```py +model_name = "bigscience/bloom-2b5" +model_8bit = AutoModelForCausalLM.from_pretrained(model_name, device_map="auto", load_in_4bit=True) +``` +But you can control the GPU RAM you want to allocate on each GPU using `accelerate`. Use the `max_memory` argument as follows: + +```py +max_memory_mapping = {0: "600MB", 1: "1GB"} +model_name = "bigscience/bloom-3b" +model_8bit = AutoModelForCausalLM.from_pretrained( + model_name, device_map="auto", load_in_4bit=True, max_memory=max_memory_mapping +) +``` +In this example, the first GPU will use 600MB of memory and the second 1GB. + +### Advanced usage + +For more advanced usage of this method, please have a look at the [quantization](main_classes/quantization) documentation page. + ## `bitsandbytes` integration for Int8 mixed-precision matrix decomposition diff --git a/docs/source/en/serialization.mdx b/docs/source/en/serialization.mdx index cc429dea08a5ab..022cf460f808bf 100644 --- a/docs/source/en/serialization.mdx +++ b/docs/source/en/serialization.mdx @@ -12,13 +12,20 @@ specific language governing permissions and limitations under the License. # Export to ONNX -If you need to deploy 🤗 Transformers models in production environments, we recommend -exporting them to a serialized format that can be loaded and executed on specialized -runtimes and hardware. In this guide, we'll show you how to export 🤗 Transformers -models to [ONNX (Open Neural Network eXchange)](http://onnx.ai). +Deploying 🤗 Transformers models in production environments often requires, or can benefit from exporting the models into +a serialized format that can be loaded and executed on specialized runtimes and hardware. -ONNX is an open standard that defines a common set of operators and a common file format -to represent deep learning models in a wide variety of frameworks, including PyTorch and +🤗 Optimum is an extension of Transformers that enables exporting models from PyTorch or TensorFlow to serialized formats +such as ONNX and TFLite through its `exporters` module. 🤗 Optimum also provides a set of performance optimization tools to train +and run models on targeted hardware with maximum efficiency. + +This guide demonstrates how you can export 🤗 Transformers models to ONNX with 🤗 Optimum, for the guide on exporting models to TFLite, +please refer to the [Export to TFLite page](tflite). + +## Export to ONNX + +[ONNX (Open Neural Network eXchange)](http://onnx.ai) is an open standard that defines a common set of operators and a +common file format to represent deep learning models in a wide variety of frameworks, including PyTorch and TensorFlow. When a model is exported to the ONNX format, these operators are used to construct a computational graph (often called an _intermediate representation_) which represents the flow of data through the neural network. @@ -27,166 +34,67 @@ By exposing a graph with standardized operators and data types, ONNX makes it ea switch between frameworks. For example, a model trained in PyTorch can be exported to ONNX format and then imported in TensorFlow (and vice versa). -🤗 Transformers provides a [`transformers.onnx`](main_classes/onnx) package that enables -you to convert model checkpoints to an ONNX graph by leveraging configuration objects. -These configuration objects come ready made for a number of model architectures, and are -designed to be easily extendable to other architectures. - - - -You can also export 🤗 Transformers models with the [`optimum.exporters.onnx` package](https://huggingface.co/docs/optimum/exporters/onnx/usage_guides/export_a_model) -from 🤗 Optimum. - -Once exported, a model can be: - -- Optimized for inference via techniques such as quantization and graph optimization. -- Run with ONNX Runtime via [`ORTModelForXXX` classes](https://huggingface.co/docs/optimum/onnxruntime/package_reference/modeling_ort), +Once exported to ONNX format, a model can be: +- optimized for inference via techniques such as [graph optimization](https://huggingface.co/docs/optimum/onnxruntime/usage_guides/optimization) and [quantization](https://huggingface.co/docs/optimum/onnxruntime/usage_guides/quantization). +- run with ONNX Runtime via [`ORTModelForXXX` classes](https://huggingface.co/docs/optimum/onnxruntime/package_reference/modeling_ort), which follow the same `AutoModel` API as the one you are used to in 🤗 Transformers. -- Run with [optimized inference pipelines](https://huggingface.co/docs/optimum/main/en/onnxruntime/usage_guides/pipelines), -which has the same API as the [`pipeline`] function in 🤗 Transformers. +- run with [optimized inference pipelines](https://huggingface.co/docs/optimum/main/en/onnxruntime/usage_guides/pipelines), +which has the same API as the [`pipeline`] function in 🤗 Transformers. -To explore all these features, check out the [🤗 Optimum library](https://github.com/huggingface/optimum). +🤗 Optimum provides support for the ONNX export by leveraging configuration objects. These configuration objects come +ready-made for a number of model architectures, and are designed to be easily extendable to other architectures. - +For the list of ready-made configurations, please refer to [🤗 Optimum documentation](https://huggingface.co/docs/optimum/exporters/onnx/overview). -Ready-made configurations include the following architectures: - - - -- ALBERT -- BART -- BEiT -- BERT -- BigBird -- BigBird-Pegasus -- Blenderbot -- BlenderbotSmall -- BLOOM -- CamemBERT -- Chinese-CLIP -- CLIP -- CodeGen -- Conditional DETR -- ConvBERT -- ConvNeXT -- Data2VecText -- Data2VecVision -- DeBERTa -- DeBERTa-v2 -- DeiT -- DETR -- DistilBERT -- EfficientNet -- ELECTRA -- ERNIE -- FlauBERT -- GPT Neo -- GPT-J -- GPT-Sw3 -- GroupViT -- I-BERT -- ImageGPT -- LayoutLM -- LayoutLMv3 -- LeViT -- Longformer -- LongT5 -- M2M100 -- Marian -- mBART -- MEGA -- MobileBERT -- MobileNetV1 -- MobileNetV2 -- MobileViT -- MT5 -- OpenAI GPT-2 -- OWL-ViT -- Perceiver -- PLBart -- PoolFormer -- RemBERT -- ResNet -- RoBERTa -- RoBERTa-PreLayerNorm -- RoFormer -- SegFormer -- SqueezeBERT -- SwiftFormer -- Swin Transformer -- T5 -- Table Transformer -- Vision Encoder decoder -- ViT -- Whisper -- X-MOD -- XLM -- XLM-RoBERTa -- XLM-RoBERTa-XL -- YOLOS - -In the next two sections, we'll show you how to: - -* Export a supported model using the `transformers.onnx` package. -* Export a custom model for an unsupported architecture. - -## Exporting a model to ONNX - - - -The recommended way of exporting a model is now to use -[`optimum.exporters.onnx`](https://huggingface.co/docs/optimum/main/en/exporters/onnx/usage_guides/export_a_model#exporting-a-model-to-onnx-using-the-cli), -do not worry it is very similar to `transformers.onnx`! +There are two ways to export a 🤗 Transformers model to ONNX, here we show both: - +- export with 🤗 Optimum via CLI. +- export with 🤗 Optimum with `optimum.onnxruntime`. -To export a 🤗 Transformers model to ONNX, you'll first need to install some extra -dependencies: +### Exporting a 🤗 Transformers model to ONNX with CLI + +To export a 🤗 Transformers model to ONNX, first install an extra dependency: ```bash -pip install transformers[onnx] +pip install optimum[exporters] ``` -The `transformers.onnx` package can then be used as a Python module: +To check out all available arguments, refer to the [🤗 Optimum docs](https://huggingface.co/docs/optimum/exporters/onnx/usage_guides/export_a_model#exporting-a-model-to-onnx-using-the-cli), +or view help in command line: ```bash -python -m transformers.onnx --help - -usage: Hugging Face Transformers ONNX exporter [-h] -m MODEL [--feature {causal-lm, ...}] [--opset OPSET] [--atol ATOL] output - -positional arguments: - output Path indicating where to store generated ONNX model. - -optional arguments: - -h, --help show this help message and exit - -m MODEL, --model MODEL - Model ID on huggingface.co or path on disk to load model from. - --feature {causal-lm, ...} - The type of features to export the model with. - --opset OPSET ONNX opset version to export the model with. - --atol ATOL Absolute difference tolerance when validating the model. +optimum-cli export onnx --help ``` -Exporting a checkpoint using a ready-made configuration can be done as follows: +To export a model's checkpoint from the 🤗 Hub, for example, `distilbert-base-uncased-distilled-squad`, run the following command: ```bash -python -m transformers.onnx --model=distilbert-base-uncased onnx/ +optimum-cli export onnx --model distilbert-base-uncased-distilled-squad distilbert_base_uncased_squad_onnx/ ``` -You should see the following logs: +You should see the logs indicating progress and showing where the resulting `model.onnx` is saved, like this: ```bash -Validating ONNX model... - -[✓] ONNX model output names match reference model ({'last_hidden_state'}) - - Validating ONNX Model output "last_hidden_state": - -[✓] (2, 8, 768) matches (2, 8, 768) - -[✓] all values close (atol: 1e-05) -All good, model saved at: onnx/model.onnx -``` +Validating ONNX model distilbert_base_uncased_squad_onnx/model.onnx... + -[✓] ONNX model output names match reference model (start_logits, end_logits) + - Validating ONNX Model output "start_logits": + -[✓] (2, 16) matches (2, 16) + -[✓] all values close (atol: 0.0001) + - Validating ONNX Model output "end_logits": + -[✓] (2, 16) matches (2, 16) + -[✓] all values close (atol: 0.0001) +The ONNX export succeeded and the exported model was saved at: distilbert_base_uncased_squad_onnx +``` + +The example above illustrates exporting a checkpoint from 🤗 Hub. When exporting a local model, first make sure that you +saved both the model's weights and tokenizer files in the same directory (`local_path`). When using CLI, pass the +`local_path` to the `model` argument instead of the checkpoint name on 🤗 Hub and provide the `--task` argument. +You can review the list of supported tasks in the [🤗 Optimum documentation](https://huggingface.co/docs/optimum/exporters/task_manager). +If `task` argument is not provided, it will default to the model architecture without any task specific head. -This exports an ONNX graph of the checkpoint defined by the `--model` argument. In this -example, it is `distilbert-base-uncased`, but it can be any checkpoint on the Hugging -Face Hub or one that's stored locally. +```bash +optimum-cli export onnx --model local_path --task question-answering distilbert_base_uncased_squad_onnx/ +``` The resulting `model.onnx` file can then be run on one of the [many accelerators](https://onnx.ai/supported-tools.html#deployModel) that support the ONNX @@ -195,348 +103,104 @@ Runtime](https://onnxruntime.ai/) as follows: ```python >>> from transformers import AutoTokenizer ->>> from onnxruntime import InferenceSession - ->>> tokenizer = AutoTokenizer.from_pretrained("distilbert-base-uncased") ->>> session = InferenceSession("onnx/model.onnx") ->>> # ONNX Runtime expects NumPy arrays as input ->>> inputs = tokenizer("Using DistilBERT with ONNX Runtime!", return_tensors="np") ->>> outputs = session.run(output_names=["last_hidden_state"], input_feed=dict(inputs)) -``` - -The required output names (like `["last_hidden_state"]`) can be obtained by taking a -look at the ONNX configuration of each model. For example, for DistilBERT we have: - -```python ->>> from transformers.models.distilbert import DistilBertConfig, DistilBertOnnxConfig - ->>> config = DistilBertConfig() ->>> onnx_config = DistilBertOnnxConfig(config) ->>> print(list(onnx_config.outputs.keys())) -["last_hidden_state"] -``` +>>> from optimum.onnxruntime import ORTModelForQuestionAnswering -The process is identical for TensorFlow checkpoints on the Hub. For example, we can -export a pure TensorFlow checkpoint from the [Keras -organization](https://huggingface.co/keras-io) as follows: - -```bash -python -m transformers.onnx --model=keras-io/transformers-qa onnx/ -``` - -To export a model that's stored locally, you'll need to have the model's weights and -tokenizer files stored in a directory. For example, we can load and save a checkpoint as -follows: - - -```python ->>> from transformers import AutoTokenizer, AutoModelForSequenceClassification - ->>> # Load tokenizer and PyTorch weights form the Hub ->>> tokenizer = AutoTokenizer.from_pretrained("distilbert-base-uncased") ->>> pt_model = AutoModelForSequenceClassification.from_pretrained("distilbert-base-uncased") ->>> # Save to disk ->>> tokenizer.save_pretrained("local-pt-checkpoint") ->>> pt_model.save_pretrained("local-pt-checkpoint") -``` - -Once the checkpoint is saved, we can export it to ONNX by pointing the `--model` -argument of the `transformers.onnx` package to the desired directory: - -```bash -python -m transformers.onnx --model=local-pt-checkpoint onnx/ -``` - -```python ->>> from transformers import AutoTokenizer, TFAutoModelForSequenceClassification - ->>> # Load tokenizer and TensorFlow weights from the Hub ->>> tokenizer = AutoTokenizer.from_pretrained("distilbert-base-uncased") ->>> tf_model = TFAutoModelForSequenceClassification.from_pretrained("distilbert-base-uncased") ->>> # Save to disk ->>> tokenizer.save_pretrained("local-tf-checkpoint") ->>> tf_model.save_pretrained("local-tf-checkpoint") +>>> tokenizer = AutoTokenizer.from_pretrained("distilbert_base_uncased_squad_onnx") +>>> model = ORTModelForQuestionAnswering.from_pretrained("distilbert_base_uncased_squad_onnx") +>>> inputs = tokenizer("What am I using?", "Using DistilBERT with ONNX Runtime!", return_tensors="pt") +>>> outputs = model(**inputs) ``` -Once the checkpoint is saved, we can export it to ONNX by pointing the `--model` -argument of the `transformers.onnx` package to the desired directory: +The process is identical for TensorFlow checkpoints on the Hub. For instance, here's how you would +export a pure TensorFlow checkpoint from the [Keras organization](https://huggingface.co/keras-io): ```bash -python -m transformers.onnx --model=local-tf-checkpoint onnx/ +optimum-cli export onnx --model keras-io/transformers-qa distilbert_base_cased_squad_onnx/ ``` - -## Selecting features for different model tasks +### Exporting a 🤗 Transformers model to ONNX with `optimum.onnxruntime` - - -The recommended way of exporting a model is now to use `optimum.exporters.onnx`. -You can check the [🤗 Optimum documentation](https://huggingface.co/docs/optimum/main/en/exporters/onnx/usage_guides/export_a_model#selecting-a-task) -to learn how to select a task. - - - -Each ready-made configuration comes with a set of _features_ that enable you to export -models for different types of tasks. As shown in the table below, each feature is -associated with a different `AutoClass`: - -| Feature | Auto Class | -| ------------------------------------ | ------------------------------------ | -| `causal-lm`, `causal-lm-with-past` | `AutoModelForCausalLM` | -| `default`, `default-with-past` | `AutoModel` | -| `masked-lm` | `AutoModelForMaskedLM` | -| `question-answering` | `AutoModelForQuestionAnswering` | -| `seq2seq-lm`, `seq2seq-lm-with-past` | `AutoModelForSeq2SeqLM` | -| `sequence-classification` | `AutoModelForSequenceClassification` | -| `token-classification` | `AutoModelForTokenClassification` | - -For each configuration, you can find the list of supported features via the -[`~transformers.onnx.FeaturesManager`]. For example, for DistilBERT we have: +Alternative to CLI, you can export a 🤗 Transformers model to ONNX programmatically like so: ```python ->>> from transformers.onnx.features import FeaturesManager - ->>> distilbert_features = list(FeaturesManager.get_supported_features_for_model_type("distilbert").keys()) ->>> print(distilbert_features) -["default", "masked-lm", "causal-lm", "sequence-classification", "token-classification", "question-answering"] -``` - -You can then pass one of these features to the `--feature` argument in the -`transformers.onnx` package. For example, to export a text-classification model we can -pick a fine-tuned model from the Hub and run: +>>> from optimum.onnxruntime import ORTModelForSequenceClassification +>>> from transformers import AutoTokenizer -```bash -python -m transformers.onnx --model=distilbert-base-uncased-finetuned-sst-2-english \ - --feature=sequence-classification onnx/ -``` +>>> model_checkpoint = "distilbert_base_uncased_squad" +>>> save_directory = "onnx/" -This displays the following logs: +>>> # Load a model from transformers and export it to ONNX +>>> ort_model = ORTModelForSequenceClassification.from_pretrained(model_checkpoint, export=True) +>>> tokenizer = AutoTokenizer.from_pretrained(model_checkpoint) -```bash -Validating ONNX model... - -[✓] ONNX model output names match reference model ({'logits'}) - - Validating ONNX Model output "logits": - -[✓] (2, 2) matches (2, 2) - -[✓] all values close (atol: 1e-05) -All good, model saved at: onnx/model.onnx +>>> # Save the onnx model and tokenizer +>>> ort_model.save_pretrained(save_directory) +>>> tokenizer.save_pretrained(save_directory) ``` -Notice that in this case, the output names from the fine-tuned model are `logits` -instead of the `last_hidden_state` we saw with the `distilbert-base-uncased` checkpoint -earlier. This is expected since the fine-tuned model has a sequence classification head. - - - -The features that have a `with-past` suffix (like `causal-lm-with-past`) correspond to -model classes with precomputed hidden states (key and values in the attention blocks) -that can be used for fast autoregressive decoding. - - - - - -For `VisionEncoderDecoder` type models, the encoder and decoder parts are -exported separately as two ONNX files named `encoder_model.onnx` and `decoder_model.onnx` respectively. - - - - -## Exporting a model for an unsupported architecture - - +### Exporting a model for an unsupported architecture If you wish to contribute by adding support for a model that cannot be currently exported, you should first check if it is -supported in [`optimum.exporters.onnx`](https://huggingface.co/docs/optimum/main/en/exporters/onnx/package_reference/configuration#supported-architectures), -and if it is not, [contribute to 🤗 Optimum](https://huggingface.co/docs/optimum/main/en/exporters/onnx/usage_guides/contribute) +supported in [`optimum.exporters.onnx`](https://huggingface.co/docs/optimum/exporters/onnx/overview), +and if it is not, [contribute to 🤗 Optimum](https://huggingface.co/docs/optimum/exporters/onnx/usage_guides/contribute) directly. - - -If you wish to export a model whose architecture is not natively supported by the -library, there are three main steps to follow: - -1. Implement a custom ONNX configuration. -2. Export the model to ONNX. -3. Validate the outputs of the PyTorch and exported models. - -In this section, we'll look at how DistilBERT was implemented to show what's involved -with each step. - -### Implementing a custom ONNX configuration - -Let's start with the ONNX configuration object. We provide three abstract classes that -you should inherit from, depending on the type of model architecture you wish to export: - -* Encoder-based models inherit from [`~onnx.config.OnnxConfig`] -* Decoder-based models inherit from [`~onnx.config.OnnxConfigWithPast`] -* Encoder-decoder models inherit from [`~onnx.config.OnnxSeq2SeqConfigWithPast`] - - - -A good way to implement a custom ONNX configuration is to look at the existing -implementation in the `configuration_.py` file of a similar architecture. - - - -Since DistilBERT is an encoder-based model, its configuration inherits from -`OnnxConfig`: +### Exporting a model with `transformers.onnx` -```python ->>> from typing import Mapping, OrderedDict ->>> from transformers.onnx import OnnxConfig - - ->>> class DistilBertOnnxConfig(OnnxConfig): -... @property -... def inputs(self) -> Mapping[str, Mapping[int, str]]: -... return OrderedDict( -... [ -... ("input_ids", {0: "batch", 1: "sequence"}), -... ("attention_mask", {0: "batch", 1: "sequence"}), -... ] -... ) -``` - -Every configuration object must implement the `inputs` property and return a mapping, -where each key corresponds to an expected input, and each value indicates the axis of -that input. For DistilBERT, we can see that two inputs are required: `input_ids` and -`attention_mask`. These inputs have the same shape of `(batch_size, sequence_length)` -which is why we see the same axes used in the configuration. + - - -Notice that `inputs` property for `DistilBertOnnxConfig` returns an `OrderedDict`. This -ensures that the inputs are matched with their relative position within the -`PreTrainedModel.forward()` method when tracing the graph. We recommend using an -`OrderedDict` for the `inputs` and `outputs` properties when implementing custom ONNX -configurations. +`tranformers.onnx` is no longer maintained, please export models with 🤗 Optimum as described above. This section will be removed in the future versions. -Once you have implemented an ONNX configuration, you can instantiate it by providing the -base model's configuration as follows: - -```python ->>> from transformers import AutoConfig - ->>> config = AutoConfig.from_pretrained("distilbert-base-uncased") ->>> onnx_config = DistilBertOnnxConfig(config) -``` - -The resulting object has several useful properties. For example, you can view the ONNX -operator set that will be used during the export: - -```python ->>> print(onnx_config.default_onnx_opset) -11 -``` - -You can also view the outputs associated with the model as follows: +To export a 🤗 Transformers model to ONNX with `tranformers.onnx`, install extra dependencies: -```python ->>> print(onnx_config.outputs) -OrderedDict([("last_hidden_state", {0: "batch", 1: "sequence"})]) +```bash +pip install transformers[onnx] ``` -Notice that the outputs property follows the same structure as the inputs; it returns an -`OrderedDict` of named outputs and their shapes. The output structure is linked to the -choice of feature that the configuration is initialised with. By default, the ONNX -configuration is initialized with the `default` feature that corresponds to exporting a -model loaded with the `AutoModel` class. If you want to export a model for another task, -just provide a different feature to the `task` argument when you initialize the ONNX -configuration. For example, if we wished to export DistilBERT with a sequence -classification head, we could use: +Use `transformers.onnx` package as a Python module to export a checkpoint using a ready-made configuration: -```python ->>> from transformers import AutoConfig - ->>> config = AutoConfig.from_pretrained("distilbert-base-uncased") ->>> onnx_config_for_seq_clf = DistilBertOnnxConfig(config, task="sequence-classification") ->>> print(onnx_config_for_seq_clf.outputs) -OrderedDict([('logits', {0: 'batch'})]) +```bash +python -m transformers.onnx --model=distilbert-base-uncased onnx/ ``` - - -All of the base properties and methods associated with [`~onnx.config.OnnxConfig`] and -the other configuration classes can be overridden if needed. Check out [`BartOnnxConfig`] -for an advanced example. - - - -### Exporting the model - -Once you have implemented the ONNX configuration, the next step is to export the model. -Here we can use the `export()` function provided by the `transformers.onnx` package. -This function expects the ONNX configuration, along with the base model and tokenizer, -and the path to save the exported file: +This exports an ONNX graph of the checkpoint defined by the `--model` argument. Pass any checkpoint on the 🤗 Hub or one that's stored locally. +The resulting `model.onnx` file can then be run on one of the many accelerators that support the ONNX standard. For example, +load and run the model with ONNX Runtime as follows: ```python ->>> from pathlib import Path ->>> from transformers.onnx import export ->>> from transformers import AutoTokenizer, AutoModel - ->>> onnx_path = Path("model.onnx") ->>> model_ckpt = "distilbert-base-uncased" ->>> base_model = AutoModel.from_pretrained(model_ckpt) ->>> tokenizer = AutoTokenizer.from_pretrained(model_ckpt) +>>> from transformers import AutoTokenizer +>>> from onnxruntime import InferenceSession ->>> onnx_inputs, onnx_outputs = export(tokenizer, base_model, onnx_config, onnx_config.default_onnx_opset, onnx_path) +>>> tokenizer = AutoTokenizer.from_pretrained("distilbert-base-uncased") +>>> session = InferenceSession("onnx/model.onnx") +>>> # ONNX Runtime expects NumPy arrays as input +>>> inputs = tokenizer("Using DistilBERT with ONNX Runtime!", return_tensors="np") +>>> outputs = session.run(output_names=["last_hidden_state"], input_feed=dict(inputs)) ``` -The `onnx_inputs` and `onnx_outputs` returned by the `export()` function are lists of -the keys defined in the `inputs` and `outputs` properties of the configuration. Once the -model is exported, you can test that the model is well formed as follows: +The required output names (like `["last_hidden_state"]`) can be obtained by taking a look at the ONNX configuration of +each model. For example, for DistilBERT we have: ```python ->>> import onnx +>>> from transformers.models.distilbert import DistilBertConfig, DistilBertOnnxConfig ->>> onnx_model = onnx.load("model.onnx") ->>> onnx.checker.check_model(onnx_model) +>>> config = DistilBertConfig() +>>> onnx_config = DistilBertOnnxConfig(config) +>>> print(list(onnx_config.outputs.keys())) +["last_hidden_state"] ``` - +The process is identical for TensorFlow checkpoints on the Hub. For example, export a pure TensorFlow checkpoint like so: -If your model is larger than 2GB, you will see that many additional files are created -during the export. This is _expected_ because ONNX uses [Protocol -Buffers](https://developers.google.com/protocol-buffers/) to store the model and these -have a size limit of 2GB. See the [ONNX -documentation](https://github.com/onnx/onnx/blob/master/docs/ExternalData.md) for -instructions on how to load models with external data. - - - -### Validating the model outputs - -The final step is to validate that the outputs from the base and exported model agree -within some absolute tolerance. Here we can use the `validate_model_outputs()` function -provided by the `transformers.onnx` package as follows: - -```python ->>> from transformers.onnx import validate_model_outputs - ->>> validate_model_outputs( -... onnx_config, tokenizer, base_model, onnx_path, onnx_outputs, onnx_config.atol_for_validation -... ) +```bash +python -m transformers.onnx --model=keras-io/transformers-qa onnx/ ``` -This function uses the [`~transformers.onnx.OnnxConfig.generate_dummy_inputs`] method to -generate inputs for the base and exported model, and the absolute tolerance can be -defined in the configuration. We generally find numerical agreement in the 1e-6 to 1e-4 -range, although anything smaller than 1e-3 is likely to be OK. - -## Contributing a new configuration to 🤗 Transformers +To export a model that's stored locally, save the model's weights and tokenizer files in the same directory (e.g. `local-pt-checkpoint`), +then export it to ONNX by pointing the `--model` argument of the `transformers.onnx` package to the desired directory: -We are looking to expand the set of ready-made configurations and welcome contributions -from the community! If you would like to contribute your addition to the library, you -will need to: - -* Implement the ONNX configuration in the corresponding `configuration_.py` -file -* Include the model architecture and corresponding features in - [`~onnx.features.FeatureManager`] -* Add your model architecture to the tests in `test_onnx_v2.py` - -Check out how the configuration for [IBERT was -contributed](https://github.com/huggingface/transformers/pull/14868/files) to get an -idea of what's involved. +```bash +python -m transformers.onnx --model=local-pt-checkpoint onnx/ +``` \ No newline at end of file diff --git a/docs/source/en/tasks/document_question_answering.mdx b/docs/source/en/tasks/document_question_answering.mdx index 4c5208820642f7..7294e9f8fda2d0 100644 --- a/docs/source/en/tasks/document_question_answering.mdx +++ b/docs/source/en/tasks/document_question_answering.mdx @@ -40,9 +40,6 @@ LayoutLMv2 solves the document question-answering task by adding a question-answ states of the tokens, to predict the positions of the start and end tokens of the answer. In other words, the problem is treated as extractive question answering: given the context, extract which piece of information answers the question. The context comes from the output of an OCR engine, here it is Google's Tesseract. -states of the tokens, in order to predict which token is at the start of the answer and which token is at the end of the -answer. In other words, the problem is treated as extractive question answering: given the context, extract which piece -of information answers the question. The context comes from the output of an OCR engine, here it is Google's Tesseract. Before you begin, make sure you have all the necessary libraries installed. LayoutLMv2 depends on detectron2, torchvision and tesseract. diff --git a/docs/source/en/tflite.mdx b/docs/source/en/tflite.mdx new file mode 100644 index 00000000000000..23e08478ba82af --- /dev/null +++ b/docs/source/en/tflite.mdx @@ -0,0 +1,58 @@ + + +# Export to TFLite + +[TensorFlow Lite](https://www.tensorflow.org/lite/guide) is a lightweight framework for deploying machine learning models +on resource-constrained devices, such as mobile phones, embedded systems, and Internet of Things (IoT) devices. +TFLite is designed to optimize and run models efficiently on these devices with limited computational power, memory, and +power consumption. +A TensorFlow Lite model is represented in a special efficient portable format identified by the `.tflite` file extension. + +🤗 Optimum offers functionality to export 🤗 Transformers models to TFLite through the `exporters.tflite` module. +For the list of supported model architectures, please refer to [🤗 Optimum documentation](https://huggingface.co/docs/optimum/exporters/tflite/overview). + +To export a model to TFLite, install the required dependencies: + +```bash +pip install optimum[exporters-tf] +``` + +To check out all available arguments, refer to the [🤗 Optimum docs](https://huggingface.co/docs/optimum/main/en/exporters/tflite/usage_guides/export_a_model), +or view help in command line: + +```bash +optimum-cli export tflite --help +``` + +To export a model's checkpoint from the 🤗 Hub, for example, `bert-base-uncased`, run the following command: + +```bash +optimum-cli export tflite --model bert-base-uncased --sequence_length 128 bert_tflite/ +``` + +You should see the logs indicating progress and showing where the resulting `model.tflite` is saved, like this: + +```bash +Validating TFLite model... + -[✓] TFLite model output names match reference model (logits) + - Validating TFLite Model output "logits": + -[✓] (1, 128, 30522) matches (1, 128, 30522) + -[x] values not close enough, max diff: 5.817413330078125e-05 (atol: 1e-05) +The TensorFlow Lite export succeeded with the warning: The maximum absolute difference between the output of the reference model and the TFLite exported model is not within the set tolerance 1e-05: +- logits: max diff = 5.817413330078125e-05. + The exported model was saved at: bert_tflite + ``` + +The example above illustrates exporting a checkpoint from 🤗 Hub. When exporting a local model, first make sure that you +saved both the model's weights and tokenizer files in the same directory (`local_path`). When using CLI, pass the +`local_path` to the `model` argument instead of the checkpoint name on 🤗 Hub. \ No newline at end of file diff --git a/docs/source/ko/_toctree.yml b/docs/source/ko/_toctree.yml index 386723497f0c86..fd921229c36b20 100644 --- a/docs/source/ko/_toctree.yml +++ b/docs/source/ko/_toctree.yml @@ -54,16 +54,16 @@ title: 이미지 분류 - local: in_translation title: (번역중) Semantic segmentation - - local: in_translation - title: (번역중) Video classification + - local: tasks/video_classification + title: 영상 분류 - local: in_translation title: (번역중) Object detection - local: tasks/zero_shot_object_detection title: 제로샷(zero-shot) 객체 탐지 - local: tasks/zero_shot_image_classification title: 제로샷(zero-shot) 이미지 분류 - - local: in_translation - title: (번역중) Depth estimation + - local: tasks/monocular_depth_estimation + title: 단일 영상 기반 깊이 추정 title: (번역중) 컴퓨터 비전 isExpanded: false - sections: @@ -75,8 +75,8 @@ isExpanded: false title: 태스크 가이드 - sections: - - local: in_translation - title: (번역중) Use fast tokenizers from 🤗 Tokenizers + - local: fast_tokenizers + title: 🤗 Tokenizers 라이브러리에서 토크나이저 사용하기 - local: multilingual title: 다국어 모델 추론하기 - local: in_translation @@ -97,8 +97,8 @@ title: (번역중) Notebooks with examples - local: in_translation title: (번역중) Community resources - - local: in_translation - title: (번역중) Troubleshoot + - local: troubleshooting + title: 문제 해결 title: (번역중) 개발자 가이드 - sections: - local: in_translation @@ -673,4 +673,4 @@ - local: in_translation title: (번역중) Utilities for Time Series title: (번역중) Internal Helpers - title: (번역중) API \ No newline at end of file + title: (번역중) API diff --git a/docs/source/ko/fast_tokenizers.mdx b/docs/source/ko/fast_tokenizers.mdx new file mode 100644 index 00000000000000..bef75686ecb0c4 --- /dev/null +++ b/docs/source/ko/fast_tokenizers.mdx @@ -0,0 +1,67 @@ + + +# 🤗 Tokenizers 라이브러리의 토크나이저 사용하기[[use-tokenizers-from-tokenizers]] + +[`PreTrainedTokenizerFast`]는 [🤗 Tokenizers](https://huggingface.co/docs/tokenizers) 라이브러리에 기반합니다. 🤗 Tokenizers 라이브러리의 토크나이저는 +🤗 Transformers로 매우 간단하게 불러올 수 있습니다. + +구체적인 내용에 들어가기 전에, 몇 줄의 코드로 더미 토크나이저를 만들어 보겠습니다: + +```python +>>> from tokenizers import Tokenizer +>>> from tokenizers.models import BPE +>>> from tokenizers.trainers import BpeTrainer +>>> from tokenizers.pre_tokenizers import Whitespace + +>>> tokenizer = Tokenizer(BPE(unk_token="[UNK]")) +>>> trainer = BpeTrainer(special_tokens=["[UNK]", "[CLS]", "[SEP]", "[PAD]", "[MASK]"]) + +>>> tokenizer.pre_tokenizer = Whitespace() +>>> files = [...] +>>> tokenizer.train(files, trainer) +``` + +우리가 정의한 파일을 통해 이제 학습된 토크나이저를 갖게 되었습니다. 이 런타임에서 계속 사용하거나 JSON 파일로 저장하여 나중에 사용할 수 있습니다. + +## 토크나이저 객체로부터 직접 불러오기[[loading-directly-from-the-tokenizer-object]] + +🤗 Transformers 라이브러리에서 이 토크나이저 객체를 활용하는 방법을 살펴보겠습니다. +[`PreTrainedTokenizerFast`] 클래스는 인스턴스화된 *토크나이저* 객체를 인수로 받아 쉽게 인스턴스화할 수 있습니다: + +```python +>>> from transformers import PreTrainedTokenizerFast + +>>> fast_tokenizer = PreTrainedTokenizerFast(tokenizer_object=tokenizer) +``` + +이제 `fast_tokenizer` 객체는 🤗 Transformers 토크나이저에서 공유하는 모든 메소드와 함께 사용할 수 있습니다! 자세한 내용은 [토크나이저 페이지](main_classes/tokenizer)를 참조하세요. + +## JSON 파일에서 불러오기[[loading-from-a-JSON-file]] + + + +JSON 파일에서 토크나이저를 불러오기 위해, 먼저 토크나이저를 저장해 보겠습니다: + +```python +>>> tokenizer.save("tokenizer.json") +``` + +JSON 파일을 저장한 경로는 `tokenizer_file` 매개변수를 사용하여 [`PreTrainedTokenizerFast`] 초기화 메소드에 전달할 수 있습니다: + +```python +>>> from transformers import PreTrainedTokenizerFast + +>>> fast_tokenizer = PreTrainedTokenizerFast(tokenizer_file="tokenizer.json") +``` + +이제 `fast_tokenizer` 객체는 🤗 Transformers 토크나이저에서 공유하는 모든 메소드와 함께 사용할 수 있습니다! 자세한 내용은 [토크나이저 페이지](main_classes/tokenizer)를 참조하세요. diff --git a/docs/source/ko/tasks/monocular_depth_estimation.mdx b/docs/source/ko/tasks/monocular_depth_estimation.mdx new file mode 100644 index 00000000000000..2ccadd2fd3cb24 --- /dev/null +++ b/docs/source/ko/tasks/monocular_depth_estimation.mdx @@ -0,0 +1,145 @@ + + +# 단일 영상 기반 깊이 추정[[depth-estimation-pipeline]] + +단일 영상 기반 깊이 추정은 한 장면의 단일 이미지에서 장면의 깊이 정보를 예측하는 컴퓨터 비전 작업입니다. +즉, 단일 카메라 시점의 장면에 있는 물체의 거리를 예측하는 과정입니다. + +단일 영상 기반 깊이 추정은 3D 재구성, 증강 현실, 자율 주행, 로봇 공학 등 다양한 분야에서 응용됩니다. +조명 조건, 가려짐, 텍스처와 같은 요소의 영향을 받을 수 있는 장면 내 물체와 해당 깊이 정보 간의 복잡한 관계를 모델이 이해해야 하므로 까다로운 작업입니다. + + + +이 튜토리얼에서 다루는 작업은 다음 모델 아키텍처에서 지원됩니다: + + + +[DPT](../model_doc/dpt), [GLPN](../model_doc/glpn) + + + + + +이번 가이드에서 배울 내용은 다음과 같습니다: + +* 깊이 추정 파이프라인 만들기 +* 직접 깊이 추정 추론하기 + +시작하기 전에, 필요한 모든 라이브러리가 설치되어 있는지 확인하세요: + +```bash +pip install -q transformers +``` + +## 깊이 추정 파이프라인[[depth-estimation-inference-by-hand]] + +깊이 추정을 추론하는 가장 간단한 방법은 해당 기능을 제공하는 [`pipeline`]을 사용하는 것입니다. +[Hugging Face Hub 체크포인트](https://huggingface.co/models?pipeline_tag=depth-estimation&sort=downloads)에서 파이프라인을 초기화합니다: + +```py +>>> from transformers import pipeline + +>>> checkpoint = "vinvino02/glpn-nyu" +>>> depth_estimator = pipeline("depth-estimation", model=checkpoint) +``` + + +다음으로, 분석할 이미지를 한 장 선택하세요: + +```py +>>> from PIL import Image +>>> import requests + +>>> url = "https://unsplash.com/photos/HwBAsSbPBDU/download?ixid=MnwxMjA3fDB8MXxzZWFyY2h8MzR8fGNhciUyMGluJTIwdGhlJTIwc3RyZWV0fGVufDB8MHx8fDE2Nzg5MDEwODg&force=true&w=640" +>>> image = Image.open(requests.get(url, stream=True).raw) +>>> image +``` + +
+ Photo of a busy street +
+ +이미지를 파이프라인으로 전달합니다. + +```py +>>> predictions = depth_estimator(image) +``` + +파이프라인은 두 개의 항목을 가지는 딕셔너리를 반환합니다. +첫 번째는 `predicted_depth`로 각 픽셀의 깊이를 미터로 표현한 값을 가지는 텐서입니다. +두 번째는 `depth`로 깊이 추정 결과를 시각화하는 PIL 이미지입니다. + +이제 시각화한 결과를 살펴보겠습니다: + +```py +>>> predictions["depth"] +``` + +
+ Depth estimation visualization +
+ +## 직접 깊이 추정 추론하기[[depth-estimation-inference-by-hand]] + +이제 깊이 추정 파이프라인 사용법을 살펴보았으니 동일한 결과를 복제하는 방법을 살펴보겠습니다. +[Hugging Face Hub 체크포인트](https://huggingface.co/models?pipeline_tag=depth-estimation&sort=downloads)에서 모델과 관련 프로세서를 가져오는 것부터 시작합니다. +여기서 이전에 사용한 체크포인트와 동일한 것을 사용합니다: + +```py +>>> from transformers import AutoImageProcessor, AutoModelForDepthEstimation + +>>> checkpoint = "vinvino02/glpn-nyu" + +>>> image_processor = AutoImageProcessor.from_pretrained(checkpoint) +>>> model = AutoModelForDepthEstimation.from_pretrained(checkpoint) +``` + +필요한 이미지 변환을 처리하는 `image_processor`를 사용하여 모델에 대한 이미지 입력을 준비합니다. +`image_processor`는 크기 조정 및 정규화 등 필요한 이미지 변환을 처리합니다: + +```py +>>> pixel_values = image_processor(image, return_tensors="pt").pixel_values +``` + +준비한 입력을 모델로 전달합니다: + +```py +>>> import torch + +>>> with torch.no_grad(): +... outputs = model(pixel_values) +... predicted_depth = outputs.predicted_depth +``` + +결과를 시각화합니다: + +```py +>>> import numpy as np + +>>> # 원본 사이즈로 복원 +>>> prediction = torch.nn.functional.interpolate( +... predicted_depth.unsqueeze(1), +... size=image.size[::-1], +... mode="bicubic", +... align_corners=False, +... ).squeeze() +>>> output = prediction.numpy() + +>>> formatted = (output * 255 / np.max(output)).astype("uint8") +>>> depth = Image.fromarray(formatted) +>>> depth +``` + +
+ Depth estimation visualization +
diff --git a/docs/source/ko/tasks/video_classification.mdx b/docs/source/ko/tasks/video_classification.mdx new file mode 100644 index 00000000000000..4d185b0aa765a7 --- /dev/null +++ b/docs/source/ko/tasks/video_classification.mdx @@ -0,0 +1,494 @@ + + +# 영상 분류 [[video-classification]] + +[[open-in-colab]] + + +영상 분류는 영상 전체에 레이블 또는 클래스를 지정하는 작업입니다. 각 영상에는 하나의 클래스가 있을 것으로 예상됩니다. 영상 분류 모델은 영상을 입력으로 받아 어느 클래스에 속하는지에 대한 예측을 반환합니다. 이러한 모델은 영상이 어떤 내용인지 분류하는 데 사용될 수 있습니다. 영상 분류의 실제 응용 예는 피트니스 앱에서 유용한 동작 / 운동 인식 서비스가 있습니다. 이는 또한 시각 장애인이 이동할 때 보조하는데 사용될 수 있습니다 + +이 가이드에서는 다음을 수행하는 방법을 보여줍니다: + +1. [UCF101](https://www.crcv.ucf.edu/data/UCF101.php) 데이터 세트의 하위 집합을 통해 [VideoMAE](https://huggingface.co/docs/transformers/main/en/model_doc/videomae) 모델을 미세 조정하기. +2. 미세 조정한 모델을 추론에 사용하기. + + + +이 튜토리얼에서 설명하는 작업은 다음 모델 아키텍처에서 지원됩니다: + + + +[TimeSformer](../model_doc/timesformer), [VideoMAE](../model_doc/videomae) + + + + + + +시작하기 전에 필요한 모든 라이브러리가 설치되었는지 확인하세요: +```bash +pip install -q pytorchvideo transformers evaluate +``` + +영상을 처리하고 준비하기 위해 [PyTorchVideo](https://pytorchvideo.org/)(이하 `pytorchvideo`)를 사용합니다. + +커뮤니티에 모델을 업로드하고 공유할 수 있도록 Hugging Face 계정에 로그인하는 것을 권장합니다. 프롬프트가 나타나면 토큰을 입력하여 로그인하세요: + +```py +>>> from huggingface_hub import notebook_login + +>>> notebook_login() +``` + +## UCF101 데이터셋 불러오기 [[load-ufc101-dataset]] + +[UCF-101](https://www.crcv.ucf.edu/data/UCF101.php) 데이터 세트의 하위 집합(subset)을 불러오는 것으로 시작할 수 있습니다. 전체 데이터 세트를 학습하는데 더 많은 시간을 할애하기 전에 데이터의 하위 집합을 불러와 모든 것이 잘 작동하는지 실험하고 확인할 수 있습니다. + +```py +>>> from huggingface_hub import hf_hub_download + +>>> hf_dataset_identifier = "sayakpaul/ucf101-subset" +>>> filename = "UCF101_subset.tar.gz" +>>> file_path = hf_hub_download(repo_id=hf_dataset_identifier, filename=filename, repo_type="dataset") +``` + +데이터 세트의 하위 집합이 다운로드 되면, 압축된 파일의 압축을 해제해야 합니다: +```py +>>> import tarfile + +>>> with tarfile.open(file_path) as t: +... t.extractall(".") +``` + +전체 데이터 세트는 다음과 같이 구성되어 있습니다. + +```bash +UCF101_subset/ + train/ + BandMarching/ + video_1.mp4 + video_2.mp4 + ... + Archery + video_1.mp4 + video_2.mp4 + ... + ... + val/ + BandMarching/ + video_1.mp4 + video_2.mp4 + ... + Archery + video_1.mp4 + video_2.mp4 + ... + ... + test/ + BandMarching/ + video_1.mp4 + video_2.mp4 + ... + Archery + video_1.mp4 + video_2.mp4 + ... + ... +``` + + +정렬된 영상의 경로는 다음과 같습니다: + +```bash +... +'UCF101_subset/train/ApplyEyeMakeup/v_ApplyEyeMakeup_g07_c04.avi', +'UCF101_subset/train/ApplyEyeMakeup/v_ApplyEyeMakeup_g07_c06.avi', +'UCF101_subset/train/ApplyEyeMakeup/v_ApplyEyeMakeup_g08_c01.avi', +'UCF101_subset/train/ApplyEyeMakeup/v_ApplyEyeMakeup_g09_c02.avi', +'UCF101_subset/train/ApplyEyeMakeup/v_ApplyEyeMakeup_g09_c06.avi' +... +``` + +동일한 그룹/장면에 속하는 영상 클립은 파일 경로에서 `g`로 표시되어 있습니다. 예를 들면, `v_ApplyEyeMakeup_g07_c04.avi`와 `v_ApplyEyeMakeup_g07_c06.avi` 이 있습니다. 이 둘은 같은 그룹입니다. + +검증 및 평가 데이터 분할을 할 때, [데이터 누출(data leakage)](https://www.kaggle.com/code/alexisbcook/data-leakage)을 방지하기 위해 동일한 그룹 / 장면의 영상 클립을 사용하지 않아야 합니다. 이 튜토리얼에서 사용하는 하위 집합은 이러한 정보를 고려하고 있습니다. + +그 다음으로, 데이터 세트에 존재하는 라벨을 추출합니다. 또한, 모델을 초기화할 때 도움이 될 딕셔너리(dictionary data type)를 생성합니다. + +* `label2id`: 클래스 이름을 정수에 매핑합니다. +* `id2label`: 정수를 클래스 이름에 매핑합니다. + +```py +>>> class_labels = sorted({str(path).split("/")[2] for path in all_video_file_paths}) +>>> label2id = {label: i for i, label in enumerate(class_labels)} +>>> id2label = {i: label for label, i in label2id.items()} + +>>> print(f"Unique classes: {list(label2id.keys())}.") + +# Unique classes: ['ApplyEyeMakeup', 'ApplyLipstick', 'Archery', 'BabyCrawling', 'BalanceBeam', 'BandMarching', 'BaseballPitch', 'Basketball', 'BasketballDunk', 'BenchPress']. +``` + +이 데이터 세트에는 총 10개의 고유한 클래스가 있습니다. 각 클래스마다 30개의 영상이 훈련 세트에 있습니다 + +## 미세 조정하기 위해 모델 가져오기 [[load-a-model-to-fine-tune]] + +사전 훈련된 체크포인트와 체크포인트에 연관된 이미지 프로세서를 사용하여 영상 분류 모델을 인스턴스화합니다. 모델의 인코더에는 미리 학습된 매개변수가 제공되며, 분류 헤드(데이터를 분류하는 마지막 레이어)는 무작위로 초기화됩니다. 데이터 세트의 전처리 파이프라인을 작성할 때는 이미지 프로세서가 유용합니다. + +```py +>>> from transformers import VideoMAEImageProcessor, VideoMAEForVideoClassification + +>>> model_ckpt = "MCG-NJU/videomae-base" +>>> image_processor = VideoMAEImageProcessor.from_pretrained(model_ckpt) +>>> model = VideoMAEForVideoClassification.from_pretrained( +... model_ckpt, +... label2id=label2id, +... id2label=id2label, +... ignore_mismatched_sizes=True, # provide this in case you're planning to fine-tune an already fine-tuned checkpoint +... ) +``` + +모델을 가져오는 동안, 다음과 같은 경고를 마주칠 수 있습니다: + +```bash +Some weights of the model checkpoint at MCG-NJU/videomae-base were not used when initializing VideoMAEForVideoClassification: [..., 'decoder.decoder_layers.1.attention.output.dense.bias', 'decoder.decoder_layers.2.attention.attention.key.weight'] +- This IS expected if you are initializing VideoMAEForVideoClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model). +- This IS NOT expected if you are initializing VideoMAEForVideoClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model). +Some weights of VideoMAEForVideoClassification were not initialized from the model checkpoint at MCG-NJU/videomae-base and are newly initialized: ['classifier.bias', 'classifier.weight'] +You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference. +``` + + +위 경고는 우리가 일부 가중치(예: `classifier` 층의 가중치와 편향)를 버리고 새로운 `classifier` 층의 가중치와 편향을 무작위로 초기화하고 있다는 것을 알려줍니다. 이 경우에는 미리 학습된 가중치가 없는 새로운 헤드를 추가하고 있으므로, 라이브러리가 모델을 추론에 사용하기 전에 미세 조정하라고 경고를 보내는 것은 당연합니다. 그리고 이제 우리는 이 모델을 미세 조정할 예정입니다. + +**참고** 이 [체크포인트](https://huggingface.co/MCG-NJU/videomae-base-finetuned-kinetics)는 도메인이 많이 중첩된 유사한 다운스트림 작업에 대해 미세 조정하여 얻은 체크포인트이므로 이 작업에서 더 나은 성능을 보일 수 있습니다. `MCG-NJU/videomae-base-finetuned-kinetics` 데이터 세트를 미세 조정하여 얻은 [체크포인트](https://huggingface.co/sayakpaul/videomae-base-finetuned-kinetics-finetuned-ucf101-subset)도 있습니다. + +## 훈련을 위한 데이터 세트 준비하기[[prepare-the-datasets-for-training]] + +영상 전처리를 위해 [PyTorchVideo 라이브러리](https://pytorchvideo.org/)를 활용할 것입니다. 필요한 종속성을 가져오는 것으로 시작하세요. + +```py +>>> import pytorchvideo.data + +>>> from pytorchvideo.transforms import ( +... ApplyTransformToKey, +... Normalize, +... RandomShortSideScale, +... RemoveKey, +... ShortSideScale, +... UniformTemporalSubsample, +... ) + +>>> from torchvision.transforms import ( +... Compose, +... Lambda, +... RandomCrop, +... RandomHorizontalFlip, +... Resize, +... ) +``` + +학습 데이터 세트 변환에는 '균일한 시간 샘플링(uniform temporal subsampling)', '픽셀 정규화(pixel normalization)', '랜덤 잘라내기(random cropping)' 및 '랜덤 수평 뒤집기(random horizontal flipping)'의 조합을 사용합니다. 검증 및 평가 데이터 세트 변환에는 '랜덤 잘라내기'와 '랜덤 뒤집기'를 제외한 동일한 변환 체인을 유지합니다. 이러한 변환에 대해 자세히 알아보려면 [PyTorchVideo 공식 문서](https://pytorchvideo.org)를 확인하세요. + +사전 훈련된 모델과 관련된 이미지 프로세서를 사용하여 다음 정보를 얻을 수 있습니다: + +* 영상 프레임 픽셀을 정규화하는 데 사용되는 이미지 평균과 표준 편차 +* 영상 프레임이 조정될 공간 해상도 + + +먼저, 몇 가지 상수를 정의합니다. + +```py +>>> mean = image_processor.image_mean +>>> std = image_processor.image_std +>>> if "shortest_edge" in image_processor.size: +... height = width = image_processor.size["shortest_edge"] +>>> else: +... height = image_processor.size["height"] +... width = image_processor.size["width"] +>>> resize_to = (height, width) + +>>> num_frames_to_sample = model.config.num_frames +>>> sample_rate = 4 +>>> fps = 30 +>>> clip_duration = num_frames_to_sample * sample_rate / fps +``` + +이제 데이터 세트에 특화된 전처리(transform)과 데이터 세트 자체를 정의합니다. 먼저 훈련 데이터 세트로 시작합니다: + +```py +>>> train_transform = Compose( +... [ +... ApplyTransformToKey( +... key="video", +... transform=Compose( +... [ +... UniformTemporalSubsample(num_frames_to_sample), +... Lambda(lambda x: x / 255.0), +... Normalize(mean, std), +... RandomShortSideScale(min_size=256, max_size=320), +... RandomCrop(resize_to), +... RandomHorizontalFlip(p=0.5), +... ] +... ), +... ), +... ] +... ) + +>>> train_dataset = pytorchvideo.data.Ucf101( +... data_path=os.path.join(dataset_root_path, "train"), +... clip_sampler=pytorchvideo.data.make_clip_sampler("random", clip_duration), +... decode_audio=False, +... transform=train_transform, +... ) +``` + +같은 방식의 작업 흐름을 검증과 평가 세트에도 적용할 수 있습니다. + +```py +>>> val_transform = Compose( +... [ +... ApplyTransformToKey( +... key="video", +... transform=Compose( +... [ +... UniformTemporalSubsample(num_frames_to_sample), +... Lambda(lambda x: x / 255.0), +... Normalize(mean, std), +... Resize(resize_to), +... ] +... ), +... ), +... ] +... ) + +>>> val_dataset = pytorchvideo.data.Ucf101( +... data_path=os.path.join(dataset_root_path, "val"), +... clip_sampler=pytorchvideo.data.make_clip_sampler("uniform", clip_duration), +... decode_audio=False, +... transform=val_transform, +... ) + +>>> test_dataset = pytorchvideo.data.Ucf101( +... data_path=os.path.join(dataset_root_path, "test"), +... clip_sampler=pytorchvideo.data.make_clip_sampler("uniform", clip_duration), +... decode_audio=False, +... transform=val_transform, +... ) +``` + + +**참고**: 위의 데이터 세트의 파이프라인은 [공식 파이토치 예제](https://pytorchvideo.org/docs/tutorial_classification#dataset)에서 가져온 것입니다. 우리는 UCF-101 데이터셋에 맞게 [`pytorchvideo.data.Ucf101()`](https://pytorchvideo.readthedocs.io/en/latest/api/data/data.html#pytorchvideo.data.Ucf101) 함수를 사용하고 있습니다. 내부적으로 이 함수는 [`pytorchvideo.data.labeled_video_dataset.LabeledVideoDataset`](https://pytorchvideo.readthedocs.io/en/latest/api/data/data.html#pytorchvideo.data.LabeledVideoDataset) 객체를 반환합니다. `LabeledVideoDataset` 클래스는 PyTorchVideo 데이터셋에서 모든 영상 관련 작업의 기본 클래스입니다. 따라서 PyTorchVideo에서 미리 제공하지 않는 사용자 지정 데이터 세트를 사용하려면, 이 클래스를 적절하게 확장하면 됩니다. 더 자세한 사항이 알고 싶다면 `data` API [문서](https://pytorchvideo.readthedocs.io/en/latest/api/data/data.html) 를 참고하세요. 또한 위의 예시와 유사한 구조를 갖는 데이터 세트를 사용하고 있다면, `pytorchvideo.data.Ucf101()` 함수를 사용하는 데 문제가 없을 것입니다. + +데이터 세트에 영상의 개수를 알기 위해 `num_videos` 인수에 접근할 수 있습니다. + +```py +>>> print(train_dataset.num_videos, val_dataset.num_videos, test_dataset.num_videos) +# (300, 30, 75) +``` + +## 더 나은 디버깅을 위해 전처리 영상 시각화하기[[visualize-the-preprocessed-video-for-better-debugging]] + +```py +>>> import imageio +>>> import numpy as np +>>> from IPython.display import Image + +>>> def unnormalize_img(img): +... """Un-normalizes the image pixels.""" +... img = (img * std) + mean +... img = (img * 255).astype("uint8") +... return img.clip(0, 255) + +>>> def create_gif(video_tensor, filename="sample.gif"): +... """Prepares a GIF from a video tensor. +... +... The video tensor is expected to have the following shape: +... (num_frames, num_channels, height, width). +... """ +... frames = [] +... for video_frame in video_tensor: +... frame_unnormalized = unnormalize_img(video_frame.permute(1, 2, 0).numpy()) +... frames.append(frame_unnormalized) +... kargs = {"duration": 0.25} +... imageio.mimsave(filename, frames, "GIF", **kargs) +... return filename + +>>> def display_gif(video_tensor, gif_name="sample.gif"): +... """Prepares and displays a GIF from a video tensor.""" +... video_tensor = video_tensor.permute(1, 0, 2, 3) +... gif_filename = create_gif(video_tensor, gif_name) +... return Image(filename=gif_filename) + +>>> sample_video = next(iter(train_dataset)) +>>> video_tensor = sample_video["video"] +>>> display_gif(video_tensor) +``` + +
+ Person playing basketball +
+ +## 모델 훈련하기[[train-the-model]] + +🤗 Transformers의 [`Trainer`](https://huggingface.co/docs/transformers/main_classes/trainer)를 사용하여 모델을 훈련시켜보세요. `Trainer`를 인스턴스화하려면 훈련 설정과 평가 지표를 정의해야 합니다. 가장 중요한 것은 [`TrainingArguments`](https://huggingface.co/transformers/main_classes/trainer.html#transformers.TrainingArguments)입니다. 이 클래스는 훈련을 구성하는 모든 속성을 포함하며, 훈련 중 체크포인트를 저장할 출력 폴더 이름을 필요로 합니다. 또한 🤗 Hub의 모델 저장소의 모든 정보를 동기화하는 데 도움이 됩니다. + +대부분의 훈련 인수는 따로 설명할 필요는 없습니다. 하지만 여기에서 중요한 인수는 `remove_unused_columns=False` 입니다. 이 인자는 모델의 호출 함수에서 사용되지 않는 모든 속성 열(columns)을 삭제합니다. 기본값은 일반적으로 True입니다. 이는 사용되지 않는 기능 열을 삭제하는 것이 이상적이며, 입력을 모델의 호출 함수로 풀기(unpack)가 쉬워지기 때문입니다. 하지만 이 경우에는 `pixel_values`(모델의 입력으로 필수적인 키)를 생성하기 위해 사용되지 않는 기능('video'가 특히 그렇습니다)이 필요합니다. 따라서 remove_unused_columns을 False로 설정해야 합니다. + +```py +>>> from transformers import TrainingArguments, Trainer + +>>> model_name = model_ckpt.split("/")[-1] +>>> new_model_name = f"{model_name}-finetuned-ucf101-subset" +>>> num_epochs = 4 + +>>> args = TrainingArguments( +... new_model_name, +... remove_unused_columns=False, +... evaluation_strategy="epoch", +... save_strategy="epoch", +... learning_rate=5e-5, +... per_device_train_batch_size=batch_size, +... per_device_eval_batch_size=batch_size, +... warmup_ratio=0.1, +... logging_steps=10, +... load_best_model_at_end=True, +... metric_for_best_model="accuracy", +... push_to_hub=True, +... max_steps=(train_dataset.num_videos // batch_size) * num_epochs, +... ) +``` + +`pytorchvideo.data.Ucf101()` 함수로 반환되는 데이터 세트는 `__len__` 메소드가 이식되어 있지 않습니다. 따라서, `TrainingArguments`를 인스턴스화할 때 `max_steps`를 정의해야 합니다. + +다음으로, 평가지표를 불러오고, 예측값에서 평가지표를 계산할 함수를 정의합니다. 필요한 전처리 작업은 예측된 로짓(logits)에 argmax 값을 취하는 것뿐입니다: + +```py +import evaluate + +metric = evaluate.load("accuracy") + + +def compute_metrics(eval_pred): + predictions = np.argmax(eval_pred.predictions, axis=1) + return metric.compute(predictions=predictions, references=eval_pred.label_ids) +``` + +**평가에 대한 참고사항**: + +[VideoMAE 논문](https://arxiv.org/abs/2203.12602)에서 저자는 다음과 같은 평가 전략을 사용합니다. 테스트 영상에서 여러 클립을 선택하고 그 클립에 다양한 크롭을 적용하여 집계 점수를 보고합니다. 그러나 이번 튜토리얼에서는 간단함과 간결함을 위해 해당 전략을 고려하지 않습니다. + +또한, 예제를 묶어서 배치를 형성하는 `collate_fn`을 정의해야합니다. 각 배치는 `pixel_values`와 `labels`라는 2개의 키로 구성됩니다. + +```py +>>> def collate_fn(examples): +... # permute to (num_frames, num_channels, height, width) +... pixel_values = torch.stack( +... [example["video"].permute(1, 0, 2, 3) for example in examples] +... ) +... labels = torch.tensor([example["label"] for example in examples]) +... return {"pixel_values": pixel_values, "labels": labels} +``` + +그런 다음 이 모든 것을 데이터 세트와 함께 `Trainer`에 전달하기만 하면 됩니다: + +```py +>>> trainer = Trainer( +... model, +... args, +... train_dataset=train_dataset, +... eval_dataset=val_dataset, +... tokenizer=image_processor, +... compute_metrics=compute_metrics, +... data_collator=collate_fn, +... ) +``` + +데이터를 이미 처리했는데도 불구하고 `image_processor`를 토크나이저 인수로 넣은 이유는 JSON으로 저장되는 이미지 프로세서 구성 파일이 Hub의 저장소에 업로드되도록 하기 위함입니다. + +`train` 메소드를 호출하여 모델을 미세 조정하세요: + +```py +>>> train_results = trainer.train() +``` + +학습이 완료되면, 모델을 [`~transformers.Trainer.push_to_hub`] 메소드를 사용하여 허브에 공유하여 누구나 모델을 사용할 수 있도록 합니다: +```py +>>> trainer.push_to_hub() +``` + +## 추론하기[[inference]] + +좋습니다. 이제 미세 조정된 모델을 추론하는 데 사용할 수 있습니다. + +추론에 사용할 영상을 불러오세요: +```py +>>> sample_test_video = next(iter(test_dataset)) +``` + +
+ Teams playing basketball +
+ +미세 조정된 모델을 추론에 사용하는 가장 간단한 방법은 [`pipeline`](https://huggingface.co/docs/transformers/main/en/main_classes/pipelines#transformers.VideoClassificationPipeline)에서 모델을 사용하는 것입니다. 모델로 영상 분류를 하기 위해 `pipeline`을 인스턴스화하고 영상을 전달하세요: + +```py +>>> from transformers import pipeline + +>>> video_cls = pipeline(model="my_awesome_video_cls_model") +>>> video_cls("https://huggingface.co/datasets/sayakpaul/ucf101-subset/resolve/main/v_BasketballDunk_g14_c06.avi") +[{'score': 0.9272987842559814, 'label': 'BasketballDunk'}, + {'score': 0.017777055501937866, 'label': 'BabyCrawling'}, + {'score': 0.01663011871278286, 'label': 'BalanceBeam'}, + {'score': 0.009560945443809032, 'label': 'BandMarching'}, + {'score': 0.0068979403004050255, 'label': 'BaseballPitch'}] +``` + +만약 원한다면 수동으로 `pipeline`의 결과를 재현할 수 있습니다: + + +```py +>>> def run_inference(model, video): +... # (num_frames, num_channels, height, width) +... perumuted_sample_test_video = video.permute(1, 0, 2, 3) +... inputs = { +... "pixel_values": perumuted_sample_test_video.unsqueeze(0), +... "labels": torch.tensor( +... [sample_test_video["label"]] +... ), # this can be skipped if you don't have labels available. +... } + +... device = torch.device("cuda" if torch.cuda.is_available() else "cpu") +... inputs = {k: v.to(device) for k, v in inputs.items()} +... model = model.to(device) + +... # forward pass +... with torch.no_grad(): +... outputs = model(**inputs) +... logits = outputs.logits + +... return logits +``` + +모델에 입력값을 넣고 `logits`을 반환받으세요: + +``` +>>> logits = run_inference(trained_model, sample_test_video["video"]) +``` + +`logits`을 디코딩하면, 우리는 다음 결과를 얻을 수 있습니다: + +```py +>>> predicted_class_idx = logits.argmax(-1).item() +>>> print("Predicted class:", model.config.id2label[predicted_class_idx]) +# Predicted class: BasketballDunk +``` diff --git a/docs/source/ko/troubleshooting.mdx b/docs/source/ko/troubleshooting.mdx new file mode 100644 index 00000000000000..56c27df9fcc074 --- /dev/null +++ b/docs/source/ko/troubleshooting.mdx @@ -0,0 +1,194 @@ + + +# 문제 해결[[troubleshoot]] + +때때로 오류가 발생할 수 있지만, 저희가 도와드리겠습니다! 이 가이드는 현재까지 확인된 가장 일반적인 문제 몇 가지와 그것들을 해결하는 방법에 대해 다룹니다. 그러나 이 가이드는 모든 🤗 Transformers 문제를 포괄적으로 다루고 있지 않습니다. 문제 해결에 더 많은 도움을 받으려면 다음을 시도해보세요: + + + +1. [포럼](https://discuss.huggingface.co/)에서 도움을 요청하세요. [Beginners](https://discuss.huggingface.co/c/beginners/5) 또는 [🤗 Transformers](https://discuss.huggingface.co/c/transformers/9)와 같은 특정 카테고리에 질문을 게시할 수 있습니다. 재현 가능한 코드와 함께 잘 서술된 포럼 게시물을 작성하여 여러분의 문제가 해결될 가능성을 극대화하세요! + + + +2. 라이브러리와 관련된 버그이면 🤗 Transformers 저장소에서 [이슈](https://github.com/huggingface/transformers/issues/new/choose)를 생성하세요. 버그에 대해 설명하는 정보를 가능한 많이 포함하려고 노력하여, 무엇이 잘못 되었는지와 어떻게 수정할 수 있는지 더 잘 파악할 수 있도록 도와주세요. + +3. 이전 버전의 🤗 Transformers을 사용하는 경우 중요한 변경 사항이 버전 사이에 도입되었기 때문에 [마이그레이션](migration) 가이드를 확인하세요. + +문제 해결 및 도움 매뉴얼에 대한 자세한 내용은 Hugging Face 강좌의 [8장](https://huggingface.co/course/chapter8/1?fw=pt)을 참조하세요. + + +## 방화벽 환경[[firewalled-environments]] + +클라우드 및 내부망(intranet) 설정의 일부 GPU 인스턴스는 외부 연결에 대한 방화벽으로 차단되어 연결 오류가 발생할 수 있습니다. 스크립트가 모델 가중치나 데이터를 다운로드하려고 할 때, 다운로드가 중단되고 다음 메시지와 함께 시간 초과됩니다: + +``` +ValueError: Connection error, and we cannot find the requested files in the cached path. +Please try again or make sure your Internet connection is on. +``` + +이 경우에는 연결 오류를 피하기 위해 🤗 Transformers를 [오프라인 모드](installation#offline-mode)로 실행해야 합니다. + +## CUDA 메모리 부족(CUDA out of memory)[[cuda-out-of-memory]] + +수백만 개의 매개변수로 대규모 모델을 훈련하는 것은 적절한 하드웨어 없이 어려울 수 있습니다. GPU 메모리가 부족한 경우 발생할 수 있는 일반적인 오류는 다음과 같습니다: + +``` +CUDA out of memory. Tried to allocate 256.00 MiB (GPU 0; 11.17 GiB total capacity; 9.70 GiB already allocated; 179.81 MiB free; 9.85 GiB reserved in total by PyTorch) +``` + +다음은 메모리 사용을 줄이기 위해 시도해 볼 수 있는 몇 가지 잠재적인 해결책입니다: + +- [`TrainingArguments`]의 [`per_device_train_batch_size`](main_classes/trainer#transformers.TrainingArguments.per_device_train_batch_size) 값을 줄이세요. +- [`TrainingArguments`]의 [`gradient_accumulation_steps`](main_classes/trainer#transformers.TrainingArguments.gradient_accumulation_steps)은 전체 배치 크기를 효과적으로 늘리세요. + + + +메모리 절약 기술에 대한 자세한 내용은 성능 [가이드](performance)를 참조하세요. + + + +## 저장된 TensorFlow 모델을 가져올 수 없습니다(Unable to load a saved TensorFlow model)[[unable-to-load-a-saved-uensorFlow-model]] + +TensorFlow의 [model.save](https://www.tensorflow.org/tutorials/keras/save_and_load#save_the_entire_model) 메소드는 아키텍처, 가중치, 훈련 구성 등 전체 모델을 단일 파일에 저장합니다. 그러나 모델 파일을 다시 가져올 때 🤗 Transformers는 모델 파일에 있는 모든 TensorFlow 관련 객체를 가져오지 않을 수 있기 때문에 오류가 발생할 수 있습니다. TensorFlow 모델 저장 및 가져오기 문제를 피하려면 다음을 권장합니다: + +- 모델 가중치를 `h5` 파일 확장자로 [`model.save_weights`](https://www.tensorflow.org/tutorials/keras/save_and_load#save_the_entire_model)로 저장한 다음 [`~TFPreTrainedModel.from_pretrained`]로 모델을 다시 가져옵니다: + +```py +>>> from transformers import TFPreTrainedModel +>>> from tensorflow import keras + +>>> model.save_weights("some_folder/tf_model.h5") +>>> model = TFPreTrainedModel.from_pretrained("some_folder") +``` + +- 모델을 [`~TFPretrainedModel.save_pretrained`]로 저장하고 [`~TFPreTrainedModel.from_pretrained`]로 다시 가져옵니다: + +```py +>>> from transformers import TFPreTrainedModel + +>>> model.save_pretrained("path_to/model") +>>> model = TFPreTrainedModel.from_pretrained("path_to/model") +``` + +## ImportError[[importerror]] + +특히 최신 모델인 경우 만날 수 있는 다른 일반적인 오류는 `ImportError`입니다: + +``` +ImportError: cannot import name 'ImageGPTImageProcessor' from 'transformers' (unknown location) +``` + +이러한 오류 유형의 경우 최신 모델에 액세스할 수 있도록 최신 버전의 🤗 Transformers가 설치되어 있는지 확인하세요: + +```bash +pip install transformers --upgrade +``` + +## CUDA error: device-side assert triggered[[cuda-error-deviceside-assert-triggered]] + +때때로 장치 코드 오류에 대한 일반적인 CUDA 오류가 발생할 수 있습니다. + +``` +RuntimeError: CUDA error: device-side assert triggered +``` + +더 자세한 오류 메시지를 얻으려면 우선 코드를 CPU에서 실행합니다. 다음 환경 변수를 코드의 시작 부분에 추가하여 CPU로 전환하세요: + +```py +>>> import os + +>>> os.environ["CUDA_VISIBLE_DEVICES"] = "" +``` + +또 다른 옵션은 GPU에서 더 나은 역추적(traceback)을 얻는 것입니다. 다음 환경 변수를 코드의 시작 부분에 추가하여 역추적이 오류가 발생한 소스를 가리키도록 하세요: + +```py +>>> import os + +>>> os.environ["CUDA_LAUNCH_BLOCKING"] = "1" +``` + +## 패딩 토큰이 마스킹되지 않은 경우 잘못된 출력(Incorrect output when padding tokens aren't masked)[[incorrect-output-when-padding-tokens-arent-masked]] + +경우에 따라 `input_ids`에 패딩 토큰이 포함된 경우 `hidden_state` 출력이 올바르지 않을 수 있습니다. 데모를 위해 모델과 토크나이저를 가져오세요. 모델의 `pad_token_id`에 액세스하여 해당 값을 확인할 수 있습니다. 일부 모델의 경우 `pad_token_id`가 `None`일 수 있지만 언제든지 수동으로 설정할 수 있습니다. + +```py +>>> from transformers import AutoModelForSequenceClassification +>>> import torch + +>>> model = AutoModelForSequenceClassification.from_pretrained("bert-base-uncased") +>>> model.config.pad_token_id +0 +``` + +다음 예제는 패딩 토큰을 마스킹하지 않은 출력을 보여줍니다: + +```py +>>> input_ids = torch.tensor([[7592, 2057, 2097, 2393, 9611, 2115], [7592, 0, 0, 0, 0, 0]]) +>>> output = model(input_ids) +>>> print(output.logits) +tensor([[ 0.0082, -0.2307], + [ 0.1317, -0.1683]], grad_fn=) +``` + +다음은 두 번째 시퀀스의 실제 출력입니다: + +```py +>>> input_ids = torch.tensor([[7592]]) +>>> output = model(input_ids) +>>> print(output.logits) +tensor([[-0.1008, -0.4061]], grad_fn=) +``` + +대부분의 경우 모델에 `attention_mask`를 제공하여 패딩 토큰을 무시해야 이러한 조용한 오류를 방지할 수 있습니다. 이제 두 번째 시퀀스의 출력이 실제 출력과 일치합니다: + + + +일반적으로 토크나이저는 특정 토크나이저의 기본 값을 기준으로 사용자에 대한 'attention_mask'를 만듭니다. + + + +```py +>>> attention_mask = torch.tensor([[1, 1, 1, 1, 1, 1], [1, 0, 0, 0, 0, 0]]) +>>> output = model(input_ids, attention_mask=attention_mask) +>>> print(output.logits) +tensor([[ 0.0082, -0.2307], + [-0.1008, -0.4061]], grad_fn=) +``` + +🤗 Transformers는 패딩 토큰이 제공된 경우 패딩 토큰을 마스킹하기 위한 `attention_mask`를 자동으로 생성하지 않습니다. 그 이유는 다음과 같습니다: + +- 일부 모델에는 패딩 토큰이 없습니다. +- 일부 사용 사례의 경우 사용자가 모델이 패딩 토큰을 관리하기를 원합니다. + +## ValueError: 이 유형의 AutoModel에 대해 인식할 수 없는 XYZ 구성 클래스(ValueError: Unrecognized configuration class XYZ for this kind of AutoModel)[[valueerror-unrecognized-configuration-class-xyz-for-this-kind-of-automodel]] + +일반적으로, 사전 학습된 모델의 인스턴스를 가져오기 위해 [`AutoModel`] 클래스를 사용하는 것이 좋습니다. +이 클래스는 구성에 따라 주어진 체크포인트에서 올바른 아키텍처를 자동으로 추론하고 가져올 수 있습니다. +모델을 체크포인트에서 가져올 때 이 `ValueError`가 발생하면, 이는 Auto 클래스가 주어진 체크포인트의 구성에서 +가져오려는 모델 유형과 매핑을 찾을 수 없다는 것을 의미합니다. 가장 흔하게 발생하는 경우는 +체크포인트가 주어진 태스크를 지원하지 않을 때입니다. +예를 들어, 다음 예제에서 질의응답에 대한 GPT2가 없기 때문에 오류가 발생합니다: + +```py +>>> from transformers import AutoProcessor, AutoModelForQuestionAnswering + +>>> processor = AutoProcessor.from_pretrained("gpt2-medium") +>>> model = AutoModelForQuestionAnswering.from_pretrained("gpt2-medium") +ValueError: Unrecognized configuration class for this kind of AutoModel: AutoModelForQuestionAnswering. +Model type should be one of AlbertConfig, BartConfig, BertConfig, BigBirdConfig, BigBirdPegasusConfig, BloomConfig, ... +``` diff --git a/docs/source/pt/index.mdx b/docs/source/pt/index.mdx index 9b5cbc12e61000..e9de6f464dd1b1 100644 --- a/docs/source/pt/index.mdx +++ b/docs/source/pt/index.mdx @@ -34,7 +34,7 @@ Cada arquitetura 🤗 Transformers é definida em um módulo individual do Pytho ## Se você estiver procurando suporte do time da Hugging Face, acesse - HuggingFace Expert Acceleration Program + HuggingFace Expert Acceleration Program ## Conteúdo diff --git a/examples/flax/vision/requirements.txt b/examples/flax/vision/requirements.txt index cf1859d7549477..539ffdc6fa9f74 100644 --- a/examples/flax/vision/requirements.txt +++ b/examples/flax/vision/requirements.txt @@ -3,6 +3,6 @@ jaxlib>=0.1.59 flax>=0.3.5 optax>=0.0.8 -f https://download.pytorch.org/whl/torch_stable.html -torch==1.9.0+cpu +torch==1.11.0+cpu -f https://download.pytorch.org/whl/torch_stable.html -torchvision==0.10.0+cpu \ No newline at end of file +torchvision==0.12.0+cpu diff --git a/examples/pytorch/image-classification/run_image_classification_no_trainer.py b/examples/pytorch/image-classification/run_image_classification_no_trainer.py index 6a900ff76137f9..9dd373e1a56344 100644 --- a/examples/pytorch/image-classification/run_image_classification_no_trainer.py +++ b/examples/pytorch/image-classification/run_image_classification_no_trainer.py @@ -451,22 +451,26 @@ def collate_fn(examples): if "epoch" in training_difference: starting_epoch = int(training_difference.replace("epoch_", "")) + 1 resume_step = None + completed_steps = starting_epoch * num_update_steps_per_epoch else: resume_step = int(training_difference.replace("step_", "")) starting_epoch = resume_step // len(train_dataloader) resume_step -= starting_epoch * len(train_dataloader) + completed_steps = resume_step + + # update the progress_bar if load from checkpoint + progress_bar.update(completed_steps) for epoch in range(starting_epoch, args.num_train_epochs): model.train() if args.with_tracking: total_loss = 0 - for step, batch in enumerate(train_dataloader): - # We need to skip steps until we reach the resumed step - if args.resume_from_checkpoint and epoch == starting_epoch: - if resume_step is not None and step < resume_step: - completed_steps += 1 - continue - + if args.resume_from_checkpoint and epoch == starting_epoch and resume_step is not None: + # We skip the first `n` batches in the dataloader when resuming from a checkpoint + active_dataloader = accelerator.skip_first_batches(train_dataloader, resume_step) + else: + active_dataloader = train_dataloader + for step, batch in enumerate(active_dataloader): with accelerator.accumulate(model): outputs = model(**batch) loss = outputs.loss diff --git a/examples/pytorch/image-pretraining/run_mim_no_trainer.py b/examples/pytorch/image-pretraining/run_mim_no_trainer.py index 126029150b9e28..18d5c15638c5af 100644 --- a/examples/pytorch/image-pretraining/run_mim_no_trainer.py +++ b/examples/pytorch/image-pretraining/run_mim_no_trainer.py @@ -660,29 +660,27 @@ def preprocess_images(examples): if "epoch" in training_difference: starting_epoch = int(training_difference.replace("epoch_", "")) + 1 resume_step = None + completed_steps = starting_epoch * num_update_steps_per_epoch else: # need to multiply `gradient_accumulation_steps` to reflect real steps resume_step = int(training_difference.replace("step_", "")) * args.gradient_accumulation_steps starting_epoch = resume_step // len(train_dataloader) resume_step -= starting_epoch * len(train_dataloader) + completed_steps = resume_step # update the progress_bar if load from checkpoint - progress_bar.update(starting_epoch * num_update_steps_per_epoch) - completed_steps = starting_epoch * num_update_steps_per_epoch + progress_bar.update(completed_steps) for epoch in range(starting_epoch, args.num_train_epochs): model.train() if args.with_tracking: total_loss = 0 - for step, batch in enumerate(train_dataloader): - # We need to skip steps until we reach the resumed step - if args.resume_from_checkpoint and epoch == starting_epoch: - if resume_step is not None and step < resume_step: - if step % args.gradient_accumulation_steps == 0: - progress_bar.update(1) - completed_steps += 1 - continue - + if args.resume_from_checkpoint and epoch == starting_epoch and resume_step is not None: + # We skip the first `n` batches in the dataloader when resuming from a checkpoint + active_dataloader = accelerator.skip_first_batches(train_dataloader, resume_step) + else: + active_dataloader = train_dataloader + for step, batch in enumerate(active_dataloader): with accelerator.accumulate(model): outputs = model(**batch) loss = outputs.loss diff --git a/examples/pytorch/language-modeling/run_clm_no_trainer.py b/examples/pytorch/language-modeling/run_clm_no_trainer.py index c31f2867c37f6f..fb07ad9392d138 100755 --- a/examples/pytorch/language-modeling/run_clm_no_trainer.py +++ b/examples/pytorch/language-modeling/run_clm_no_trainer.py @@ -566,29 +566,27 @@ def group_texts(examples): if "epoch" in training_difference: starting_epoch = int(training_difference.replace("epoch_", "")) + 1 resume_step = None + completed_steps = starting_epoch * num_update_steps_per_epoch else: # need to multiply `gradient_accumulation_steps` to reflect real steps resume_step = int(training_difference.replace("step_", "")) * args.gradient_accumulation_steps starting_epoch = resume_step // len(train_dataloader) resume_step -= starting_epoch * len(train_dataloader) + completed_steps = resume_step # update the progress_bar if load from checkpoint - progress_bar.update(starting_epoch * num_update_steps_per_epoch) - completed_steps = starting_epoch * num_update_steps_per_epoch + progress_bar.update(completed_steps) for epoch in range(starting_epoch, args.num_train_epochs): model.train() if args.with_tracking: total_loss = 0 - for step, batch in enumerate(train_dataloader): - # We need to skip steps until we reach the resumed step - if args.resume_from_checkpoint and epoch == starting_epoch: - if resume_step is not None and step < resume_step: - if step % args.gradient_accumulation_steps == 0: - progress_bar.update(1) - completed_steps += 1 - continue - + if args.resume_from_checkpoint and epoch == starting_epoch and resume_step is not None: + # We skip the first `n` batches in the dataloader when resuming from a checkpoint + active_dataloader = accelerator.skip_first_batches(train_dataloader, resume_step) + else: + active_dataloader = train_dataloader + for step, batch in enumerate(active_dataloader): with accelerator.accumulate(model): outputs = model(**batch) loss = outputs.loss diff --git a/examples/pytorch/language-modeling/run_mlm_no_trainer.py b/examples/pytorch/language-modeling/run_mlm_no_trainer.py index 29a2559851d069..593dae4628d9ae 100755 --- a/examples/pytorch/language-modeling/run_mlm_no_trainer.py +++ b/examples/pytorch/language-modeling/run_mlm_no_trainer.py @@ -610,29 +610,27 @@ def group_texts(examples): if "epoch" in training_difference: starting_epoch = int(training_difference.replace("epoch_", "")) + 1 resume_step = None + completed_steps = starting_epoch * num_update_steps_per_epoch else: # need to multiply `gradient_accumulation_steps` to reflect real steps resume_step = int(training_difference.replace("step_", "")) * args.gradient_accumulation_steps starting_epoch = resume_step // len(train_dataloader) resume_step -= starting_epoch * len(train_dataloader) + completed_steps = resume_step # update the progress_bar if load from checkpoint - progress_bar.update(starting_epoch * num_update_steps_per_epoch) - completed_steps = starting_epoch * num_update_steps_per_epoch + progress_bar.update(completed_steps) for epoch in range(starting_epoch, args.num_train_epochs): model.train() if args.with_tracking: total_loss = 0 - for step, batch in enumerate(train_dataloader): - # We need to skip steps until we reach the resumed step - if args.resume_from_checkpoint and epoch == starting_epoch: - if resume_step is not None and step < resume_step: - if step % args.gradient_accumulation_steps == 0: - progress_bar.update(1) - completed_steps += 1 - continue - + if args.resume_from_checkpoint and epoch == starting_epoch and resume_step is not None: + # We skip the first `n` batches in the dataloader when resuming from a checkpoint + active_dataloader = accelerator.skip_first_batches(train_dataloader, resume_step) + else: + active_dataloader = train_dataloader + for step, batch in enumerate(active_dataloader): with accelerator.accumulate(model): outputs = model(**batch) loss = outputs.loss diff --git a/examples/pytorch/multiple-choice/run_swag_no_trainer.py b/examples/pytorch/multiple-choice/run_swag_no_trainer.py index 21f2e1bf04bcc2..7492f31ddbf940 100755 --- a/examples/pytorch/multiple-choice/run_swag_no_trainer.py +++ b/examples/pytorch/multiple-choice/run_swag_no_trainer.py @@ -557,22 +557,26 @@ def preprocess_function(examples): if "epoch" in training_difference: starting_epoch = int(training_difference.replace("epoch_", "")) + 1 resume_step = None + completed_steps = starting_epoch * num_update_steps_per_epoch else: resume_step = int(training_difference.replace("step_", "")) starting_epoch = resume_step // len(train_dataloader) resume_step -= starting_epoch * len(train_dataloader) + completed_steps = resume_step + + # update the progress_bar if load from checkpoint + progress_bar.update(completed_steps) for epoch in range(starting_epoch, args.num_train_epochs): model.train() if args.with_tracking: total_loss = 0 - for step, batch in enumerate(train_dataloader): - # We need to skip steps until we reach the resumed step - if args.resume_from_checkpoint and epoch == starting_epoch: - if resume_step is not None and step < resume_step: - completed_steps += 1 - continue - + if args.resume_from_checkpoint and epoch == starting_epoch and resume_step is not None: + # We skip the first `n` batches in the dataloader when resuming from a checkpoint + active_dataloader = accelerator.skip_first_batches(train_dataloader, resume_step) + else: + active_dataloader = train_dataloader + for step, batch in enumerate(active_dataloader): with accelerator.accumulate(model): outputs = model(**batch) loss = outputs.loss diff --git a/examples/pytorch/test_xla_examples.py b/examples/pytorch/old_test_xla_examples.py similarity index 100% rename from examples/pytorch/test_xla_examples.py rename to examples/pytorch/old_test_xla_examples.py diff --git a/examples/pytorch/question-answering/run_qa_beam_search_no_trainer.py b/examples/pytorch/question-answering/run_qa_beam_search_no_trainer.py index 05c85bdf50e3f9..ed69e8b396feb0 100644 --- a/examples/pytorch/question-answering/run_qa_beam_search_no_trainer.py +++ b/examples/pytorch/question-answering/run_qa_beam_search_no_trainer.py @@ -809,22 +809,26 @@ def create_and_fill_np_array(start_or_end_logits, dataset, max_len): if "epoch" in training_difference: starting_epoch = int(training_difference.replace("epoch_", "")) + 1 resume_step = None + completed_steps = starting_epoch * num_update_steps_per_epoch else: resume_step = int(training_difference.replace("step_", "")) starting_epoch = resume_step // len(train_dataloader) resume_step -= starting_epoch * len(train_dataloader) + completed_steps = resume_step + + # update the progress_bar if load from checkpoint + progress_bar.update(completed_steps) for epoch in range(starting_epoch, args.num_train_epochs): model.train() if args.with_tracking: total_loss = 0 - for step, batch in enumerate(train_dataloader): - # We need to skip steps until we reach the resumed step - if args.resume_from_checkpoint and epoch == starting_epoch: - if resume_step is not None and step < resume_step: - completed_steps += 1 - continue - + if args.resume_from_checkpoint and epoch == starting_epoch and resume_step is not None: + # We skip the first `n` batches in the dataloader when resuming from a checkpoint + active_dataloader = accelerator.skip_first_batches(train_dataloader, resume_step) + else: + active_dataloader = train_dataloader + for step, batch in enumerate(active_dataloader): with accelerator.accumulate(model): outputs = model(**batch) loss = outputs.loss diff --git a/examples/pytorch/question-answering/run_qa_no_trainer.py b/examples/pytorch/question-answering/run_qa_no_trainer.py index 2e363ae9709850..8054779cc7b9af 100755 --- a/examples/pytorch/question-answering/run_qa_no_trainer.py +++ b/examples/pytorch/question-answering/run_qa_no_trainer.py @@ -825,22 +825,26 @@ def create_and_fill_np_array(start_or_end_logits, dataset, max_len): if "epoch" in training_difference: starting_epoch = int(training_difference.replace("epoch_", "")) + 1 resume_step = None + completed_steps = starting_epoch * num_update_steps_per_epoch else: resume_step = int(training_difference.replace("step_", "")) starting_epoch = resume_step // len(train_dataloader) resume_step -= starting_epoch * len(train_dataloader) + completed_steps = resume_step + + # update the progress_bar if load from checkpoint + progress_bar.update(completed_steps) for epoch in range(starting_epoch, args.num_train_epochs): model.train() if args.with_tracking: total_loss = 0 - for step, batch in enumerate(train_dataloader): - # We need to skip steps until we reach the resumed step - if args.resume_from_checkpoint and epoch == starting_epoch: - if resume_step is not None and step < resume_step: - completed_steps += 1 - continue - + if args.resume_from_checkpoint and epoch == starting_epoch and resume_step is not None: + # We skip the first `n` batches in the dataloader when resuming from a checkpoint + active_dataloader = accelerator.skip_first_batches(train_dataloader, resume_step) + else: + active_dataloader = train_dataloader + for step, batch in enumerate(active_dataloader): with accelerator.accumulate(model): outputs = model(**batch) loss = outputs.loss diff --git a/examples/pytorch/semantic-segmentation/run_semantic_segmentation_no_trainer.py b/examples/pytorch/semantic-segmentation/run_semantic_segmentation_no_trainer.py index de997529de81cf..5087a4186a1774 100644 --- a/examples/pytorch/semantic-segmentation/run_semantic_segmentation_no_trainer.py +++ b/examples/pytorch/semantic-segmentation/run_semantic_segmentation_no_trainer.py @@ -554,22 +554,26 @@ def preprocess_val(example_batch): if "epoch" in training_difference: starting_epoch = int(training_difference.replace("epoch_", "")) + 1 resume_step = None + completed_steps = starting_epoch * num_update_steps_per_epoch else: resume_step = int(training_difference.replace("step_", "")) starting_epoch = resume_step // len(train_dataloader) resume_step -= starting_epoch * len(train_dataloader) + completed_steps = resume_step + + # update the progress_bar if load from checkpoint + progress_bar.update(completed_steps) for epoch in range(starting_epoch, args.num_train_epochs): + model.train() if args.with_tracking: total_loss = 0 - model.train() - for step, batch in enumerate(train_dataloader): - # We need to skip steps until we reach the resumed step - if args.resume_from_checkpoint and epoch == starting_epoch: - if resume_step is not None and step < resume_step: - completed_steps += 1 - continue - + if args.resume_from_checkpoint and epoch == starting_epoch and resume_step is not None: + # We skip the first `n` batches in the dataloader when resuming from a checkpoint + active_dataloader = accelerator.skip_first_batches(train_dataloader, resume_step) + else: + active_dataloader = train_dataloader + for step, batch in enumerate(active_dataloader): with accelerator.accumulate(model): outputs = model(**batch) loss = outputs.loss diff --git a/examples/pytorch/summarization/run_summarization_no_trainer.py b/examples/pytorch/summarization/run_summarization_no_trainer.py index ea09c6b89e065c..bef349fdb36981 100644 --- a/examples/pytorch/summarization/run_summarization_no_trainer.py +++ b/examples/pytorch/summarization/run_summarization_no_trainer.py @@ -626,22 +626,26 @@ def postprocess_text(preds, labels): if "epoch" in training_difference: starting_epoch = int(training_difference.replace("epoch_", "")) + 1 resume_step = None + completed_steps = starting_epoch * num_update_steps_per_epoch else: resume_step = int(training_difference.replace("step_", "")) starting_epoch = resume_step // len(train_dataloader) resume_step -= starting_epoch * len(train_dataloader) + completed_steps = resume_step + + # update the progress_bar if load from checkpoint + progress_bar.update(completed_steps) for epoch in range(starting_epoch, args.num_train_epochs): model.train() if args.with_tracking: total_loss = 0 - for step, batch in enumerate(train_dataloader): - # We need to skip steps until we reach the resumed step - if args.resume_from_checkpoint and epoch == starting_epoch: - if resume_step is not None and step < resume_step: - completed_steps += 1 - continue - + if args.resume_from_checkpoint and epoch == starting_epoch and resume_step is not None: + # We skip the first `n` batches in the dataloader when resuming from a checkpoint + active_dataloader = accelerator.skip_first_batches(train_dataloader, resume_step) + else: + active_dataloader = train_dataloader + for step, batch in enumerate(active_dataloader): with accelerator.accumulate(model): outputs = model(**batch) loss = outputs.loss diff --git a/examples/pytorch/text-classification/run_glue_no_trainer.py b/examples/pytorch/text-classification/run_glue_no_trainer.py index 2fbacade06c630..2ba3c44498368f 100644 --- a/examples/pytorch/text-classification/run_glue_no_trainer.py +++ b/examples/pytorch/text-classification/run_glue_no_trainer.py @@ -510,12 +510,12 @@ def preprocess_function(examples): model.train() if args.with_tracking: total_loss = 0 - for step, batch in enumerate(train_dataloader): - # We need to skip steps until we reach the resumed step - if args.resume_from_checkpoint and epoch == starting_epoch: - if resume_step is not None and step < resume_step: - completed_steps += 1 - continue + if args.resume_from_checkpoint and epoch == starting_epoch and resume_step is not None: + # We skip the first `n` batches in the dataloader when resuming from a checkpoint + active_dataloader = accelerator.skip_first_batches(train_dataloader, resume_step) + else: + active_dataloader = train_dataloader + for step, batch in enumerate(active_dataloader): outputs = model(**batch) loss = outputs.loss # We keep track of the loss at each epoch diff --git a/examples/pytorch/text-generation/README.md b/examples/pytorch/text-generation/README.md index 2177c45c3b884a..fce4aef86b14ea 100644 --- a/examples/pytorch/text-generation/README.md +++ b/examples/pytorch/text-generation/README.md @@ -18,7 +18,7 @@ limitations under the License. Based on the script [`run_generation.py`](https://github.com/huggingface/transformers/blob/main/examples/pytorch/text-generation/run_generation.py). -Conditional text generation using the auto-regressive models of the library: GPT, GPT-2, Transformer-XL, XLNet, CTRL. +Conditional text generation using the auto-regressive models of the library: GPT, GPT-2, GPTJ, Transformer-XL, XLNet, CTRL, BLOOM, LLAMA, OPT. A similar script is used for our official demo [Write With Transfomer](https://transformer.huggingface.co), where you can try out the different models available in the library. diff --git a/examples/pytorch/text-generation/run_generation.py b/examples/pytorch/text-generation/run_generation.py index e0dda0ec0c2fa2..75221934da85d0 100755 --- a/examples/pytorch/text-generation/run_generation.py +++ b/examples/pytorch/text-generation/run_generation.py @@ -19,6 +19,7 @@ import argparse +import inspect import logging from typing import Tuple @@ -26,13 +27,20 @@ import torch from transformers import ( + AutoTokenizer, + BloomForCausalLM, + BloomTokenizerFast, CTRLLMHeadModel, CTRLTokenizer, GenerationMixin, GPT2LMHeadModel, GPT2Tokenizer, + GPTJForCausalLM, + LlamaForCausalLM, + LlamaTokenizer, OpenAIGPTLMHeadModel, OpenAIGPTTokenizer, + OPTForCausalLM, TransfoXLLMHeadModel, TransfoXLTokenizer, XLMTokenizer, @@ -59,6 +67,10 @@ "xlnet": (XLNetLMHeadModel, XLNetTokenizer), "transfo-xl": (TransfoXLLMHeadModel, TransfoXLTokenizer), "xlm": (XLMWithLMHeadModel, XLMTokenizer), + "gptj": (GPTJForCausalLM, AutoTokenizer), + "bloom": (BloomForCausalLM, BloomTokenizerFast), + "llama": (LlamaForCausalLM, LlamaTokenizer), + "opt": (OPTForCausalLM, GPT2Tokenizer), } # Padding text to help Transformer-XL and XLNet with short prompts as proposed by Aman Rusia @@ -173,23 +185,26 @@ def sparse_model_config(model_config): raise ValueError("Check the model config") num_embedding_size_per_head = int(embedding_size / num_head) - num_layer = model_config.n_layer + if hasattr(model_config, "n_layer"): + num_layer = model_config.n_layer + elif hasattr(model_config, "num_hidden_layers"): + num_layer = model_config.num_hidden_layers + else: + raise ValueError("Number of hidden layers couldn't be determined from the model config") return num_layer, num_head, num_embedding_size_per_head -def prepare_jit_inputs(inputs, model, tokenizer): - num_batch = len(inputs) - dummy_input = tokenizer.batch_encode_plus(inputs, return_tensors="pt", padding=True) +def generate_past_key_values(model, batch_size, seq_len): num_block_layers, num_attention_heads, num_embedding_size_per_head = sparse_model_config(model.config) if model.config.model_type == "bloom": past_key_values = tuple( ( - torch.zeros(int(num_attention_heads * num_batch), num_embedding_size_per_head, 1) - .to(model.config.torch_dtype) + torch.empty(int(num_attention_heads * batch_size), num_embedding_size_per_head, seq_len) + .to(model.dtype) .to(model.device), - torch.zeros(int(num_attention_heads * num_batch), 1, num_embedding_size_per_head) - .to(model.config.torch_dtype) + torch.empty(int(num_attention_heads * batch_size), seq_len, num_embedding_size_per_head) + .to(model.dtype) .to(model.device), ) for _ in range(num_block_layers) @@ -197,37 +212,34 @@ def prepare_jit_inputs(inputs, model, tokenizer): else: past_key_values = tuple( ( - torch.zeros(num_batch, num_attention_heads, 1, num_embedding_size_per_head) - .to(model.config.torch_dtype) + torch.empty(batch_size, num_attention_heads, seq_len, num_embedding_size_per_head) + .to(model.dtype) .to(model.device), - torch.zeros(num_batch, num_attention_heads, 1, num_embedding_size_per_head) - .to(model.config.torch_dtype) + torch.empty(batch_size, num_attention_heads, seq_len, num_embedding_size_per_head) + .to(model.dtype) .to(model.device), ) for _ in range(num_block_layers) ) + return past_key_values + +def prepare_jit_inputs(inputs, model, tokenizer): + batch_size = len(inputs) + dummy_input = tokenizer.batch_encode_plus(inputs, return_tensors="pt") + dummy_input = dummy_input.to(model.device) + if model.config.use_cache: + dummy_input["past_key_values"] = generate_past_key_values(model, batch_size, 1) dummy_input["attention_mask"] = torch.cat( [ - torch.zeros(dummy_input["attention_mask"].shape[0], 1).to(dummy_input["attention_mask"].dtype), + torch.zeros(dummy_input["attention_mask"].shape[0], 1) + .to(dummy_input["attention_mask"].dtype) + .to(model.device), dummy_input["attention_mask"], ], -1, ) - - if model.config.use_cache: - jit_inputs = ( - dummy_input["input_ids"].to(model.device), - past_key_values, - dummy_input["attention_mask"].to(model.device), - ) - else: - jit_inputs = ( - dummy_input["input_ids"].to(model.device), - dummy_input["attention_mask"].to(model.device), - ) - - return jit_inputs + return dummy_input class _ModelFallbackWrapper(GenerationMixin): @@ -238,15 +250,13 @@ def __init__(self, optimized, default): self._default = default def __call__(self, *args, **kwargs): - if kwargs["past_key_values"] is None: - return self._default(*args, **kwargs) - trace_graph_inputs = [] + if kwargs["past_key_values"] is None and self._default.config.use_cache: + kwargs["past_key_values"] = generate_past_key_values(self._default, kwargs["input_ids"].shape[0], 0) kwargs.pop("position_ids", None) - for k, v in kwargs.items(): - if v is not None and not isinstance(v, bool): - trace_graph_inputs.append(v) - trace_graph_inputs = tuple(trace_graph_inputs) - outputs = self._optimized(*trace_graph_inputs) + for k in list(kwargs.keys()): + if kwargs[k] is None or isinstance(kwargs[k], bool): + kwargs.pop(k) + outputs = self._optimized(**kwargs) lm_logits = outputs[0] past_key_values = outputs[1] fixed_output = CausalLMOutputWithPast( @@ -324,9 +334,7 @@ def main(): action="store_true", help="Whether to use 16-bit (mixed) precision (through NVIDIA apex) instead of 32-bit", ) - parser.add_argument( - "--jit", type=bool, default=False, help="Whether or not to use jit trace to accelerate inference" - ) + parser.add_argument("--jit", action="store_true", help="Whether or not to use jit trace to accelerate inference") args = parser.parse_args() args.device = torch.device("cuda" if torch.cuda.is_available() and not args.no_cuda else "cpu") @@ -351,8 +359,8 @@ def main(): if args.fp16: model.half() - - args.length = adjust_length_to_model(args.length, max_sequence_length=model.config.max_position_embeddings) + max_seq_length = getattr(model.config, "max_position_embeddings", 0) + args.length = adjust_length_to_model(args.length, max_sequence_length=max_seq_length) logger.info(args) prompt_text = args.prompt if args.prompt else input("Model prompt >>> ") @@ -382,10 +390,15 @@ def main(): input_ids = encoded_prompt if args.jit: - jit_input_texts = ["jit"] + jit_input_texts = ["enable jit"] jit_inputs = prepare_jit_inputs(jit_input_texts, model, tokenizer) torch._C._jit_set_texpr_fuser_enabled(False) model.config.return_dict = False + if hasattr(model, "forward"): + sig = inspect.signature(model.forward) + else: + sig = inspect.signature(model.__call__) + jit_inputs = tuple(jit_inputs[key] for key in sig.parameters if jit_inputs.get(key, None) is not None) traced_model = torch.jit.trace(model, jit_inputs, strict=False) traced_model = torch.jit.freeze(traced_model.eval()) traced_model(*jit_inputs) diff --git a/examples/pytorch/token-classification/run_ner_no_trainer.py b/examples/pytorch/token-classification/run_ner_no_trainer.py index bc51fab14e0452..82aea5d4d6cd4a 100755 --- a/examples/pytorch/token-classification/run_ner_no_trainer.py +++ b/examples/pytorch/token-classification/run_ner_no_trainer.py @@ -668,12 +668,12 @@ def compute_metrics(): model.train() if args.with_tracking: total_loss = 0 - for step, batch in enumerate(train_dataloader): - # We need to skip steps until we reach the resumed step - if args.resume_from_checkpoint and epoch == starting_epoch: - if resume_step is not None and step < resume_step: - completed_steps += 1 - continue + if args.resume_from_checkpoint and epoch == starting_epoch and resume_step is not None: + # We skip the first `n` batches in the dataloader when resuming from a checkpoint + active_dataloader = accelerator.skip_first_batches(train_dataloader, resume_step) + else: + active_dataloader = train_dataloader + for step, batch in enumerate(active_dataloader): outputs = model(**batch) loss = outputs.loss # We keep track of the loss at each epoch diff --git a/examples/pytorch/translation/run_translation_no_trainer.py b/examples/pytorch/translation/run_translation_no_trainer.py index e52050308ab247..5267cea4e26931 100644 --- a/examples/pytorch/translation/run_translation_no_trainer.py +++ b/examples/pytorch/translation/run_translation_no_trainer.py @@ -607,28 +607,27 @@ def postprocess_text(preds, labels): if "epoch" in training_difference: starting_epoch = int(training_difference.replace("epoch_", "")) + 1 resume_step = None + completed_steps = starting_epoch * num_update_steps_per_epoch else: # need to multiply `gradient_accumulation_steps` to reflect real steps resume_step = int(training_difference.replace("step_", "")) * args.gradient_accumulation_steps starting_epoch = resume_step // len(train_dataloader) resume_step -= starting_epoch * len(train_dataloader) + completed_steps = resume_step # update the progress_bar if load from checkpoint - progress_bar.update(starting_epoch * num_update_steps_per_epoch) - completed_steps = starting_epoch * num_update_steps_per_epoch + progress_bar.update(completed_steps) for epoch in range(starting_epoch, args.num_train_epochs): model.train() if args.with_tracking: total_loss = 0 - for step, batch in enumerate(train_dataloader): - # We need to skip steps until we reach the resumed step - if args.resume_from_checkpoint and epoch == starting_epoch: - if resume_step is not None and step < resume_step: - if step % args.gradient_accumulation_steps == 0: - progress_bar.update(1) - completed_steps += 1 - continue + if args.resume_from_checkpoint and epoch == starting_epoch and resume_step is not None: + # We skip the first `n` batches in the dataloader when resuming from a checkpoint + active_dataloader = accelerator.skip_first_batches(train_dataloader, resume_step) + else: + active_dataloader = train_dataloader + for step, batch in enumerate(active_dataloader): outputs = model(**batch) loss = outputs.loss # We keep track of the loss at each epoch diff --git a/examples/research_projects/bert-loses-patience/pabee/modeling_pabee_albert.py b/examples/research_projects/bert-loses-patience/pabee/modeling_pabee_albert.py index 5e17352dc19b54..57b649ec067bc3 100644 --- a/examples/research_projects/bert-loses-patience/pabee/modeling_pabee_albert.py +++ b/examples/research_projects/bert-loses-patience/pabee/modeling_pabee_albert.py @@ -253,7 +253,7 @@ def forward( Returns: :obj:`tuple(torch.FloatTensor)` comprising various elements depending on the configuration (:class:`~transformers.AlbertConfig`) and inputs: - loss: (`optional`, returned when ``labels`` is provided) ``torch.FloatTensor`` of shape ``(1,)``: + loss (`optional`, returned when ``labels`` is provided) ``torch.FloatTensor`` of shape ``(1,)``: Classification (or regression if config.num_labels==1) loss. logits ``torch.FloatTensor`` of shape ``(batch_size, config.num_labels)`` Classification (or regression if config.num_labels==1) scores (before SoftMax). diff --git a/examples/research_projects/decision_transformer/requirements.txt b/examples/research_projects/decision_transformer/requirements.txt index 3cf50951975ca5..8c0ba1cc64a3e4 100644 --- a/examples/research_projects/decision_transformer/requirements.txt +++ b/examples/research_projects/decision_transformer/requirements.txt @@ -177,7 +177,7 @@ PyYAML==6.0 ray==1.11.0 redis==4.5.4 regex==2022.3.15 -requests==2.27.1 +requests==2.31.0 requests-oauthlib==1.3.1 resampy==0.2.2 responses==0.18.0 diff --git a/examples/research_projects/lxmert/requirements.txt b/examples/research_projects/lxmert/requirements.txt index 0d483b6d18923d..52e3878fac24de 100644 --- a/examples/research_projects/lxmert/requirements.txt +++ b/examples/research_projects/lxmert/requirements.txt @@ -75,7 +75,7 @@ pyzmq==19.0.2 qtconsole==4.7.7 QtPy==1.9.0 regex==2020.7.14 -requests==2.22.0 +requests==2.31.0 retrying==1.3.3 sacremoses==0.0.43 Send2Trash==1.5.0 @@ -86,7 +86,7 @@ testpath==0.4.4 tokenizers==0.8.1rc2 torch==1.6.0 torchvision==0.7.0 -tornado==6.0.4 +tornado==6.3.2 tqdm==4.48.2 traitlets git+https://github.com/huggingface/transformers.git diff --git a/examples/research_projects/visual_bert/requirements.txt b/examples/research_projects/visual_bert/requirements.txt index 0d483b6d18923d..52e3878fac24de 100644 --- a/examples/research_projects/visual_bert/requirements.txt +++ b/examples/research_projects/visual_bert/requirements.txt @@ -75,7 +75,7 @@ pyzmq==19.0.2 qtconsole==4.7.7 QtPy==1.9.0 regex==2020.7.14 -requests==2.22.0 +requests==2.31.0 retrying==1.3.3 sacremoses==0.0.43 Send2Trash==1.5.0 @@ -86,7 +86,7 @@ testpath==0.4.4 tokenizers==0.8.1rc2 torch==1.6.0 torchvision==0.7.0 -tornado==6.0.4 +tornado==6.3.2 tqdm==4.48.2 traitlets git+https://github.com/huggingface/transformers.git diff --git a/examples/tensorflow/image-classification/run_image_classification.py b/examples/tensorflow/image-classification/run_image_classification.py index 61c6cea2cd9443..6a4b7df4d0a05c 100644 --- a/examples/tensorflow/image-classification/run_image_classification.py +++ b/examples/tensorflow/image-classification/run_image_classification.py @@ -543,6 +543,7 @@ def compute_metrics(p): logging.info(f"{metric_name}: {value:.3f}") if training_args.output_dir is not None: + os.makedirs(training_args.output_dir, exist_ok=True) with open(os.path.join(training_args.output_dir, "all_results.json"), "w") as f: f.write(json.dumps(eval_metrics)) diff --git a/setup.py b/setup.py index 952cfcf510aad9..19754eaf3f212b 100644 --- a/setup.py +++ b/setup.py @@ -321,7 +321,6 @@ def run(self): "protobuf", # Can be removed once we can unpin protobuf "sacremoses", "rjieba", - "safetensors", "beautifulsoup4", ) + extras["retrieval"] @@ -425,6 +424,7 @@ def run(self): deps["regex"], # for OpenAI GPT deps["requests"], # for downloading models over HTTPS deps["tokenizers"], + deps["safetensors"], deps["tqdm"], # progress bars in model download and training scripts ] diff --git a/src/transformers/__init__.py b/src/transformers/__init__.py index 10bf93633abdd4..37310c34b98640 100644 --- a/src/transformers/__init__.py +++ b/src/transformers/__init__.py @@ -155,6 +155,10 @@ "AutoProcessor", "AutoTokenizer", ], + "models.autoformer": [ + "AUTOFORMER_PRETRAINED_CONFIG_ARCHIVE_MAP", + "AutoformerConfig", + ], "models.bart": ["BartConfig", "BartTokenizer"], "models.barthez": [], "models.bartpho": [], @@ -1082,6 +1086,14 @@ "AutoModelWithLMHead", ] ) + _import_structure["models.autoformer"].extend( + [ + "AUTOFORMER_PRETRAINED_MODEL_ARCHIVE_LIST", + "AutoformerForPrediction", + "AutoformerModel", + "AutoformerPreTrainedModel", + ] + ) _import_structure["models.bart"].extend( [ "BART_PRETRAINED_MODEL_ARCHIVE_LIST", @@ -3946,6 +3958,10 @@ AutoProcessor, AutoTokenizer, ) + from .models.autoformer import ( + AUTOFORMER_PRETRAINED_CONFIG_ARCHIVE_MAP, + AutoformerConfig, + ) from .models.bart import BartConfig, BartTokenizer from .models.beit import BEIT_PRETRAINED_CONFIG_ARCHIVE_MAP, BeitConfig from .models.bert import ( @@ -4784,6 +4800,12 @@ AutoModelForZeroShotObjectDetection, AutoModelWithLMHead, ) + from .models.autoformer import ( + AUTOFORMER_PRETRAINED_MODEL_ARCHIVE_LIST, + AutoformerForPrediction, + AutoformerModel, + AutoformerPreTrainedModel, + ) from .models.bart import ( BART_PRETRAINED_MODEL_ARCHIVE_LIST, BartForCausalLM, diff --git a/src/transformers/convert_slow_tokenizer.py b/src/transformers/convert_slow_tokenizer.py index 195d09ecd89185..1934e35a575d14 100644 --- a/src/transformers/convert_slow_tokenizer.py +++ b/src/transformers/convert_slow_tokenizer.py @@ -1175,7 +1175,11 @@ def post_processor(self): single = f"{(bos+':0 ') * add_bos}$A:0{(' '+eos+':0') * add_eos}" pair = f"{single}{(' '+bos+':1') * add_bos} $B:1{(' '+eos+':1') * add_eos}" - special_tokens = [(bos, bos_token_id), (eos, eos_token_id)] + special_tokens = [] + if add_bos: + special_tokens.append((bos, bos_token_id)) + if add_eos: + special_tokens.append((eos, eos_token_id)) return processors.TemplateProcessing(single=single, pair=pair, special_tokens=special_tokens) else: diff --git a/src/transformers/dynamic_module_utils.py b/src/transformers/dynamic_module_utils.py index e7ee18a278fa9a..ae76e4ae1fa574 100644 --- a/src/transformers/dynamic_module_utils.py +++ b/src/transformers/dynamic_module_utils.py @@ -123,7 +123,7 @@ def get_imports(filename): content = f.read() # filter out try/except block so in custom code we can have try/except imports - content = re.sub(r"\s*try\s*:\s*.*?\s*except\s*:", "", content, flags=re.MULTILINE) + content = re.sub(r"\s*try\s*:\s*.*?\s*except\s*.*?:", "", content, flags=re.MULTILINE | re.DOTALL) # Imports of the form `import xxx` imports = re.findall(r"^\s*import\s+(\S+)\s*$", content, flags=re.MULTILINE) @@ -316,7 +316,7 @@ def get_cached_module_file( ) new_files.append(f"{module_needed}.py") - if len(new_files) > 0: + if len(new_files) > 0 and revision is None: new_files = "\n".join([f"- {f}" for f in new_files]) repo_type_str = "" if repo_type is None else f"{repo_type}s/" url = f"https://huggingface.co/{repo_type_str}{pretrained_model_name_or_path}" @@ -340,6 +340,7 @@ def get_class_from_dynamic_module( revision: Optional[str] = None, local_files_only: bool = False, repo_type: Optional[str] = None, + code_revision: Optional[str] = None, **kwargs, ): """ @@ -391,6 +392,10 @@ def get_class_from_dynamic_module( If `True`, will only try to load the tokenizer configuration from local files. repo_type (`str`, *optional*): Specify the repo type (useful when downloading from a space for instance). + code_revision (`str`, *optional*, defaults to `"main"`): + The specific revision to use for the code on the Hub, if the code leaves in a different repository than the + rest of the model. It can be a branch name, a tag name, or a commit id, since we use a git-based system for + storing models and other artifacts on huggingface.co, so `revision` can be any identifier allowed by git. @@ -415,12 +420,12 @@ def get_class_from_dynamic_module( # Catch the name of the repo if it's specified in `class_reference` if "--" in class_reference: repo_id, class_reference = class_reference.split("--") - # Invalidate revision since it's not relevant for this repo - revision = "main" else: repo_id = pretrained_model_name_or_path module_file, class_name = class_reference.split(".") + if code_revision is None and pretrained_model_name_or_path == repo_id: + code_revision = revision # And lastly we get the class inside our newly created module final_module = get_cached_module_file( repo_id, @@ -430,7 +435,7 @@ def get_class_from_dynamic_module( resume_download=resume_download, proxies=proxies, use_auth_token=use_auth_token, - revision=revision, + revision=code_revision, local_files_only=local_files_only, repo_type=repo_type, ) diff --git a/src/transformers/feature_extraction_sequence_utils.py b/src/transformers/feature_extraction_sequence_utils.py index 2121261be0565f..40717d99318500 100644 --- a/src/transformers/feature_extraction_sequence_utils.py +++ b/src/transformers/feature_extraction_sequence_utils.py @@ -140,7 +140,7 @@ def pad( return_attention_mask if return_attention_mask is not None else self.return_attention_mask ) - if not required_input: + if len(required_input) == 0: if return_attention_mask: processed_features["attention_mask"] = [] return processed_features diff --git a/src/transformers/generation/logits_process.py b/src/transformers/generation/logits_process.py index 95c8064ee40445..dd51610afd4371 100644 --- a/src/transformers/generation/logits_process.py +++ b/src/transformers/generation/logits_process.py @@ -678,7 +678,7 @@ class PrefixConstrainedLogitsProcessor(LogitsProcessor): generation. See [Autoregressive Entity Retrieval](https://arxiv.org/abs/2010.00904) for more information. Args: - prefix_allowed_tokens_fn: (`Callable[[int, torch.Tensor], List[int]]`): + prefix_allowed_tokens_fn (`Callable[[int, torch.Tensor], List[int]]`): This function constraints the beam search to allowed tokens only at each step. This function takes 2 arguments `inputs_ids` and the batch ID `batch_id`. It has to return a list with the allowed tokens for the next generation step conditioned on the previously generated tokens `inputs_ids` and the batch ID diff --git a/src/transformers/integrations.py b/src/transformers/integrations.py index 4b0e1c590d8245..0a85cef6981c0d 100644 --- a/src/transformers/integrations.py +++ b/src/transformers/integrations.py @@ -30,7 +30,7 @@ import numpy as np from . import __version__ as version -from .utils import flatten_dict, is_datasets_available, is_torch_available, logging +from .utils import flatten_dict, is_datasets_available, is_pandas_available, is_torch_available, logging from .utils.versions import importlib_metadata @@ -146,6 +146,16 @@ def is_codecarbon_available(): return importlib.util.find_spec("codecarbon") is not None +def is_flytekit_available(): + return importlib.util.find_spec("flytekit") is not None + + +def is_flyte_deck_standard_available(): + if not is_flytekit_available(): + return False + return importlib.util.find_spec("flytekitplugins.deck") is not None + + def hp_params(trial): if is_optuna_available(): import optuna @@ -1537,6 +1547,69 @@ def on_save(self, args, state, control, **kwargs): self._clearml_task.update_output_model(artifact_path, iteration=state.global_step, auto_delete_file=False) +class FlyteCallback(TrainerCallback): + """A [`TrainerCallback`] that sends the logs to [Flyte](https://flyte.org/). + NOTE: This callback only works within a Flyte task. + + Args: + save_log_history (`bool`, *optional*, defaults to `True`): + When set to True, the training logs are saved as a Flyte Deck. + + sync_checkpoints (`bool`, *optional*, defaults to `True`): + When set to True, checkpoints are synced with Flyte and can be used to resume training in the case of an + interruption. + + Example: + + ```python + # Note: This example skips over some setup steps for brevity. + from flytekit import current_context, task + + + @task + def train_hf_transformer(): + cp = current_context().checkpoint + trainer = Trainer(..., callbacks=[FlyteCallback()]) + output = trainer.train(resume_from_checkpoint=cp.restore()) + ``` + """ + + def __init__(self, save_log_history: bool = True, sync_checkpoints: bool = True): + super().__init__() + if not is_flytekit_available(): + raise ImportError("FlyteCallback requires flytekit to be installed. Run `pip install flytekit`.") + + if not is_flyte_deck_standard_available() or not is_pandas_available(): + logger.warning( + "Syncing log history requires both flytekitplugins-deck-standard and pandas to be installed. " + "Run `pip install flytekitplugins-deck-standard pandas` to enable this feature." + ) + save_log_history = False + + from flytekit import current_context + + self.cp = current_context().checkpoint + self.save_log_history = save_log_history + self.sync_checkpoints = sync_checkpoints + + def on_save(self, args, state, control, **kwargs): + if self.sync_checkpoints and state.is_world_process_zero: + ckpt_dir = f"checkpoint-{state.global_step}" + artifact_path = os.path.join(args.output_dir, ckpt_dir) + + logger.info(f"Syncing checkpoint in {ckpt_dir} to Flyte. This may take time.") + self.cp.save(artifact_path) + + def on_train_end(self, args, state, control, **kwargs): + if self.save_log_history: + import pandas as pd + from flytekit import Deck + from flytekitplugins.deck.renderer import TableRenderer + + log_history_df = pd.DataFrame(state.log_history) + Deck("Log History", TableRenderer().to_html(log_history_df)) + + INTEGRATION_TO_CALLBACK = { "azure_ml": AzureMLCallback, "comet_ml": CometCallback, @@ -1547,6 +1620,7 @@ def on_save(self, args, state, control, **kwargs): "codecarbon": CodeCarbonCallback, "clearml": ClearMLCallback, "dagshub": DagsHubCallback, + "flyte": FlyteCallback, } diff --git a/src/transformers/modeling_outputs.py b/src/transformers/modeling_outputs.py index c69e426ab53100..aceec7abd40643 100755 --- a/src/transformers/modeling_outputs.py +++ b/src/transformers/modeling_outputs.py @@ -1522,7 +1522,7 @@ class Seq2SeqTSModelOutput(ModelOutput): scale (`torch.FloatTensor` of shape `(batch_size,)` or `(batch_size, input_size)`, *optional*): Scaling values of each time series' context window which is used to give the model inputs of the same magnitude and then used to rescale back to the original magnitude. - static_features: (`torch.FloatTensor` of shape `(batch_size, feature size)`, *optional*): + static_features (`torch.FloatTensor` of shape `(batch_size, feature size)`, *optional*): Static features of each time series' in a batch which are copied to the covariates at inference time. """ @@ -1593,7 +1593,7 @@ class Seq2SeqTSPredictionOutput(ModelOutput): scale (`torch.FloatTensor` of shape `(batch_size,)` or `(batch_size, input_size)`, *optional*): Scaling values of each time series' context window which is used to give the model inputs of the same magnitude and then used to rescale back to the original magnitude. - static_features: (`torch.FloatTensor` of shape `(batch_size, feature size)`, *optional*): + static_features (`torch.FloatTensor` of shape `(batch_size, feature size)`, *optional*): Static features of each time series' in a batch which are copied to the covariates at inference time. """ diff --git a/src/transformers/modeling_tf_outputs.py b/src/transformers/modeling_tf_outputs.py index f8148b169543fa..357c34bc1f25fc 100644 --- a/src/transformers/modeling_tf_outputs.py +++ b/src/transformers/modeling_tf_outputs.py @@ -12,6 +12,8 @@ # See the License for the specific language governing permissions and # limitations under the License. +from __future__ import annotations + import warnings from dataclasses import dataclass from typing import List, Optional, Tuple @@ -43,8 +45,8 @@ class TFBaseModelOutput(ModelOutput): """ last_hidden_state: tf.Tensor = None - hidden_states: Optional[Tuple[tf.Tensor]] = None - attentions: Optional[Tuple[tf.Tensor]] = None + hidden_states: Tuple[tf.Tensor] | None = None + attentions: Tuple[tf.Tensor] | None = None @dataclass @@ -96,8 +98,8 @@ class TFBaseModelOutputWithPooling(ModelOutput): last_hidden_state: tf.Tensor = None pooler_output: tf.Tensor = None - hidden_states: Optional[Tuple[tf.Tensor]] = None - attentions: Optional[Tuple[tf.Tensor]] = None + hidden_states: Tuple[tf.Tensor] | None = None + attentions: Tuple[tf.Tensor] | None = None @dataclass @@ -164,10 +166,10 @@ class TFBaseModelOutputWithPoolingAndCrossAttentions(ModelOutput): last_hidden_state: tf.Tensor = None pooler_output: tf.Tensor = None - past_key_values: Optional[List[tf.Tensor]] = None - hidden_states: Optional[Tuple[tf.Tensor]] = None - attentions: Optional[Tuple[tf.Tensor]] = None - cross_attentions: Optional[Tuple[tf.Tensor]] = None + past_key_values: List[tf.Tensor] | None = None + hidden_states: Tuple[tf.Tensor] | None = None + attentions: Tuple[tf.Tensor] | None = None + cross_attentions: Tuple[tf.Tensor] | None = None @dataclass @@ -201,9 +203,9 @@ class TFBaseModelOutputWithPast(ModelOutput): """ last_hidden_state: tf.Tensor = None - past_key_values: Optional[List[tf.Tensor]] = None - hidden_states: Optional[Tuple[tf.Tensor]] = None - attentions: Optional[Tuple[tf.Tensor]] = None + past_key_values: List[tf.Tensor] | None = None + hidden_states: Tuple[tf.Tensor] | None = None + attentions: Tuple[tf.Tensor] | None = None @dataclass @@ -234,9 +236,9 @@ class TFBaseModelOutputWithCrossAttentions(ModelOutput): """ last_hidden_state: tf.Tensor = None - hidden_states: Optional[Tuple[tf.Tensor]] = None - attentions: Optional[Tuple[tf.Tensor]] = None - cross_attentions: Optional[Tuple[tf.Tensor]] = None + hidden_states: Tuple[tf.Tensor] | None = None + attentions: Tuple[tf.Tensor] | None = None + cross_attentions: Tuple[tf.Tensor] | None = None @dataclass @@ -276,10 +278,10 @@ class TFBaseModelOutputWithPastAndCrossAttentions(ModelOutput): """ last_hidden_state: tf.Tensor = None - past_key_values: Optional[List[tf.Tensor]] = None - hidden_states: Optional[Tuple[tf.Tensor]] = None - attentions: Optional[Tuple[tf.Tensor]] = None - cross_attentions: Optional[Tuple[tf.Tensor]] = None + past_key_values: List[tf.Tensor] | None = None + hidden_states: Tuple[tf.Tensor] | None = None + attentions: Tuple[tf.Tensor] | None = None + cross_attentions: Tuple[tf.Tensor] | None = None @dataclass @@ -333,13 +335,13 @@ class TFSeq2SeqModelOutput(ModelOutput): """ last_hidden_state: tf.Tensor = None - past_key_values: Optional[List[tf.Tensor]] = None - decoder_hidden_states: Optional[Tuple[tf.Tensor]] = None - decoder_attentions: Optional[Tuple[tf.Tensor]] = None - cross_attentions: Optional[Tuple[tf.Tensor]] = None - encoder_last_hidden_state: Optional[tf.Tensor] = None - encoder_hidden_states: Optional[Tuple[tf.Tensor]] = None - encoder_attentions: Optional[Tuple[tf.Tensor]] = None + past_key_values: List[tf.Tensor] | None = None + decoder_hidden_states: Tuple[tf.Tensor] | None = None + decoder_attentions: Tuple[tf.Tensor] | None = None + cross_attentions: Tuple[tf.Tensor] | None = None + encoder_last_hidden_state: tf.Tensor | None = None + encoder_hidden_states: Tuple[tf.Tensor] | None = None + encoder_attentions: Tuple[tf.Tensor] | None = None @dataclass @@ -365,10 +367,10 @@ class TFCausalLMOutput(ModelOutput): heads. """ - loss: Optional[tf.Tensor] = None + loss: tf.Tensor | None = None logits: tf.Tensor = None - hidden_states: Optional[Tuple[tf.Tensor]] = None - attentions: Optional[Tuple[tf.Tensor]] = None + hidden_states: Tuple[tf.Tensor] | None = None + attentions: Tuple[tf.Tensor] | None = None @dataclass @@ -400,11 +402,11 @@ class TFCausalLMOutputWithPast(ModelOutput): heads. """ - loss: Optional[tf.Tensor] = None + loss: tf.Tensor | None = None logits: tf.Tensor = None - past_key_values: Optional[List[tf.Tensor]] = None - hidden_states: Optional[Tuple[tf.Tensor]] = None - attentions: Optional[Tuple[tf.Tensor]] = None + past_key_values: List[tf.Tensor] | None = None + hidden_states: Tuple[tf.Tensor] | None = None + attentions: Tuple[tf.Tensor] | None = None @dataclass @@ -442,12 +444,12 @@ class TFCausalLMOutputWithCrossAttentions(ModelOutput): `past_key_values` input) to speed up sequential decoding. """ - loss: Optional[tf.Tensor] = None + loss: tf.Tensor | None = None logits: tf.Tensor = None - past_key_values: Optional[List[tf.Tensor]] = None - hidden_states: Optional[Tuple[tf.Tensor]] = None - attentions: Optional[Tuple[tf.Tensor]] = None - cross_attentions: Optional[Tuple[tf.Tensor]] = None + past_key_values: List[tf.Tensor] | None = None + hidden_states: Tuple[tf.Tensor] | None = None + attentions: Tuple[tf.Tensor] | None = None + cross_attentions: Tuple[tf.Tensor] | None = None @dataclass @@ -473,10 +475,10 @@ class TFMaskedLMOutput(ModelOutput): heads. """ - loss: Optional[tf.Tensor] = None + loss: tf.Tensor | None = None logits: tf.Tensor = None - hidden_states: Optional[Tuple[tf.Tensor]] = None - attentions: Optional[Tuple[tf.Tensor]] = None + hidden_states: Tuple[tf.Tensor] | None = None + attentions: Tuple[tf.Tensor] | None = None @dataclass @@ -527,15 +529,15 @@ class TFSeq2SeqLMOutput(ModelOutput): self-attention heads. """ - loss: Optional[tf.Tensor] = None + loss: tf.Tensor | None = None logits: tf.Tensor = None - past_key_values: Optional[List[tf.Tensor]] = None - decoder_hidden_states: Optional[Tuple[tf.Tensor]] = None - decoder_attentions: Optional[Tuple[tf.Tensor]] = None - cross_attentions: Optional[Tuple[tf.Tensor]] = None - encoder_last_hidden_state: Optional[tf.Tensor] = None - encoder_hidden_states: Optional[Tuple[tf.Tensor]] = None - encoder_attentions: Optional[Tuple[tf.Tensor]] = None + past_key_values: List[tf.Tensor] | None = None + decoder_hidden_states: Tuple[tf.Tensor] | None = None + decoder_attentions: Tuple[tf.Tensor] | None = None + cross_attentions: Tuple[tf.Tensor] | None = None + encoder_last_hidden_state: tf.Tensor | None = None + encoder_hidden_states: Tuple[tf.Tensor] | None = None + encoder_attentions: Tuple[tf.Tensor] | None = None @dataclass @@ -562,10 +564,10 @@ class TFNextSentencePredictorOutput(ModelOutput): heads. """ - loss: Optional[tf.Tensor] = None + loss: tf.Tensor | None = None logits: tf.Tensor = None - hidden_states: Optional[Tuple[tf.Tensor]] = None - attentions: Optional[Tuple[tf.Tensor]] = None + hidden_states: Tuple[tf.Tensor] | None = None + attentions: Tuple[tf.Tensor] | None = None @dataclass @@ -591,10 +593,10 @@ class TFSequenceClassifierOutput(ModelOutput): heads. """ - loss: Optional[tf.Tensor] = None + loss: tf.Tensor | None = None logits: tf.Tensor = None - hidden_states: Optional[Tuple[tf.Tensor]] = None - attentions: Optional[Tuple[tf.Tensor]] = None + hidden_states: Tuple[tf.Tensor] | None = None + attentions: Tuple[tf.Tensor] | None = None @dataclass @@ -642,15 +644,15 @@ class TFSeq2SeqSequenceClassifierOutput(ModelOutput): self-attention heads. """ - loss: Optional[tf.Tensor] = None + loss: tf.Tensor | None = None logits: tf.Tensor = None - past_key_values: Optional[List[tf.Tensor]] = None - decoder_hidden_states: Optional[Tuple[tf.Tensor]] = None - decoder_attentions: Optional[Tuple[tf.Tensor]] = None - cross_attentions: Optional[Tuple[tf.Tensor]] = None - encoder_last_hidden_state: Optional[tf.Tensor] = None - encoder_hidden_states: Optional[Tuple[tf.Tensor]] = None - encoder_attentions: Optional[Tuple[tf.Tensor]] = None + past_key_values: List[tf.Tensor] | None = None + decoder_hidden_states: Tuple[tf.Tensor] | None = None + decoder_attentions: Tuple[tf.Tensor] | None = None + cross_attentions: Tuple[tf.Tensor] | None = None + encoder_last_hidden_state: tf.Tensor | None = None + encoder_hidden_states: Tuple[tf.Tensor] | None = None + encoder_attentions: Tuple[tf.Tensor] | None = None @dataclass @@ -684,10 +686,10 @@ class TFSemanticSegmenterOutput(ModelOutput): heads. """ - loss: Optional[tf.Tensor] = None + loss: tf.Tensor | None = None logits: tf.Tensor = None - hidden_states: Optional[Tuple[tf.Tensor]] = None - attentions: Optional[Tuple[tf.Tensor]] = None + hidden_states: Tuple[tf.Tensor] | None = None + attentions: Tuple[tf.Tensor] | None = None @dataclass @@ -716,9 +718,9 @@ class TFSemanticSegmenterOutputWithNoAttention(ModelOutput): Hidden-states of the model at the output of each layer plus the optional initial embedding outputs. """ - loss: Optional[tf.Tensor] = None + loss: tf.Tensor | None = None logits: tf.Tensor = None - hidden_states: Optional[Tuple[tf.Tensor]] = None + hidden_states: Tuple[tf.Tensor] | None = None @dataclass @@ -742,10 +744,10 @@ class TFImageClassifierOutput(ModelOutput): heads. """ - loss: Optional[tf.Tensor] = None + loss: tf.Tensor | None = None logits: tf.Tensor = None - hidden_states: Optional[Tuple[tf.Tensor]] = None - attentions: Optional[Tuple[tf.Tensor]] = None + hidden_states: Tuple[tf.Tensor] | None = None + attentions: Tuple[tf.Tensor] | None = None @dataclass @@ -773,10 +775,10 @@ class TFMultipleChoiceModelOutput(ModelOutput): heads. """ - loss: Optional[tf.Tensor] = None + loss: tf.Tensor | None = None logits: tf.Tensor = None - hidden_states: Optional[Tuple[tf.Tensor]] = None - attentions: Optional[Tuple[tf.Tensor]] = None + hidden_states: Tuple[tf.Tensor] | None = None + attentions: Tuple[tf.Tensor] | None = None @dataclass @@ -802,10 +804,10 @@ class TFTokenClassifierOutput(ModelOutput): heads. """ - loss: Optional[tf.Tensor] = None + loss: tf.Tensor | None = None logits: tf.Tensor = None - hidden_states: Optional[Tuple[tf.Tensor]] = None - attentions: Optional[Tuple[tf.Tensor]] = None + hidden_states: Tuple[tf.Tensor] | None = None + attentions: Tuple[tf.Tensor] | None = None @dataclass @@ -833,11 +835,11 @@ class TFQuestionAnsweringModelOutput(ModelOutput): heads. """ - loss: Optional[tf.Tensor] = None + loss: tf.Tensor | None = None start_logits: tf.Tensor = None end_logits: tf.Tensor = None - hidden_states: Optional[Tuple[tf.Tensor]] = None - attentions: Optional[Tuple[tf.Tensor]] = None + hidden_states: Tuple[tf.Tensor] | None = None + attentions: Tuple[tf.Tensor] | None = None @dataclass @@ -884,15 +886,15 @@ class TFSeq2SeqQuestionAnsweringModelOutput(ModelOutput): self-attention heads. """ - loss: Optional[tf.Tensor] = None + loss: tf.Tensor | None = None start_logits: tf.Tensor = None end_logits: tf.Tensor = None - past_key_values: Optional[List[tf.Tensor]] = None - decoder_hidden_states: Optional[Tuple[tf.Tensor]] = None - decoder_attentions: Optional[Tuple[tf.Tensor]] = None - encoder_last_hidden_state: Optional[tf.Tensor] = None - encoder_hidden_states: Optional[Tuple[tf.Tensor]] = None - encoder_attentions: Optional[Tuple[tf.Tensor]] = None + past_key_values: List[tf.Tensor] | None = None + decoder_hidden_states: Tuple[tf.Tensor] | None = None + decoder_attentions: Tuple[tf.Tensor] | None = None + encoder_last_hidden_state: tf.Tensor | None = None + encoder_hidden_states: Tuple[tf.Tensor] | None = None + encoder_attentions: Tuple[tf.Tensor] | None = None @dataclass @@ -924,11 +926,11 @@ class TFSequenceClassifierOutputWithPast(ModelOutput): heads. """ - loss: Optional[tf.Tensor] = None + loss: tf.Tensor | None = None logits: tf.Tensor = None - past_key_values: Optional[List[tf.Tensor]] = None - hidden_states: Optional[Tuple[tf.Tensor]] = None - attentions: Optional[Tuple[tf.Tensor]] = None + past_key_values: List[tf.Tensor] | None = None + hidden_states: Tuple[tf.Tensor] | None = None + attentions: Tuple[tf.Tensor] | None = None @dataclass @@ -947,7 +949,7 @@ class TFImageClassifierOutputWithNoAttention(ModelOutput): feature maps) of the model at the output of each stage. """ - loss: Optional[tf.Tensor] = None + loss: tf.Tensor | None = None logits: tf.Tensor = None hidden_states: Optional[Tuple[tf.Tensor, ...]] = None @@ -974,10 +976,10 @@ class TFMaskedImageModelingOutput(ModelOutput): heads. """ - loss: Optional[tf.Tensor] = None + loss: tf.Tensor | None = None reconstruction: tf.Tensor = None - hidden_states: Optional[Tuple[tf.Tensor]] = None - attentions: Optional[Tuple[tf.Tensor]] = None + hidden_states: Tuple[tf.Tensor] | None = None + attentions: Tuple[tf.Tensor] | None = None @property def logits(self): diff --git a/src/transformers/modeling_tf_utils.py b/src/transformers/modeling_tf_utils.py index 630290d9216193..bac575e249df47 100644 --- a/src/transformers/modeling_tf_utils.py +++ b/src/transformers/modeling_tf_utils.py @@ -15,6 +15,8 @@ # limitations under the License. """TF general model utils.""" +from __future__ import annotations + import functools import gc import inspect @@ -38,9 +40,8 @@ from .configuration_utils import PretrainedConfig from .dynamic_module_utils import custom_object_save from .generation import GenerationConfig, TFGenerationMixin -from .tf_utils import shape_list +from .tf_utils import expand_1d, load_attributes_from_hdf5_group, save_attributes_to_hdf5_group, shape_list from .utils import ( - DUMMY_INPUTS, SAFE_WEIGHTS_INDEX_NAME, SAFE_WEIGHTS_NAME, TF2_WEIGHTS_INDEX_NAME, @@ -65,16 +66,15 @@ from .utils.hub import convert_file_size_to_int, get_checkpoint_shard_files -if parse(tf.__version__) >= parse("2.11.0"): +if parse(tf.__version__).minor >= 13: + from keras import backend as K + from keras.__internal__ import KerasTensor +elif parse(tf.__version__).minor >= 11: from keras import backend as K - from keras.engine import data_adapter from keras.engine.keras_tensor import KerasTensor - from keras.saving.legacy import hdf5_format else: from tensorflow.python.keras import backend as K - from tensorflow.python.keras.engine import data_adapter from tensorflow.python.keras.engine.keras_tensor import KerasTensor - from tensorflow.python.keras.saving import hdf5_format if is_safetensors_available(): @@ -797,9 +797,7 @@ def load_tf_shard(model, model_layer_map, resolved_archive_file, ignore_mismatch try: with h5py.File(resolved_archive_file, "r") as sharded_checkpoint_file: # Retrieve the name of each layer from the H5 file - saved_h5_model_layers_name = set( - hdf5_format.load_attributes_from_hdf5_group(sharded_checkpoint_file, "layer_names") - ) + saved_h5_model_layers_name = set(load_attributes_from_hdf5_group(sharded_checkpoint_file, "layer_names")) weight_value_tuples = [] # Compute missing and unexpected sub layers @@ -898,9 +896,7 @@ def load_tf_weights_from_h5(model, resolved_archive_file, ignore_mismatched_size # Read the H5 file with h5py.File(resolved_archive_file, "r") as sharded_checkpoint_file: # Retrieve the name of each layer from the H5 file - saved_h5_model_layers_name = set( - hdf5_format.load_attributes_from_hdf5_group(sharded_checkpoint_file, "layer_names") - ) + saved_h5_model_layers_name = set(load_attributes_from_hdf5_group(sharded_checkpoint_file, "layer_names")) # Find the missing layers from the high level list of layers missing_layers = list({layer.name for layer in model.layers} - saved_h5_model_layers_name) @@ -924,7 +920,7 @@ def load_tf_weights_from_h5(model, resolved_archive_file, ignore_mismatched_size # Create a dict from the H5 saved model that looks like {"weight_name": weight_value} # And a set with only the names - for weight_name in hdf5_format.load_attributes_from_hdf5_group(h5_layer_object, "weight_names"): + for weight_name in load_attributes_from_hdf5_group(h5_layer_object, "weight_names"): # TF names always start with the model name so we ignore it name = "/".join(weight_name.split("/")[1:]) @@ -1117,9 +1113,25 @@ def dummy_inputs(self) -> Dict[str, tf.Tensor]: Returns: `Dict[str, tf.Tensor]`: The dummy inputs. """ - return { - "input_ids": tf.constant(DUMMY_INPUTS, dtype=tf.int32), - } + dummies = {} + sig = self._prune_signature(self.input_signature) + for key, spec in sig.items(): + # 3 is the most correct arbitrary size. I will not be taking questions + dummies[key] = tf.ones(shape=[dim if dim is not None else 3 for dim in spec.shape], dtype=spec.dtype) + if key == "token_type_ids": + # Some models have token_type_ids but with a vocab_size of 1 + dummies[key] = tf.zeros_like(dummies[key]) + if self.config.add_cross_attention and "encoder_hidden_states" in inspect.signature(self.call).parameters: + if "encoder_hidden_states" not in dummies: + if self.main_input_name == "input_ids": + dummies["encoder_hidden_states"] = tf.ones( + shape=(3, 3, self.config.hidden_size), dtype=tf.float32, name="encoder_hidden_states" + ) + else: + raise NotImplementedError( + "Model has cross-attention but we couldn't infer the shape for the encoder hidden states. Please manually override dummy_inputs!" + ) + return dummies @property def framework(self) -> str: @@ -1140,6 +1152,10 @@ def __init__(self, config, *inputs, **kwargs): self.config = config self.name_or_path = config.name_or_path self.generation_config = GenerationConfig.from_model_config(config) if self.can_generate() else None + if not hasattr(self, "serving"): # Don't overwrite existing serving signatures + self.serving = tf.function( + self.eager_serving, input_signature=[self._prune_signature(self.input_signature)] + ) # Set the serving spec quickly to ensure that Keras doesn't use the specific dummy input shapes as the spec self._set_save_spec(self.serving.input_signature[0]) @@ -1159,7 +1175,7 @@ def _from_config(cls, config, **kwargs): """ return cls(config, **kwargs) - def get_head_mask(self, head_mask: Optional[tf.Tensor], num_hidden_layers: int) -> tf.Tensor: + def get_head_mask(self, head_mask: tf.Tensor | None, num_hidden_layers: int) -> tf.Tensor: """ Prepare the head mask if needed. @@ -1204,36 +1220,82 @@ def eager_serving(self, inputs): return self.serving_output(output) - @tf.function( - input_signature=[ - { - "input_ids": tf.TensorSpec((None, None), tf.int32, name="input_ids"), - "attention_mask": tf.TensorSpec((None, None), tf.int32, name="attention_mask"), - "token_type_ids": tf.TensorSpec((None, None), tf.int32, name="token_type_ids"), - } - ] - ) - def serving(self, inputs): + @property + def input_signature(self) -> Dict[str, tf.TensorSpec]: """ - Method used for serving the model. - - Args: - inputs (`Dict[str, tf.Tensor]`): - The input of the saved model as a dictionary of tensors. + This property should return a dict mapping input names to tf.TensorSpec objects, representing the expected + shape and dtype for model inputs. It is used for both serving and for generating the dummy inputs used to build + the model. """ - output = self.call(inputs) + model_inputs = list(inspect.signature(self.call).parameters) + sig = {} + if "input_ids" in model_inputs: + if self.__class__.__name__.endswith("ForMultipleChoice"): + text_dims = 3 + else: + text_dims = 2 + for input_name in ( + "input_ids", + "attention_mask", + "token_type_ids", + "decoder_input_ids", + "decoder_attention_mask", + ): + if input_name in model_inputs: + sig[input_name] = tf.TensorSpec([None] * text_dims, tf.int32, name=input_name) + if "pixel_values" in model_inputs: + pixel_values_shape = [None, None, None, None] + if hasattr(self.config, "vision_config"): + vision_config = self.config.vision_config + else: + vision_config = self.config + if hasattr(vision_config, "num_channels"): + pixel_values_shape[1] = vision_config.num_channels + else: + raise NotImplementedError( + "Could not infer number of channels from config, please override input_signature to specify input shapes." + ) + if hasattr(vision_config, "image_size"): + pixel_values_shape[2] = pixel_values_shape[3] = vision_config.image_size + elif hasattr(vision_config, "input_size"): + pixel_values_shape[2] = pixel_values_shape[3] = vision_config.input_size + else: + raise NotImplementedError( + "Could not infer input image shape from config, please override input_signature to specify input shapes." + ) + sig["pixel_values"] = tf.TensorSpec(pixel_values_shape, tf.float32, name="pixel_values") + if "input_features" in model_inputs: + raise NotImplementedError("Audio models need a manually defined input_signature") + return sig - return self.serving_output(output) + def _prune_signature(self, signature): + """Keeps only the keys of a given input signature that are valid for this model.""" + model_inputs = list(inspect.signature(self.call).parameters) + return {key: val for key, val in signature.items() if key in model_inputs} def serving_output(self, output): """ - Prepare the output of the saved model. Each model must implement this function. - - Args: - output ([`TFBaseModelOutput`]): - The output returned by the model. - """ - raise NotImplementedError + Prepare the output of the saved model. Can be overridden if specific serving modifications are required. + """ + if not isinstance(output, ModelOutput): + return output + for key in output: + if key.endswith("hidden_states") and not getattr(self.config, "output_hidden_states", False): + output[key] = None + elif key.endswith("attentions") and not getattr(self.config, "output_attentions", False): + output[key] = None + elif key == "past_key_values" and not getattr(self.config, "use_cache", False): + output[key] = None + elif key == "cross_attentions" and not ( + getattr(self.config, "output_attentions", False) and getattr(self.config, "add_cross_attention", False) + ): + output[key] = None + if isinstance(output[key], (tuple, list)): + try: + output[key] = tf.convert_to_tensor(output[key]) + except (ValueError, tf.errors.InvalidArgumentError): + pass # Layers may not have the same dimensions + return output def can_generate(self) -> bool: """ @@ -1387,7 +1449,7 @@ def prepare_tf_dataset( if not isinstance(dataset, datasets.Dataset): raise TypeError("Dataset argument should be a datasets.Dataset!") - model_inputs = list(dict(inspect.signature(self.call).parameters).keys()) + model_inputs = list(inspect.signature(self.call).parameters) model_labels = find_labels(self.__class__) if "cols_to_retain" in list(inspect.signature(dataset._get_output_signature).parameters.keys()): output_signature, _ = dataset._get_output_signature( @@ -1499,7 +1561,7 @@ def compute_loss(self, *args, **kwargs): return self.hf_compute_loss(*args, **kwargs) def get_label_to_output_name_mapping(self): - arg_names = list(dict(inspect.signature(self.call).parameters).keys()) + arg_names = list(inspect.signature(self.call).parameters) if self._label_to_output_map is not None: return self._label_to_output_map elif "start_positions" in arg_names: @@ -1522,14 +1584,14 @@ def train_step(self, data): """ # We hardcode the most common renamings; models with weirder names can set `self._label_to_output_map` - arg_names = list(dict(inspect.signature(self.call).parameters).keys()) + arg_names = list(inspect.signature(self.call).parameters) label_kwargs = find_labels(self.__class__) label_to_output = self.get_label_to_output_name_mapping() output_to_label = {val: key for key, val in label_to_output.items()} if not self._using_dummy_loss and parse(tf.__version__) < parse("2.11.0"): # Newer TF train steps leave this out - data = data_adapter.expand_1d(data) - x, y, sample_weight = data_adapter.unpack_x_y_sample_weight(data) + data = expand_1d(data) + x, y, sample_weight = tf.keras.utils.unpack_x_y_sample_weight(data) # If the inputs are mutable dictionaries, make a shallow copy of them because we will modify # them during input/label pre-processing. This avoids surprising the user by wrecking their data. # In addition, modifying mutable Python inputs makes XLA compilation impossible. @@ -1629,14 +1691,14 @@ def test_step(self, data): that they are available to the model during the forward pass. """ # We hardcode the most common renamings; models with weirder names can set `self._label_to_output_map` - arg_names = list(dict(inspect.signature(self.call).parameters).keys()) + arg_names = list(inspect.signature(self.call).parameters) label_kwargs = find_labels(self.__class__) label_to_output = self.get_label_to_output_name_mapping() output_to_label = {val: key for key, val in label_to_output.items()} if not self._using_dummy_loss and parse(tf.__version__) < parse("2.11.0"): # Newer versions leave this out - data = data_adapter.expand_1d(data) - x, y, sample_weight = data_adapter.unpack_x_y_sample_weight(data) + data = expand_1d(data) + x, y, sample_weight = tf.keras.utils.unpack_x_y_sample_weight(data) # If the inputs are mutable dictionaries, make a shallow copy of them because we will modify # them during input/label pre-processing. This avoids surprising the user by wrecking their data. # In addition, modifying mutable Python inputs makes XLA compilation impossible. @@ -1648,7 +1710,7 @@ def test_step(self, data): # When using a dummy loss, we ensure that separate labels are copied to the correct model arguments, # if those keys are not already present in the input dict if self._using_dummy_loss and y is not None: - arg_names = list(dict(inspect.signature(self.call).parameters).keys()) + arg_names = list(inspect.signature(self.call).parameters) # If y is a tensor and the model only has one label-like input, map y to that input if len(label_kwargs) == 1 and isinstance(y, tf.Tensor): if isinstance(x, tf.Tensor): @@ -2402,7 +2464,7 @@ def save_pretrained( ) param_dset[:] = layer.numpy() layers.append(layer_name.encode("utf8")) - hdf5_format.save_attributes_to_hdf5_group(shard_file, "layer_names", layers) + save_attributes_to_hdf5_group(shard_file, "layer_names", layers) if push_to_hub: self._upload_modified_files( diff --git a/src/transformers/modeling_utils.py b/src/transformers/modeling_utils.py index a8e8e4b2e24146..2dbe5d43829b99 100644 --- a/src/transformers/modeling_utils.py +++ b/src/transformers/modeling_utils.py @@ -606,7 +606,7 @@ def _load_state_dict_into_meta_model( state_dict_folder=None, state_dict_index=None, dtype=None, - load_in_8bit=False, + is_quantized=False, is_safetensors=False, keep_in_fp32_modules=None, ): @@ -627,8 +627,8 @@ def _load_state_dict_into_meta_model( # - Is there a situation where some keys aren't in `loaded_state_dict_keys` and in which case # they won't get loaded. - if load_in_8bit: - from .utils.bitsandbytes import set_module_8bit_tensor_to_device + if is_quantized: + from .utils.bitsandbytes import set_module_quantized_tensor_to_device error_msgs = [] @@ -699,12 +699,13 @@ def _load_state_dict_into_meta_model( # TODO: group all errors and raise at the end. raise ValueError(f"{param_name} doesn't have any device set.") param_device = device_map[module_name] + if param_device == "disk": if not is_safetensors: offload_index = offload_weight(param, param_name, offload_folder, offload_index) elif param_device == "cpu" and state_dict_index is not None: state_dict_index = offload_weight(param, param_name, state_dict_folder, state_dict_index) - elif not load_in_8bit: + elif not is_quantized: # For backward compatibility with older versions of `accelerate` set_module_tensor_to_device(model, param_name, param_device, **set_module_kwargs) else: @@ -714,7 +715,7 @@ def _load_state_dict_into_meta_model( fp16_statistics = None if "SCB" not in param_name: - set_module_8bit_tensor_to_device( + set_module_quantized_tensor_to_device( model, param_name, param_device, value=param, fp16_statistics=fp16_statistics ) @@ -912,7 +913,7 @@ def get_head_mask( The mask indicating if we should keep the heads or not (1.0 for keep, 0.0 for discard). num_hidden_layers (`int`): The number of hidden layers in the model. - is_attention_chunked: (`bool`, *optional*, defaults to `False`): + is_attention_chunked (`bool`, *optional*, defaults to `False`): Whether or not the attentions scores are computed by chunks or not. Returns: @@ -1700,6 +1701,11 @@ def save_pretrained( UserWarning, ) + if getattr(self, "is_loaded_in_4bit", False): + raise NotImplementedError( + "You are calling `save_pretrained` on a 4-bit converted model. This is currently not supported" + ) + if "save_config" in kwargs: warnings.warn( "`save_config` is deprecated and will be removed in v5 of Transformers. Use `is_main_process` instead." @@ -1876,9 +1882,9 @@ def get_memory_footprint(self, return_buffers=True): def to(self, *args, **kwargs): # Checks if the model has been loaded in 8-bit - if getattr(self, "is_loaded_in_8bit", False): + if getattr(self, "is_quantized", False): raise ValueError( - "`.to` is not supported for `8-bit` models. Please use the model as it is, since the" + "`.to` is not supported for `4-bit` or `8-bit` models. Please use the model as it is, since the" " model has already been set to the correct devices and casted to the correct `dtype`." ) else: @@ -1886,9 +1892,9 @@ def to(self, *args, **kwargs): def half(self, *args): # Checks if the model has been loaded in 8-bit - if getattr(self, "is_loaded_in_8bit", False): + if getattr(self, "is_quantized", False): raise ValueError( - "`.half()` is not supported for `8-bit` models. Please use the model as it is, since the" + "`.half()` is not supported for `4-bit` or `8-bit` models. Please use the model as it is, since the" " model has already been casted to the correct `dtype`." ) else: @@ -1896,9 +1902,9 @@ def half(self, *args): def float(self, *args): # Checks if the model has been loaded in 8-bit - if getattr(self, "is_loaded_in_8bit", False): + if getattr(self, "is_quantized", False): raise ValueError( - "`.float()` is not supported for `8-bit` models. Please use the model as it is, since the" + "`.float()` is not supported for `4-bit` or `8-bit` models. Please use the model as it is, since the" " model has already been casted to the correct `dtype`." ) else: @@ -2156,6 +2162,7 @@ def from_pretrained(cls, pretrained_model_name_or_path: Optional[Union[str, os.P offload_folder = kwargs.pop("offload_folder", None) offload_state_dict = kwargs.pop("offload_state_dict", False) load_in_8bit = kwargs.pop("load_in_8bit", False) + load_in_4bit = kwargs.pop("load_in_4bit", False) quantization_config = kwargs.pop("quantization_config", None) subfolder = kwargs.pop("subfolder", "") commit_hash = kwargs.pop("_commit_hash", None) @@ -2194,10 +2201,13 @@ def from_pretrained(cls, pretrained_model_name_or_path: Optional[Union[str, os.P if quantization_config is None: quantization_config, kwargs = BitsAndBytesConfig.from_dict( - config_dict={"load_in_8bit": load_in_8bit}, return_unused_kwargs=True, **kwargs + config_dict={"load_in_8bit": load_in_8bit, "load_in_4bit": load_in_4bit}, + return_unused_kwargs=True, + **kwargs, ) elif quantization_config is not None: load_in_8bit = quantization_config.load_in_8bit + load_in_4bit = quantization_config.load_in_4bit quantization_config_kwargs = { k: v for k, v in kwargs.items() if k in inspect.signature(BitsAndBytesConfig).parameters @@ -2215,30 +2225,32 @@ def from_pretrained(cls, pretrained_model_name_or_path: Optional[Union[str, os.P if low_cpu_mem_usage is None: low_cpu_mem_usage = True - if load_in_8bit: + if load_in_8bit or load_in_4bit: if not (is_accelerate_available() and is_bitsandbytes_available()): raise ImportError( "Using `load_in_8bit=True` requires Accelerate: `pip install accelerate` and the latest version of" " bitsandbytes `pip install -i https://test.pypi.org/simple/ bitsandbytes` or" " pip install bitsandbytes` " ) - if torch_dtype != torch.float16: + + if torch_dtype is None: # We force the `dtype` to be float16, this is a requirement from `bitsandbytes` - logger.warning( + logger.info( f"Overriding torch_dtype={torch_dtype} with `torch_dtype=torch.float16` due to " - "requirements of `bitsandbytes` to enable model loading in mixed int8. " - "Either pass torch_dtype=torch.float16 or don't pass this argument at all to remove this warning." + "requirements of `bitsandbytes` to enable model loading in mixed kbit. " + "Pass your own torch_dtype to specify the dtype of the remaining non-linear layers or pass" + " torch_dtype=torch.float16 to remove this warning." ) torch_dtype = torch.float16 if device_map is None: raise ValueError( - "A device map needs to be passed to run convert models into mixed-int8 format. Please run" + "A device map needs to be passed to run convert models into 8-bit and 4-bit formats. Please run" "`.from_pretrained` with `device_map='auto'`" ) if from_tf or from_flax: raise ValueError( - "Converting into mixed 8-bit weights from tf/flax weights is currently not supported, please make" + "Converting into 4-bit or 8-bit weights from tf/flax weights is currently not supported, please make" " sure the weights are in PyTorch format." ) @@ -2296,8 +2308,8 @@ def from_pretrained(cls, pretrained_model_name_or_path: Optional[Union[str, os.P load_in_8bit = quantization_config.load_in_8bit if load_in_8bit: - torch_dtype = torch.float16 - + if torch_dtype is None: + torch_dtype = torch.float16 if device_map is None: device_map = "auto" @@ -2582,7 +2594,9 @@ def from_pretrained(cls, pretrained_model_name_or_path: Optional[Union[str, os.P # Check if `_keep_in_fp32_modules` is not None use_keep_in_fp32_modules = ( - (cls._keep_in_fp32_modules is not None) and is_accelerate_available() and torch_dtype == torch.float16 + (cls._keep_in_fp32_modules is not None) + and is_accelerate_available() + and (torch_dtype == torch.float16 or load_in_4bit or load_in_8bit) ) if ( (cls._keep_in_fp32_modules is not None) @@ -2611,7 +2625,7 @@ def from_pretrained(cls, pretrained_model_name_or_path: Optional[Union[str, os.P logger.info("Detected DeepSpeed ZeRO-3: activating zero.init() for this model") init_contexts = [deepspeed.zero.Init(config_dict_or_path=deepspeed_config())] + init_contexts - elif load_in_8bit or low_cpu_mem_usage: + elif load_in_8bit or load_in_4bit or low_cpu_mem_usage: init_contexts.append(init_empty_weights()) with ContextManagers(init_contexts): @@ -2624,20 +2638,19 @@ def from_pretrained(cls, pretrained_model_name_or_path: Optional[Union[str, os.P else: keep_in_fp32_modules = [] - if load_in_8bit: - from .utils.bitsandbytes import get_keys_to_not_convert, replace_8bit_linear + if load_in_8bit or load_in_4bit: + from .utils.bitsandbytes import get_keys_to_not_convert, replace_with_bnb_linear - load_in_8bit_skip_modules = quantization_config.llm_int8_skip_modules - load_in_8bit_threshold = quantization_config.llm_int8_threshold + llm_int8_skip_modules = quantization_config.llm_int8_skip_modules load_in_8bit_fp32_cpu_offload = quantization_config.llm_int8_enable_fp32_cpu_offload logger.info("Detected 8-bit loading: activating 8-bit loading for this model") # We keep some modules such as the lm_head in their original dtype for numerical stability reasons - if load_in_8bit_skip_modules is None: + if llm_int8_skip_modules is None: modules_to_not_convert = get_keys_to_not_convert(model) else: - modules_to_not_convert = load_in_8bit_skip_modules + modules_to_not_convert = llm_int8_skip_modules if not isinstance(modules_to_not_convert, list): modules_to_not_convert = [modules_to_not_convert] @@ -2657,21 +2670,36 @@ def from_pretrained(cls, pretrained_model_name_or_path: Optional[Union[str, os.P modules_to_not_convert.extend(keys_on_cpu) - model = replace_8bit_linear( - model, threshold=load_in_8bit_threshold, modules_to_not_convert=modules_to_not_convert + supports_4bit = version.parse(importlib_metadata.version("bitsandbytes")) >= version.parse("0.39.0") + + if load_in_4bit and not supports_4bit: + raise ValueError( + "You have a version of `bitsandbytes` that is not compatible with 4bit inference and training" + " make sure you have the latest version of `bitsandbytes` installed" + ) + + model = replace_with_bnb_linear( + model, modules_to_not_convert=modules_to_not_convert, quantization_config=quantization_config ) # training in 8-bit is only available in 0.37.0+ - model._is_int8_training_enabled = version.parse( + model._is_kbit_training_enabled = version.parse( importlib_metadata.version("bitsandbytes") ) >= version.parse("0.37.0") model.config.quantization_config = quantization_config model.is_8bit_serializable = is_8bit_serializable + if load_in_8bit and torch_dtype is None: + logger.warning( + "You are loading your model in 8bit but you did not specify a `torch_dtype` attribute." + "All non-linear modules will be loaded in full precision.", + " If you want to load the other modules in other precision, please specify a `torch_dtype` attribute.", + ) + if isinstance(device_map, str): special_dtypes = {} - if load_in_8bit: + if load_in_8bit or load_in_4bit: special_dtypes.update( { name: torch_dtype @@ -2688,8 +2716,28 @@ def from_pretrained(cls, pretrained_model_name_or_path: Optional[Union[str, os.P } ) + target_dtype = torch_dtype + + if load_in_4bit: + if version.parse(importlib_metadata.version("accelerate")) > version.parse("0.19.0"): + from accelerate.utils import CustomDtype + + target_dtype = CustomDtype.INT4 + else: + raise ValueError( + "You are using `device_map='auto'` on a 4bit loaded version of the model. To automatically compute" + " the appropriate device map, you should upgrade your `accelerate` library," + "`pip install --upgrade accelerate` or install it from source to support fp4 auto device map" + "calculation. You may encounter unexpected behavior, or pass your own device map" + ) + elif load_in_8bit: + target_dtype = torch.int8 + if model._no_split_modules is None: - raise ValueError(f"{model.__class__.__name__} does not support `device_map='{device_map}'` yet.") + raise ValueError( + f"{model.__class__.__name__} does not support `device_map='{device_map}'`. To implement support, the model" + "class needs to implement the `_no_split_modules` attribute." + ) no_split_modules = model._no_split_modules if device_map not in ["auto", "balanced", "balanced_low_0", "sequential"]: raise ValueError( @@ -2710,7 +2758,7 @@ def from_pretrained(cls, pretrained_model_name_or_path: Optional[Union[str, os.P if device_map != "sequential" and get_balanced_memory is not None: max_memory = get_balanced_memory( model, - dtype=torch_dtype if not load_in_8bit else torch.int8, + dtype=target_dtype, low_zero=(device_map == "balanced_low_0"), max_memory=max_memory, **kwargs, @@ -2718,9 +2766,9 @@ def from_pretrained(cls, pretrained_model_name_or_path: Optional[Union[str, os.P kwargs["max_memory"] = max_memory # Make sure tied weights are tied before creating the device map. model.tie_weights() - device_map = infer_auto_device_map(model, dtype=torch_dtype if not load_in_8bit else torch.int8, **kwargs) + device_map = infer_auto_device_map(model, dtype=target_dtype, **kwargs) - if load_in_8bit: + if load_in_8bit or load_in_4bit: # The LM head / tied weights or any last module can stay on disk / CPU device_map_without_lm_head = { key: device_map[key] for key in device_map.keys() if key not in modules_to_not_convert @@ -2795,11 +2843,13 @@ def from_pretrained(cls, pretrained_model_name_or_path: Optional[Union[str, os.P offload_folder=offload_folder, offload_state_dict=offload_state_dict, dtype=torch_dtype, - load_in_8bit=load_in_8bit, + is_quantized=(load_in_8bit or load_in_4bit), keep_in_fp32_modules=keep_in_fp32_modules, ) + model.is_loaded_in_4bit = load_in_4bit model.is_loaded_in_8bit = load_in_8bit + model.is_quantized = load_in_8bit or load_in_4bit # make sure token embedding weights are still tied if needed model.tie_weights() @@ -2862,12 +2912,12 @@ def _load_pretrained_model( offload_folder=None, offload_state_dict=None, dtype=None, - load_in_8bit=False, + is_quantized=False, keep_in_fp32_modules=None, ): is_safetensors = False - if load_in_8bit: - from .utils.bitsandbytes import set_module_8bit_tensor_to_device + if is_quantized: + from .utils.bitsandbytes import set_module_quantized_tensor_to_device if device_map is not None and "disk" in device_map.values(): archive_file = ( @@ -2973,10 +3023,10 @@ def _fix_key(key): target_dtype = torch.float32 if param.device == torch.device("meta"): - if not load_in_8bit: + if not (is_quantized): set_module_tensor_to_device(model, key, "cpu", torch.empty(*param.size(), dtype=target_dtype)) else: - set_module_8bit_tensor_to_device( + set_module_quantized_tensor_to_device( model, key, "cpu", torch.empty(*param.size(), dtype=target_dtype) ) @@ -3134,7 +3184,7 @@ def _find_mismatched_keys( state_dict_folder=state_dict_folder, state_dict_index=state_dict_index, dtype=dtype, - load_in_8bit=load_in_8bit, + is_quantized=is_quantized, is_safetensors=is_safetensors, keep_in_fp32_modules=keep_in_fp32_modules, ) @@ -3174,7 +3224,7 @@ def _find_mismatched_keys( ) raise RuntimeError(f"Error(s) in loading state_dict for {model.__class__.__name__}:\n\t{error_msg}") - if load_in_8bit: + if is_quantized: unexpected_keys = [elem for elem in unexpected_keys if "SCB" not in elem] missing_keys = [elem for elem in missing_keys if "SCB" not in elem] diff --git a/src/transformers/models/__init__.py b/src/transformers/models/__init__.py index 91a2d3ed5dab8a..1aa2a049aa1bf1 100644 --- a/src/transformers/models/__init__.py +++ b/src/transformers/models/__init__.py @@ -18,6 +18,7 @@ altclip, audio_spectrogram_transformer, auto, + autoformer, bart, barthez, bartpho, diff --git a/src/transformers/models/albert/modeling_tf_albert.py b/src/transformers/models/albert/modeling_tf_albert.py index c7f76b175b0b28..ad35b6182a4e21 100644 --- a/src/transformers/models/albert/modeling_tf_albert.py +++ b/src/transformers/models/albert/modeling_tf_albert.py @@ -15,6 +15,9 @@ # limitations under the License. """ TF 2.0 ALBERT model.""" + +from __future__ import annotations + import math from dataclasses import dataclass from typing import Dict, Optional, Tuple, Union @@ -46,7 +49,6 @@ ) from ...tf_utils import check_embeddings_within_bounds, shape_list, stable_softmax from ...utils import ( - MULTIPLE_CHOICE_DUMMY_INPUTS, ModelOutput, add_code_sample_docstrings, add_start_docstrings, @@ -561,12 +563,12 @@ class PreTrainedModel @unpack_inputs def call( self, - input_ids: Optional[TFModelInputType] = None, - attention_mask: Optional[Union[np.ndarray, tf.Tensor]] = None, - token_type_ids: Optional[Union[np.ndarray, tf.Tensor]] = None, - position_ids: Optional[Union[np.ndarray, tf.Tensor]] = None, - head_mask: Optional[Union[np.ndarray, tf.Tensor]] = None, - inputs_embeds: Optional[Union[np.ndarray, tf.Tensor]] = None, + input_ids: TFModelInputType | None = None, + attention_mask: np.ndarray | tf.Tensor | None = None, + token_type_ids: np.ndarray | tf.Tensor | None = None, + position_ids: np.ndarray | tf.Tensor | None = None, + head_mask: np.ndarray | tf.Tensor | None = None, + inputs_embeds: np.ndarray | tf.Tensor | None = None, output_attentions: Optional[bool] = None, output_hidden_states: Optional[bool] = None, return_dict: Optional[bool] = None, @@ -676,8 +678,8 @@ class TFAlbertForPreTrainingOutput(ModelOutput): loss: tf.Tensor = None prediction_logits: tf.Tensor = None sop_logits: tf.Tensor = None - hidden_states: Optional[Tuple[tf.Tensor]] = None - attentions: Optional[Tuple[tf.Tensor]] = None + hidden_states: Tuple[tf.Tensor] | None = None + attentions: Tuple[tf.Tensor] | None = None ALBERT_START_DOCSTRING = r""" @@ -797,12 +799,12 @@ def __init__(self, config: AlbertConfig, *inputs, **kwargs): ) def call( self, - input_ids: Optional[TFModelInputType] = None, - attention_mask: Optional[Union[np.ndarray, tf.Tensor]] = None, - token_type_ids: Optional[Union[np.ndarray, tf.Tensor]] = None, - position_ids: Optional[Union[np.ndarray, tf.Tensor]] = None, - head_mask: Optional[Union[np.ndarray, tf.Tensor]] = None, - inputs_embeds: Optional[Union[np.ndarray, tf.Tensor]] = None, + input_ids: TFModelInputType | None = None, + attention_mask: np.ndarray | tf.Tensor | None = None, + token_type_ids: np.ndarray | tf.Tensor | None = None, + position_ids: np.ndarray | tf.Tensor | None = None, + head_mask: np.ndarray | tf.Tensor | None = None, + inputs_embeds: np.ndarray | tf.Tensor | None = None, output_attentions: Optional[bool] = None, output_hidden_states: Optional[bool] = None, return_dict: Optional[bool] = None, @@ -823,17 +825,6 @@ def call( return outputs - def serving_output(self, output: TFBaseModelOutputWithPooling) -> TFBaseModelOutputWithPooling: - hs = tf.convert_to_tensor(output.hidden_states) if self.config.output_hidden_states else None - attns = tf.convert_to_tensor(output.attentions) if self.config.output_attentions else None - - return TFBaseModelOutputWithPooling( - last_hidden_state=output.last_hidden_state, - pooler_output=output.pooler_output, - hidden_states=hs, - attentions=attns, - ) - @add_start_docstrings( """ @@ -863,17 +854,17 @@ def get_lm_head(self) -> tf.keras.layers.Layer: @replace_return_docstrings(output_type=TFAlbertForPreTrainingOutput, config_class=_CONFIG_FOR_DOC) def call( self, - input_ids: Optional[TFModelInputType] = None, - attention_mask: Optional[Union[np.ndarray, tf.Tensor]] = None, - token_type_ids: Optional[Union[np.ndarray, tf.Tensor]] = None, - position_ids: Optional[Union[np.ndarray, tf.Tensor]] = None, - head_mask: Optional[Union[np.ndarray, tf.Tensor]] = None, - inputs_embeds: Optional[Union[np.ndarray, tf.Tensor]] = None, + input_ids: TFModelInputType | None = None, + attention_mask: np.ndarray | tf.Tensor | None = None, + token_type_ids: np.ndarray | tf.Tensor | None = None, + position_ids: np.ndarray | tf.Tensor | None = None, + head_mask: np.ndarray | tf.Tensor | None = None, + inputs_embeds: np.ndarray | tf.Tensor | None = None, output_attentions: Optional[bool] = None, output_hidden_states: Optional[bool] = None, return_dict: Optional[bool] = None, - labels: Optional[Union[np.ndarray, tf.Tensor]] = None, - sentence_order_label: Optional[Union[np.ndarray, tf.Tensor]] = None, + labels: np.ndarray | tf.Tensor | None = None, + sentence_order_label: np.ndarray | tf.Tensor | None = None, training: Optional[bool] = False, ) -> Union[TFAlbertForPreTrainingOutput, Tuple[tf.Tensor]]: r""" @@ -930,17 +921,6 @@ def call( attentions=outputs.attentions, ) - def serving_output(self, output: TFAlbertForPreTrainingOutput) -> TFAlbertForPreTrainingOutput: - hs = tf.convert_to_tensor(output.hidden_states) if self.config.output_hidden_states else None - attns = tf.convert_to_tensor(output.attentions) if self.config.output_attentions else None - - return TFAlbertForPreTrainingOutput( - prediction_logits=output.prediction_logits, - sop_logits=output.sop_logits, - hidden_states=hs, - attentions=attns, - ) - class TFAlbertSOPHead(tf.keras.layers.Layer): def __init__(self, config: AlbertConfig, **kwargs): @@ -979,16 +959,16 @@ def get_lm_head(self) -> tf.keras.layers.Layer: @replace_return_docstrings(output_type=TFMaskedLMOutput, config_class=_CONFIG_FOR_DOC) def call( self, - input_ids: Optional[TFModelInputType] = None, - attention_mask: Optional[Union[np.ndarray, tf.Tensor]] = None, - token_type_ids: Optional[Union[np.ndarray, tf.Tensor]] = None, - position_ids: Optional[Union[np.ndarray, tf.Tensor]] = None, - head_mask: Optional[Union[np.ndarray, tf.Tensor]] = None, - inputs_embeds: Optional[Union[np.ndarray, tf.Tensor]] = None, + input_ids: TFModelInputType | None = None, + attention_mask: np.ndarray | tf.Tensor | None = None, + token_type_ids: np.ndarray | tf.Tensor | None = None, + position_ids: np.ndarray | tf.Tensor | None = None, + head_mask: np.ndarray | tf.Tensor | None = None, + inputs_embeds: np.ndarray | tf.Tensor | None = None, output_attentions: Optional[bool] = None, output_hidden_states: Optional[bool] = None, return_dict: Optional[bool] = None, - labels: Optional[Union[np.ndarray, tf.Tensor]] = None, + labels: np.ndarray | tf.Tensor | None = None, training: Optional[bool] = False, ) -> Union[TFMaskedLMOutput, Tuple[tf.Tensor]]: r""" @@ -1055,13 +1035,6 @@ def call( attentions=outputs.attentions, ) - # Copied from transformers.models.bert.modeling_tf_bert.TFBertForMaskedLM.serving_output - def serving_output(self, output: TFMaskedLMOutput) -> TFMaskedLMOutput: - hs = tf.convert_to_tensor(output.hidden_states) if self.config.output_hidden_states else None - attns = tf.convert_to_tensor(output.attentions) if self.config.output_attentions else None - - return TFMaskedLMOutput(logits=output.logits, hidden_states=hs, attentions=attns) - @add_start_docstrings( """ @@ -1097,16 +1070,16 @@ def __init__(self, config: AlbertConfig, *inputs, **kwargs): ) def call( self, - input_ids: Optional[TFModelInputType] = None, - attention_mask: Optional[Union[np.ndarray, tf.Tensor]] = None, - token_type_ids: Optional[Union[np.ndarray, tf.Tensor]] = None, - position_ids: Optional[Union[np.ndarray, tf.Tensor]] = None, - head_mask: Optional[Union[np.ndarray, tf.Tensor]] = None, - inputs_embeds: Optional[Union[np.ndarray, tf.Tensor]] = None, + input_ids: TFModelInputType | None = None, + attention_mask: np.ndarray | tf.Tensor | None = None, + token_type_ids: np.ndarray | tf.Tensor | None = None, + position_ids: np.ndarray | tf.Tensor | None = None, + head_mask: np.ndarray | tf.Tensor | None = None, + inputs_embeds: np.ndarray | tf.Tensor | None = None, output_attentions: Optional[bool] = None, output_hidden_states: Optional[bool] = None, return_dict: Optional[bool] = None, - labels: Optional[Union[np.ndarray, tf.Tensor]] = None, + labels: np.ndarray | tf.Tensor | None = None, training: Optional[bool] = False, ) -> Union[TFSequenceClassifierOutput, Tuple[tf.Tensor]]: r""" @@ -1144,13 +1117,6 @@ def call( attentions=outputs.attentions, ) - # Copied from transformers.models.bert.modeling_tf_bert.TFBertForSequenceClassification.serving_output - def serving_output(self, output: TFSequenceClassifierOutput) -> TFSequenceClassifierOutput: - hs = tf.convert_to_tensor(output.hidden_states) if self.config.output_hidden_states else None - attns = tf.convert_to_tensor(output.attentions) if self.config.output_attentions else None - - return TFSequenceClassifierOutput(logits=output.logits, hidden_states=hs, attentions=attns) - @add_start_docstrings( """ @@ -1189,16 +1155,16 @@ def __init__(self, config: AlbertConfig, *inputs, **kwargs): ) def call( self, - input_ids: Optional[TFModelInputType] = None, - attention_mask: Optional[Union[np.ndarray, tf.Tensor]] = None, - token_type_ids: Optional[Union[np.ndarray, tf.Tensor]] = None, - position_ids: Optional[Union[np.ndarray, tf.Tensor]] = None, - head_mask: Optional[Union[np.ndarray, tf.Tensor]] = None, - inputs_embeds: Optional[Union[np.ndarray, tf.Tensor]] = None, + input_ids: TFModelInputType | None = None, + attention_mask: np.ndarray | tf.Tensor | None = None, + token_type_ids: np.ndarray | tf.Tensor | None = None, + position_ids: np.ndarray | tf.Tensor | None = None, + head_mask: np.ndarray | tf.Tensor | None = None, + inputs_embeds: np.ndarray | tf.Tensor | None = None, output_attentions: Optional[bool] = None, output_hidden_states: Optional[bool] = None, return_dict: Optional[bool] = None, - labels: Optional[Union[np.ndarray, tf.Tensor]] = None, + labels: np.ndarray | tf.Tensor | None = None, training: Optional[bool] = False, ) -> Union[TFTokenClassifierOutput, Tuple[tf.Tensor]]: r""" @@ -1234,13 +1200,6 @@ def call( attentions=outputs.attentions, ) - # Copied from transformers.models.bert.modeling_tf_bert.TFBertForTokenClassification.serving_output - def serving_output(self, output: TFTokenClassifierOutput) -> TFTokenClassifierOutput: - hs = tf.convert_to_tensor(output.hidden_states) if self.config.output_hidden_states else None - attns = tf.convert_to_tensor(output.attentions) if self.config.output_attentions else None - - return TFTokenClassifierOutput(logits=output.logits, hidden_states=hs, attentions=attns) - @add_start_docstrings( """ @@ -1276,17 +1235,17 @@ def __init__(self, config: AlbertConfig, *inputs, **kwargs): ) def call( self, - input_ids: Optional[TFModelInputType] = None, - attention_mask: Optional[Union[np.ndarray, tf.Tensor]] = None, - token_type_ids: Optional[Union[np.ndarray, tf.Tensor]] = None, - position_ids: Optional[Union[np.ndarray, tf.Tensor]] = None, - head_mask: Optional[Union[np.ndarray, tf.Tensor]] = None, - inputs_embeds: Optional[Union[np.ndarray, tf.Tensor]] = None, + input_ids: TFModelInputType | None = None, + attention_mask: np.ndarray | tf.Tensor | None = None, + token_type_ids: np.ndarray | tf.Tensor | None = None, + position_ids: np.ndarray | tf.Tensor | None = None, + head_mask: np.ndarray | tf.Tensor | None = None, + inputs_embeds: np.ndarray | tf.Tensor | None = None, output_attentions: Optional[bool] = None, output_hidden_states: Optional[bool] = None, return_dict: Optional[bool] = None, - start_positions: Optional[Union[np.ndarray, tf.Tensor]] = None, - end_positions: Optional[Union[np.ndarray, tf.Tensor]] = None, + start_positions: np.ndarray | tf.Tensor | None = None, + end_positions: np.ndarray | tf.Tensor | None = None, training: Optional[bool] = False, ) -> Union[TFQuestionAnsweringModelOutput, Tuple[tf.Tensor]]: r""" @@ -1336,15 +1295,6 @@ def call( attentions=outputs.attentions, ) - # Copied from transformers.models.bert.modeling_tf_bert.TFBertForQuestionAnswering.serving_output - def serving_output(self, output: TFQuestionAnsweringModelOutput) -> TFQuestionAnsweringModelOutput: - hs = tf.convert_to_tensor(output.hidden_states) if self.config.output_hidden_states else None - attns = tf.convert_to_tensor(output.attentions) if self.config.output_attentions else None - - return TFQuestionAnsweringModelOutput( - start_logits=output.start_logits, end_logits=output.end_logits, hidden_states=hs, attentions=attns - ) - @add_start_docstrings( """ @@ -1367,16 +1317,6 @@ def __init__(self, config: AlbertConfig, *inputs, **kwargs): units=1, kernel_initializer=get_initializer(config.initializer_range), name="classifier" ) - @property - def dummy_inputs(self): - """ - Dummy inputs to build the network. - - Returns: - tf.Tensor with dummy inputs - """ - return {"input_ids": tf.constant(MULTIPLE_CHOICE_DUMMY_INPUTS, dtype=tf.int32)} - @unpack_inputs @add_start_docstrings_to_model_forward(ALBERT_INPUTS_DOCSTRING.format("batch_size, num_choices, sequence_length")) @add_code_sample_docstrings( @@ -1386,16 +1326,16 @@ def dummy_inputs(self): ) def call( self, - input_ids: Optional[TFModelInputType] = None, - attention_mask: Optional[Union[np.ndarray, tf.Tensor]] = None, - token_type_ids: Optional[Union[np.ndarray, tf.Tensor]] = None, - position_ids: Optional[Union[np.ndarray, tf.Tensor]] = None, - head_mask: Optional[Union[np.ndarray, tf.Tensor]] = None, - inputs_embeds: Optional[Union[np.ndarray, tf.Tensor]] = None, + input_ids: TFModelInputType | None = None, + attention_mask: np.ndarray | tf.Tensor | None = None, + token_type_ids: np.ndarray | tf.Tensor | None = None, + position_ids: np.ndarray | tf.Tensor | None = None, + head_mask: np.ndarray | tf.Tensor | None = None, + inputs_embeds: np.ndarray | tf.Tensor | None = None, output_attentions: Optional[bool] = None, output_hidden_states: Optional[bool] = None, return_dict: Optional[bool] = None, - labels: Optional[Union[np.ndarray, tf.Tensor]] = None, + labels: np.ndarray | tf.Tensor | None = None, training: Optional[bool] = False, ) -> Union[TFMultipleChoiceModelOutput, Tuple[tf.Tensor]]: r""" @@ -1454,25 +1394,3 @@ def call( hidden_states=outputs.hidden_states, attentions=outputs.attentions, ) - - @tf.function( - input_signature=[ - { - "input_ids": tf.TensorSpec((None, None, None), tf.int32, name="input_ids"), - "attention_mask": tf.TensorSpec((None, None, None), tf.int32, name="attention_mask"), - "token_type_ids": tf.TensorSpec((None, None, None), tf.int32, name="token_type_ids"), - } - ] - ) - # Copied from transformers.models.bert.modeling_tf_bert.TFBertForMultipleChoice.serving - def serving(self, inputs: Dict[str, tf.Tensor]) -> TFMultipleChoiceModelOutput: - output = self.call(input_ids=inputs) - - return self.serving_output(output) - - # Copied from transformers.models.bert.modeling_tf_bert.TFBertForMultipleChoice.serving_output - def serving_output(self, output: TFMultipleChoiceModelOutput) -> TFMultipleChoiceModelOutput: - hs = tf.convert_to_tensor(output.hidden_states) if self.config.output_hidden_states else None - attns = tf.convert_to_tensor(output.attentions) if self.config.output_attentions else None - - return TFMultipleChoiceModelOutput(logits=output.logits, hidden_states=hs, attentions=attns) diff --git a/src/transformers/models/align/configuration_align.py b/src/transformers/models/align/configuration_align.py index 488b7f6fe458ac..0436b278f0a737 100644 --- a/src/transformers/models/align/configuration_align.py +++ b/src/transformers/models/align/configuration_align.py @@ -184,7 +184,7 @@ class AlignVisionConfig(PretrainedConfig): List of output channel sizes to be used in each block for convolutional layers. depthwise_padding (`List[int]`, *optional*, defaults to `[]`): List of block indices with square padding. - strides: (`List[int]`, *optional*, defaults to `[1, 2, 2, 2, 1, 2, 1]`): + strides (`List[int]`, *optional*, defaults to `[1, 2, 2, 2, 1, 2, 1]`): List of stride sizes to be used in each block for convolutional layers. num_block_repeats (`List[int]`, *optional*, defaults to `[1, 2, 2, 3, 3, 4, 1]`): List of the number of times each block is to repeated. diff --git a/src/transformers/models/audio_spectrogram_transformer/feature_extraction_audio_spectrogram_transformer.py b/src/transformers/models/audio_spectrogram_transformer/feature_extraction_audio_spectrogram_transformer.py index deda2fc7781b28..786548fd2336e9 100644 --- a/src/transformers/models/audio_spectrogram_transformer/feature_extraction_audio_spectrogram_transformer.py +++ b/src/transformers/models/audio_spectrogram_transformer/feature_extraction_audio_spectrogram_transformer.py @@ -135,7 +135,8 @@ def __call__( Args: raw_speech (`np.ndarray`, `List[float]`, `List[np.ndarray]`, `List[List[float]]`): The sequence or batch of sequences to be padded. Each sequence can be a numpy array, a list of float - values, a list of numpy arrays or a list of list of float values. + values, a list of numpy arrays or a list of list of float values. Must be mono channel audio, not + stereo, i.e. single float per timestep. sampling_rate (`int`, *optional*): The sampling rate at which the `raw_speech` input was sampled. It is strongly recommended to pass `sampling_rate` at the forward call to prevent silent errors. @@ -160,9 +161,11 @@ def __call__( "Failing to do so can result in silent errors that might be hard to debug." ) - is_batched = bool( - isinstance(raw_speech, (list, tuple)) - and (isinstance(raw_speech[0], np.ndarray) or isinstance(raw_speech[0], (tuple, list))) + is_batched_numpy = isinstance(raw_speech, np.ndarray) and len(raw_speech.shape) > 1 + if is_batched_numpy and len(raw_speech.shape) > 2: + raise ValueError(f"Only mono-channel audio is supported for input to {self}") + is_batched = is_batched_numpy or ( + isinstance(raw_speech, (list, tuple)) and (isinstance(raw_speech[0], (np.ndarray, tuple, list))) ) if is_batched: diff --git a/src/transformers/models/auto/auto_factory.py b/src/transformers/models/auto/auto_factory.py index aad113d454428b..eedecb0da9c7a8 100644 --- a/src/transformers/models/auto/auto_factory.py +++ b/src/transformers/models/auto/auto_factory.py @@ -128,6 +128,11 @@ Whether or not to allow for custom models defined on the Hub in their own modeling files. This option should only be set to `True` for repositories you trust and in which you have read the code, as it will execute code present on the Hub on your local machine. + code_revision (`str`, *optional*, defaults to `"main"`): + The specific revision to use for the code on the Hub, if the code leaves in a different repository than + the rest of the model. It can be a branch name, a tag name, or a commit id, since we use a git-based + system for storing models and other artifacts on huggingface.co, so `revision` can be any identifier + allowed by git. kwargs (additional keyword arguments, *optional*): Can be used to update the configuration object (after it being loaded) and initiate the model (e.g., `output_attentions=True`). Behaves differently depending on whether a `config` is provided or @@ -224,6 +229,11 @@ Whether or not to allow for custom models defined on the Hub in their own modeling files. This option should only be set to `True` for repositories you trust and in which you have read the code, as it will execute code present on the Hub on your local machine. + code_revision (`str`, *optional*, defaults to `"main"`): + The specific revision to use for the code on the Hub, if the code leaves in a different repository than + the rest of the model. It can be a branch name, a tag name, or a commit id, since we use a git-based + system for storing models and other artifacts on huggingface.co, so `revision` can be any identifier + allowed by git. kwargs (additional keyword arguments, *optional*): Can be used to update the configuration object (after it being loaded) and initiate the model (e.g., `output_attentions=True`). Behaves differently depending on whether a `config` is provided or @@ -320,6 +330,11 @@ Whether or not to allow for custom models defined on the Hub in their own modeling files. This option should only be set to `True` for repositories you trust and in which you have read the code, as it will execute code present on the Hub on your local machine. + code_revision (`str`, *optional*, defaults to `"main"`): + The specific revision to use for the code on the Hub, if the code leaves in a different repository than + the rest of the model. It can be a branch name, a tag name, or a commit id, since we use a git-based + system for storing models and other artifacts on huggingface.co, so `revision` can be any identifier + allowed by git. kwargs (additional keyword arguments, *optional*): Can be used to update the configuration object (after it being loaded) and initiate the model (e.g., `output_attentions=True`). Behaves differently depending on whether a `config` is provided or @@ -408,6 +423,7 @@ def from_config(cls, config, **kwargs): else: repo_id = config.name_or_path model_class = get_class_from_dynamic_module(class_ref, repo_id, **kwargs) + _ = kwargs.pop("code_revision", None) return model_class._from_config(config, **kwargs) elif type(config) in cls._model_mapping.keys(): model_class = _get_model_class(config, cls._model_mapping) @@ -425,6 +441,7 @@ def from_pretrained(cls, pretrained_model_name_or_path, *model_args, **kwargs): kwargs["_from_auto"] = True hub_kwargs_names = [ "cache_dir", + "code_revision", "force_download", "local_files_only", "proxies", @@ -464,6 +481,7 @@ def from_pretrained(cls, pretrained_model_name_or_path, *model_args, **kwargs): model_class = get_class_from_dynamic_module( class_ref, pretrained_model_name_or_path, **hub_kwargs, **kwargs ) + _ = hub_kwargs.pop("code_revision", None) return model_class.from_pretrained( pretrained_model_name_or_path, *model_args, config=config, **hub_kwargs, **kwargs ) diff --git a/src/transformers/models/auto/configuration_auto.py b/src/transformers/models/auto/configuration_auto.py index 80f7af5caad39d..fbffe838226658 100755 --- a/src/transformers/models/auto/configuration_auto.py +++ b/src/transformers/models/auto/configuration_auto.py @@ -33,6 +33,7 @@ ("align", "AlignConfig"), ("altclip", "AltCLIPConfig"), ("audio-spectrogram-transformer", "ASTConfig"), + ("autoformer", "AutoformerConfig"), ("bart", "BartConfig"), ("beit", "BeitConfig"), ("bert", "BertConfig"), @@ -225,6 +226,7 @@ ("align", "ALIGN_PRETRAINED_CONFIG_ARCHIVE_MAP"), ("altclip", "ALTCLIP_PRETRAINED_CONFIG_ARCHIVE_MAP"), ("audio-spectrogram-transformer", "AUDIO_SPECTROGRAM_TRANSFORMER_PRETRAINED_CONFIG_ARCHIVE_MAP"), + ("autoformer", "AUTOFORMER_PRETRAINED_CONFIG_ARCHIVE_MAP"), ("bart", "BART_PRETRAINED_CONFIG_ARCHIVE_MAP"), ("beit", "BEIT_PRETRAINED_CONFIG_ARCHIVE_MAP"), ("bert", "BERT_PRETRAINED_CONFIG_ARCHIVE_MAP"), @@ -399,6 +401,7 @@ ("align", "ALIGN"), ("altclip", "AltCLIP"), ("audio-spectrogram-transformer", "Audio Spectrogram Transformer"), + ("autoformer", "Autoformer"), ("bart", "BART"), ("barthez", "BARThez"), ("bartpho", "BARTpho"), @@ -938,6 +941,7 @@ def from_pretrained(cls, pretrained_model_name_or_path, **kwargs): ) class_ref = config_dict["auto_map"]["AutoConfig"] config_class = get_class_from_dynamic_module(class_ref, pretrained_model_name_or_path, **kwargs) + _ = kwargs.pop("code_revision", None) return config_class.from_pretrained(pretrained_model_name_or_path, **kwargs) elif "model_type" in config_dict: config_class = CONFIG_MAPPING[config_dict["model_type"]] diff --git a/src/transformers/models/auto/feature_extraction_auto.py b/src/transformers/models/auto/feature_extraction_auto.py index ff9dec171f43c4..588f4b4d3d59b5 100644 --- a/src/transformers/models/auto/feature_extraction_auto.py +++ b/src/transformers/models/auto/feature_extraction_auto.py @@ -337,6 +337,7 @@ def from_pretrained(cls, pretrained_model_name_or_path, **kwargs): feature_extractor_class = get_class_from_dynamic_module( feature_extractor_auto_map, pretrained_model_name_or_path, **kwargs ) + _ = kwargs.pop("code_revision", None) else: feature_extractor_class = feature_extractor_class_from_name(feature_extractor_class) diff --git a/src/transformers/models/auto/image_processing_auto.py b/src/transformers/models/auto/image_processing_auto.py index d3d2944ff823f8..d3c6615527f35c 100644 --- a/src/transformers/models/auto/image_processing_auto.py +++ b/src/transformers/models/auto/image_processing_auto.py @@ -361,6 +361,7 @@ def from_pretrained(cls, pretrained_model_name_or_path, **kwargs): image_processor_class = get_class_from_dynamic_module( image_processor_auto_map, pretrained_model_name_or_path, **kwargs ) + _ = kwargs.pop("code_revision", None) else: image_processor_class = image_processor_class_from_name(image_processor_class) diff --git a/src/transformers/models/auto/modeling_auto.py b/src/transformers/models/auto/modeling_auto.py index 2fb8f1172f39f2..d6581f94d8630a 100755 --- a/src/transformers/models/auto/modeling_auto.py +++ b/src/transformers/models/auto/modeling_auto.py @@ -32,6 +32,7 @@ ("align", "AlignModel"), ("altclip", "AltCLIPModel"), ("audio-spectrogram-transformer", "ASTModel"), + ("autoformer", "AutoformerModel"), ("bart", "BartModel"), ("beit", "BeitModel"), ("bert", "BertModel"), @@ -529,6 +530,8 @@ [ ("blip", "BlipForConditionalGeneration"), ("blip-2", "Blip2ForConditionalGeneration"), + ("git", "GitForCausalLM"), + ("pix2struct", "Pix2StructForConditionalGeneration"), ("vision-encoder-decoder", "VisionEncoderDecoderModel"), ] ) diff --git a/src/transformers/models/auto/processing_auto.py b/src/transformers/models/auto/processing_auto.py index d96757bc13ad5e..e72815747fa333 100644 --- a/src/transformers/models/auto/processing_auto.py +++ b/src/transformers/models/auto/processing_auto.py @@ -259,6 +259,7 @@ def from_pretrained(cls, pretrained_model_name_or_path, **kwargs): processor_class = get_class_from_dynamic_module( processor_auto_map, pretrained_model_name_or_path, **kwargs ) + _ = kwargs.pop("code_revision", None) else: processor_class = processor_class_from_name(processor_class) diff --git a/src/transformers/models/auto/tokenization_auto.py b/src/transformers/models/auto/tokenization_auto.py index cb6c91521de91b..aa4d5860a14770 100644 --- a/src/transformers/models/auto/tokenization_auto.py +++ b/src/transformers/models/auto/tokenization_auto.py @@ -678,6 +678,7 @@ def from_pretrained(cls, pretrained_model_name_or_path, *inputs, **kwargs): else: class_ref = tokenizer_auto_map[0] tokenizer_class = get_class_from_dynamic_module(class_ref, pretrained_model_name_or_path, **kwargs) + _ = kwargs.pop("code_revision", None) elif use_fast and not config_tokenizer_class.endswith("Fast"): tokenizer_class_candidate = f"{config_tokenizer_class}Fast" diff --git a/src/transformers/models/autoformer/__init__.py b/src/transformers/models/autoformer/__init__.py new file mode 100644 index 00000000000000..f87bfdea532d61 --- /dev/null +++ b/src/transformers/models/autoformer/__init__.py @@ -0,0 +1,63 @@ +# Copyright 2023 The HuggingFace Team. All rights reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +from typing import TYPE_CHECKING + +# rely on isort to merge the imports +from ...utils import OptionalDependencyNotAvailable, _LazyModule, is_torch_available + + +_import_structure = { + "configuration_autoformer": [ + "AUTOFORMER_PRETRAINED_CONFIG_ARCHIVE_MAP", + "AutoformerConfig", + ], +} + +try: + if not is_torch_available(): + raise OptionalDependencyNotAvailable() +except OptionalDependencyNotAvailable: + pass +else: + _import_structure["modeling_autoformer"] = [ + "AUTOFORMER_PRETRAINED_MODEL_ARCHIVE_LIST", + "AutoformerForPrediction", + "AutoformerModel", + "AutoformerPreTrainedModel", + ] + + +if TYPE_CHECKING: + from .configuration_autoformer import ( + AUTOFORMER_PRETRAINED_CONFIG_ARCHIVE_MAP, + AutoformerConfig, + ) + + try: + if not is_torch_available(): + raise OptionalDependencyNotAvailable() + except OptionalDependencyNotAvailable: + pass + else: + from .modeling_autoformer import ( + AUTOFORMER_PRETRAINED_MODEL_ARCHIVE_LIST, + AutoformerForPrediction, + AutoformerModel, + AutoformerPreTrainedModel, + ) + +else: + import sys + + sys.modules[__name__] = _LazyModule(__name__, globals()["__file__"], _import_structure, module_spec=__spec__) diff --git a/src/transformers/models/autoformer/configuration_autoformer.py b/src/transformers/models/autoformer/configuration_autoformer.py new file mode 100644 index 00000000000000..ced76448cd1e5d --- /dev/null +++ b/src/transformers/models/autoformer/configuration_autoformer.py @@ -0,0 +1,245 @@ +# coding=utf-8 +# Copyright 2023 The HuggingFace Inc. team. All rights reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +""" Autoformer model configuration""" + +from typing import List, Optional + +from ...configuration_utils import PretrainedConfig +from ...utils import logging + + +logger = logging.get_logger(__name__) + +AUTOFORMER_PRETRAINED_CONFIG_ARCHIVE_MAP = { + "huggingface/autoformer-tourism-monthly": "https://huggingface.co/huggingface/autoformer-tourism-monthly/resolve/main/config.json", +} + + +class AutoformerConfig(PretrainedConfig): + r""" + This is the configuration class to store the configuration of an [`AutoformerModel`]. It is used to instantiate an + Autoformer model according to the specified arguments, defining the model architecture. Instantiating a + configuration with the defaults will yield a similar configuration to that of the Autoformer + [huggingface/autoformer-tourism-monthly](https://huggingface.co/huggingface/autoformer-tourism-monthly) + architecture. + + Configuration objects inherit from [`PretrainedConfig`] can be used to control the model outputs. Read the + documentation from [`PretrainedConfig`] for more information. + + Args: + prediction_length (`int`): + The prediction length for the decoder. In other words, the prediction horizon of the model. + context_length (`int`, *optional*, defaults to `prediction_length`): + The context length for the encoder. If unset, the context length will be the same as the + `prediction_length`. + distribution_output (`string`, *optional*, defaults to `"student_t"`): + The distribution emission head for the model. Could be either "student_t", "normal" or "negative_binomial". + loss (`string`, *optional*, defaults to `"nll"`): + The loss function for the model corresponding to the `distribution_output` head. For parametric + distributions it is the negative log likelihood (nll) - which currently is the only supported one. + input_size (`int`, *optional*, defaults to 1): + The size of the target variable which by default is 1 for univariate targets. Would be > 1 in case of + multivariate targets. + lags_sequence (`list[int]`, *optional*, defaults to `[1, 2, 3, 4, 5, 6, 7]`): + The lags of the input time series as covariates often dictated by the frequency. Default is `[1, 2, 3, 4, + 5, 6, 7]`. + scaling (`bool`, *optional* defaults to `True`): + Whether to scale the input targets. + num_time_features (`int`, *optional*, defaults to 0): + The number of time features in the input time series. + num_dynamic_real_features (`int`, *optional*, defaults to 0): + The number of dynamic real valued features. + num_static_categorical_features (`int`, *optional*, defaults to 0): + The number of static categorical features. + num_static_real_features (`int`, *optional*, defaults to 0): + The number of static real valued features. + cardinality (`list[int]`, *optional*): + The cardinality (number of different values) for each of the static categorical features. Should be a list + of integers, having the same length as `num_static_categorical_features`. Cannot be `None` if + `num_static_categorical_features` is > 0. + embedding_dimension (`list[int]`, *optional*): + The dimension of the embedding for each of the static categorical features. Should be a list of integers, + having the same length as `num_static_categorical_features`. Cannot be `None` if + `num_static_categorical_features` is > 0. + d_model (`int`, *optional*, defaults to 64): + Dimensionality of the transformer layers. + encoder_layers (`int`, *optional*, defaults to 2): + Number of encoder layers. + decoder_layers (`int`, *optional*, defaults to 2): + Number of decoder layers. + encoder_attention_heads (`int`, *optional*, defaults to 2): + Number of attention heads for each attention layer in the Transformer encoder. + decoder_attention_heads (`int`, *optional*, defaults to 2): + Number of attention heads for each attention layer in the Transformer decoder. + encoder_ffn_dim (`int`, *optional*, defaults to 32): + Dimension of the "intermediate" (often named feed-forward) layer in encoder. + decoder_ffn_dim (`int`, *optional*, defaults to 32): + Dimension of the "intermediate" (often named feed-forward) layer in decoder. + activation_function (`str` or `function`, *optional*, defaults to `"gelu"`): + The non-linear activation function (function or string) in the encoder and decoder. If string, `"gelu"` and + `"relu"` are supported. + dropout (`float`, *optional*, defaults to 0.1): + The dropout probability for all fully connected layers in the encoder, and decoder. + encoder_layerdrop (`float`, *optional*, defaults to 0.1): + The dropout probability for the attention and fully connected layers for each encoder layer. + decoder_layerdrop (`float`, *optional*, defaults to 0.1): + The dropout probability for the attention and fully connected layers for each decoder layer. + attention_dropout (`float`, *optional*, defaults to 0.1): + The dropout probability for the attention probabilities. + activation_dropout (`float`, *optional*, defaults to 0.1): + The dropout probability used between the two layers of the feed-forward networks. + num_parallel_samples (`int`, *optional*, defaults to 100): + The number of samples to generate in parallel for each time step of inference. + init_std (`float`, *optional*, defaults to 0.02): + The standard deviation of the truncated normal weight initialization distribution. + use_cache (`bool`, *optional*, defaults to `True`): + Whether to use the past key/values attentions (if applicable to the model) to speed up decoding. + label_length (`int`, *optional*, defaults to 10): + Start token length of the Autoformer decoder, which is used for direct multi-step prediction (i.e. + non-autoregressive generation). + moving_average (`int`, defaults to 25): + The window size of the moving average. In practice, it's the kernel size in AvgPool1d of the Decomposition + Layer. + autocorrelation_factor (`int`, defaults to 3): + "Attention" (i.e. AutoCorrelation mechanism) factor which is used to find top k autocorrelations delays. + It's recommended in the paper to set it to a number between 1 and 5. + + + Example: + + ```python + >>> from transformers import AutoformerConfig, AutoformerModel + + >>> # Initializing a default Autoformer configuration + >>> configuration = AutoformerConfig() + + >>> # Randomly initializing a model (with random weights) from the configuration + >>> model = AutoformerModel(configuration) + + >>> # Accessing the model configuration + >>> configuration = model.config + ```""" + model_type = "autoformer" + attribute_map = { + "hidden_size": "d_model", + "num_attention_heads": "encoder_attention_heads", + "num_hidden_layers": "encoder_layers", + } + + def __init__( + self, + prediction_length: Optional[int] = None, + context_length: Optional[int] = None, + distribution_output: str = "student_t", + loss: str = "nll", + input_size: int = 1, + lags_sequence: List[int] = [1, 2, 3, 4, 5, 6, 7], + scaling: bool = True, + num_time_features: int = 0, + num_dynamic_real_features: int = 0, + num_static_categorical_features: int = 0, + num_static_real_features: int = 0, + cardinality: Optional[List[int]] = None, + embedding_dimension: Optional[List[int]] = None, + d_model: int = 64, + encoder_attention_heads: int = 2, + decoder_attention_heads: int = 2, + encoder_layers: int = 2, + decoder_layers: int = 2, + encoder_ffn_dim: int = 32, + decoder_ffn_dim: int = 32, + activation_function: str = "gelu", + dropout: float = 0.1, + encoder_layerdrop: float = 0.1, + decoder_layerdrop: float = 0.1, + attention_dropout: float = 0.1, + activation_dropout: float = 0.1, + num_parallel_samples: int = 100, + init_std: float = 0.02, + use_cache: bool = True, + is_encoder_decoder=True, + # Autoformer arguments + label_length: int = 10, + moving_average: int = 25, + autocorrelation_factor: int = 3, + **kwargs, + ): + # time series specific configuration + self.prediction_length = prediction_length + self.context_length = context_length if context_length is not None else prediction_length + self.distribution_output = distribution_output + self.loss = loss + self.input_size = input_size + self.num_time_features = num_time_features + self.lags_sequence = lags_sequence + self.scaling = scaling + self.num_dynamic_real_features = num_dynamic_real_features + self.num_static_real_features = num_static_real_features + self.num_static_categorical_features = num_static_categorical_features + if cardinality is not None and num_static_categorical_features > 0: + if len(cardinality) != num_static_categorical_features: + raise ValueError( + "The cardinality should be a list of the same length as `num_static_categorical_features`" + ) + self.cardinality = cardinality + else: + self.cardinality = [0] + if embedding_dimension is not None and num_static_categorical_features > 0: + if len(embedding_dimension) != num_static_categorical_features: + raise ValueError( + "The embedding dimension should be a list of the same length as `num_static_categorical_features`" + ) + self.embedding_dimension = embedding_dimension + else: + self.embedding_dimension = [min(50, (cat + 1) // 2) for cat in self.cardinality] + self.num_parallel_samples = num_parallel_samples + + # Transformer architecture configuration + self.feature_size = input_size * len(self.lags_sequence) + self._number_of_features + self.d_model = d_model + self.encoder_attention_heads = encoder_attention_heads + self.decoder_attention_heads = decoder_attention_heads + self.encoder_ffn_dim = encoder_ffn_dim + self.decoder_ffn_dim = decoder_ffn_dim + self.encoder_layers = encoder_layers + self.decoder_layers = decoder_layers + + self.dropout = dropout + self.attention_dropout = attention_dropout + self.activation_dropout = activation_dropout + self.encoder_layerdrop = encoder_layerdrop + self.decoder_layerdrop = decoder_layerdrop + + self.activation_function = activation_function + self.init_std = init_std + + self.use_cache = use_cache + + # Autoformer + self.label_length = label_length + self.moving_average = moving_average + self.autocorrelation_factor = autocorrelation_factor + + super().__init__(is_encoder_decoder=is_encoder_decoder, **kwargs) + + @property + def _number_of_features(self) -> int: + return ( + sum(self.embedding_dimension) + + self.num_dynamic_real_features + + self.num_time_features + + self.num_static_real_features + + self.input_size * 2 # the log1p(abs(loc)) and log(scale) features + ) diff --git a/src/transformers/models/autoformer/modeling_autoformer.py b/src/transformers/models/autoformer/modeling_autoformer.py new file mode 100644 index 00000000000000..a77920fb9d6e2a --- /dev/null +++ b/src/transformers/models/autoformer/modeling_autoformer.py @@ -0,0 +1,2178 @@ +# coding=utf-8 +# Copyright (c) 2021 THUML @ Tsinghua University +# Copyright 2023 Amazon.com, Inc. or its affiliates. All Rights Reserved. +# Copyright 2023 The HuggingFace Inc. team. All rights reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +""" PyTorch Autoformer model.""" + +import math +import random +from dataclasses import dataclass +from typing import List, Optional, Tuple, Union + +import numpy as np +import torch +from torch import nn + +from ...activations import ACT2FN +from ...modeling_outputs import ( + BaseModelOutput, + ModelOutput, + SampleTSPredictionOutput, + Seq2SeqTSPredictionOutput, +) +from ...modeling_utils import PreTrainedModel +from ...time_series_utils import NegativeBinomialOutput, NormalOutput, StudentTOutput +from ...utils import add_start_docstrings, add_start_docstrings_to_model_forward, logging, replace_return_docstrings +from .configuration_autoformer import AutoformerConfig + + +logger = logging.get_logger(__name__) + +_CONFIG_FOR_DOC = "AutoformerConfig" + + +@dataclass +class AutoFormerDecoderOutput(ModelOutput): + """ + Base class for model's outputs that may also contain a past key/values (to speed up sequential decoding). + + Args: + last_hidden_state (`torch.FloatTensor` of shape `(batch_size, sequence_length, hidden_size)`): + Sequence of hidden-states at the output of the last layer of the model. + + If `past_key_values` is used only the last hidden-state of the sequences of shape `(batch_size, 1, + hidden_size)` is output. + trend (`torch.FloatTensor` of shape `(batch_size, sequence_length, hidden_size)`): + Trend tensor for each time series. + past_key_values (`tuple(tuple(torch.FloatTensor))`, *optional*, returned when `use_cache=True` is passed or when `config.use_cache=True`): + Tuple of `tuple(torch.FloatTensor)` of length `config.n_layers`, with each tuple having 2 tensors of shape + `(batch_size, num_heads, sequence_length, embed_size_per_head)`) and optionally if + `config.is_encoder_decoder=True` 2 additional tensors of shape `(batch_size, num_heads, + encoder_sequence_length, embed_size_per_head)`. + + Contains pre-computed hidden-states (key and values in the self-attention blocks and optionally if + `config.is_encoder_decoder=True` in the cross-attention blocks) that can be used (see `past_key_values` + input) to speed up sequential decoding. + hidden_states (`tuple(torch.FloatTensor)`, *optional*, returned when `output_hidden_states=True` is passed or when `config.output_hidden_states=True`): + Tuple of `torch.FloatTensor` (one for the output of the embeddings, if the model has an embedding layer, + + one for the output of each layer) of shape `(batch_size, sequence_length, hidden_size)`. + + Hidden-states of the model at the output of each layer plus the optional initial embedding outputs. + attentions (`tuple(torch.FloatTensor)`, *optional*, returned when `output_attentions=True` is passed or when `config.output_attentions=True`): + Tuple of `torch.FloatTensor` (one for each layer) of shape `(batch_size, num_heads, sequence_length, + sequence_length)`. + + Attentions weights after the attention softmax, used to compute the weighted average in the self-attention + heads. + cross_attentions (`tuple(torch.FloatTensor)`, *optional*, returned when `output_attentions=True` and `config.add_cross_attention=True` is passed or when `config.output_attentions=True`): + Tuple of `torch.FloatTensor` (one for each layer) of shape `(batch_size, num_heads, sequence_length, + sequence_length)`. + + Attentions weights of the decoder's cross-attention layer, after the attention softmax, used to compute the + weighted average in the cross-attention heads. + """ + + last_hidden_state: torch.FloatTensor = None + trend: torch.FloatTensor = None + past_key_values: Optional[Tuple[Tuple[torch.FloatTensor]]] = None + hidden_states: Optional[Tuple[torch.FloatTensor]] = None + attentions: Optional[Tuple[torch.FloatTensor]] = None + cross_attentions: Optional[Tuple[torch.FloatTensor]] = None + + +@dataclass +class AutoformerModelOutput(ModelOutput): + """ + Autoformer model output that contains the additional trend output. + + Args: + last_hidden_state (`torch.FloatTensor` of shape `(batch_size, sequence_length, hidden_size)`): + Sequence of hidden-states at the output of the last layer of the decoder of the model. + + If `past_key_values` is used only the last hidden-state of the sequences of shape `(batch_size, 1, + hidden_size)` is output. + trend (`torch.FloatTensor` of shape `(batch_size, sequence_length, hidden_size)`): + Trend tensor for each time series. + past_key_values (`tuple(tuple(torch.FloatTensor))`, *optional*, returned when `use_cache=True` is passed or when `config.use_cache=True`): + Tuple of `tuple(torch.FloatTensor)` of length `config.n_layers`, with each tuple having 2 tensors of shape + `(batch_size, num_heads, sequence_length, embed_size_per_head)`) and 2 additional tensors of shape + `(batch_size, num_heads, encoder_sequence_length, embed_size_per_head)`. + + Contains pre-computed hidden-states (key and values in the self-attention blocks and in the cross-attention + blocks) that can be used (see `past_key_values` input) to speed up sequential decoding. + decoder_hidden_states (`tuple(torch.FloatTensor)`, *optional*, returned when `output_hidden_states=True` is passed or when `config.output_hidden_states=True`): + Tuple of `torch.FloatTensor` (one for the output of the embeddings, if the model has an embedding layer, + + one for the output of each layer) of shape `(batch_size, sequence_length, hidden_size)`. + + Hidden-states of the decoder at the output of each layer plus the optional initial embedding outputs. + decoder_attentions (`tuple(torch.FloatTensor)`, *optional*, returned when `output_attentions=True` is passed or when `config.output_attentions=True`): + Tuple of `torch.FloatTensor` (one for each layer) of shape `(batch_size, num_heads, sequence_length, + sequence_length)`. + + Attentions weights of the decoder, after the attention softmax, used to compute the weighted average in the + self-attention heads. + cross_attentions (`tuple(torch.FloatTensor)`, *optional*, returned when `output_attentions=True` is passed or when `config.output_attentions=True`): + Tuple of `torch.FloatTensor` (one for each layer) of shape `(batch_size, num_heads, sequence_length, + sequence_length)`. + + Attentions weights of the decoder's cross-attention layer, after the attention softmax, used to compute the + weighted average in the cross-attention heads. + encoder_last_hidden_state (`torch.FloatTensor` of shape `(batch_size, sequence_length, hidden_size)`, *optional*): + Sequence of hidden-states at the output of the last layer of the encoder of the model. + encoder_hidden_states (`tuple(torch.FloatTensor)`, *optional*, returned when `output_hidden_states=True` is passed or when `config.output_hidden_states=True`): + Tuple of `torch.FloatTensor` (one for the output of the embeddings, if the model has an embedding layer, + + one for the output of each layer) of shape `(batch_size, sequence_length, hidden_size)`. + + Hidden-states of the encoder at the output of each layer plus the optional initial embedding outputs. + encoder_attentions (`tuple(torch.FloatTensor)`, *optional*, returned when `output_attentions=True` is passed or when `config.output_attentions=True`): + Tuple of `torch.FloatTensor` (one for each layer) of shape `(batch_size, num_heads, sequence_length, + sequence_length)`. + + Attentions weights of the encoder, after the attention softmax, used to compute the weighted average in the + self-attention heads. + loc (`torch.FloatTensor` of shape `(batch_size,)` or `(batch_size, input_size)`, *optional*): + Shift values of each time series' context window which is used to give the model inputs of the same + magnitude and then used to shift back to the original magnitude. + scale (`torch.FloatTensor` of shape `(batch_size,)` or `(batch_size, input_size)`, *optional*): + Scaling values of each time series' context window which is used to give the model inputs of the same + magnitude and then used to rescale back to the original magnitude. + static_features: (`torch.FloatTensor` of shape `(batch_size, feature size)`, *optional*): + Static features of each time series' in a batch which are copied to the covariates at inference time. + """ + + last_hidden_state: torch.FloatTensor = None + trend: torch.FloatTensor = None + past_key_values: Optional[Tuple[Tuple[torch.FloatTensor]]] = None + decoder_hidden_states: Optional[Tuple[torch.FloatTensor]] = None + decoder_attentions: Optional[Tuple[torch.FloatTensor]] = None + cross_attentions: Optional[Tuple[torch.FloatTensor]] = None + encoder_last_hidden_state: Optional[torch.FloatTensor] = None + encoder_hidden_states: Optional[Tuple[torch.FloatTensor]] = None + encoder_attentions: Optional[Tuple[torch.FloatTensor]] = None + loc: Optional[torch.FloatTensor] = None + scale: Optional[torch.FloatTensor] = None + static_features: Optional[torch.FloatTensor] = None + + +AUTOFORMER_PRETRAINED_MODEL_ARCHIVE_LIST = [ + "huggingface/autoformer-tourism-monthly", + # See all Autoformer models at https://huggingface.co/models?filter=autoformer +] + + +# Copied from transformers.models.time_series_transformer.modeling_time_series_transformer.TimeSeriesFeatureEmbedder with TimeSeries->Autoformer +class AutoformerFeatureEmbedder(nn.Module): + """ + Embed a sequence of categorical features. + + Args: + cardinalities (`list[int]`): + List of cardinalities of the categorical features. + embedding_dims (`list[int]`): + List of embedding dimensions of the categorical features. + """ + + def __init__(self, cardinalities: List[int], embedding_dims: List[int]) -> None: + super().__init__() + + self.num_features = len(cardinalities) + self.embedders = nn.ModuleList([nn.Embedding(c, d) for c, d in zip(cardinalities, embedding_dims)]) + + def forward(self, features: torch.Tensor) -> torch.Tensor: + if self.num_features > 1: + # we slice the last dimension, giving an array of length + # self.num_features with shape (N,T) or (N) + cat_feature_slices = torch.chunk(features, self.num_features, dim=-1) + else: + cat_feature_slices = [features] + + return torch.cat( + [ + embed(cat_feature_slice.squeeze(-1)) + for embed, cat_feature_slice in zip(self.embedders, cat_feature_slices) + ], + dim=-1, + ) + + +# Copied from transformers.models.time_series_transformer.modeling_time_series_transformer.TimeSeriesStdScaler with TimeSeries->Autoformer +class AutoformerStdScaler(nn.Module): + """ + Standardize features by calculating the mean and scaling along some given dimension `dim`, and then normalizes it + by subtracting from the mean and dividing by the standard deviation. + + Args: + dim (`int`): + Dimension along which to calculate the mean and standard deviation. + keepdim (`bool`, *optional*, defaults to `False`): + Controls whether to retain dimension `dim` (of length 1) in the scale tensor, or suppress it. + minimum_scale (`float`, *optional*, defaults to 1e-5): + Default scale that is used for elements that are constantly zero along dimension `dim`. + """ + + def __init__(self, dim: int, keepdim: bool = False, minimum_scale: float = 1e-5): + super().__init__() + if not dim > 0: + raise ValueError("Cannot compute scale along dim = 0 (batch dimension), please provide dim > 0") + self.dim = dim + self.keepdim = keepdim + self.minimum_scale = minimum_scale + + @torch.no_grad() + def forward(self, data: torch.Tensor, weights: torch.Tensor) -> Tuple[torch.Tensor, torch.Tensor, torch.Tensor]: + denominator = weights.sum(self.dim, keepdim=self.keepdim) + denominator = denominator.clamp_min(1.0) + loc = (data * weights).sum(self.dim, keepdim=self.keepdim) / denominator + + variance = (((data - loc) * weights) ** 2).sum(self.dim, keepdim=self.keepdim) / denominator + scale = torch.sqrt(variance + self.minimum_scale) + return (data - loc) / scale, loc, scale + + +# Copied from transformers.models.time_series_transformer.modeling_time_series_transformer.TimeSeriesMeanScaler with TimeSeries->Autoformer +class AutoformerMeanScaler(nn.Module): + """ + Computes a scaling factor as the weighted average absolute value along dimension `dim`, and scales the data + accordingly. + + Args: + dim (`int`): + Dimension along which to compute the scale. + keepdim (`bool`, *optional*, defaults to `False`): + Controls whether to retain dimension `dim` (of length 1) in the scale tensor, or suppress it. + default_scale (`float`, *optional*, defaults to `None`): + Default scale that is used for elements that are constantly zero. If `None`, we use the scale of the batch. + minimum_scale (`float`, *optional*, defaults to 1e-10): + Default minimum possible scale that is used for any item. + """ + + def __init__( + self, dim: int = -1, keepdim: bool = True, default_scale: Optional[float] = None, minimum_scale: float = 1e-10 + ): + super().__init__() + self.dim = dim + self.keepdim = keepdim + self.minimum_scale = minimum_scale + self.default_scale = default_scale + + @torch.no_grad() + def forward( + self, data: torch.Tensor, observed_indicator: torch.Tensor + ) -> Tuple[torch.Tensor, torch.Tensor, torch.Tensor]: + # shape: (N, [C], T=1) + ts_sum = (data * observed_indicator).abs().sum(self.dim, keepdim=True) + num_observed = observed_indicator.sum(self.dim, keepdim=True) + + scale = ts_sum / torch.clamp(num_observed, min=1) + + # If `default_scale` is provided, we use it, otherwise we use the scale + # of the batch. + if self.default_scale is None: + batch_sum = ts_sum.sum(dim=0) + batch_observations = torch.clamp(num_observed.sum(0), min=1) + default_scale = torch.squeeze(batch_sum / batch_observations) + else: + default_scale = self.default_scale * torch.ones_like(scale) + + # apply default scale where there are no observations + scale = torch.where(num_observed > 0, scale, default_scale) + + # ensure the scale is at least `self.minimum_scale` + scale = torch.clamp(scale, min=self.minimum_scale) + scaled_data = data / scale + + if not self.keepdim: + scale = scale.squeeze(dim=self.dim) + + return scaled_data, torch.zeros_like(scale), scale + + +# Copied from transformers.models.time_series_transformer.modeling_time_series_transformer.TimeSeriesNOPScaler with TimeSeries->Autoformer +class AutoformerNOPScaler(nn.Module): + """ + Assigns a scaling factor equal to 1 along dimension `dim`, and therefore applies no scaling to the input data. + + Args: + dim (`int`): + Dimension along which to compute the scale. + keepdim (`bool`, *optional*, defaults to `False`): + Controls whether to retain dimension `dim` (of length 1) in the scale tensor, or suppress it. + """ + + def __init__(self, dim: int, keepdim: bool = False): + super().__init__() + self.dim = dim + self.keepdim = keepdim + + def forward( + self, data: torch.Tensor, observed_indicator: torch.Tensor + ) -> Tuple[torch.Tensor, torch.Tensor, torch.Tensor]: + scale = torch.ones_like(data, requires_grad=False).mean(dim=self.dim, keepdim=self.keepdim) + loc = torch.zeros_like(data, requires_grad=False).mean(dim=self.dim, keepdim=self.keepdim) + return data, loc, scale + + +# Copied from transformers.models.time_series_transformer.modeling_time_series_transformer.weighted_average +def weighted_average(input_tensor: torch.Tensor, weights: Optional[torch.Tensor] = None, dim=None) -> torch.Tensor: + """ + Computes the weighted average of a given tensor across a given `dim`, masking values associated with weight zero, + meaning instead of `nan * 0 = nan` you will get `0 * 0 = 0`. + + Args: + input_tensor (`torch.FloatTensor`): + Input tensor, of which the average must be computed. + weights (`torch.FloatTensor`, *optional*): + Weights tensor, of the same shape as `input_tensor`. + dim (`int`, *optional*): + The dim along which to average `input_tensor`. + + Returns: + `torch.FloatTensor`: The tensor with values averaged along the specified `dim`. + """ + if weights is not None: + weighted_tensor = torch.where(weights != 0, input_tensor * weights, torch.zeros_like(input_tensor)) + sum_weights = torch.clamp(weights.sum(dim=dim) if dim else weights.sum(), min=1.0) + return (weighted_tensor.sum(dim=dim) if dim else weighted_tensor.sum()) / sum_weights + else: + return input_tensor.mean(dim=dim) + + +# Copied from transformers.models.time_series_transformer.modeling_time_series_transformer.nll +def nll(input: torch.distributions.Distribution, target: torch.Tensor) -> torch.Tensor: + """ + Computes the negative log likelihood loss from input distribution with respect to target. + """ + return -input.log_prob(target) + + +# Copied from transformers.models.bart.modeling_bart._make_causal_mask +def _make_causal_mask( + input_ids_shape: torch.Size, dtype: torch.dtype, device: torch.device, past_key_values_length: int = 0 +): + """ + Make causal mask used for bi-directional self-attention. + """ + bsz, tgt_len = input_ids_shape + mask = torch.full((tgt_len, tgt_len), torch.tensor(torch.finfo(dtype).min, device=device), device=device) + mask_cond = torch.arange(mask.size(-1), device=device) + mask.masked_fill_(mask_cond < (mask_cond + 1).view(mask.size(-1), 1), 0) + mask = mask.to(dtype) + + if past_key_values_length > 0: + mask = torch.cat([torch.zeros(tgt_len, past_key_values_length, dtype=dtype, device=device), mask], dim=-1) + return mask[None, None, :, :].expand(bsz, 1, tgt_len, tgt_len + past_key_values_length) + + +# Copied from transformers.models.bart.modeling_bart._expand_mask +def _expand_mask(mask: torch.Tensor, dtype: torch.dtype, tgt_len: Optional[int] = None): + """ + Expands attention_mask from `[bsz, seq_len]` to `[bsz, 1, tgt_seq_len, src_seq_len]`. + """ + bsz, src_len = mask.size() + tgt_len = tgt_len if tgt_len is not None else src_len + + expanded_mask = mask[:, None, None, :].expand(bsz, 1, tgt_len, src_len).to(dtype) + + inverted_mask = 1.0 - expanded_mask + + return inverted_mask.masked_fill(inverted_mask.to(torch.bool), torch.finfo(dtype).min) + + +# Copied from transformers.models.marian.modeling_marian.MarianSinusoidalPositionalEmbedding with Marian->Autoformer +class AutoformerSinusoidalPositionalEmbedding(nn.Embedding): + """This module produces sinusoidal positional embeddings of any length.""" + + def __init__(self, num_positions: int, embedding_dim: int, padding_idx: Optional[int] = None) -> None: + super().__init__(num_positions, embedding_dim) + self.weight = self._init_weight(self.weight) + + @staticmethod + def _init_weight(out: nn.Parameter) -> nn.Parameter: + """ + Identical to the XLM create_sinusoidal_embeddings except features are not interleaved. The cos features are in + the 2nd half of the vector. [dim // 2:] + """ + n_pos, dim = out.shape + position_enc = np.array( + [[pos / np.power(10000, 2 * (j // 2) / dim) for j in range(dim)] for pos in range(n_pos)] + ) + out.requires_grad = False # set early to avoid an error in pytorch-1.8+ + sentinel = dim // 2 if dim % 2 == 0 else (dim // 2) + 1 + out[:, 0:sentinel] = torch.FloatTensor(np.sin(position_enc[:, 0::2])) + out[:, sentinel:] = torch.FloatTensor(np.cos(position_enc[:, 1::2])) + out.detach_() + return out + + @torch.no_grad() + def forward(self, input_ids_shape: torch.Size, past_key_values_length: int = 0) -> torch.Tensor: + """`input_ids_shape` is expected to be [bsz x seqlen].""" + bsz, seq_len = input_ids_shape[:2] + positions = torch.arange( + past_key_values_length, past_key_values_length + seq_len, dtype=torch.long, device=self.weight.device + ) + return super().forward(positions) + + +# Copied from transformers.models.time_series_transformer.modeling_time_series_transformer.TimeSeriesValueEmbedding with TimeSeries->Autoformer +class AutoformerValueEmbedding(nn.Module): + def __init__(self, feature_size, d_model): + super().__init__() + self.value_projection = nn.Linear(in_features=feature_size, out_features=d_model, bias=False) + + def forward(self, x): + return self.value_projection(x) + + +# Class based on +# https://github.com/thuml/Autoformer/blob/c6a0694ff484753f2d986cc0bb1f99ee850fc1a8/layers/Autoformer_EncDec.py#L39 +# where AutoformerSeriesDecompositionLayer is series_decomp + moving_average +class AutoformerSeriesDecompositionLayer(nn.Module): + """ + Returns the trend and the seasonal parts of the time series. Calculated as: + + x_trend = AvgPool(Padding(X)) and x_seasonal = X - x_trend + """ + + def __init__(self, config: AutoformerConfig): + super().__init__() + self.kernel_size = config.moving_average + self.avg = nn.AvgPool1d(kernel_size=self.kernel_size, stride=1, padding=0) + + def forward(self, x): + """Input shape: Batch x Time x EMBED_DIM""" + # padding on the both ends of time series + num_of_pads = (self.kernel_size - 1) // 2 + front = x[:, 0:1, :].repeat(1, num_of_pads, 1) + end = x[:, -1:, :].repeat(1, num_of_pads, 1) + x_padded = torch.cat([front, x, end], dim=1) + + # calculate the trend and seasonal part of the series + x_trend = self.avg(x_padded.permute(0, 2, 1)).permute(0, 2, 1) + x_seasonal = x - x_trend + return x_seasonal, x_trend + + +# Class based on +# https://github.com/thuml/Autoformer/blob/c6a0694ff484753f2d986cc0bb1f99ee850fc1a8/layers/Autoformer_EncDec.py#L6 +# where AutoformerLayernorm is my_Layernorm +class AutoformerLayernorm(nn.Module): + """ + Special designed layer normalization for the seasonal part, calculated as: AutoformerLayernorm(x) = nn.LayerNorm(x) + - torch.mean(nn.LayerNorm(x)) + """ + + def __init__(self, config: AutoformerConfig): + super().__init__() + self.layernorm = nn.LayerNorm(config.d_model) + + def forward(self, x): + x_hat = self.layernorm(x) + bias = torch.mean(x_hat, dim=1).unsqueeze(1).repeat(1, x.shape[1], 1) + return x_hat - bias + + +class AutoformerAttention(nn.Module): + """ + AutoCorrelation Mechanism with the following two phases: + (1) period-based dependencies discovery (2) time delay aggregation + This block replace the canonical self-attention mechanism. + """ + + def __init__( + self, + embed_dim: int, + num_heads: int, + dropout: float = 0.0, + is_decoder: bool = False, + bias: bool = True, + autocorrelation_factor: int = 3, + ): + super().__init__() + self.embed_dim = embed_dim + self.num_heads = num_heads + self.dropout = dropout + self.head_dim = embed_dim // num_heads + + if (self.head_dim * num_heads) != self.embed_dim: + raise ValueError( + f"embed_dim must be divisible by num_heads (got `embed_dim`: {self.embed_dim}" + f" and `num_heads`: {num_heads})." + ) + self.scaling = self.head_dim**-0.5 + self.is_decoder = is_decoder + + self.k_proj = nn.Linear(embed_dim, embed_dim, bias=bias) + self.v_proj = nn.Linear(embed_dim, embed_dim, bias=bias) + self.q_proj = nn.Linear(embed_dim, embed_dim, bias=bias) + self.out_proj = nn.Linear(embed_dim, embed_dim, bias=bias) + + self.autocorrelation_factor = autocorrelation_factor + + def _shape(self, tensor: torch.Tensor, seq_len: int, bsz: int): + return tensor.view(bsz, seq_len, self.num_heads, self.head_dim).transpose(1, 2).contiguous() + + def forward( + self, + hidden_states: torch.Tensor, + key_value_states: Optional[torch.Tensor] = None, + past_key_value: Optional[Tuple[torch.Tensor]] = None, + attention_mask: Optional[torch.Tensor] = None, + layer_head_mask: Optional[torch.Tensor] = None, + output_attentions: bool = False, + ) -> Tuple[torch.Tensor, Optional[torch.Tensor], Optional[Tuple[torch.Tensor]]]: + """Input shape: Batch x Time x Channel""" + + # if key_value_states are provided this layer is used as a cross-attention layer + # for the decoder + is_cross_attention = key_value_states is not None + + bsz, tgt_len, _ = hidden_states.size() + + # get query proj + query_states = self.q_proj(hidden_states) + # get key, value proj + # `past_key_value[0].shape[2] == key_value_states.shape[1]` + # is checking that the `sequence_length` of the `past_key_value` is the same as + # the provided `key_value_states` to support prefix tuning + if ( + is_cross_attention + and past_key_value is not None + and past_key_value[0].shape[2] == key_value_states.shape[1] + ): + # reuse k,v, cross_attentions + key_states = past_key_value[0] + value_states = past_key_value[1] + elif is_cross_attention: + # cross_attentions + key_states = self._shape(self.k_proj(key_value_states), -1, bsz) + value_states = self._shape(self.v_proj(key_value_states), -1, bsz) + elif past_key_value is not None: + # reuse k, v, self_attention + key_states = self._shape(self.k_proj(hidden_states), -1, bsz) + value_states = self._shape(self.v_proj(hidden_states), -1, bsz) + key_states = torch.cat([past_key_value[0], key_states], dim=2) + value_states = torch.cat([past_key_value[1], value_states], dim=2) + else: + # self_attention + key_states = self._shape(self.k_proj(hidden_states), -1, bsz) + value_states = self._shape(self.v_proj(hidden_states), -1, bsz) + + if self.is_decoder: + # if cross_attention save Tuple(torch.Tensor, torch.Tensor) of all cross attention key/value_states. + # Further calls to cross_attention layer can then reuse all cross-attention + # key/value_states (first "if" case) + # if uni-directional self-attention (decoder) save Tuple(torch.Tensor, torch.Tensor) of + # all previous decoder key/value_states. Further calls to uni-directional self-attention + # can concat previous decoder key/value_states to current projected key/value_states (third "elif" case) + # if encoder bi-directional self-attention `past_key_value` is always `None` + past_key_value = (key_states, value_states) + + proj_shape = (bsz * self.num_heads, -1, self.head_dim) + query_states = self._shape(query_states, tgt_len, bsz).view(*proj_shape) + key_states = key_states.view(*proj_shape) + value_states = value_states.view(*proj_shape) + + # (1) period-based dependencies discovery + # Resize (truncation or zero filling) + queries_time_length = query_states.size(1) + values_time_length = value_states.size(1) + if queries_time_length > values_time_length: + query_states = query_states[:, : (queries_time_length - values_time_length), :] + zeros = torch.zeros_like(query_states).float() + value_states = torch.cat([value_states, zeros], dim=1) + key_states = torch.cat([key_states, zeros], dim=1) + else: + value_states = value_states[:, :queries_time_length, :] + key_states = key_states[:, :queries_time_length, :] + + query_states_fft = torch.fft.rfft(query_states, n=tgt_len, dim=1) + key_states_fft = torch.fft.rfft(key_states, n=tgt_len, dim=1) + attn_weights = query_states_fft * torch.conj(key_states_fft) + attn_weights = torch.fft.irfft(attn_weights, n=tgt_len, dim=1) # Autocorrelation(Q,K) + + src_len = key_states.size(1) + channel = key_states.size(2) + + if attn_weights.size() != (bsz * self.num_heads, tgt_len, channel): + raise ValueError( + f"Attention weights should be of size {(bsz * self.num_heads, tgt_len, channel)}, but is" + f" {attn_weights.size()}" + ) + + if attention_mask is not None: + if attention_mask.size() != (bsz, 1, tgt_len, src_len): + raise ValueError( + f"Attention mask should be of size {(bsz, 1, tgt_len, src_len)}, but is {attention_mask.size()}" + ) + attn_weights = attn_weights.view(bsz, self.num_heads, tgt_len, src_len) + attention_mask + attn_weights = attn_weights.view(bsz * self.num_heads, tgt_len, src_len) + + if layer_head_mask is not None: + if layer_head_mask.size() != (self.num_heads,): + raise ValueError( + f"Head mask for a single layer should be of size {(self.num_heads,)}, but is" + f" {layer_head_mask.size()}" + ) + attn_weights = layer_head_mask.view(1, -1, 1, 1) * attn_weights.view(bsz, self.num_heads, tgt_len, channel) + attn_weights = attn_weights.view(bsz * self.num_heads, tgt_len, channel) + + if output_attentions: + # this operation is a bit awkward, but it's required to + # make sure that attn_weights keeps its gradient. + # In order to do so, attn_weights have to be reshaped + # twice and have to be reused in the following + attn_weights_reshaped = attn_weights.view(bsz, self.num_heads, tgt_len, channel) + attn_weights = attn_weights_reshaped.view(bsz * self.num_heads, tgt_len, channel) + else: + attn_weights_reshaped = None + + # time delay aggregation + time_length = value_states.size(1) + autocorrelations = attn_weights.view(bsz, self.num_heads, tgt_len, channel) + + # find top k autocorrelations delays + top_k = int(self.autocorrelation_factor * math.log(time_length)) + autocorrelations_mean_on_head_channel = torch.mean(autocorrelations, dim=(1, -1)) # bsz x tgt_len + if self.training: + autocorrelations_mean_on_bsz = torch.mean(autocorrelations_mean_on_head_channel, dim=0) + _, top_k_delays_index = torch.topk(autocorrelations_mean_on_bsz, top_k) + top_k_autocorrelations = torch.stack( + [autocorrelations_mean_on_head_channel[:, top_k_delays_index[i]] for i in range(top_k)], dim=-1 + ) + else: + top_k_autocorrelations, top_k_delays_index = torch.topk( + autocorrelations_mean_on_head_channel, top_k, dim=1 + ) + + top_k_autocorrelations = torch.softmax(top_k_autocorrelations, dim=-1) # bsz x top_k + + # compute aggregation: value_states.roll(delay) * top_k_autocorrelations(delay) + if not self.training: + # used for compute values_states.roll(delay) in inference + tmp_values = value_states.repeat(1, 2, 1) + init_index = ( + torch.arange(time_length) + .view(1, -1, 1) + .repeat(bsz * self.num_heads, 1, channel) + .to(value_states.device) + ) + + delays_agg = torch.zeros_like(value_states).float() # bsz x time_length x channel + for i in range(top_k): + # compute value_states roll delay + if not self.training: + tmp_delay = init_index + top_k_delays_index[:, i].view(-1, 1, 1).repeat( + self.num_heads, tgt_len, channel + ) + value_states_roll_delay = torch.gather(tmp_values, dim=1, index=tmp_delay) + else: + value_states_roll_delay = value_states.roll(shifts=-int(top_k_delays_index[i]), dims=1) + + # aggregation + top_k_autocorrelations_at_delay = ( + top_k_autocorrelations[:, i].view(-1, 1, 1).repeat(self.num_heads, tgt_len, channel) + ) + delays_agg += value_states_roll_delay * top_k_autocorrelations_at_delay + + attn_output = delays_agg.contiguous() + + if attn_output.size() != (bsz * self.num_heads, tgt_len, self.head_dim): + raise ValueError( + f"`attn_output` should be of size {(bsz * self.num_heads, tgt_len, self.head_dim)}, but is" + f" {attn_output.size()}" + ) + + attn_output = attn_output.view(bsz, self.num_heads, tgt_len, self.head_dim) + attn_output = attn_output.transpose(1, 2) + + # Use the `embed_dim` from the config (stored in the class) rather than `hidden_state` because `attn_output` can be + # partitioned across GPUs when using tensor-parallelism. + attn_output = attn_output.reshape(bsz, tgt_len, self.embed_dim) + + attn_output = self.out_proj(attn_output) + + return attn_output, attn_weights_reshaped, past_key_value + + +class AutoformerEncoderLayer(nn.Module): + def __init__(self, config: AutoformerConfig): + super().__init__() + self.embed_dim = config.d_model + self.self_attn = AutoformerAttention( + embed_dim=self.embed_dim, + num_heads=config.encoder_attention_heads, + dropout=config.attention_dropout, + autocorrelation_factor=config.autocorrelation_factor, + ) + self.self_attn_layer_norm = nn.LayerNorm(self.embed_dim) + self.dropout = config.dropout + self.activation_fn = ACT2FN[config.activation_function] + self.activation_dropout = config.activation_dropout + self.fc1 = nn.Linear(self.embed_dim, config.encoder_ffn_dim) + self.fc2 = nn.Linear(config.encoder_ffn_dim, self.embed_dim) + self.final_layer_norm = AutoformerLayernorm(config) + self.decomp1 = AutoformerSeriesDecompositionLayer(config) + self.decomp2 = AutoformerSeriesDecompositionLayer(config) + + def forward( + self, + hidden_states: torch.FloatTensor, + attention_mask: torch.FloatTensor, + layer_head_mask: torch.FloatTensor, + output_attentions: Optional[bool] = False, + ) -> Tuple[torch.FloatTensor, Optional[torch.FloatTensor]]: + """ + Args: + hidden_states (`torch.FloatTensor`): input to the layer of shape `(seq_len, batch, embed_dim)` + attention_mask (`torch.FloatTensor`): attention mask of size + `(batch, 1, tgt_len, src_len)` where padding elements are indicated by very large negative values. + layer_head_mask (`torch.FloatTensor`): mask for attention heads in a given layer of size + `(encoder_attention_heads,)`. + output_attentions (`bool`, *optional*): + Whether or not to return the attentions tensors of all attention layers. See `attentions` under + returned tensors for more detail. + """ + residual = hidden_states + hidden_states, attn_weights, _ = self.self_attn( + hidden_states=hidden_states, + attention_mask=attention_mask, + layer_head_mask=layer_head_mask, + output_attentions=output_attentions, + ) + hidden_states = nn.functional.dropout(hidden_states, p=self.dropout, training=self.training) + hidden_states = residual + hidden_states + # added layer norm here as an improvement + hidden_states = self.self_attn_layer_norm(hidden_states) + hidden_states, _ = self.decomp1(hidden_states) + + residual = hidden_states + hidden_states = self.activation_fn(self.fc1(hidden_states)) + hidden_states = nn.functional.dropout(hidden_states, p=self.activation_dropout, training=self.training) + hidden_states = self.fc2(hidden_states) + hidden_states = nn.functional.dropout(hidden_states, p=self.dropout, training=self.training) + hidden_states = residual + hidden_states + hidden_states, _ = self.decomp2(hidden_states) + hidden_states = self.final_layer_norm(hidden_states) + + if hidden_states.dtype == torch.float16 and ( + torch.isinf(hidden_states).any() or torch.isnan(hidden_states).any() + ): + clamp_value = torch.finfo(hidden_states.dtype).max - 1000 + hidden_states = torch.clamp(hidden_states, min=-clamp_value, max=clamp_value) + + outputs = (hidden_states,) + + if output_attentions: + outputs += (attn_weights,) + + return outputs + + +class AutoformerDecoderLayer(nn.Module): + def __init__(self, config: AutoformerConfig): + super().__init__() + self.embed_dim = config.d_model + + self.self_attn = AutoformerAttention( + embed_dim=self.embed_dim, + num_heads=config.decoder_attention_heads, + dropout=config.attention_dropout, + is_decoder=True, + autocorrelation_factor=config.autocorrelation_factor, + ) + self.dropout = config.dropout + self.activation_fn = ACT2FN[config.activation_function] + self.activation_dropout = config.activation_dropout + + self.self_attn_layer_norm = nn.LayerNorm(self.embed_dim) + self.encoder_attn = AutoformerAttention( + self.embed_dim, + config.decoder_attention_heads, + dropout=config.attention_dropout, + is_decoder=True, + autocorrelation_factor=config.autocorrelation_factor, + ) + self.encoder_attn_layer_norm = nn.LayerNorm(self.embed_dim) + self.fc1 = nn.Linear(self.embed_dim, config.decoder_ffn_dim) + self.fc2 = nn.Linear(config.decoder_ffn_dim, self.embed_dim) + self.final_layer_norm = AutoformerLayernorm(config) + + self.decomp1 = AutoformerSeriesDecompositionLayer(config) + self.decomp2 = AutoformerSeriesDecompositionLayer(config) + self.decomp3 = AutoformerSeriesDecompositionLayer(config) + + # source: https://github.com/thuml/Autoformer/blob/e6371e24f2ae2dd53e472edefdd5814c5176f864/layers/Autoformer_EncDec.py#L128 + self.trend_projection = nn.Conv1d( + in_channels=self.embed_dim, + out_channels=config.feature_size, + kernel_size=3, + stride=1, + padding=1, + padding_mode="circular", + bias=False, + ) + + def forward( + self, + hidden_states: torch.Tensor, + attention_mask: Optional[torch.Tensor] = None, + encoder_hidden_states: Optional[torch.Tensor] = None, + encoder_attention_mask: Optional[torch.Tensor] = None, + layer_head_mask: Optional[torch.Tensor] = None, + cross_attn_layer_head_mask: Optional[torch.Tensor] = None, + past_key_value: Optional[Tuple[torch.Tensor]] = None, + output_attentions: Optional[bool] = False, + use_cache: Optional[bool] = True, + ) -> Tuple[torch.FloatTensor, Optional[Tuple[torch.FloatTensor, torch.FloatTensor]]]: + """ + Args: + hidden_states (`torch.FloatTensor`): input to the layer of shape `(batch, seq_len, embed_dim)` + attention_mask (`torch.FloatTensor`): attention mask of size + `(batch, 1, tgt_len, src_len)` where padding elements are indicated by very large negative values. + encoder_hidden_states (`torch.FloatTensor`): + cross attention input to the layer of shape `(batch, seq_len, embed_dim)` + encoder_attention_mask (`torch.FloatTensor`): encoder attention mask of size + `(batch, 1, tgt_len, src_len)` where padding elements are indicated by very large negative values. + layer_head_mask (`torch.FloatTensor`): mask for attention heads in a given layer of size + `(encoder_attention_heads,)`. + cross_attn_layer_head_mask (`torch.FloatTensor`): mask for cross-attention heads in a given layer of + size `(decoder_attention_heads,)`. + past_key_value (`Tuple(torch.FloatTensor)`): cached past key and value projection states + output_attentions (`bool`, *optional*): + Whether or not to return the attentions tensors of all attention layers. See `attentions` under + returned tensors for more detail. + use_cache: (`bool`, *optional*, defaults to `True`): + Whether or not the model should return the `present_key_value` state to be used for subsequent + decoding. + """ + residual = hidden_states + + # Self Attention + # decoder uni-directional self-attention cached key/values tuple is at positions 1,2 + self_attn_past_key_value = past_key_value[:2] if past_key_value is not None else None + # add present self-attn cache to positions 1,2 of present_key_value tuple + hidden_states, self_attn_weights, present_key_value = self.self_attn( + hidden_states=hidden_states, + past_key_value=self_attn_past_key_value, + attention_mask=attention_mask, + layer_head_mask=layer_head_mask, + output_attentions=output_attentions, + ) + hidden_states = nn.functional.dropout(hidden_states, p=self.dropout, training=self.training) + hidden_states = residual + hidden_states + hidden_states, trend1 = self.decomp1(hidden_states) + # added layer norm here as an improvement + hidden_states = self.self_attn_layer_norm(hidden_states) + + # Cross-Attention Block + cross_attn_present_key_value = None + cross_attn_weights = None + if encoder_hidden_states is not None: + residual = hidden_states + + # cross_attn cached key/values tuple is at positions 3,4 of present_key_value tuple + cross_attn_past_key_value = past_key_value[-2:] if past_key_value is not None else None + hidden_states, cross_attn_weights, cross_attn_present_key_value = self.encoder_attn( + hidden_states=hidden_states, + key_value_states=encoder_hidden_states, + attention_mask=encoder_attention_mask, + layer_head_mask=cross_attn_layer_head_mask, + past_key_value=cross_attn_past_key_value, + output_attentions=output_attentions, + ) + hidden_states = nn.functional.dropout(hidden_states, p=self.dropout, training=self.training) + hidden_states = residual + hidden_states + hidden_states, trend2 = self.decomp2(hidden_states) + # added layer norm here as an improvement + hidden_states = self.encoder_attn_layer_norm(hidden_states) + + # add cross-attn to positions 3,4 of present_key_value tuple + present_key_value = present_key_value + cross_attn_present_key_value + + # Fully Connected + residual = hidden_states + hidden_states = self.activation_fn(self.fc1(hidden_states)) + hidden_states = nn.functional.dropout(hidden_states, p=self.activation_dropout, training=self.training) + hidden_states = self.fc2(hidden_states) + hidden_states = nn.functional.dropout(hidden_states, p=self.dropout, training=self.training) + hidden_states = residual + hidden_states + hidden_states, trend3 = self.decomp3(hidden_states) + hidden_states = self.final_layer_norm(hidden_states) + + if encoder_hidden_states is not None: + residual_trend = trend1 + trend2 + trend3 + else: + residual_trend = trend1 + trend3 + residual_trend = self.trend_projection(residual_trend.permute(0, 2, 1)).transpose(1, 2) + outputs = ((hidden_states, residual_trend),) + + if output_attentions: + outputs += (self_attn_weights, cross_attn_weights) + + if use_cache: + outputs += (present_key_value,) + + return outputs + + +class AutoformerPreTrainedModel(PreTrainedModel): + config_class = AutoformerConfig + base_model_prefix = "model" + main_input_name = "past_values" + supports_gradient_checkpointing = True + + def _init_weights(self, module): + std = self.config.init_std + if isinstance(module, (nn.Linear, nn.Conv1d)): + module.weight.data.normal_(mean=0.0, std=std) + if module.bias is not None: + module.bias.data.zero_() + elif isinstance(module, AutoformerSinusoidalPositionalEmbedding): + pass + elif isinstance(module, nn.Embedding): + module.weight.data.normal_(mean=0.0, std=std) + if module.padding_idx is not None: + module.weight.data[module.padding_idx].zero_() + + def _set_gradient_checkpointing(self, module, value=False): + if isinstance(module, (AutoformerDecoder, AutoformerEncoder)): + module.gradient_checkpointing = value + + +AUTOFORMER_START_DOCSTRING = r""" + This model inherits from [`PreTrainedModel`]. Check the superclass documentation for the generic methods the + library implements for all its model (such as downloading or saving, resizing the input embeddings, pruning heads + etc.) + + This model is also a PyTorch [torch.nn.Module](https://pytorch.org/docs/stable/nn.html#torch.nn.Module) subclass. + Use it as a regular PyTorch Module and refer to the PyTorch documentation for all matter related to general usage + and behavior. + + Parameters: + config ([`AutoformerConfig`]): + Model configuration class with all the parameters of the model. Initializing with a config file does not + load the weights associated with the model, only the configuration. Check out the + [`~PreTrainedModel.from_pretrained`] method to load the model weights. +""" + +AUTOFORMER_INPUTS_DOCSTRING = r""" + Args: + past_values (`torch.FloatTensor` of shape `(batch_size, sequence_length)`): + Past values of the time series, that serve as context in order to predict the future. These values may + contain lags, i.e. additional values from the past which are added in order to serve as "extra context". + The `past_values` is what the Transformer encoder gets as input (with optional additional features, such as + `static_categorical_features`, `static_real_features`, `past_time_features`). + + The sequence length here is equal to `context_length` + `max(config.lags_sequence)`. + + Missing values need to be replaced with zeros. + + past_time_features (`torch.FloatTensor` of shape `(batch_size, sequence_length, num_features)`, *optional*): + Optional time features, which the model internally will add to `past_values`. These could be things like + "month of year", "day of the month", etc. encoded as vectors (for instance as Fourier features). These + could also be so-called "age" features, which basically help the model know "at which point in life" a + time-series is. Age features have small values for distant past time steps and increase monotonically the + more we approach the current time step. + + These features serve as the "positional encodings" of the inputs. So contrary to a model like BERT, where + the position encodings are learned from scratch internally as parameters of the model, the Time Series + Transformer requires to provide additional time features. + + The Autoformer only learns additional embeddings for `static_categorical_features`. + + past_observed_mask (`torch.BoolTensor` of shape `(batch_size, sequence_length)`, *optional*): + Boolean mask to indicate which `past_values` were observed and which were missing. Mask values selected in + `[0, 1]`: + + - 1 for values that are **observed**, + - 0 for values that are **missing** (i.e. NaNs that were replaced by zeros). + + static_categorical_features (`torch.LongTensor` of shape `(batch_size, number of static categorical features)`, *optional*): + Optional static categorical features for which the model will learn an embedding, which it will add to the + values of the time series. + + Static categorical features are features which have the same value for all time steps (static over time). + + A typical example of a static categorical feature is a time series ID. + + static_real_features (`torch.FloatTensor` of shape `(batch_size, number of static real features)`, *optional*): + Optional static real features which the model will add to the values of the time series. + + Static real features are features which have the same value for all time steps (static over time). + + A typical example of a static real feature is promotion information. + + future_values (`torch.FloatTensor` of shape `(batch_size, prediction_length)`): + Future values of the time series, that serve as labels for the model. The `future_values` is what the + Transformer needs to learn to output, given the `past_values`. + + See the demo notebook and code snippets for details. + + Missing values need to be replaced with zeros. + + future_time_features (`torch.FloatTensor` of shape `(batch_size, prediction_length, num_features)`, *optional*): + Optional time features, which the model internally will add to `future_values`. These could be things like + "month of year", "day of the month", etc. encoded as vectors (for instance as Fourier features). These + could also be so-called "age" features, which basically help the model know "at which point in life" a + time-series is. Age features have small values for distant past time steps and increase monotonically the + more we approach the current time step. + + These features serve as the "positional encodings" of the inputs. So contrary to a model like BERT, where + the position encodings are learned from scratch internally as parameters of the model, the Time Series + Transformer requires to provide additional features. + + The Autoformer only learns additional embeddings for `static_categorical_features`. + + attention_mask (`torch.Tensor` of shape `(batch_size, sequence_length)`, *optional*): + Mask to avoid performing attention on certain token indices. Mask values selected in `[0, 1]`: + + - 1 for tokens that are **not masked**, + - 0 for tokens that are **masked**. + + [What are attention masks?](../glossary#attention-mask) + + decoder_attention_mask (`torch.LongTensor` of shape `(batch_size, target_sequence_length)`, *optional*): + Mask to avoid performing attention on certain token indices. By default, a causal mask will be used, to + make sure the model can only look at previous inputs in order to predict the future. + + head_mask (`torch.Tensor` of shape `(encoder_layers, encoder_attention_heads)`, *optional*): + Mask to nullify selected heads of the attention modules in the encoder. Mask values selected in `[0, 1]`: + + - 1 indicates the head is **not masked**, + - 0 indicates the head is **masked**. + + decoder_head_mask (`torch.Tensor` of shape `(decoder_layers, decoder_attention_heads)`, *optional*): + Mask to nullify selected heads of the attention modules in the decoder. Mask values selected in `[0, 1]`: + + - 1 indicates the head is **not masked**, + - 0 indicates the head is **masked**. + + cross_attn_head_mask (`torch.Tensor` of shape `(decoder_layers, decoder_attention_heads)`, *optional*): + Mask to nullify selected heads of the cross-attention modules. Mask values selected in `[0, 1]`: + + - 1 indicates the head is **not masked**, + - 0 indicates the head is **masked**. + + encoder_outputs (`tuple(tuple(torch.FloatTensor)`, *optional*): + Tuple consists of `last_hidden_state`, `hidden_states` (*optional*) and `attentions` (*optional*) + `last_hidden_state` of shape `(batch_size, sequence_length, hidden_size)` (*optional*) is a sequence of + hidden-states at the output of the last layer of the encoder. Used in the cross-attention of the decoder. + past_key_values (`tuple(tuple(torch.FloatTensor))`, *optional*, returned when `use_cache=True` is passed or when `config.use_cache=True`): + Tuple of `tuple(torch.FloatTensor)` of length `config.n_layers`, with each tuple having 2 tensors of shape + `(batch_size, num_heads, sequence_length, embed_size_per_head)`) and 2 additional tensors of shape + `(batch_size, num_heads, encoder_sequence_length, embed_size_per_head)`. + + Contains pre-computed hidden-states (key and values in the self-attention blocks and in the cross-attention + blocks) that can be used (see `past_key_values` input) to speed up sequential decoding. + + If `past_key_values` are used, the user can optionally input only the last `decoder_input_ids` (those that + don't have their past key value states given to this model) of shape `(batch_size, 1)` instead of all + `decoder_input_ids` of shape `(batch_size, sequence_length)`. + inputs_embeds (`torch.FloatTensor` of shape `(batch_size, sequence_length, hidden_size)`, *optional*): + Optionally, instead of passing `input_ids` you can choose to directly pass an embedded representation. This + is useful if you want more control over how to convert `input_ids` indices into associated vectors than the + model's internal embedding lookup matrix. + + use_cache (`bool`, *optional*): + If set to `True`, `past_key_values` key value states are returned and can be used to speed up decoding (see + `past_key_values`). + output_attentions (`bool`, *optional*): + Whether or not to return the attentions tensors of all attention layers. See `attentions` under returned + tensors for more detail. + output_hidden_states (`bool`, *optional*): + Whether or not to return the hidden states of all layers. See `hidden_states` under returned tensors for + more detail. + return_dict (`bool`, *optional*): + Whether or not to return a [`~utils.ModelOutput`] instead of a plain tuple. +""" + + +# Copied from transformers.models.time_series_transformer.modeling_time_series_transformer.TimeSeriesTransformerEncoder with TimeSeriesTransformer->Autoformer,TimeSeries->Autoformer +class AutoformerEncoder(AutoformerPreTrainedModel): + """ + Transformer encoder consisting of *config.encoder_layers* self attention layers. Each layer is a + [`AutoformerEncoderLayer`]. + + Args: + config: AutoformerConfig + """ + + def __init__(self, config: AutoformerConfig): + super().__init__(config) + + self.dropout = config.dropout + self.layerdrop = config.encoder_layerdrop + if config.prediction_length is None: + raise ValueError("The `prediction_length` config needs to be specified.") + + self.value_embedding = AutoformerValueEmbedding(feature_size=config.feature_size, d_model=config.d_model) + self.embed_positions = AutoformerSinusoidalPositionalEmbedding( + config.context_length + config.prediction_length, config.d_model + ) + self.layers = nn.ModuleList([AutoformerEncoderLayer(config) for _ in range(config.encoder_layers)]) + self.layernorm_embedding = nn.LayerNorm(config.d_model) + + self.gradient_checkpointing = False + # Initialize weights and apply final processing + self.post_init() + + def forward( + self, + attention_mask: Optional[torch.Tensor] = None, + head_mask: Optional[torch.Tensor] = None, + inputs_embeds: Optional[torch.FloatTensor] = None, + output_attentions: Optional[bool] = None, + output_hidden_states: Optional[bool] = None, + return_dict: Optional[bool] = None, + ) -> Union[Tuple, BaseModelOutput]: + r""" + Args: + attention_mask (`torch.Tensor` of shape `(batch_size, sequence_length)`, *optional*): + Mask to avoid performing attention on padding token indices. Mask values selected in `[0, 1]`: + + - 1 for tokens that are **not masked**, + - 0 for tokens that are **masked**. + + [What are attention masks?](../glossary#attention-mask) + head_mask (`torch.Tensor` of shape `(encoder_layers, encoder_attention_heads)`, *optional*): + Mask to nullify selected heads of the attention modules. Mask values selected in `[0, 1]`: + + - 1 indicates the head is **not masked**, + - 0 indicates the head is **masked**. + + inputs_embeds (`torch.FloatTensor` of shape `(batch_size, sequence_length, hidden_size)`, *optional*): + Optionally, instead of passing `input_ids` you can choose to directly pass an embedded representation. + This is useful if you want more control over how to convert `input_ids` indices into associated vectors + than the model's internal embedding lookup matrix. + output_attentions (`bool`, *optional*): + Whether or not to return the attentions tensors of all attention layers. See `attentions` under + returned tensors for more detail. + output_hidden_states (`bool`, *optional*): + Whether or not to return the hidden states of all layers. See `hidden_states` under returned tensors + for more detail. + return_dict (`bool`, *optional*): + Whether or not to return a [`~utils.ModelOutput`] instead of a plain tuple. + """ + output_attentions = output_attentions if output_attentions is not None else self.config.output_attentions + output_hidden_states = ( + output_hidden_states if output_hidden_states is not None else self.config.output_hidden_states + ) + return_dict = return_dict if return_dict is not None else self.config.use_return_dict + + hidden_states = self.value_embedding(inputs_embeds) + embed_pos = self.embed_positions(inputs_embeds.size()) + + hidden_states = self.layernorm_embedding(hidden_states + embed_pos) + hidden_states = nn.functional.dropout(hidden_states, p=self.dropout, training=self.training) + + # expand attention_mask + if attention_mask is not None: + # [bsz, seq_len] -> [bsz, 1, tgt_seq_len, src_seq_len] + attention_mask = _expand_mask(attention_mask, inputs_embeds.dtype) + + encoder_states = () if output_hidden_states else None + all_attentions = () if output_attentions else None + + # check if head_mask has a correct number of layers specified if desired + if head_mask is not None: + if head_mask.size()[0] != (len(self.layers)): + raise ValueError( + f"The head_mask should be specified for {len(self.layers)} layers, but it is for" + f" {head_mask.size()[0]}." + ) + + for idx, encoder_layer in enumerate(self.layers): + if output_hidden_states: + encoder_states = encoder_states + (hidden_states,) + # add LayerDrop (see https://arxiv.org/abs/1909.11556 for description) + dropout_probability = random.uniform(0, 1) + if self.training and (dropout_probability < self.layerdrop): # skip the layer + layer_outputs = (None, None) + else: + if self.gradient_checkpointing and self.training: + + def create_custom_forward(module): + def custom_forward(*inputs): + return module(*inputs, output_attentions) + + return custom_forward + + layer_outputs = torch.utils.checkpoint.checkpoint( + create_custom_forward(encoder_layer), + hidden_states, + attention_mask, + (head_mask[idx] if head_mask is not None else None), + ) + else: + layer_outputs = encoder_layer( + hidden_states, + attention_mask, + layer_head_mask=(head_mask[idx] if head_mask is not None else None), + output_attentions=output_attentions, + ) + + hidden_states = layer_outputs[0] + + if output_attentions: + all_attentions = all_attentions + (layer_outputs[1],) + + if output_hidden_states: + encoder_states = encoder_states + (hidden_states,) + + if not return_dict: + return tuple(v for v in [hidden_states, encoder_states, all_attentions] if v is not None) + return BaseModelOutput( + last_hidden_state=hidden_states, hidden_states=encoder_states, attentions=all_attentions + ) + + +class AutoformerDecoder(AutoformerPreTrainedModel): + """ + Transformer decoder consisting of `config.decoder_layers` layers. Each layer is a [`AutoformerDecoderLayer`] + + Args: + config: AutoformerConfig + """ + + def __init__(self, config: AutoformerConfig): + super().__init__(config) + self.dropout = config.dropout + self.layerdrop = config.decoder_layerdrop + if config.prediction_length is None: + raise ValueError("The `prediction_length` config needs to be specified.") + + self.value_embedding = AutoformerValueEmbedding(feature_size=config.feature_size, d_model=config.d_model) + self.embed_positions = AutoformerSinusoidalPositionalEmbedding( + config.context_length + config.prediction_length, config.d_model + ) + self.layers = nn.ModuleList([AutoformerDecoderLayer(config) for _ in range(config.decoder_layers)]) + self.layernorm_embedding = nn.LayerNorm(config.d_model) + + # https://github.com/thuml/Autoformer/blob/e6371e24f2ae2dd53e472edefdd5814c5176f864/models/Autoformer.py#L74 + self.seasonality_projection = nn.Linear(config.d_model, config.feature_size) + + self.gradient_checkpointing = False + # Initialize weights and apply final processing + self.post_init() + + def _prepare_decoder_attention_mask(self, attention_mask, input_shape, inputs_embeds, past_key_values_length): + # create causal mask + # [bsz, seq_len] -> [bsz, 1, tgt_seq_len, src_seq_len] + combined_attention_mask = None + if input_shape[-1] > 1: + combined_attention_mask = _make_causal_mask( + input_shape, + inputs_embeds.dtype, + device=inputs_embeds.device, + past_key_values_length=past_key_values_length, + ).to(inputs_embeds.device) + + if attention_mask is not None: + # [bsz, seq_len] -> [bsz, 1, tgt_seq_len, src_seq_len] + expanded_attn_mask = _expand_mask(attention_mask, inputs_embeds.dtype, tgt_len=input_shape[-1]).to( + inputs_embeds.device + ) + combined_attention_mask = ( + expanded_attn_mask if combined_attention_mask is None else expanded_attn_mask + combined_attention_mask + ) + + return combined_attention_mask + + def forward( + self, + trend: Optional[torch.Tensor] = None, + attention_mask: Optional[torch.Tensor] = None, + encoder_hidden_states: Optional[torch.FloatTensor] = None, + encoder_attention_mask: Optional[torch.LongTensor] = None, + head_mask: Optional[torch.Tensor] = None, + cross_attn_head_mask: Optional[torch.Tensor] = None, + past_key_values: Optional[List[torch.FloatTensor]] = None, + inputs_embeds: Optional[torch.FloatTensor] = None, + use_cache: Optional[bool] = None, + output_attentions: Optional[bool] = None, + output_hidden_states: Optional[bool] = None, + return_dict: Optional[bool] = None, + ) -> Union[Tuple, AutoFormerDecoderOutput]: + r""" + Args: + trend (`torch.FloatTensor` of shape `(batch_size, prediction_length, feature_size)`, *optional*): + The trend sequence to be fed to the decoder. + attention_mask (`torch.Tensor` of shape `(batch_size, sequence_length)`, *optional*): + Mask to avoid performing attention on padding token indices. Mask values selected in `[0, 1]`: + + - 1 for tokens that are **not masked**, + - 0 for tokens that are **masked**. + + [What are attention masks?](../glossary#attention-mask) + encoder_hidden_states (`torch.FloatTensor` of shape `(batch_size, encoder_sequence_length, hidden_size)`, *optional*): + Sequence of hidden-states at the output of the last layer of the encoder. Used in the cross-attention + of the decoder. + encoder_attention_mask (`torch.LongTensor` of shape `(batch_size, encoder_sequence_length)`, *optional*): + Mask to avoid performing cross-attention on padding tokens indices of encoder input_ids. Mask values + selected in `[0, 1]`: + + - 1 for tokens that are **not masked**, + - 0 for tokens that are **masked**. + + [What are attention masks?](../glossary#attention-mask) + head_mask (`torch.Tensor` of shape `(decoder_layers, decoder_attention_heads)`, *optional*): + Mask to nullify selected heads of the attention modules. Mask values selected in `[0, 1]`: + + - 1 indicates the head is **not masked**, + - 0 indicates the head is **masked**. + + cross_attn_head_mask (`torch.Tensor` of shape `(decoder_layers, decoder_attention_heads)`, *optional*): + Mask to nullify selected heads of the cross-attention modules in the decoder to avoid performing + cross-attention on hidden heads. Mask values selected in `[0, 1]`: + + - 1 indicates the head is **not masked**, + - 0 indicates the head is **masked**. + + past_key_values (`tuple(tuple(torch.FloatTensor))`, *optional*, returned when `use_cache=True` is passed or when `config.use_cache=True`): + Tuple of `tuple(torch.FloatTensor)` of length `config.n_layers`, with each tuple having 2 tensors of + shape `(batch_size, num_heads, sequence_length, embed_size_per_head)`) and 2 additional tensors of + shape `(batch_size, num_heads, encoder_sequence_length, embed_size_per_head)`. + + Contains pre-computed hidden-states (key and values in the self-attention blocks and in the + cross-attention blocks) that can be used (see `past_key_values` input) to speed up sequential decoding. + + If `past_key_values` are used, the user can optionally input only the last `decoder_input_ids` (those + that don't have their past key value states given to this model) of shape `(batch_size, 1)` instead of + all `decoder_input_ids` of shape `(batch_size, sequence_length)`. + inputs_embeds (`torch.FloatTensor` of shape `(batch_size, sequence_length, hidden_size)`, *optional*): + Optionally, instead of passing `input_ids` you can choose to directly pass an embedded representation. + This is useful if you want more control over how to convert `input_ids` indices into associated vectors + than the model's internal embedding lookup matrix. + use_cache (`bool`, *optional*): + If `use_cache` is True, `past_key_values` key value states are returned and can be used to speed up + decoding (see `past_key_values`). + output_attentions (`bool`, *optional*): + Whether or not to return the attentions tensors of all attention layers. See `attentions` under + returned tensors for more detail. + output_hidden_states (`bool`, *optional*): + Whether or not to return the hidden states of all layers. See `hidden_states` under returned tensors + for more detail. + return_dict (`bool`, *optional*): + Whether or not to return a [`~utils.ModelOutput`] instead of a plain tuple. + """ + output_attentions = output_attentions if output_attentions is not None else self.config.output_attentions + output_hidden_states = ( + output_hidden_states if output_hidden_states is not None else self.config.output_hidden_states + ) + use_cache = use_cache if use_cache is not None else self.config.use_cache + return_dict = return_dict if return_dict is not None else self.config.use_return_dict + + input_shape = inputs_embeds.size()[:-1] + + # expand encoder attention mask + if encoder_hidden_states is not None and encoder_attention_mask is not None: + # [bsz, seq_len] -> [bsz, 1, tgt_seq_len, src_seq_len] + encoder_attention_mask = _expand_mask(encoder_attention_mask, inputs_embeds.dtype, tgt_len=input_shape[-1]) + + hidden_states = self.value_embedding(inputs_embeds) + embed_pos = self.embed_positions( + inputs_embeds.size(), past_key_values_length=self.config.context_length - self.config.label_length + ) + hidden_states = self.layernorm_embedding(hidden_states + embed_pos) + hidden_states = nn.functional.dropout(hidden_states, p=self.dropout, training=self.training) + + # decoder layers + all_hidden_states = () if output_hidden_states else None + all_self_attns = () if output_attentions else None + all_cross_attentions = () if (output_attentions and encoder_hidden_states is not None) else None + next_decoder_cache = () if use_cache else None + + # check if head_mask/cross_attn_head_mask has a correct number of layers specified if desired + for attn_mask, mask_name in zip([head_mask, cross_attn_head_mask], ["head_mask", "cross_attn_head_mask"]): + if attn_mask is not None: + if attn_mask.size()[0] != (len(self.layers)): + raise ValueError( + f"The `{mask_name}` should be specified for {len(self.layers)} layers, but it is for" + f" {head_mask.size()[0]}." + ) + + for idx, decoder_layer in enumerate(self.layers): + # add LayerDrop (see https://arxiv.org/abs/1909.11556 for description) + if output_hidden_states: + all_hidden_states += (hidden_states,) + dropout_probability = random.uniform(0, 1) + if self.training and (dropout_probability < self.layerdrop): + continue + + past_key_value = past_key_values[idx] if past_key_values is not None else None + + if self.gradient_checkpointing and self.training: + if use_cache: + logger.warning( + "`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`..." + ) + use_cache = False + + def create_custom_forward(module): + def custom_forward(*inputs): + # None for past_key_value + return module(*inputs, output_attentions, use_cache) + + return custom_forward + + layer_outputs = torch.utils.checkpoint.checkpoint( + create_custom_forward(decoder_layer), + hidden_states, + attention_mask, + encoder_hidden_states, + encoder_attention_mask, + head_mask[idx] if head_mask is not None else None, + cross_attn_head_mask[idx] if cross_attn_head_mask is not None else None, + None, + ) + else: + layer_outputs = decoder_layer( + hidden_states, + attention_mask=attention_mask, + encoder_hidden_states=encoder_hidden_states, + encoder_attention_mask=encoder_attention_mask, + layer_head_mask=(head_mask[idx] if head_mask is not None else None), + cross_attn_layer_head_mask=( + cross_attn_head_mask[idx] if cross_attn_head_mask is not None else None + ), + past_key_value=past_key_value, + output_attentions=output_attentions, + use_cache=use_cache, + ) + (hidden_states, residual_trend) = layer_outputs[0] + trend = trend + residual_trend + + if use_cache: + next_decoder_cache += (layer_outputs[3 if output_attentions else 1],) + + if output_attentions: + all_self_attns += (layer_outputs[1],) + + if encoder_hidden_states is not None: + all_cross_attentions += (layer_outputs[2],) + + # project seasonality representation + hidden_states = self.seasonality_projection(hidden_states) + + # add hidden states from the last decoder layer + if output_hidden_states: + all_hidden_states += (hidden_states,) + + next_cache = next_decoder_cache if use_cache else None + if not return_dict: + return tuple( + v + for v in [hidden_states, trend, next_cache, all_hidden_states, all_self_attns, all_cross_attentions] + if v is not None + ) + return AutoFormerDecoderOutput( + last_hidden_state=hidden_states, + trend=trend, + past_key_values=next_cache, + hidden_states=all_hidden_states, + attentions=all_self_attns, + cross_attentions=all_cross_attentions, + ) + + +@add_start_docstrings( + "The bare Autoformer Model outputting raw hidden-states without any specific head on top.", + AUTOFORMER_START_DOCSTRING, +) +class AutoformerModel(AutoformerPreTrainedModel): + def __init__(self, config: AutoformerConfig): + super().__init__(config) + + if config.scaling == "mean" or config.scaling: + self.scaler = AutoformerMeanScaler(dim=1, keepdim=True) + elif config.scaling == "std": + self.scaler = AutoformerStdScaler(dim=1, keepdim=True) + else: + self.scaler = AutoformerNOPScaler(dim=1, keepdim=True) + + if config.num_static_categorical_features > 0: + self.embedder = AutoformerFeatureEmbedder( + cardinalities=config.cardinality, embedding_dims=config.embedding_dimension + ) + + # transformer encoder-decoder and mask initializer + self.encoder = AutoformerEncoder(config) + self.decoder = AutoformerDecoder(config) + + # used for decoder seasonal and trend initialization + self.decomposition_layer = AutoformerSeriesDecompositionLayer(config) + + # Initialize weights and apply final processing + self.post_init() + + @property + def _past_length(self) -> int: + return self.config.context_length + max(self.config.lags_sequence) + + def get_lagged_subsequences( + self, sequence: torch.Tensor, subsequences_length: int, shift: int = 0 + ) -> torch.Tensor: + """ + Returns lagged subsequences of a given sequence. Returns a tensor of shape (batch_size, subsequences_length, + feature_size, indices_length), containing lagged subsequences. Specifically, lagged[i, j, :, k] = sequence[i, + -indices[k]-subsequences_length+j, :]. + + Args: + sequence (`torch.Tensor` or shape `(batch_size, context_length, + feature_size)`): The sequence from which lagged subsequences should be extracted. + subsequences_length (`int`): + Length of the subsequences to be extracted. + shift (`int`, *optional* defaults to 0): + Shift the lags by this amount back in the time index. + """ + + # calculates the indices of the lags by subtracting the shift value from the given lags_sequence + indices = [lag - shift for lag in self.config.lags_sequence] + + # checks if the maximum lag plus the length of the subsequences exceeds the length of the input sequence + sequence_length = sequence.shape[1] + if max(indices) + subsequences_length > sequence_length: + raise ValueError( + f"lags cannot go further than history length, found lag {max(indices)} " + f"while history length is only {sequence_length}" + ) + + # extracts the lagged subsequences from the input sequence using the calculated indices + lagged_values = [] + for lag_index in indices: + begin_index = -lag_index - subsequences_length + end_index = -lag_index if lag_index > 0 else None + lagged_values.append(sequence[:, begin_index:end_index, ...]) + + # return as stacked tensor in the feature dimension + return torch.stack(lagged_values, dim=-1) + + def create_network_inputs( + self, + past_values: torch.Tensor, + past_time_features: torch.Tensor, + static_categorical_features: Optional[torch.Tensor] = None, + static_real_features: Optional[torch.Tensor] = None, + past_observed_mask: Optional[torch.Tensor] = None, + future_values: Optional[torch.Tensor] = None, + future_time_features: Optional[torch.Tensor] = None, + ) -> Tuple[torch.Tensor, torch.Tensor, torch.Tensor, torch.Tensor, torch.Tensor]: + """ + Creates the inputs for the network given the past and future values, time features, and static features. + + Args: + past_values (`torch.Tensor`): + A tensor of shape `(batch_size, past_length, input_size)` containing the past values. + past_time_features (`torch.Tensor`): + A tensor of shape `(batch_size, past_length, num_features)` containing the past time features. + static_categorical_features (`Optional[torch.Tensor]`): + An optional tensor of shape `(batch_size, num_categorical_features)` containing the static categorical + features. + static_real_features (`Optional[torch.Tensor]`): + An optional tensor of shape `(batch_size, num_real_features)` containing the static real features. + past_observed_mask (`Optional[torch.Tensor]`): + An optional tensor of shape `(batch_size, past_length, input_size)` containing the mask of observed + values in the past. + future_values (`Optional[torch.Tensor]`): + An optional tensor of shape `(batch_size, future_length, input_size)` containing the future values. + + Returns: + A tuple containing the following tensors: + - reshaped_lagged_sequence (`torch.Tensor`): A tensor of shape `(batch_size, sequence_length, num_lags * + input_size)` containing the lagged subsequences of the inputs. + - features (`torch.Tensor`): A tensor of shape `(batch_size, sequence_length, num_features)` containing the + concatenated static and time features. + - loc (`torch.Tensor`): A tensor of shape `(batch_size, input_size)` containing the mean of the input + values. + - scale (`torch.Tensor`): A tensor of shape `(batch_size, input_size)` containing the std of the input + values. + - static_feat (`torch.Tensor`): A tensor of shape `(batch_size, num_static_features)` containing the + concatenated static features. + """ + # time feature + time_feat = ( + torch.cat( + ( + past_time_features[:, self._past_length - self.config.context_length :, ...], + future_time_features, + ), + dim=1, + ) + if future_values is not None + else past_time_features[:, self._past_length - self.config.context_length :, ...] + ) + + # target + if past_observed_mask is None: + past_observed_mask = torch.ones_like(past_values) + + context = past_values[:, -self.config.context_length :] + observed_context = past_observed_mask[:, -self.config.context_length :] + _, loc, scale = self.scaler(context, observed_context) + + inputs = ( + (torch.cat((past_values, future_values), dim=1) - loc) / scale + if future_values is not None + else (past_values - loc) / scale + ) + + # static features + log_abs_loc = loc.abs().log1p() if self.config.input_size == 1 else loc.squeeze(1).abs().log1p() + log_scale = scale.log() if self.config.input_size == 1 else scale.squeeze(1).log() + static_feat = torch.cat((log_abs_loc, log_scale), dim=1) + + if static_real_features is not None: + static_feat = torch.cat((static_real_features, static_feat), dim=1) + if static_categorical_features is not None: + embedded_cat = self.embedder(static_categorical_features) + static_feat = torch.cat((embedded_cat, static_feat), dim=1) + expanded_static_feat = static_feat.unsqueeze(1).expand(-1, time_feat.shape[1], -1) + + # all features + features = torch.cat((expanded_static_feat, time_feat), dim=-1) + + # lagged features + subsequences_length = ( + self.config.context_length + self.config.prediction_length + if future_values is not None + else self.config.context_length + ) + lagged_sequence = self.get_lagged_subsequences(sequence=inputs, subsequences_length=subsequences_length) + lags_shape = lagged_sequence.shape + reshaped_lagged_sequence = lagged_sequence.reshape(lags_shape[0], lags_shape[1], -1) + + if reshaped_lagged_sequence.shape[1] != time_feat.shape[1]: + raise ValueError( + f"input length {reshaped_lagged_sequence.shape[1]} and time feature lengths {time_feat.shape[1]} does not match" + ) + return reshaped_lagged_sequence, features, loc, scale, static_feat + + def get_encoder(self): + return self.encoder + + def get_decoder(self): + return self.decoder + + @add_start_docstrings_to_model_forward(AUTOFORMER_INPUTS_DOCSTRING) + @replace_return_docstrings(output_type=AutoformerModelOutput, config_class=_CONFIG_FOR_DOC) + def forward( + self, + past_values: torch.Tensor, + past_time_features: torch.Tensor, + past_observed_mask: torch.Tensor, + static_categorical_features: Optional[torch.Tensor] = None, + static_real_features: Optional[torch.Tensor] = None, + future_values: Optional[torch.Tensor] = None, + future_time_features: Optional[torch.Tensor] = None, + decoder_attention_mask: Optional[torch.LongTensor] = None, + head_mask: Optional[torch.Tensor] = None, + decoder_head_mask: Optional[torch.Tensor] = None, + cross_attn_head_mask: Optional[torch.Tensor] = None, + encoder_outputs: Optional[List[torch.FloatTensor]] = None, + past_key_values: Optional[List[torch.FloatTensor]] = None, + output_hidden_states: Optional[bool] = None, + output_attentions: Optional[bool] = None, + use_cache: Optional[bool] = None, + return_dict: Optional[bool] = None, + ) -> Union[AutoformerModelOutput, Tuple]: + r""" + Returns: + + Examples: + + ```python + >>> from huggingface_hub import hf_hub_download + >>> import torch + >>> from transformers import AutoformerModel + + >>> file = hf_hub_download( + ... repo_id="hf-internal-testing/tourism-monthly-batch", filename="train-batch.pt", repo_type="dataset" + ... ) + >>> batch = torch.load(file) + + >>> model = AutoformerModel.from_pretrained("huggingface/autoformer-tourism-monthly") + + >>> # during training, one provides both past and future values + >>> # as well as possible additional features + >>> outputs = model( + ... past_values=batch["past_values"], + ... past_time_features=batch["past_time_features"], + ... past_observed_mask=batch["past_observed_mask"], + ... static_categorical_features=batch["static_categorical_features"], + ... future_values=batch["future_values"], + ... future_time_features=batch["future_time_features"], + ... ) + + >>> last_hidden_state = outputs.last_hidden_state + ```""" + output_attentions = output_attentions if output_attentions is not None else self.config.output_attentions + output_hidden_states = ( + output_hidden_states if output_hidden_states is not None else self.config.output_hidden_states + ) + use_cache = use_cache if use_cache is not None else self.config.use_cache + return_dict = return_dict if return_dict is not None else self.config.use_return_dict + + transformer_inputs, temporal_features, loc, scale, static_feat = self.create_network_inputs( + past_values=past_values, + past_time_features=past_time_features, + past_observed_mask=past_observed_mask, + static_categorical_features=static_categorical_features, + static_real_features=static_real_features, + future_values=future_values, + future_time_features=future_time_features, + ) + + if encoder_outputs is None: + enc_input = torch.cat( + ( + transformer_inputs[:, : self.config.context_length, ...], + temporal_features[:, : self.config.context_length, ...], + ), + dim=-1, + ) + encoder_outputs = self.encoder( + inputs_embeds=enc_input, + head_mask=head_mask, + output_attentions=output_attentions, + output_hidden_states=output_hidden_states, + return_dict=return_dict, + ) + # If the user passed a tuple for encoder_outputs, we wrap it in a BaseModelOutput when return_dict=True + elif return_dict and not isinstance(encoder_outputs, BaseModelOutput): + encoder_outputs = BaseModelOutput( + last_hidden_state=encoder_outputs[0], + hidden_states=encoder_outputs[1] if len(encoder_outputs) > 1 else None, + attentions=encoder_outputs[2] if len(encoder_outputs) > 2 else None, + ) + + if future_values is not None: + # Decoder inputs + # seasonality and trend from context length + seasonal_input, trend_input = self.decomposition_layer( + transformer_inputs[:, : self.config.context_length, ...] + ) + mean = ( + torch.mean(transformer_inputs[:, : self.config.context_length, ...], dim=1) + .unsqueeze(1) + .repeat(1, self.config.prediction_length, 1) + ) + zeros = torch.zeros( + [transformer_inputs.shape[0], self.config.prediction_length, transformer_inputs.shape[2]], + device=enc_input.device, + ) + + decoder_input = torch.cat( + ( + torch.cat((seasonal_input[:, -self.config.label_length :, ...], zeros), dim=1), + temporal_features[:, self.config.context_length - self.config.label_length :, ...], + ), + dim=-1, + ) + trend_init = torch.cat( + ( + torch.cat((trend_input[:, -self.config.label_length :, ...], mean), dim=1), + temporal_features[:, self.config.context_length - self.config.label_length :, ...], + ), + dim=-1, + ) + + decoder_outputs = self.decoder( + trend=trend_init, + inputs_embeds=decoder_input, + attention_mask=decoder_attention_mask, + encoder_hidden_states=encoder_outputs[0], + head_mask=decoder_head_mask, + cross_attn_head_mask=cross_attn_head_mask, + past_key_values=past_key_values, + use_cache=use_cache, + output_attentions=output_attentions, + output_hidden_states=output_hidden_states, + return_dict=return_dict, + ) + else: + decoder_outputs = AutoFormerDecoderOutput() + + if not return_dict: + return decoder_outputs + encoder_outputs + (loc, scale, static_feat) + + return AutoformerModelOutput( + last_hidden_state=decoder_outputs.last_hidden_state, + trend=decoder_outputs.trend, + past_key_values=decoder_outputs.past_key_values, + decoder_hidden_states=decoder_outputs.hidden_states, + decoder_attentions=decoder_outputs.attentions, + cross_attentions=decoder_outputs.cross_attentions, + encoder_last_hidden_state=encoder_outputs.last_hidden_state, + encoder_hidden_states=encoder_outputs.hidden_states, + encoder_attentions=encoder_outputs.attentions, + loc=loc, + scale=scale, + static_features=static_feat, + ) + + +@add_start_docstrings( + "The Autoformer Model with a distribution head on top for time-series forecasting.", + AUTOFORMER_START_DOCSTRING, +) +class AutoformerForPrediction(AutoformerPreTrainedModel): + def __init__(self, config: AutoformerConfig): + super().__init__(config) + self.model = AutoformerModel(config) + if config.distribution_output == "student_t": + self.distribution_output = StudentTOutput(dim=config.input_size) + elif config.distribution_output == "normal": + self.distribution_output = NormalOutput(dim=config.input_size) + elif config.distribution_output == "negative_binomial": + self.distribution_output = NegativeBinomialOutput(dim=config.input_size) + else: + raise ValueError(f"Unknown distribution output {config.distribution_output}") + + self.parameter_projection = self.distribution_output.get_parameter_projection(self.model.config.feature_size) + self.target_shape = self.distribution_output.event_shape + + if config.loss == "nll": + self.loss = nll + else: + raise ValueError(f"Unknown loss function {config.loss}") + + # Initialize weights of distribution_output and apply final processing + self.post_init() + + def output_params(self, decoder_output): + return self.parameter_projection(decoder_output[:, -self.config.prediction_length :, :]) + + def get_encoder(self): + return self.model.get_encoder() + + def get_decoder(self): + return self.model.get_decoder() + + @torch.jit.ignore + def output_distribution(self, params, loc=None, scale=None, trailing_n=None) -> torch.distributions.Distribution: + sliced_params = params + if trailing_n is not None: + sliced_params = [p[:, -trailing_n:] for p in params] + return self.distribution_output.distribution(sliced_params, loc=loc, scale=scale) + + @add_start_docstrings_to_model_forward(AUTOFORMER_INPUTS_DOCSTRING) + @replace_return_docstrings(output_type=Seq2SeqTSPredictionOutput, config_class=_CONFIG_FOR_DOC) + def forward( + self, + past_values: torch.Tensor, + past_time_features: torch.Tensor, + past_observed_mask: torch.Tensor, + static_categorical_features: Optional[torch.Tensor] = None, + static_real_features: Optional[torch.Tensor] = None, + future_values: Optional[torch.Tensor] = None, + future_time_features: Optional[torch.Tensor] = None, + future_observed_mask: Optional[torch.Tensor] = None, + decoder_attention_mask: Optional[torch.LongTensor] = None, + head_mask: Optional[torch.Tensor] = None, + decoder_head_mask: Optional[torch.Tensor] = None, + cross_attn_head_mask: Optional[torch.Tensor] = None, + encoder_outputs: Optional[List[torch.FloatTensor]] = None, + past_key_values: Optional[List[torch.FloatTensor]] = None, + output_hidden_states: Optional[bool] = None, + output_attentions: Optional[bool] = None, + use_cache: Optional[bool] = None, + return_dict: Optional[bool] = None, + ) -> Union[Seq2SeqTSPredictionOutput, Tuple]: + r""" + Returns: + + Examples: + + ```python + >>> from huggingface_hub import hf_hub_download + >>> import torch + >>> from transformers import AutoformerForPrediction + + >>> file = hf_hub_download( + ... repo_id="hf-internal-testing/tourism-monthly-batch", filename="train-batch.pt", repo_type="dataset" + ... ) + >>> batch = torch.load(file) + + >>> model = AutoformerForPrediction.from_pretrained("huggingface/autoformer-tourism-monthly") + + >>> # during training, one provides both past and future values + >>> # as well as possible additional features + >>> outputs = model( + ... past_values=batch["past_values"], + ... past_time_features=batch["past_time_features"], + ... past_observed_mask=batch["past_observed_mask"], + ... static_categorical_features=batch["static_categorical_features"], + ... static_real_features=batch["static_real_features"], + ... future_values=batch["future_values"], + ... future_time_features=batch["future_time_features"], + ... ) + + >>> loss = outputs.loss + >>> loss.backward() + + >>> # during inference, one only provides past values + >>> # as well as possible additional features + >>> # the model autoregressively generates future values + >>> outputs = model.generate( + ... past_values=batch["past_values"], + ... past_time_features=batch["past_time_features"], + ... past_observed_mask=batch["past_observed_mask"], + ... static_categorical_features=batch["static_categorical_features"], + ... static_real_features=batch["static_real_features"], + ... future_time_features=batch["future_time_features"], + ... ) + + >>> mean_prediction = outputs.sequences.mean(dim=1) + ```""" + + return_dict = return_dict if return_dict is not None else self.config.use_return_dict + if future_values is not None: + use_cache = False + + outputs = self.model( + past_values=past_values, + past_time_features=past_time_features, + past_observed_mask=past_observed_mask, + static_categorical_features=static_categorical_features, + static_real_features=static_real_features, + future_values=future_values, + future_time_features=future_time_features, + decoder_attention_mask=decoder_attention_mask, + head_mask=head_mask, + decoder_head_mask=decoder_head_mask, + cross_attn_head_mask=cross_attn_head_mask, + encoder_outputs=encoder_outputs, + past_key_values=past_key_values, + output_hidden_states=output_hidden_states, + output_attentions=output_attentions, + use_cache=use_cache, + return_dict=return_dict, + ) + + prediction_loss = None + params = None + if future_values is not None: + # outputs.last_hidden_state and trend + # loc is 4rd last and scale is 3rd last output + params = self.output_params(outputs[0] + outputs[1]) + distribution = self.output_distribution(params, loc=outputs[-3], scale=outputs[-2]) + + loss = self.loss(distribution, future_values) + + if future_observed_mask is None: + future_observed_mask = torch.ones_like(future_values) + + if len(self.target_shape) == 0: + loss_weights = future_observed_mask + else: + loss_weights, _ = future_observed_mask.min(dim=-1, keepdim=False) + + prediction_loss = weighted_average(loss, weights=loss_weights) + + if not return_dict: + outputs = ((params,) + outputs[2:]) if params is not None else outputs[2:] + return ((prediction_loss,) + outputs) if prediction_loss is not None else outputs + + return Seq2SeqTSPredictionOutput( + loss=prediction_loss, + params=params, + past_key_values=outputs.past_key_values, + decoder_hidden_states=outputs.decoder_hidden_states, + decoder_attentions=outputs.decoder_attentions, + cross_attentions=outputs.cross_attentions, + encoder_last_hidden_state=outputs.encoder_last_hidden_state, + encoder_hidden_states=outputs.encoder_hidden_states, + encoder_attentions=outputs.encoder_attentions, + loc=outputs.loc, + scale=outputs.scale, + static_features=outputs.static_features, + ) + + @torch.no_grad() + def generate( + self, + past_values: torch.Tensor, + past_time_features: torch.Tensor, + future_time_features: torch.Tensor, + past_observed_mask: Optional[torch.Tensor] = None, + static_categorical_features: Optional[torch.Tensor] = None, + static_real_features: Optional[torch.Tensor] = None, + output_attentions: Optional[bool] = None, + output_hidden_states: Optional[bool] = None, + ) -> SampleTSPredictionOutput: + r""" + Greedily generate sequences of sample predictions from a model with a probability distribution head. + + Parameters: + past_values (`torch.FloatTensor` of shape `(batch_size, sequence_length)` or `(batch_size, sequence_length, input_size)`): + Past values of the time series, that serve as context in order to predict the future. The sequence size + of this tensor must be larger than the `context_length` of the model, since the model will use the + larger size to construct lag features, i.e. additional values from the past which are added in order to + serve as "extra context". + + The `sequence_length` here is equal to `config.context_length` + `max(config.lags_sequence)`, which if + no `lags_sequence` is configured, is equal to `config.context_length` + 7 (as by default, the largest + look-back index in `config.lags_sequence` is 7). The property `_past_length` returns the actual length + of the past. + + The `past_values` is what the Transformer encoder gets as input (with optional additional features, + such as `static_categorical_features`, `static_real_features`, `past_time_features` and lags). + + Optionally, missing values need to be replaced with zeros and indicated via the `past_observed_mask`. + + For multivariate time series, the `input_size` > 1 dimension is required and corresponds to the number + of variates in the time series per time step. + past_time_features (`torch.FloatTensor` of shape `(batch_size, sequence_length, num_features)`): + Required time features, which the model internally will add to `past_values`. These could be things + like "month of year", "day of the month", etc. encoded as vectors (for instance as Fourier features). + These could also be so-called "age" features, which basically help the model know "at which point in + life" a time-series is. Age features have small values for distant past time steps and increase + monotonically the more we approach the current time step. Holiday features are also a good example of + time features. + + These features serve as the "positional encodings" of the inputs. So contrary to a model like BERT, + where the position encodings are learned from scratch internally as parameters of the model, the Time + Series Transformer requires to provide additional time features. The Time Series Transformer only + learns additional embeddings for `static_categorical_features`. + + Additional dynamic real covariates can be concatenated to this tensor, with the caveat that these + features must but known at prediction time. + + The `num_features` here is equal to `config.`num_time_features` + `config.num_dynamic_real_features`. + future_time_features (`torch.FloatTensor` of shape `(batch_size, prediction_length, num_features)`): + Required time features for the prediction window, which the model internally will add to sampled + predictions. These could be things like "month of year", "day of the month", etc. encoded as vectors + (for instance as Fourier features). These could also be so-called "age" features, which basically help + the model know "at which point in life" a time-series is. Age features have small values for distant + past time steps and increase monotonically the more we approach the current time step. Holiday features + are also a good example of time features. + + These features serve as the "positional encodings" of the inputs. So contrary to a model like BERT, + where the position encodings are learned from scratch internally as parameters of the model, the Time + Series Transformer requires to provide additional time features. The Time Series Transformer only + learns additional embeddings for `static_categorical_features`. + + Additional dynamic real covariates can be concatenated to this tensor, with the caveat that these + features must but known at prediction time. + + The `num_features` here is equal to `config.`num_time_features` + `config.num_dynamic_real_features`. + past_observed_mask (`torch.BoolTensor` of shape `(batch_size, sequence_length)` or `(batch_size, sequence_length, input_size)`, *optional*): + Boolean mask to indicate which `past_values` were observed and which were missing. Mask values selected + in `[0, 1]`: + + - 1 for values that are **observed**, + - 0 for values that are **missing** (i.e. NaNs that were replaced by zeros). + + static_categorical_features (`torch.LongTensor` of shape `(batch_size, number of static categorical features)`, *optional*): + Optional static categorical features for which the model will learn an embedding, which it will add to + the values of the time series. + + Static categorical features are features which have the same value for all time steps (static over + time). + + A typical example of a static categorical feature is a time series ID. + static_real_features (`torch.FloatTensor` of shape `(batch_size, number of static real features)`, *optional*): + Optional static real features which the model will add to the values of the time series. + + Static real features are features which have the same value for all time steps (static over time). + + A typical example of a static real feature is promotion information. + output_attentions (`bool`, *optional*): + Whether or not to return the attentions tensors of all attention layers. + output_hidden_states (`bool`, *optional*): + Whether or not to return the hidden states of all layers. + + Return: + [`SampleTSPredictionOutput`] where the outputs `sequences` tensor will have shape `(batch_size, number of + samples, prediction_length)` or `(batch_size, number of samples, prediction_length, input_size)` for + multivariate predictions. + """ + outputs = self( + static_categorical_features=static_categorical_features, + static_real_features=static_real_features, + past_time_features=past_time_features, + past_values=past_values, + past_observed_mask=past_observed_mask, + future_time_features=None, + future_values=None, + output_attentions=output_attentions, + output_hidden_states=output_hidden_states, + return_dict=True, + use_cache=False, + ) + + decoder = self.model.get_decoder() + enc_last_hidden = outputs.encoder_last_hidden_state + loc = outputs.loc + scale = outputs.scale + static_feat = outputs.static_features + + num_parallel_samples = self.config.num_parallel_samples + repeated_loc = loc.repeat_interleave(repeats=num_parallel_samples, dim=0) + repeated_scale = scale.repeat_interleave(repeats=num_parallel_samples, dim=0) + + repeated_past_values = ( + past_values.repeat_interleave(repeats=num_parallel_samples, dim=0) - repeated_loc + ) / repeated_scale + + time_features = torch.cat((past_time_features, future_time_features), dim=1) + + expanded_static_feat = static_feat.unsqueeze(1).expand(-1, time_features.shape[1], -1) + features = torch.cat((expanded_static_feat, time_features), dim=-1) + repeated_features = features.repeat_interleave(repeats=num_parallel_samples, dim=0) + + repeated_enc_last_hidden = enc_last_hidden.repeat_interleave(repeats=num_parallel_samples, dim=0) + + lagged_sequence = self.model.get_lagged_subsequences( + sequence=repeated_past_values, subsequences_length=self.config.context_length + ) + lags_shape = lagged_sequence.shape + reshaped_lagged_sequence = lagged_sequence.reshape(lags_shape[0], lags_shape[1], -1) + seasonal_input, trend_input = self.model.decomposition_layer(reshaped_lagged_sequence) + + mean = torch.mean(reshaped_lagged_sequence, dim=1).unsqueeze(1).repeat(1, self.config.prediction_length, 1) + zeros = torch.zeros( + [reshaped_lagged_sequence.shape[0], self.config.prediction_length, reshaped_lagged_sequence.shape[2]], + device=reshaped_lagged_sequence.device, + ) + + decoder_input = torch.cat( + ( + torch.cat((seasonal_input[:, -self.config.label_length :, ...], zeros), dim=1), + repeated_features[:, -self.config.prediction_length - self.config.label_length :, ...], + ), + dim=-1, + ) + trend_init = torch.cat( + ( + torch.cat((trend_input[:, -self.config.label_length :, ...], mean), dim=1), + repeated_features[:, -self.config.prediction_length - self.config.label_length :, ...], + ), + dim=-1, + ) + decoder_outputs = decoder( + trend=trend_init, inputs_embeds=decoder_input, encoder_hidden_states=repeated_enc_last_hidden + ) + decoder_last_hidden = decoder_outputs.last_hidden_state + trend = decoder_outputs.trend + params = self.output_params(decoder_last_hidden + trend) + distr = self.output_distribution(params, loc=repeated_loc, scale=repeated_scale) + future_samples = distr.sample() + + return SampleTSPredictionOutput( + sequences=future_samples.reshape( + (-1, num_parallel_samples, self.config.prediction_length) + self.target_shape, + ) + ) diff --git a/src/transformers/models/bart/modeling_tf_bart.py b/src/transformers/models/bart/modeling_tf_bart.py index 39537b88bfce98..e2555381f4bdd3 100644 --- a/src/transformers/models/bart/modeling_tf_bart.py +++ b/src/transformers/models/bart/modeling_tf_bart.py @@ -15,6 +15,8 @@ """ TF 2.0 Bart model.""" +from __future__ import annotations + import random from typing import Optional, Tuple, Union @@ -32,7 +34,6 @@ # Public API from ...modeling_tf_utils import ( - DUMMY_INPUTS, TFCausalLanguageModelingLoss, TFModelInputType, TFPreTrainedModel, @@ -131,7 +132,7 @@ def call( self, input_shape: Optional[tf.TensorShape] = None, past_key_values_length: int = 0, - position_ids: Optional[tf.Tensor] = None, + position_ids: tf.Tensor | None = None, ): """Input is expected to be of size [bsz x seqlen].""" if position_ids is None: @@ -180,12 +181,12 @@ def _shape(self, tensor: tf.Tensor, seq_len: int, bsz: int): def call( self, hidden_states: tf.Tensor, - key_value_states: Optional[tf.Tensor] = None, - past_key_value: Optional[Tuple[Tuple[tf.Tensor]]] = None, - attention_mask: Optional[tf.Tensor] = None, - layer_head_mask: Optional[tf.Tensor] = None, + key_value_states: tf.Tensor | None = None, + past_key_value: Tuple[Tuple[tf.Tensor]] | None = None, + attention_mask: tf.Tensor | None = None, + layer_head_mask: tf.Tensor | None = None, training: Optional[bool] = False, - ) -> Tuple[tf.Tensor, Optional[tf.Tensor]]: + ) -> Tuple[tf.Tensor, tf.Tensor | None]: """Input shape: Batch x Time x Channel""" # if key_value_states are provided this layer is used as a cross-attention layer @@ -314,8 +315,8 @@ def __init__(self, config: BartConfig, **kwargs): def call( self, hidden_states: tf.Tensor, - attention_mask: Optional[Union[np.ndarray, tf.Tensor]], - layer_head_mask: Optional[tf.Tensor], + attention_mask: np.ndarray | tf.Tensor | None, + layer_head_mask: tf.Tensor | None, training: Optional[bool] = False, ) -> tf.Tensor: """ @@ -383,11 +384,11 @@ def __init__(self, config: BartConfig, **kwargs): def call( self, hidden_states: tf.Tensor, - attention_mask: Optional[Union[np.ndarray, tf.Tensor]] = None, - encoder_hidden_states: Optional[Union[np.ndarray, tf.Tensor]] = None, - encoder_attention_mask: Optional[Union[np.ndarray, tf.Tensor]] = None, - layer_head_mask: Optional[tf.Tensor] = None, - cross_attn_layer_head_mask: Optional[tf.Tensor] = None, + attention_mask: np.ndarray | tf.Tensor | None = None, + encoder_hidden_states: np.ndarray | tf.Tensor | None = None, + encoder_attention_mask: np.ndarray | tf.Tensor | None = None, + layer_head_mask: tf.Tensor | None = None, + cross_attn_layer_head_mask: tf.Tensor | None = None, past_key_value: Optional[Tuple[Tuple[Union[np.ndarray, tf.Tensor]]]] = None, training: Optional[bool] = False, ) -> Tuple[tf.Tensor, tf.Tensor, Tuple[Tuple[tf.Tensor]]]: @@ -485,31 +486,14 @@ class TFBartPretrainedModel(TFPreTrainedModel): @property def dummy_inputs(self): - pad_token = 1 - input_ids = tf.convert_to_tensor(DUMMY_INPUTS, dtype=tf.int32) - decoder_input_ids = tf.convert_to_tensor(DUMMY_INPUTS, dtype=tf.int32) - dummy_inputs = { - "decoder_input_ids": decoder_input_ids, - "attention_mask": tf.cast(input_ids != pad_token, tf.int32), - "input_ids": input_ids, - } + dummy_inputs = super().dummy_inputs + # Dummy inputs should not contain the default val of 1 + # as this is the padding token and some assertions check it + dummy_inputs["input_ids"] = dummy_inputs["input_ids"] * 2 + if "decoder_input_ids" in dummy_inputs: + dummy_inputs["decoder_input_ids"] = dummy_inputs["decoder_input_ids"] * 2 return dummy_inputs - @tf.function( - input_signature=[ - { - "input_ids": tf.TensorSpec((None, None), tf.int32, name="input_ids"), - "attention_mask": tf.TensorSpec((None, None), tf.int32, name="attention_mask"), - "decoder_input_ids": tf.TensorSpec((None, None), tf.int32, name="decoder_input_ids"), - "decoder_attention_mask": tf.TensorSpec((None, None), tf.int32, name="decoder_attention_mask"), - } - ] - ) - def serving(self, inputs): - output = self.call(inputs) - - return self.serving_output(output) - BART_START_DOCSTRING = r""" This model inherits from [`TFPreTrainedModel`]. Check the superclass documentation for the generic methods the @@ -700,10 +684,10 @@ def __init__(self, config: BartConfig, embed_tokens: Optional[tf.keras.layers.Em @unpack_inputs def call( self, - input_ids: Optional[TFModelInputType] = None, - inputs_embeds: Optional[Union[np.ndarray, tf.Tensor]] = None, - attention_mask: Optional[Union[np.ndarray, tf.Tensor]] = None, - head_mask: Optional[Union[np.ndarray, tf.Tensor]] = None, + input_ids: TFModelInputType | None = None, + inputs_embeds: np.ndarray | tf.Tensor | None = None, + attention_mask: np.ndarray | tf.Tensor | None = None, + head_mask: np.ndarray | tf.Tensor | None = None, output_attentions: Optional[bool] = None, output_hidden_states: Optional[bool] = None, return_dict: Optional[bool] = None, @@ -851,14 +835,14 @@ def __init__(self, config: BartConfig, embed_tokens: Optional[tf.keras.layers.Em @unpack_inputs def call( self, - input_ids: Optional[TFModelInputType] = None, - inputs_embeds: Optional[Union[np.ndarray, tf.Tensor]] = None, - attention_mask: Optional[Union[np.ndarray, tf.Tensor]] = None, - position_ids: Optional[Union[np.ndarray, tf.Tensor]] = None, - encoder_hidden_states: Optional[Union[np.ndarray, tf.Tensor]] = None, - encoder_attention_mask: Optional[Union[np.ndarray, tf.Tensor]] = None, - head_mask: Optional[Union[np.ndarray, tf.Tensor]] = None, - cross_attn_head_mask: Optional[Union[np.ndarray, tf.Tensor]] = None, + input_ids: TFModelInputType | None = None, + inputs_embeds: np.ndarray | tf.Tensor | None = None, + attention_mask: np.ndarray | tf.Tensor | None = None, + position_ids: np.ndarray | tf.Tensor | None = None, + encoder_hidden_states: np.ndarray | tf.Tensor | None = None, + encoder_attention_mask: np.ndarray | tf.Tensor | None = None, + head_mask: np.ndarray | tf.Tensor | None = None, + cross_attn_head_mask: np.ndarray | tf.Tensor | None = None, past_key_values: Optional[Tuple[Tuple[Union[np.ndarray, tf.Tensor]]]] = None, use_cache: Optional[bool] = None, output_attentions: Optional[bool] = None, @@ -1073,18 +1057,18 @@ def set_input_embeddings(self, new_embeddings): @unpack_inputs def call( self, - input_ids: Optional[TFModelInputType] = None, - attention_mask: Optional[Union[np.ndarray, tf.Tensor]] = None, - decoder_input_ids: Optional[Union[np.ndarray, tf.Tensor]] = None, - decoder_attention_mask: Optional[Union[np.ndarray, tf.Tensor]] = None, - decoder_position_ids: Optional[Union[np.ndarray, tf.Tensor]] = None, - head_mask: Optional[Union[np.ndarray, tf.Tensor]] = None, - decoder_head_mask: Optional[Union[np.ndarray, tf.Tensor]] = None, - cross_attn_head_mask: Optional[Union[np.ndarray, tf.Tensor]] = None, + input_ids: TFModelInputType | None = None, + attention_mask: np.ndarray | tf.Tensor | None = None, + decoder_input_ids: np.ndarray | tf.Tensor | None = None, + decoder_attention_mask: np.ndarray | tf.Tensor | None = None, + decoder_position_ids: np.ndarray | tf.Tensor | None = None, + head_mask: np.ndarray | tf.Tensor | None = None, + decoder_head_mask: np.ndarray | tf.Tensor | None = None, + cross_attn_head_mask: np.ndarray | tf.Tensor | None = None, encoder_outputs: Optional[Union[Tuple, TFBaseModelOutput]] = None, past_key_values: Optional[Tuple[Tuple[Union[np.ndarray, tf.Tensor]]]] = None, - inputs_embeds: Optional[Union[np.ndarray, tf.Tensor]] = None, - decoder_inputs_embeds: Optional[Union[np.ndarray, tf.Tensor]] = None, + inputs_embeds: np.ndarray | tf.Tensor | None = None, + decoder_inputs_embeds: np.ndarray | tf.Tensor | None = None, use_cache: Optional[bool] = None, output_attentions: Optional[bool] = None, output_hidden_states: Optional[bool] = None, @@ -1187,18 +1171,18 @@ def get_decoder(self): @unpack_inputs def call( self, - input_ids: Optional[TFModelInputType] = None, - attention_mask: Optional[Union[np.ndarray, tf.Tensor]] = None, - decoder_input_ids: Optional[Union[np.ndarray, tf.Tensor]] = None, - decoder_attention_mask: Optional[Union[np.ndarray, tf.Tensor]] = None, - decoder_position_ids: Optional[Union[np.ndarray, tf.Tensor]] = None, - head_mask: Optional[Union[np.ndarray, tf.Tensor]] = None, - decoder_head_mask: Optional[Union[np.ndarray, tf.Tensor]] = None, - cross_attn_head_mask: Optional[Union[np.ndarray, tf.Tensor]] = None, + input_ids: TFModelInputType | None = None, + attention_mask: np.ndarray | tf.Tensor | None = None, + decoder_input_ids: np.ndarray | tf.Tensor | None = None, + decoder_attention_mask: np.ndarray | tf.Tensor | None = None, + decoder_position_ids: np.ndarray | tf.Tensor | None = None, + head_mask: np.ndarray | tf.Tensor | None = None, + decoder_head_mask: np.ndarray | tf.Tensor | None = None, + cross_attn_head_mask: np.ndarray | tf.Tensor | None = None, encoder_outputs: Optional[Union[Tuple, TFBaseModelOutput]] = None, past_key_values: Optional[Tuple[Tuple[Union[np.ndarray, tf.Tensor]]]] = None, - inputs_embeds: Optional[Union[np.ndarray, tf.Tensor]] = None, - decoder_inputs_embeds: Optional[Union[np.ndarray, tf.Tensor]] = None, + inputs_embeds: np.ndarray | tf.Tensor | None = None, + decoder_inputs_embeds: np.ndarray | tf.Tensor | None = None, use_cache: Optional[bool] = None, output_attentions: Optional[bool] = None, output_hidden_states: Optional[bool] = None, @@ -1311,23 +1295,23 @@ def set_bias(self, value): @unpack_inputs def call( self, - input_ids: Optional[TFModelInputType] = None, - attention_mask: Optional[Union[np.ndarray, tf.Tensor]] = None, - decoder_input_ids: Optional[Union[np.ndarray, tf.Tensor]] = None, - decoder_attention_mask: Optional[Union[np.ndarray, tf.Tensor]] = None, - decoder_position_ids: Optional[Union[np.ndarray, tf.Tensor]] = None, - head_mask: Optional[Union[np.ndarray, tf.Tensor]] = None, - decoder_head_mask: Optional[Union[np.ndarray, tf.Tensor]] = None, - cross_attn_head_mask: Optional[Union[np.ndarray, tf.Tensor]] = None, + input_ids: TFModelInputType | None = None, + attention_mask: np.ndarray | tf.Tensor | None = None, + decoder_input_ids: np.ndarray | tf.Tensor | None = None, + decoder_attention_mask: np.ndarray | tf.Tensor | None = None, + decoder_position_ids: np.ndarray | tf.Tensor | None = None, + head_mask: np.ndarray | tf.Tensor | None = None, + decoder_head_mask: np.ndarray | tf.Tensor | None = None, + cross_attn_head_mask: np.ndarray | tf.Tensor | None = None, encoder_outputs: Optional[TFBaseModelOutput] = None, past_key_values: Optional[Tuple[Tuple[Union[np.ndarray, tf.Tensor]]]] = None, - inputs_embeds: Optional[Union[np.ndarray, tf.Tensor]] = None, - decoder_inputs_embeds: Optional[Union[np.ndarray, tf.Tensor]] = None, + inputs_embeds: np.ndarray | tf.Tensor | None = None, + decoder_inputs_embeds: np.ndarray | tf.Tensor | None = None, use_cache: Optional[bool] = None, output_attentions: Optional[bool] = None, output_hidden_states: Optional[bool] = None, return_dict: Optional[bool] = None, - labels: Optional[tf.Tensor] = None, + labels: tf.Tensor | None = None, training: Optional[bool] = False, ) -> Union[TFSeq2SeqLMOutput, Tuple[tf.Tensor]]: r""" @@ -1459,16 +1443,6 @@ def prepare_decoder_input_ids_from_labels(self, labels: tf.Tensor): BART_START_DOCSTRING, ) class TFBartForSequenceClassification(TFBartPretrainedModel, TFSequenceClassificationLoss): - @property - def dummy_inputs(self): - pad_token = self.config.pad_token_id - input_ids = tf.constant([[0, 6, 10, 4, 2], [0, 8, 12, 2, pad_token]]) - dummy_inputs = { - "attention_mask": tf.cast(tf.math.not_equal(input_ids, (pad_token)), dtype=tf.int32), - "input_ids": input_ids, - } - return dummy_inputs - def __init__(self, config: BartConfig, load_weight_prefix=None, *inputs, **kwargs): super().__init__(config, *inputs, **kwargs) self.model = TFBartMainLayer(config, load_weight_prefix=load_weight_prefix, name="model") @@ -1481,23 +1455,23 @@ def __init__(self, config: BartConfig, load_weight_prefix=None, *inputs, **kwarg @unpack_inputs def call( self, - input_ids: Optional[TFModelInputType] = None, - attention_mask: Optional[Union[np.ndarray, tf.Tensor]] = None, - decoder_input_ids: Optional[Union[np.ndarray, tf.Tensor]] = None, - decoder_attention_mask: Optional[Union[np.ndarray, tf.Tensor]] = None, - decoder_position_ids: Optional[Union[np.ndarray, tf.Tensor]] = None, - head_mask: Optional[Union[np.ndarray, tf.Tensor]] = None, - decoder_head_mask: Optional[Union[np.ndarray, tf.Tensor]] = None, - cross_attn_head_mask: Optional[Union[np.ndarray, tf.Tensor]] = None, + input_ids: TFModelInputType | None = None, + attention_mask: np.ndarray | tf.Tensor | None = None, + decoder_input_ids: np.ndarray | tf.Tensor | None = None, + decoder_attention_mask: np.ndarray | tf.Tensor | None = None, + decoder_position_ids: np.ndarray | tf.Tensor | None = None, + head_mask: np.ndarray | tf.Tensor | None = None, + decoder_head_mask: np.ndarray | tf.Tensor | None = None, + cross_attn_head_mask: np.ndarray | tf.Tensor | None = None, encoder_outputs: Optional[TFBaseModelOutput] = None, past_key_values: Optional[Tuple[Tuple[Union[np.ndarray, tf.Tensor]]]] = None, - inputs_embeds: Optional[Union[np.ndarray, tf.Tensor]] = None, - decoder_inputs_embeds: Optional[Union[np.ndarray, tf.Tensor]] = None, + inputs_embeds: np.ndarray | tf.Tensor | None = None, + decoder_inputs_embeds: np.ndarray | tf.Tensor | None = None, use_cache: Optional[bool] = None, output_attentions: Optional[bool] = None, output_hidden_states: Optional[bool] = None, return_dict: Optional[bool] = None, - labels: Optional[tf.Tensor] = None, + labels: tf.Tensor | None = None, training: Optional[bool] = False, ) -> Union[TFSeq2SeqSequenceClassifierOutput, Tuple[tf.Tensor]]: r""" diff --git a/src/transformers/models/bert/modeling_tf_bert.py b/src/transformers/models/bert/modeling_tf_bert.py index 50ff7f2dddaa11..fd0a07b415f4f2 100644 --- a/src/transformers/models/bert/modeling_tf_bert.py +++ b/src/transformers/models/bert/modeling_tf_bert.py @@ -15,6 +15,9 @@ # limitations under the License. """ TF 2.0 BERT model.""" + +from __future__ import annotations + import math import warnings from dataclasses import dataclass @@ -51,8 +54,6 @@ ) from ...tf_utils import check_embeddings_within_bounds, shape_list, stable_softmax from ...utils import ( - DUMMY_INPUTS, - MULTIPLE_CHOICE_DUMMY_INPUTS, ModelOutput, add_code_sample_docstrings, add_start_docstrings, @@ -452,9 +453,9 @@ def call( hidden_states: tf.Tensor, attention_mask: tf.Tensor, head_mask: tf.Tensor, - encoder_hidden_states: Optional[tf.Tensor], - encoder_attention_mask: Optional[tf.Tensor], - past_key_value: Optional[Tuple[tf.Tensor]], + encoder_hidden_states: tf.Tensor | None, + encoder_attention_mask: tf.Tensor | None, + past_key_value: Tuple[tf.Tensor] | None, output_attentions: bool, training: bool = False, ) -> Tuple[tf.Tensor]: @@ -530,9 +531,9 @@ def call( hidden_states: tf.Tensor, attention_mask: tf.Tensor, head_mask: tf.Tensor, - encoder_hidden_states: Optional[tf.Tensor], - encoder_attention_mask: Optional[tf.Tensor], - past_key_values: Optional[Tuple[Tuple[tf.Tensor]]], + encoder_hidden_states: tf.Tensor | None, + encoder_attention_mask: tf.Tensor | None, + past_key_values: Tuple[Tuple[tf.Tensor]] | None, use_cache: Optional[bool], output_attentions: bool, output_hidden_states: bool, @@ -735,14 +736,14 @@ class PreTrainedModel @unpack_inputs def call( self, - input_ids: Optional[TFModelInputType] = None, - attention_mask: Optional[Union[np.ndarray, tf.Tensor]] = None, - token_type_ids: Optional[Union[np.ndarray, tf.Tensor]] = None, - position_ids: Optional[Union[np.ndarray, tf.Tensor]] = None, - head_mask: Optional[Union[np.ndarray, tf.Tensor]] = None, - inputs_embeds: Optional[Union[np.ndarray, tf.Tensor]] = None, - encoder_hidden_states: Optional[Union[np.ndarray, tf.Tensor]] = None, - encoder_attention_mask: Optional[Union[np.ndarray, tf.Tensor]] = None, + input_ids: TFModelInputType | None = None, + attention_mask: np.ndarray | tf.Tensor | None = None, + token_type_ids: np.ndarray | tf.Tensor | None = None, + position_ids: np.ndarray | tf.Tensor | None = None, + head_mask: np.ndarray | tf.Tensor | None = None, + inputs_embeds: np.ndarray | tf.Tensor | None = None, + encoder_hidden_states: np.ndarray | tf.Tensor | None = None, + encoder_attention_mask: np.ndarray | tf.Tensor | None = None, past_key_values: Optional[Tuple[Tuple[Union[np.ndarray, tf.Tensor]]]] = None, use_cache: Optional[bool] = None, output_attentions: Optional[bool] = None, @@ -900,24 +901,6 @@ class TFBertPreTrainedModel(TFPreTrainedModel): config_class = BertConfig base_model_prefix = "bert" - @property - def dummy_inputs(self): - """ - Dummy inputs to build the network. - - Returns: - `Dict[str, tf.Tensor]`: The dummy inputs. - """ - dummy = {"input_ids": tf.constant(DUMMY_INPUTS, dtype=tf.int32)} - # Add `encoder_hidden_states` to make the cross-attention layers' weights initialized - if self.config.add_cross_attention: - batch_size, seq_len = tf.constant(DUMMY_INPUTS).shape - shape = (batch_size, seq_len) + (self.config.hidden_size,) - h = tf.random.uniform(shape=shape) - dummy["encoder_hidden_states"] = h - - return dummy - @dataclass class TFBertForPreTrainingOutput(ModelOutput): @@ -943,7 +926,7 @@ class TFBertForPreTrainingOutput(ModelOutput): heads. """ - loss: Optional[tf.Tensor] = None + loss: tf.Tensor | None = None prediction_logits: tf.Tensor = None seq_relationship_logits: tf.Tensor = None hidden_states: Optional[Union[Tuple[tf.Tensor], tf.Tensor]] = None @@ -1067,14 +1050,14 @@ def __init__(self, config: BertConfig, *inputs, **kwargs): ) def call( self, - input_ids: Optional[TFModelInputType] = None, - attention_mask: Optional[Union[np.ndarray, tf.Tensor]] = None, - token_type_ids: Optional[Union[np.ndarray, tf.Tensor]] = None, - position_ids: Optional[Union[np.ndarray, tf.Tensor]] = None, - head_mask: Optional[Union[np.ndarray, tf.Tensor]] = None, - inputs_embeds: Optional[Union[np.ndarray, tf.Tensor]] = None, - encoder_hidden_states: Optional[Union[np.ndarray, tf.Tensor]] = None, - encoder_attention_mask: Optional[Union[np.ndarray, tf.Tensor]] = None, + input_ids: TFModelInputType | None = None, + attention_mask: np.ndarray | tf.Tensor | None = None, + token_type_ids: np.ndarray | tf.Tensor | None = None, + position_ids: np.ndarray | tf.Tensor | None = None, + head_mask: np.ndarray | tf.Tensor | None = None, + inputs_embeds: np.ndarray | tf.Tensor | None = None, + encoder_hidden_states: np.ndarray | tf.Tensor | None = None, + encoder_attention_mask: np.ndarray | tf.Tensor | None = None, past_key_values: Optional[Tuple[Tuple[Union[np.ndarray, tf.Tensor]]]] = None, use_cache: Optional[bool] = None, output_attentions: Optional[bool] = None, @@ -1120,26 +1103,6 @@ def call( ) return outputs - def serving_output( - self, output: TFBaseModelOutputWithPoolingAndCrossAttentions - ) -> TFBaseModelOutputWithPoolingAndCrossAttentions: - output_cache = self.config.use_cache and self.config.is_decoder - pkv = tf.convert_to_tensor(output.past_key_values) if output_cache else None - hs = tf.convert_to_tensor(output.hidden_states) if self.config.output_hidden_states else None - attns = tf.convert_to_tensor(output.attentions) if self.config.output_attentions else None - cross_attns = tf.convert_to_tensor(output.cross_attentions) if output.cross_attentions is not None else None - if not (self.config.output_attentions and self.config.add_cross_attention): - cross_attns = None - - return TFBaseModelOutputWithPoolingAndCrossAttentions( - last_hidden_state=output.last_hidden_state, - pooler_output=output.pooler_output, - past_key_values=pkv, - hidden_states=hs, - attentions=attns, - cross_attentions=cross_attns, - ) - @add_start_docstrings( """ @@ -1175,17 +1138,17 @@ def get_prefix_bias_name(self) -> str: @replace_return_docstrings(output_type=TFBertForPreTrainingOutput, config_class=_CONFIG_FOR_DOC) def call( self, - input_ids: Optional[TFModelInputType] = None, - attention_mask: Optional[Union[np.ndarray, tf.Tensor]] = None, - token_type_ids: Optional[Union[np.ndarray, tf.Tensor]] = None, - position_ids: Optional[Union[np.ndarray, tf.Tensor]] = None, - head_mask: Optional[Union[np.ndarray, tf.Tensor]] = None, - inputs_embeds: Optional[Union[np.ndarray, tf.Tensor]] = None, + input_ids: TFModelInputType | None = None, + attention_mask: np.ndarray | tf.Tensor | None = None, + token_type_ids: np.ndarray | tf.Tensor | None = None, + position_ids: np.ndarray | tf.Tensor | None = None, + head_mask: np.ndarray | tf.Tensor | None = None, + inputs_embeds: np.ndarray | tf.Tensor | None = None, output_attentions: Optional[bool] = None, output_hidden_states: Optional[bool] = None, return_dict: Optional[bool] = None, - labels: Optional[Union[np.ndarray, tf.Tensor]] = None, - next_sentence_label: Optional[Union[np.ndarray, tf.Tensor]] = None, + labels: np.ndarray | tf.Tensor | None = None, + next_sentence_label: np.ndarray | tf.Tensor | None = None, training: Optional[bool] = False, ) -> Union[TFBertForPreTrainingOutput, Tuple[tf.Tensor]]: r""" @@ -1252,17 +1215,6 @@ def call( attentions=outputs.attentions, ) - def serving_output(self, output: TFBertForPreTrainingOutput) -> TFBertForPreTrainingOutput: - hs = tf.convert_to_tensor(output.hidden_states) if self.config.output_hidden_states else None - attns = tf.convert_to_tensor(output.attentions) if self.config.output_attentions else None - - return TFBertForPreTrainingOutput( - prediction_logits=output.prediction_logits, - seq_relationship_logits=output.seq_relationship_logits, - hidden_states=hs, - attentions=attns, - ) - @add_start_docstrings("""Bert Model with a `language modeling` head on top.""", BERT_START_DOCSTRING) class TFBertForMaskedLM(TFBertPreTrainedModel, TFMaskedLanguageModelingLoss): @@ -1304,16 +1256,16 @@ def get_prefix_bias_name(self) -> str: ) def call( self, - input_ids: Optional[TFModelInputType] = None, - attention_mask: Optional[Union[np.ndarray, tf.Tensor]] = None, - token_type_ids: Optional[Union[np.ndarray, tf.Tensor]] = None, - position_ids: Optional[Union[np.ndarray, tf.Tensor]] = None, - head_mask: Optional[Union[np.ndarray, tf.Tensor]] = None, - inputs_embeds: Optional[Union[np.ndarray, tf.Tensor]] = None, + input_ids: TFModelInputType | None = None, + attention_mask: np.ndarray | tf.Tensor | None = None, + token_type_ids: np.ndarray | tf.Tensor | None = None, + position_ids: np.ndarray | tf.Tensor | None = None, + head_mask: np.ndarray | tf.Tensor | None = None, + inputs_embeds: np.ndarray | tf.Tensor | None = None, output_attentions: Optional[bool] = None, output_hidden_states: Optional[bool] = None, return_dict: Optional[bool] = None, - labels: Optional[Union[np.ndarray, tf.Tensor]] = None, + labels: np.ndarray | tf.Tensor | None = None, training: Optional[bool] = False, ) -> Union[TFMaskedLMOutput, Tuple[tf.Tensor]]: r""" @@ -1349,12 +1301,6 @@ def call( attentions=outputs.attentions, ) - def serving_output(self, output: TFMaskedLMOutput) -> TFMaskedLMOutput: - hs = tf.convert_to_tensor(output.hidden_states) if self.config.output_hidden_states else None - attns = tf.convert_to_tensor(output.attentions) if self.config.output_attentions else None - - return TFMaskedLMOutput(logits=output.logits, hidden_states=hs, attentions=attns) - class TFBertLMHeadModel(TFBertPreTrainedModel, TFCausalLanguageModelingLoss): # names with a '.' represents the authorized unexpected/missing layers when a TF model is loaded from a PT model @@ -1401,20 +1347,20 @@ def prepare_inputs_for_generation(self, input_ids, past_key_values=None, attenti ) def call( self, - input_ids: Optional[TFModelInputType] = None, - attention_mask: Optional[Union[np.ndarray, tf.Tensor]] = None, - token_type_ids: Optional[Union[np.ndarray, tf.Tensor]] = None, - position_ids: Optional[Union[np.ndarray, tf.Tensor]] = None, - head_mask: Optional[Union[np.ndarray, tf.Tensor]] = None, - inputs_embeds: Optional[Union[np.ndarray, tf.Tensor]] = None, - encoder_hidden_states: Optional[Union[np.ndarray, tf.Tensor]] = None, - encoder_attention_mask: Optional[Union[np.ndarray, tf.Tensor]] = None, + input_ids: TFModelInputType | None = None, + attention_mask: np.ndarray | tf.Tensor | None = None, + token_type_ids: np.ndarray | tf.Tensor | None = None, + position_ids: np.ndarray | tf.Tensor | None = None, + head_mask: np.ndarray | tf.Tensor | None = None, + inputs_embeds: np.ndarray | tf.Tensor | None = None, + encoder_hidden_states: np.ndarray | tf.Tensor | None = None, + encoder_attention_mask: np.ndarray | tf.Tensor | None = None, past_key_values: Optional[Tuple[Tuple[Union[np.ndarray, tf.Tensor]]]] = None, use_cache: Optional[bool] = None, output_attentions: Optional[bool] = None, output_hidden_states: Optional[bool] = None, return_dict: Optional[bool] = None, - labels: Optional[Union[np.ndarray, tf.Tensor]] = None, + labels: np.ndarray | tf.Tensor | None = None, training: Optional[bool] = False, **kwargs, ) -> Union[TFCausalLMOutputWithCrossAttentions, Tuple[tf.Tensor]]: @@ -1480,19 +1426,6 @@ def call( cross_attentions=outputs.cross_attentions, ) - def serving_output(self, output: TFCausalLMOutputWithCrossAttentions) -> TFCausalLMOutputWithCrossAttentions: - output_cache = self.config.use_cache and self.config.is_decoder - pkv = tf.convert_to_tensor(output.past_key_values) if output_cache else None - hs = tf.convert_to_tensor(output.hidden_states) if self.config.output_hidden_states else None - attns = tf.convert_to_tensor(output.attentions) if self.config.output_attentions else None - cross_attns = tf.convert_to_tensor(output.cross_attentions) if output.cross_attentions is not None else None - if not (self.config.output_attentions and self.config.add_cross_attention): - cross_attns = None - - return TFCausalLMOutputWithCrossAttentions( - logits=output.logits, past_key_values=pkv, hidden_states=hs, attentions=attns, cross_attentions=cross_attns - ) - @add_start_docstrings( """Bert Model with a `next sentence prediction (classification)` head on top.""", @@ -1513,16 +1446,16 @@ def __init__(self, config: BertConfig, *inputs, **kwargs): @replace_return_docstrings(output_type=TFNextSentencePredictorOutput, config_class=_CONFIG_FOR_DOC) def call( self, - input_ids: Optional[TFModelInputType] = None, - attention_mask: Optional[Union[np.ndarray, tf.Tensor]] = None, - token_type_ids: Optional[Union[np.ndarray, tf.Tensor]] = None, - position_ids: Optional[Union[np.ndarray, tf.Tensor]] = None, - head_mask: Optional[Union[np.ndarray, tf.Tensor]] = None, - inputs_embeds: Optional[Union[np.ndarray, tf.Tensor]] = None, + input_ids: TFModelInputType | None = None, + attention_mask: np.ndarray | tf.Tensor | None = None, + token_type_ids: np.ndarray | tf.Tensor | None = None, + position_ids: np.ndarray | tf.Tensor | None = None, + head_mask: np.ndarray | tf.Tensor | None = None, + inputs_embeds: np.ndarray | tf.Tensor | None = None, output_attentions: Optional[bool] = None, output_hidden_states: Optional[bool] = None, return_dict: Optional[bool] = None, - next_sentence_label: Optional[Union[np.ndarray, tf.Tensor]] = None, + next_sentence_label: np.ndarray | tf.Tensor | None = None, training: Optional[bool] = False, ) -> Union[TFNextSentencePredictorOutput, Tuple[tf.Tensor]]: r""" @@ -1575,12 +1508,6 @@ def call( attentions=outputs.attentions, ) - def serving_output(self, output: TFNextSentencePredictorOutput) -> TFNextSentencePredictorOutput: - hs = tf.convert_to_tensor(output.hidden_states) if self.config.output_hidden_states else None - attns = tf.convert_to_tensor(output.attentions) if self.config.output_attentions else None - - return TFNextSentencePredictorOutput(logits=output.logits, hidden_states=hs, attentions=attns) - @add_start_docstrings( """ @@ -1621,16 +1548,16 @@ def __init__(self, config: BertConfig, *inputs, **kwargs): ) def call( self, - input_ids: Optional[TFModelInputType] = None, - attention_mask: Optional[Union[np.ndarray, tf.Tensor]] = None, - token_type_ids: Optional[Union[np.ndarray, tf.Tensor]] = None, - position_ids: Optional[Union[np.ndarray, tf.Tensor]] = None, - head_mask: Optional[Union[np.ndarray, tf.Tensor]] = None, - inputs_embeds: Optional[Union[np.ndarray, tf.Tensor]] = None, + input_ids: TFModelInputType | None = None, + attention_mask: np.ndarray | tf.Tensor | None = None, + token_type_ids: np.ndarray | tf.Tensor | None = None, + position_ids: np.ndarray | tf.Tensor | None = None, + head_mask: np.ndarray | tf.Tensor | None = None, + inputs_embeds: np.ndarray | tf.Tensor | None = None, output_attentions: Optional[bool] = None, output_hidden_states: Optional[bool] = None, return_dict: Optional[bool] = None, - labels: Optional[Union[np.ndarray, tf.Tensor]] = None, + labels: np.ndarray | tf.Tensor | None = None, training: Optional[bool] = False, ) -> Union[TFSequenceClassifierOutput, Tuple[tf.Tensor]]: r""" @@ -1667,12 +1594,6 @@ def call( attentions=outputs.attentions, ) - def serving_output(self, output: TFSequenceClassifierOutput) -> TFSequenceClassifierOutput: - hs = tf.convert_to_tensor(output.hidden_states) if self.config.output_hidden_states else None - attns = tf.convert_to_tensor(output.attentions) if self.config.output_attentions else None - - return TFSequenceClassifierOutput(logits=output.logits, hidden_states=hs, attentions=attns) - @add_start_docstrings( """ @@ -1695,16 +1616,6 @@ def __init__(self, config: BertConfig, *inputs, **kwargs): units=1, kernel_initializer=get_initializer(config.initializer_range), name="classifier" ) - @property - def dummy_inputs(self) -> Dict[str, tf.Tensor]: - """ - Dummy inputs to build the network. - - Returns: - tf.Tensor with dummy inputs - """ - return {"input_ids": tf.constant(MULTIPLE_CHOICE_DUMMY_INPUTS, dtype=tf.int32)} - @unpack_inputs @add_start_docstrings_to_model_forward(BERT_INPUTS_DOCSTRING.format("batch_size, num_choices, sequence_length")) @add_code_sample_docstrings( @@ -1714,16 +1625,16 @@ def dummy_inputs(self) -> Dict[str, tf.Tensor]: ) def call( self, - input_ids: Optional[TFModelInputType] = None, - attention_mask: Optional[Union[np.ndarray, tf.Tensor]] = None, - token_type_ids: Optional[Union[np.ndarray, tf.Tensor]] = None, - position_ids: Optional[Union[np.ndarray, tf.Tensor]] = None, - head_mask: Optional[Union[np.ndarray, tf.Tensor]] = None, - inputs_embeds: Optional[Union[np.ndarray, tf.Tensor]] = None, + input_ids: TFModelInputType | None = None, + attention_mask: np.ndarray | tf.Tensor | None = None, + token_type_ids: np.ndarray | tf.Tensor | None = None, + position_ids: np.ndarray | tf.Tensor | None = None, + head_mask: np.ndarray | tf.Tensor | None = None, + inputs_embeds: np.ndarray | tf.Tensor | None = None, output_attentions: Optional[bool] = None, output_hidden_states: Optional[bool] = None, return_dict: Optional[bool] = None, - labels: Optional[Union[np.ndarray, tf.Tensor]] = None, + labels: np.ndarray | tf.Tensor | None = None, training: Optional[bool] = False, ) -> Union[TFMultipleChoiceModelOutput, Tuple[tf.Tensor]]: r""" @@ -1782,26 +1693,6 @@ def call( attentions=outputs.attentions, ) - @tf.function( - input_signature=[ - { - "input_ids": tf.TensorSpec((None, None, None), tf.int32, name="input_ids"), - "attention_mask": tf.TensorSpec((None, None, None), tf.int32, name="attention_mask"), - "token_type_ids": tf.TensorSpec((None, None, None), tf.int32, name="token_type_ids"), - } - ] - ) - def serving(self, inputs: Dict[str, tf.Tensor]) -> TFMultipleChoiceModelOutput: - output = self.call(input_ids=inputs) - - return self.serving_output(output) - - def serving_output(self, output: TFMultipleChoiceModelOutput) -> TFMultipleChoiceModelOutput: - hs = tf.convert_to_tensor(output.hidden_states) if self.config.output_hidden_states else None - attns = tf.convert_to_tensor(output.attentions) if self.config.output_attentions else None - - return TFMultipleChoiceModelOutput(logits=output.logits, hidden_states=hs, attentions=attns) - @add_start_docstrings( """ @@ -1848,16 +1739,16 @@ def __init__(self, config: BertConfig, *inputs, **kwargs): ) def call( self, - input_ids: Optional[TFModelInputType] = None, - attention_mask: Optional[Union[np.ndarray, tf.Tensor]] = None, - token_type_ids: Optional[Union[np.ndarray, tf.Tensor]] = None, - position_ids: Optional[Union[np.ndarray, tf.Tensor]] = None, - head_mask: Optional[Union[np.ndarray, tf.Tensor]] = None, - inputs_embeds: Optional[Union[np.ndarray, tf.Tensor]] = None, + input_ids: TFModelInputType | None = None, + attention_mask: np.ndarray | tf.Tensor | None = None, + token_type_ids: np.ndarray | tf.Tensor | None = None, + position_ids: np.ndarray | tf.Tensor | None = None, + head_mask: np.ndarray | tf.Tensor | None = None, + inputs_embeds: np.ndarray | tf.Tensor | None = None, output_attentions: Optional[bool] = None, output_hidden_states: Optional[bool] = None, return_dict: Optional[bool] = None, - labels: Optional[Union[np.ndarray, tf.Tensor]] = None, + labels: np.ndarray | tf.Tensor | None = None, training: Optional[bool] = False, ) -> Union[TFTokenClassifierOutput, Tuple[tf.Tensor]]: r""" @@ -1892,12 +1783,6 @@ def call( attentions=outputs.attentions, ) - def serving_output(self, output: TFTokenClassifierOutput) -> TFTokenClassifierOutput: - hs = tf.convert_to_tensor(output.hidden_states) if self.config.output_hidden_states else None - attns = tf.convert_to_tensor(output.attentions) if self.config.output_attentions else None - - return TFTokenClassifierOutput(logits=output.logits, hidden_states=hs, attentions=attns) - @add_start_docstrings( """ @@ -1941,17 +1826,17 @@ def __init__(self, config: BertConfig, *inputs, **kwargs): ) def call( self, - input_ids: Optional[TFModelInputType] = None, - attention_mask: Optional[Union[np.ndarray, tf.Tensor]] = None, - token_type_ids: Optional[Union[np.ndarray, tf.Tensor]] = None, - position_ids: Optional[Union[np.ndarray, tf.Tensor]] = None, - head_mask: Optional[Union[np.ndarray, tf.Tensor]] = None, - inputs_embeds: Optional[Union[np.ndarray, tf.Tensor]] = None, + input_ids: TFModelInputType | None = None, + attention_mask: np.ndarray | tf.Tensor | None = None, + token_type_ids: np.ndarray | tf.Tensor | None = None, + position_ids: np.ndarray | tf.Tensor | None = None, + head_mask: np.ndarray | tf.Tensor | None = None, + inputs_embeds: np.ndarray | tf.Tensor | None = None, output_attentions: Optional[bool] = None, output_hidden_states: Optional[bool] = None, return_dict: Optional[bool] = None, - start_positions: Optional[Union[np.ndarray, tf.Tensor]] = None, - end_positions: Optional[Union[np.ndarray, tf.Tensor]] = None, + start_positions: np.ndarray | tf.Tensor | None = None, + end_positions: np.ndarray | tf.Tensor | None = None, training: Optional[bool] = False, ) -> Union[TFQuestionAnsweringModelOutput, Tuple[tf.Tensor]]: r""" @@ -1999,11 +1884,3 @@ def call( hidden_states=outputs.hidden_states, attentions=outputs.attentions, ) - - def serving_output(self, output: TFQuestionAnsweringModelOutput) -> TFQuestionAnsweringModelOutput: - hs = tf.convert_to_tensor(output.hidden_states) if self.config.output_hidden_states else None - attns = tf.convert_to_tensor(output.attentions) if self.config.output_attentions else None - - return TFQuestionAnsweringModelOutput( - start_logits=output.start_logits, end_logits=output.end_logits, hidden_states=hs, attentions=attns - ) diff --git a/src/transformers/models/blenderbot/modeling_tf_blenderbot.py b/src/transformers/models/blenderbot/modeling_tf_blenderbot.py index ee5755c2035cab..d0e74550370505 100644 --- a/src/transformers/models/blenderbot/modeling_tf_blenderbot.py +++ b/src/transformers/models/blenderbot/modeling_tf_blenderbot.py @@ -15,6 +15,8 @@ """ TF 2.0 Blenderbot model.""" +from __future__ import annotations + import os import random import warnings @@ -32,7 +34,6 @@ # Public API from ...modeling_tf_utils import ( - DUMMY_INPUTS, TFCausalLanguageModelingLoss, TFPreTrainedModel, keras_serializable, @@ -126,7 +127,7 @@ def __init__(self, num_embeddings: int, embedding_dim: int, **kwargs): super().__init__(num_embeddings, embedding_dim, **kwargs) def call( - self, input_shape: tf.TensorShape, past_key_values_length: int = 0, position_ids: Optional[tf.Tensor] = None + self, input_shape: tf.TensorShape, past_key_values_length: int = 0, position_ids: tf.Tensor | None = None ): """Input is expected to be of size [bsz x seqlen].""" if position_ids is None: @@ -175,12 +176,12 @@ def _shape(self, tensor: tf.Tensor, seq_len: int, bsz: int): def call( self, hidden_states: tf.Tensor, - key_value_states: Optional[tf.Tensor] = None, - past_key_value: Optional[Tuple[Tuple[tf.Tensor]]] = None, - attention_mask: Optional[tf.Tensor] = None, - layer_head_mask: Optional[tf.Tensor] = None, + key_value_states: tf.Tensor | None = None, + past_key_value: Tuple[Tuple[tf.Tensor]] | None = None, + attention_mask: tf.Tensor | None = None, + layer_head_mask: tf.Tensor | None = None, training: Optional[bool] = False, - ) -> Tuple[tf.Tensor, Optional[tf.Tensor]]: + ) -> Tuple[tf.Tensor, tf.Tensor | None]: """Input shape: Batch x Time x Channel""" # if key_value_states are provided this layer is used as a cross-attention layer @@ -380,12 +381,12 @@ def __init__(self, config: BlenderbotConfig, **kwargs): def call( self, hidden_states: tf.Tensor, - attention_mask: Optional[tf.Tensor] = None, - encoder_hidden_states: Optional[tf.Tensor] = None, - encoder_attention_mask: Optional[tf.Tensor] = None, - layer_head_mask: Optional[tf.Tensor] = None, - cross_attn_layer_head_mask: Optional[tf.Tensor] = None, - past_key_value: Optional[Tuple[tf.Tensor]] = None, + attention_mask: tf.Tensor | None = None, + encoder_hidden_states: tf.Tensor | None = None, + encoder_attention_mask: tf.Tensor | None = None, + layer_head_mask: tf.Tensor | None = None, + cross_attn_layer_head_mask: tf.Tensor | None = None, + past_key_value: Tuple[tf.Tensor] | None = None, training: Optional[bool] = False, ) -> Tuple[tf.Tensor, tf.Tensor, Tuple[Tuple[tf.Tensor]]]: """ @@ -462,34 +463,6 @@ class TFBlenderbotPreTrainedModel(TFPreTrainedModel): config_class = BlenderbotConfig base_model_prefix = "model" - @property - def dummy_inputs(self): - pad_token = 1 - input_ids = tf.convert_to_tensor(DUMMY_INPUTS, dtype=tf.int32) - decoder_input_ids = tf.convert_to_tensor(DUMMY_INPUTS, dtype=tf.int32) - dummy_inputs = { - "decoder_input_ids": decoder_input_ids, - "attention_mask": tf.cast(input_ids != pad_token, tf.int32), - "input_ids": input_ids, - } - return dummy_inputs - - @tf.function( - input_signature=[ - { - "input_ids": tf.TensorSpec((None, None), tf.int32, name="input_ids"), - "attention_mask": tf.TensorSpec((None, None), tf.int32, name="attention_mask"), - "decoder_input_ids": tf.TensorSpec((None, None), tf.int32, name="decoder_input_ids"), - "decoder_attention_mask": tf.TensorSpec((None, None), tf.int32, name="decoder_attention_mask"), - } - ] - ) - # Copied from transformers.models.bart.modeling_tf_bart.TFBartPretrainedModel.serving - def serving(self, inputs): - output = self.call(inputs) - - return self.serving_output(output) - BLENDERBOT_START_DOCSTRING = r""" This model inherits from [`TFPreTrainedModel`]. Check the superclass documentation for the generic methods the @@ -1183,18 +1156,18 @@ def from_pretrained(cls, pretrained_model_name_or_path: Optional[Union[str, os.P ) def call( self, - input_ids: Optional[tf.Tensor] = None, - attention_mask: Optional[tf.Tensor] = None, - decoder_input_ids: Optional[tf.Tensor] = None, - decoder_attention_mask: Optional[tf.Tensor] = None, - decoder_position_ids: Optional[tf.Tensor] = None, - head_mask: Optional[tf.Tensor] = None, - decoder_head_mask: Optional[tf.Tensor] = None, - cross_attn_head_mask: Optional[tf.Tensor] = None, + input_ids: tf.Tensor | None = None, + attention_mask: tf.Tensor | None = None, + decoder_input_ids: tf.Tensor | None = None, + decoder_attention_mask: tf.Tensor | None = None, + decoder_position_ids: tf.Tensor | None = None, + head_mask: tf.Tensor | None = None, + decoder_head_mask: tf.Tensor | None = None, + cross_attn_head_mask: tf.Tensor | None = None, encoder_outputs: Optional[Union[Tuple, TFBaseModelOutput]] = None, - past_key_values: Optional[List[tf.Tensor]] = None, - inputs_embeds: Optional[tf.Tensor] = None, - decoder_inputs_embeds: Optional[tf.Tensor] = None, + past_key_values: List[tf.Tensor] | None = None, + inputs_embeds: tf.Tensor | None = None, + decoder_inputs_embeds: tf.Tensor | None = None, use_cache: Optional[bool] = None, output_attentions: Optional[bool] = None, output_hidden_states: Optional[bool] = None, @@ -1327,23 +1300,23 @@ def from_pretrained(cls, pretrained_model_name_or_path: Optional[Union[str, os.P @add_end_docstrings(BLENDERBOT_GENERATION_EXAMPLE) def call( self, - input_ids: Optional[tf.Tensor] = None, - attention_mask: Optional[tf.Tensor] = None, - decoder_input_ids: Optional[tf.Tensor] = None, - decoder_attention_mask: Optional[tf.Tensor] = None, - decoder_position_ids: Optional[tf.Tensor] = None, - head_mask: Optional[tf.Tensor] = None, - decoder_head_mask: Optional[tf.Tensor] = None, - cross_attn_head_mask: Optional[tf.Tensor] = None, + input_ids: tf.Tensor | None = None, + attention_mask: tf.Tensor | None = None, + decoder_input_ids: tf.Tensor | None = None, + decoder_attention_mask: tf.Tensor | None = None, + decoder_position_ids: tf.Tensor | None = None, + head_mask: tf.Tensor | None = None, + decoder_head_mask: tf.Tensor | None = None, + cross_attn_head_mask: tf.Tensor | None = None, encoder_outputs: Optional[Union[Tuple, TFBaseModelOutput]] = None, - past_key_values: Optional[List[tf.Tensor]] = None, - inputs_embeds: Optional[tf.Tensor] = None, - decoder_inputs_embeds: Optional[tf.Tensor] = None, + past_key_values: List[tf.Tensor] | None = None, + inputs_embeds: tf.Tensor | None = None, + decoder_inputs_embeds: tf.Tensor | None = None, use_cache: Optional[bool] = None, output_attentions: Optional[bool] = None, output_hidden_states: Optional[bool] = None, return_dict: Optional[bool] = None, - labels: Optional[tf.Tensor] = None, + labels: tf.Tensor | None = None, training: Optional[bool] = False, ) -> Union[Tuple[tf.Tensor], TFSeq2SeqLMOutput]: r""" diff --git a/src/transformers/models/blenderbot_small/modeling_tf_blenderbot_small.py b/src/transformers/models/blenderbot_small/modeling_tf_blenderbot_small.py index e170085e91c57c..2e8d2e11cae798 100644 --- a/src/transformers/models/blenderbot_small/modeling_tf_blenderbot_small.py +++ b/src/transformers/models/blenderbot_small/modeling_tf_blenderbot_small.py @@ -15,6 +15,8 @@ """ TF 2.0 BlenderbotSmall model.""" +from __future__ import annotations + import random from typing import List, Optional, Tuple, Union @@ -31,7 +33,6 @@ # Public API from ...modeling_tf_utils import ( - DUMMY_INPUTS, TFCausalLanguageModelingLoss, TFPreTrainedModel, keras_serializable, @@ -126,7 +127,7 @@ def __init__(self, num_embeddings: int, embedding_dim: int, **kwargs): super().__init__(num_embeddings, embedding_dim, **kwargs) def call( - self, input_shape: tf.TensorShape, past_key_values_length: int = 0, position_ids: Optional[tf.Tensor] = None + self, input_shape: tf.TensorShape, past_key_values_length: int = 0, position_ids: tf.Tensor | None = None ): """Input is expected to be of size [bsz x seqlen].""" if position_ids is None: @@ -175,12 +176,12 @@ def _shape(self, tensor: tf.Tensor, seq_len: int, bsz: int): def call( self, hidden_states: tf.Tensor, - key_value_states: Optional[tf.Tensor] = None, - past_key_value: Optional[Tuple[Tuple[tf.Tensor]]] = None, - attention_mask: Optional[tf.Tensor] = None, - layer_head_mask: Optional[tf.Tensor] = None, + key_value_states: tf.Tensor | None = None, + past_key_value: Tuple[Tuple[tf.Tensor]] | None = None, + attention_mask: tf.Tensor | None = None, + layer_head_mask: tf.Tensor | None = None, training: Optional[bool] = False, - ) -> Tuple[tf.Tensor, Optional[tf.Tensor]]: + ) -> Tuple[tf.Tensor, tf.Tensor | None]: """Input shape: Batch x Time x Channel""" # if key_value_states are provided this layer is used as a cross-attention layer @@ -310,8 +311,8 @@ def __init__(self, config: BlenderbotSmallConfig, **kwargs): def call( self, hidden_states: tf.Tensor, - attention_mask: Optional[Union[np.ndarray, tf.Tensor]], - layer_head_mask: Optional[tf.Tensor], + attention_mask: np.ndarray | tf.Tensor | None, + layer_head_mask: tf.Tensor | None, training: Optional[bool] = False, ) -> tf.Tensor: """ @@ -380,11 +381,11 @@ def __init__(self, config: BlenderbotSmallConfig, **kwargs): def call( self, hidden_states: tf.Tensor, - attention_mask: Optional[Union[np.ndarray, tf.Tensor]] = None, - encoder_hidden_states: Optional[Union[np.ndarray, tf.Tensor]] = None, - encoder_attention_mask: Optional[Union[np.ndarray, tf.Tensor]] = None, - layer_head_mask: Optional[tf.Tensor] = None, - cross_attn_layer_head_mask: Optional[tf.Tensor] = None, + attention_mask: np.ndarray | tf.Tensor | None = None, + encoder_hidden_states: np.ndarray | tf.Tensor | None = None, + encoder_attention_mask: np.ndarray | tf.Tensor | None = None, + layer_head_mask: tf.Tensor | None = None, + cross_attn_layer_head_mask: tf.Tensor | None = None, past_key_value: Optional[Tuple[Tuple[Union[np.ndarray, tf.Tensor]]]] = None, training: Optional[bool] = False, ) -> Tuple[tf.Tensor, tf.Tensor, Tuple[Tuple[tf.Tensor]]]: @@ -462,34 +463,6 @@ class TFBlenderbotSmallPreTrainedModel(TFPreTrainedModel): config_class = BlenderbotSmallConfig base_model_prefix = "model" - @property - def dummy_inputs(self): - pad_token = 1 - input_ids = tf.convert_to_tensor(DUMMY_INPUTS, dtype=tf.int32) - decoder_input_ids = tf.convert_to_tensor(DUMMY_INPUTS, dtype=tf.int32) - dummy_inputs = { - "decoder_input_ids": decoder_input_ids, - "attention_mask": tf.cast(input_ids != pad_token, tf.int32), - "input_ids": input_ids, - } - return dummy_inputs - - @tf.function( - input_signature=[ - { - "input_ids": tf.TensorSpec((None, None), tf.int32, name="input_ids"), - "attention_mask": tf.TensorSpec((None, None), tf.int32, name="attention_mask"), - "decoder_input_ids": tf.TensorSpec((None, None), tf.int32, name="decoder_input_ids"), - "decoder_attention_mask": tf.TensorSpec((None, None), tf.int32, name="decoder_attention_mask"), - } - ] - ) - # Copied from transformers.models.bart.modeling_tf_bart.TFBartPretrainedModel.serving - def serving(self, inputs): - output = self.call(inputs) - - return self.serving_output(output) - BLENDERBOT_SMALL_START_DOCSTRING = r""" This model inherits from [`TFPreTrainedModel`]. Check the superclass documentation for the generic methods the @@ -1175,18 +1148,18 @@ def get_decoder(self): ) def call( self, - input_ids: Optional[tf.Tensor] = None, - attention_mask: Optional[tf.Tensor] = None, - decoder_input_ids: Optional[tf.Tensor] = None, - decoder_attention_mask: Optional[tf.Tensor] = None, - decoder_position_ids: Optional[tf.Tensor] = None, - head_mask: Optional[tf.Tensor] = None, - decoder_head_mask: Optional[tf.Tensor] = None, - cross_attn_head_mask: Optional[tf.Tensor] = None, + input_ids: tf.Tensor | None = None, + attention_mask: tf.Tensor | None = None, + decoder_input_ids: tf.Tensor | None = None, + decoder_attention_mask: tf.Tensor | None = None, + decoder_position_ids: tf.Tensor | None = None, + head_mask: tf.Tensor | None = None, + decoder_head_mask: tf.Tensor | None = None, + cross_attn_head_mask: tf.Tensor | None = None, encoder_outputs: Optional[Union[Tuple, TFBaseModelOutput]] = None, - past_key_values: Optional[List[tf.Tensor]] = None, - inputs_embeds: Optional[tf.Tensor] = None, - decoder_inputs_embeds: Optional[tf.Tensor] = None, + past_key_values: List[tf.Tensor] | None = None, + inputs_embeds: tf.Tensor | None = None, + decoder_inputs_embeds: tf.Tensor | None = None, use_cache: Optional[bool] = None, output_attentions: Optional[bool] = None, output_hidden_states: Optional[bool] = None, @@ -1303,23 +1276,23 @@ def set_bias(self, value): @add_end_docstrings(BLENDERBOT_SMALL_GENERATION_EXAMPLE) def call( self, - input_ids: Optional[tf.Tensor] = None, - attention_mask: Optional[tf.Tensor] = None, - decoder_input_ids: Optional[tf.Tensor] = None, - decoder_attention_mask: Optional[tf.Tensor] = None, - decoder_position_ids: Optional[tf.Tensor] = None, - head_mask: Optional[tf.Tensor] = None, - decoder_head_mask: Optional[tf.Tensor] = None, - cross_attn_head_mask: Optional[tf.Tensor] = None, + input_ids: tf.Tensor | None = None, + attention_mask: tf.Tensor | None = None, + decoder_input_ids: tf.Tensor | None = None, + decoder_attention_mask: tf.Tensor | None = None, + decoder_position_ids: tf.Tensor | None = None, + head_mask: tf.Tensor | None = None, + decoder_head_mask: tf.Tensor | None = None, + cross_attn_head_mask: tf.Tensor | None = None, encoder_outputs: Optional[TFBaseModelOutput] = None, - past_key_values: Optional[List[tf.Tensor]] = None, - inputs_embeds: Optional[tf.Tensor] = None, - decoder_inputs_embeds: Optional[tf.Tensor] = None, + past_key_values: List[tf.Tensor] | None = None, + inputs_embeds: tf.Tensor | None = None, + decoder_inputs_embeds: tf.Tensor | None = None, use_cache: Optional[bool] = None, output_attentions: Optional[bool] = None, output_hidden_states: Optional[bool] = None, return_dict: Optional[bool] = None, - labels: Optional[tf.Tensor] = None, + labels: tf.Tensor | None = None, training: Optional[bool] = False, ) -> Union[Tuple[tf.Tensor], TFSeq2SeqLMOutput]: r""" diff --git a/src/transformers/models/blip/modeling_blip.py b/src/transformers/models/blip/modeling_blip.py index 9e0fc7419d46e0..e41bb55fa19b0e 100644 --- a/src/transformers/models/blip/modeling_blip.py +++ b/src/transformers/models/blip/modeling_blip.py @@ -1200,6 +1200,10 @@ def forward( return_dict=return_dict, ) + if labels is not None and decoder_input_ids is None: + # labels are already shifted right, see: https://github.com/huggingface/transformers/pull/23153 + decoder_input_ids = labels + question_embeds = question_embeds[0] if not return_dict else question_embeds.last_hidden_state answer_output = self.text_decoder( diff --git a/src/transformers/models/blip/modeling_blip_text.py b/src/transformers/models/blip/modeling_blip_text.py index 85585f3d5dd345..1f269cf852ee0d 100644 --- a/src/transformers/models/blip/modeling_blip_text.py +++ b/src/transformers/models/blip/modeling_blip_text.py @@ -613,7 +613,7 @@ def get_extended_attention_mask( Mask with ones indicating tokens to attend to, zeros for tokens to ignore. input_shape (`Tuple[int]`): The shape of the input to the model. - device: (`torch.device`): + device (`torch.device`): The device of the input to the model. Returns: diff --git a/src/transformers/models/blip/modeling_tf_blip.py b/src/transformers/models/blip/modeling_tf_blip.py index 4ea9c7e7e5f5dc..428151ea9a3c0a 100644 --- a/src/transformers/models/blip/modeling_tf_blip.py +++ b/src/transformers/models/blip/modeling_tf_blip.py @@ -14,14 +14,15 @@ # limitations under the License. """ TensorFlow BLIP model.""" +from __future__ import annotations + from dataclasses import dataclass -from typing import Any, Dict, Optional, Tuple, Union +from typing import Any, Optional, Tuple, Union import tensorflow as tf from ...modeling_tf_outputs import TFBaseModelOutput, TFBaseModelOutputWithPooling from ...modeling_tf_utils import ( - DUMMY_INPUTS, TFPreTrainedModel, get_initializer, get_tf_activation, @@ -102,12 +103,12 @@ class TFBlipForConditionalGenerationModelOutput(ModelOutput): heads.` """ - loss: Optional[Tuple[tf.Tensor]] = None - decoder_logits: Optional[Tuple[tf.Tensor]] = None - image_embeds: Optional[tf.Tensor] = None + loss: Tuple[tf.Tensor] | None = None + decoder_logits: Tuple[tf.Tensor] | None = None + image_embeds: tf.Tensor | None = None last_hidden_state: tf.Tensor = None - hidden_states: Optional[Tuple[tf.Tensor]] = None - attentions: Optional[Tuple[tf.Tensor]] = None + hidden_states: Tuple[tf.Tensor] | None = None + attentions: Tuple[tf.Tensor] | None = None @dataclass @@ -136,11 +137,11 @@ class TFBlipTextVisionModelOutput(ModelOutput): heads. """ - loss: Optional[tf.Tensor] = None - image_embeds: Optional[tf.Tensor] = None + loss: tf.Tensor | None = None + image_embeds: tf.Tensor | None = None last_hidden_state: tf.Tensor = None - hidden_states: Optional[Tuple[tf.Tensor]] = None - attentions: Optional[Tuple[tf.Tensor]] = None + hidden_states: Tuple[tf.Tensor] | None = None + attentions: Tuple[tf.Tensor] | None = None @dataclass @@ -176,14 +177,14 @@ class TFBlipImageTextMatchingModelOutput(ModelOutput): The question embeddings obtained by the text projection layer. """ - itm_score: Optional[tf.Tensor] = None - loss: Optional[tf.Tensor] = None - image_embeds: Optional[tf.Tensor] = None + itm_score: tf.Tensor | None = None + loss: tf.Tensor | None = None + image_embeds: tf.Tensor | None = None last_hidden_state: tf.Tensor = None - hidden_states: Optional[Tuple[tf.Tensor]] = None - vision_pooler_output: Optional[tf.Tensor] = None - attentions: Optional[Tuple[tf.Tensor]] = None - question_embeds: Optional[Tuple[tf.Tensor]] = None + hidden_states: Tuple[tf.Tensor] | None = None + vision_pooler_output: tf.Tensor | None = None + attentions: Tuple[tf.Tensor] | None = None + question_embeds: Tuple[tf.Tensor] | None = None @dataclass @@ -208,7 +209,7 @@ class TFBlipOutput(ModelOutput): The output of the [`BlipVisionModel`]. """ - loss: Optional[tf.Tensor] = None + loss: tf.Tensor | None = None logits_per_image: tf.Tensor = None logits_per_text: tf.Tensor = None text_embeds: tf.Tensor = None @@ -359,10 +360,10 @@ def __init__(self, config, **kwargs): def call( self, hidden_states: tf.Tensor, - head_mask: Optional[tf.Tensor] = None, + head_mask: tf.Tensor | None = None, output_attentions: Optional[bool] = False, training: Optional[bool] = None, - ) -> Tuple[tf.Tensor, Optional[tf.Tensor], Optional[Tuple[tf.Tensor]]]: + ) -> Tuple[tf.Tensor, tf.Tensor | None, Tuple[tf.Tensor] | None]: """Input shape: Batch x Time x Channel""" bsz, tgt_len, embed_dim = shape_list(hidden_states) @@ -573,7 +574,7 @@ def __init__(self, config: BlipConfig, **kwargs): def call( self, inputs_embeds, - attention_mask: Optional[tf.Tensor] = None, + attention_mask: tf.Tensor | None = None, output_attentions: Optional[bool] = None, output_hidden_states: Optional[bool] = None, return_dict: Optional[bool] = None, @@ -646,38 +647,6 @@ def __init__(self, config: BlipVisionConfig, *args, **kwargs): self.encoder = TFBlipEncoder(config, name="encoder") self.post_layernorm = tf.keras.layers.LayerNormalization(epsilon=config.layer_norm_eps, name="post_layernorm") - @property - def dummy_inputs(self) -> Dict[str, tf.Tensor]: - """ - Dummy inputs to build the network. - - Returns: - `Dict[str, tf.Tensor]`: The dummy inputs. - """ - VISION_DUMMY_INPUTS = tf.random.uniform( - shape=(len(DUMMY_INPUTS), 3, self.config.image_size, self.config.image_size), dtype=tf.float32 - ) - return {"pixel_values": VISION_DUMMY_INPUTS} - - @tf.function( - input_signature=[ - { - "pixel_values": tf.TensorSpec((None, None, None, None), tf.float32, name="pixel_values"), - } - ] - ) - def serving(self, inputs: Dict[str, tf.Tensor]) -> TFBaseModelOutputWithPooling: - """ - Method used for serving the model. - - Args: - inputs (`Dict[str, tf.Tensor]`): - The input of the saved model as a dictionary of tensors. - """ - output = self.call(inputs) - - return self.serving_output(output) - def serving_output(self, output: TFBaseModelOutputWithPooling) -> TFBaseModelOutputWithPooling: hs = tf.convert_to_tensor(output.hidden_states) if self.config.output_hidden_states else None attns = tf.convert_to_tensor(output.attentions) if self.config.output_attentions else None @@ -694,7 +663,7 @@ def serving_output(self, output: TFBaseModelOutputWithPooling) -> TFBaseModelOut @replace_return_docstrings(output_type=TFBaseModelOutputWithPooling, config_class=BlipVisionConfig) def call( self, - pixel_values: Optional[tf.Tensor] = None, + pixel_values: tf.Tensor | None = None, output_attentions: Optional[bool] = None, output_hidden_states: Optional[bool] = None, return_dict: Optional[bool] = None, @@ -799,10 +768,10 @@ def build(self, input_shape): @unpack_inputs def call( self, - input_ids: Optional[tf.Tensor] = None, - pixel_values: Optional[tf.Tensor] = None, - attention_mask: Optional[tf.Tensor] = None, - position_ids: Optional[tf.Tensor] = None, + input_ids: tf.Tensor | None = None, + pixel_values: tf.Tensor | None = None, + attention_mask: tf.Tensor | None = None, + position_ids: tf.Tensor | None = None, return_loss: Optional[bool] = None, output_attentions: Optional[bool] = None, output_hidden_states: Optional[bool] = None, @@ -879,44 +848,6 @@ def __init__(self, config: BlipConfig, *inputs, **kwargs): self.blip = TFBlipMainLayer(config, name="blip") - @property - def dummy_inputs(self) -> Dict[str, tf.Tensor]: - """ - Dummy inputs to build the network. - - Returns: - `Dict[str, tf.Tensor]`: The dummy inputs. - """ - VISION_DUMMY_INPUTS = tf.random.uniform( - shape=(len(DUMMY_INPUTS), 3, self.config.vision_config.image_size, self.config.vision_config.image_size), - dtype=tf.float32, - ) - return { - "input_ids": tf.constant(DUMMY_INPUTS, dtype=tf.int32), - "pixel_values": VISION_DUMMY_INPUTS, - } - - @tf.function( - input_signature=[ - { - "input_ids": tf.TensorSpec((None, None), tf.int32, name="input_ids"), - "pixel_values": tf.TensorSpec((None, None, None, None), tf.float32, name="pixel_values"), - "attention_mask": tf.TensorSpec((None, None), tf.int32, name="attention_mask"), - } - ] - ) - def serving(self, inputs: Dict[str, tf.Tensor]) -> TFBlipOutput: - """ - Method used for serving the model. - - Args: - inputs (`Dict[str, tf.Tensor]`): - The input of the saved model as a dictionary of tensors. - """ - output = self.call(inputs) - - return self.serving_output(output) - def serving_output(self, output: TFBlipOutput) -> TFBlipOutput: return TFBlipOutput( logits_per_image=output.logits_per_image, @@ -930,10 +861,10 @@ def serving_output(self, output: TFBlipOutput) -> TFBlipOutput: @replace_return_docstrings(output_type=TFBlipOutput, config_class=BlipConfig) def call( self, - input_ids: Optional[tf.Tensor] = None, - pixel_values: Optional[tf.Tensor] = None, - attention_mask: Optional[tf.Tensor] = None, - position_ids: Optional[tf.Tensor] = None, + input_ids: tf.Tensor | None = None, + pixel_values: tf.Tensor | None = None, + attention_mask: tf.Tensor | None = None, + position_ids: tf.Tensor | None = None, return_loss: Optional[bool] = None, output_attentions: Optional[bool] = None, output_hidden_states: Optional[bool] = None, @@ -980,9 +911,9 @@ def call( @add_start_docstrings_to_model_forward(BLIP_TEXT_INPUTS_DOCSTRING) def get_text_features( self, - input_ids: Optional[tf.Tensor] = None, - attention_mask: Optional[tf.Tensor] = None, - position_ids: Optional[tf.Tensor] = None, + input_ids: tf.Tensor | None = None, + attention_mask: tf.Tensor | None = None, + position_ids: tf.Tensor | None = None, return_dict: Optional[bool] = None, ) -> tf.Tensor: r""" @@ -1018,7 +949,7 @@ def get_text_features( @add_start_docstrings_to_model_forward(BLIP_VISION_INPUTS_DOCSTRING) def get_image_features( self, - pixel_values: Optional[tf.Tensor] = None, + pixel_values: tf.Tensor | None = None, return_dict: Optional[bool] = None, ) -> tf.Tensor: r""" @@ -1080,59 +1011,17 @@ def __init__(self, config: BlipConfig, *args, **kwargs): def get_input_embeddings(self) -> tf.keras.layers.Layer: return self.vision_model.embeddings.patch_embedding - @property - def dummy_inputs(self): - input_ids = tf.constant(DUMMY_INPUTS, dtype=tf.int32) - VISION_DUMMY_INPUTS = tf.random.uniform( - shape=(len(DUMMY_INPUTS), 3, self.config.vision_config.image_size, self.config.vision_config.image_size), - dtype=tf.float32, - ) - return {"input_ids": input_ids, "pixel_values": VISION_DUMMY_INPUTS} - - @tf.function( - input_signature=[ - { - "pixel_values": tf.TensorSpec((None, None, None, None), tf.float32, name="pixel_values"), - "input_ids": tf.TensorSpec((None, None), tf.int32, name="input_ids"), - } - ] - ) - def serving(self, inputs: Dict[str, tf.Tensor]) -> TFBaseModelOutputWithPooling: - """ - Method used for serving the model. - - Args: - inputs (`Dict[str, tf.Tensor]`): - The input of the saved model as a dictionary of tensors. - """ - output = self.call(inputs) - - return self.serving_output(output) - - def serving_output( - self, output: TFBlipForConditionalGenerationModelOutput - ) -> TFBlipForConditionalGenerationModelOutput: - hs = tf.convert_to_tensor(output.hidden_states) if self.config.output_hidden_states else None - attns = tf.convert_to_tensor(output.attentions) if self.config.output_attentions else None - - return TFBlipForConditionalGenerationModelOutput( - last_hidden_state=output.last_hidden_state, - image_embeds=output.image_embeds, - hidden_states=hs, - attentions=attns, - ) - @unpack_inputs @add_start_docstrings_to_model_forward(BLIP_VISION_INPUTS_DOCSTRING) @replace_return_docstrings(output_type=TFBlipForConditionalGenerationModelOutput, config_class=BlipConfig) def call( self, pixel_values: tf.Tensor, - input_ids: Optional[tf.Tensor] = None, - attention_mask: Optional[tf.Tensor] = None, + input_ids: tf.Tensor | None = None, + attention_mask: tf.Tensor | None = None, output_attentions: Optional[bool] = None, output_hidden_states: Optional[bool] = None, - labels: Optional[tf.Tensor] = None, + labels: tf.Tensor | None = None, return_dict: Optional[bool] = None, training: Optional[bool] = None, ) -> Union[Tuple, TFBlipForConditionalGenerationModelOutput]: @@ -1197,8 +1086,8 @@ def call( def generate( self, pixel_values: tf.Tensor, - input_ids: Optional[tf.Tensor] = None, - attention_mask: Optional[tf.Tensor] = None, + input_ids: tf.Tensor | None = None, + attention_mask: tf.Tensor | None = None, **generate_kwargs, ) -> tf.Tensor: r""" @@ -1295,46 +1184,30 @@ def __init__(self, config: BlipConfig, *args, **kwargs): def get_input_embeddings(self) -> tf.keras.layers.Layer: return self.vision_model.embeddings.patch_embedding - @property - def dummy_inputs(self): - input_ids = tf.constant(DUMMY_INPUTS, dtype=tf.int32) - VISION_DUMMY_INPUTS = tf.random.uniform( - shape=(len(DUMMY_INPUTS), 3, self.config.vision_config.image_size, self.config.vision_config.image_size), - dtype=tf.float32, - ) - return {"input_ids": input_ids, "pixel_values": VISION_DUMMY_INPUTS, "decoder_input_ids": input_ids} - - @tf.function( - input_signature=[ - { - "pixel_values": tf.TensorSpec((None, None, None, None), tf.float32, name="pixel_values"), - "input_ids": tf.TensorSpec((None, None), tf.int32, name="input_ids"), - } - ] - ) - def serving(self, inputs: Dict[str, tf.Tensor]) -> TFBaseModelOutputWithPooling: - """ - Method used for serving the model. + # Adapted from transformers.models.t5.modeling_tf_t5.TFT5PreTrainedModel._shift_right + def _shift_right(self, input_ids): + decoder_start_token_id = self.decoder_start_token_id + pad_token_id = self.decoder_pad_token_id - Args: - inputs (`Dict[str, tf.Tensor]`): - The input of the saved model as a dictionary of tensors. - """ - output = self.call(inputs) + if decoder_start_token_id is None or pad_token_id is None: + raise ValueError("decoder_start_token_id and pad_token_id must be defined!") - return self.serving_output(output) + start_tokens = tf.fill((shape_list(input_ids)[0], 1), decoder_start_token_id) + start_tokens = tf.cast(start_tokens, input_ids.dtype) # Ensure compatible dtypes for concatenation + shifted_input_ids = tf.concat([start_tokens, input_ids[:, :-1]], -1) - def serving_output(self, output: TFBlipTextVisionModelOutput) -> TFBlipTextVisionModelOutput: - hs = tf.convert_to_tensor(output.hidden_states) if self.config.output_hidden_states else None - attns = tf.convert_to_tensor(output.attentions) if self.config.output_attentions else None - - return TFBlipTextVisionModelOutput( - image_embeds=output.image_embeds, - last_hidden_state=output.last_hidden_state, - hidden_states=hs, - attentions=attns, + # replace possible -100 values in labels by `pad_token_id` + shifted_input_ids = tf.where( + shifted_input_ids == -100, + tf.cast(tf.fill(shape_list(shifted_input_ids), pad_token_id), shifted_input_ids.dtype), + shifted_input_ids, ) + # "Verify that `labels` has only positive values and -100" + tf.debugging.assert_greater_equal(shifted_input_ids, tf.constant(0, dtype=shifted_input_ids.dtype)) + + return shifted_input_ids + @unpack_inputs @add_start_docstrings_to_model_forward(BLIP_VISION_INPUTS_DOCSTRING) @replace_return_docstrings(output_type=TFBlipTextVisionModelOutput, config_class=BlipVisionConfig) @@ -1342,13 +1215,13 @@ def call( self, input_ids: tf.Tensor, pixel_values: tf.Tensor, - decoder_input_ids: Optional[tf.Tensor] = None, - decoder_attention_mask: Optional[tf.Tensor] = None, - attention_mask: Optional[tf.Tensor] = None, + decoder_input_ids: tf.Tensor | None = None, + decoder_attention_mask: tf.Tensor | None = None, + attention_mask: tf.Tensor | None = None, output_attentions: Optional[bool] = None, foutput_attentions: Optional[bool] = None, output_hidden_states: Optional[bool] = None, - labels: Optional[tf.Tensor] = None, + labels: tf.Tensor | None = None, return_dict: Optional[bool] = None, training: Optional[bool] = None, ) -> Union[Tuple, TFBlipTextVisionModelOutput]: @@ -1387,7 +1260,7 @@ def call( ```""" if labels is None and decoder_input_ids is None: raise ValueError( - "Either `decoder_input_ids` or `labels` should be passed when calling `forward` with" + "Either `decoder_input_ids` or `labels` should be passed when calling" " `TFBlipForQuestionAnswering`. if you are training the model make sure that `labels` is passed, if you" " are using the model for inference make sure that `decoder_input_ids` is passed or call `generate`" ) @@ -1416,6 +1289,10 @@ def call( question_embeds = question_embeds[0] if not return_dict else question_embeds.last_hidden_state + if labels is not None and decoder_input_ids is None: + # labels are already shifted right, see: https://github.com/huggingface/transformers/pull/23153 + decoder_input_ids = labels + answer_output = self.text_decoder( input_ids=decoder_input_ids, attention_mask=decoder_attention_mask, @@ -1447,7 +1324,7 @@ def generate( self, input_ids: tf.Tensor, pixel_values: tf.Tensor, - attention_mask: Optional[tf.Tensor] = None, + attention_mask: tf.Tensor | None = None, **generate_kwargs, ) -> tf.Tensor: r""" @@ -1573,56 +1450,15 @@ def __init__(self, config: BlipConfig, *args, **kwargs): def get_input_embeddings(self) -> tf.keras.layers.Layer: return self.vision_model.embeddings.patch_embedding - @property - def dummy_inputs(self): - input_ids = tf.constant(DUMMY_INPUTS, dtype=tf.int32) - VISION_DUMMY_INPUTS = tf.random.uniform( - shape=(len(DUMMY_INPUTS), 3, self.config.vision_config.image_size, self.config.vision_config.image_size), - dtype=tf.float32, - ) - return {"input_ids": input_ids, "pixel_values": VISION_DUMMY_INPUTS} - - @tf.function( - input_signature=[ - { - "pixel_values": tf.TensorSpec((None, None, None, None), tf.float32, name="pixel_values"), - "input_ids": tf.TensorSpec((None, None), tf.int32, name="input_ids"), - } - ] - ) - def serving(self, inputs: Dict[str, tf.Tensor]) -> TFBaseModelOutputWithPooling: - """ - Method used for serving the model. - - Args: - inputs (`Dict[str, tf.Tensor]`): - The input of the saved model as a dictionary of tensors. - """ - output = self.call(inputs) - - return self.serving_output(output) - - def serving_output(self, output: TFBlipImageTextMatchingModelOutput) -> TFBlipImageTextMatchingModelOutput: - hs = tf.convert_to_tensor(output.hidden_states) if self.config.output_hidden_states else None - attns = tf.convert_to_tensor(output.attentions) if self.config.output_attentions else None - - return TFBlipImageTextMatchingModelOutput( - itm_score=output.itm_score, - last_hidden_state=hs, - hidden_states=output.hidden_states, - attentions=attns, - question_embeds=output.question_embeds, - ) - @unpack_inputs @add_start_docstrings_to_model_forward(BLIP_VISION_INPUTS_DOCSTRING) @replace_return_docstrings(output_type=TFBlipImageTextMatchingModelOutput, config_class=BlipVisionConfig) def call( self, input_ids: tf.Tensor, - pixel_values: Optional[tf.Tensor] = None, + pixel_values: tf.Tensor | None = None, use_itm_head: Optional[bool] = True, - attention_mask: Optional[tf.Tensor] = None, + attention_mask: tf.Tensor | None = None, output_attentions: Optional[bool] = None, output_hidden_states: Optional[bool] = None, return_dict: Optional[bool] = None, diff --git a/src/transformers/models/blip/modeling_tf_blip_text.py b/src/transformers/models/blip/modeling_tf_blip_text.py index 6e8ed8a891c04e..19ebdac62e22fa 100644 --- a/src/transformers/models/blip/modeling_tf_blip_text.py +++ b/src/transformers/models/blip/modeling_tf_blip_text.py @@ -13,8 +13,11 @@ # See the License for the specific language governing permissions and # limitations under the License. + +from __future__ import annotations + import math -from typing import Dict, Optional, Tuple +from typing import Optional, Tuple import tensorflow as tf @@ -24,7 +27,6 @@ TFCausalLMOutputWithCrossAttentions, ) from ...modeling_tf_utils import ( - DUMMY_INPUTS, TFPreTrainedModel, get_initializer, get_tf_activation, @@ -277,11 +279,11 @@ def __init__(self, config, is_cross_attention=False, **kwargs): def call( self, hidden_states: tf.Tensor, - attention_mask: Optional[tf.Tensor] = None, - head_mask: Optional[tf.Tensor] = None, - encoder_hidden_states: Optional[tf.Tensor] = None, - encoder_attention_mask: Optional[tf.Tensor] = None, - past_key_value: Optional[Tuple[Tuple[tf.Tensor]]] = None, + attention_mask: tf.Tensor | None = None, + head_mask: tf.Tensor | None = None, + encoder_hidden_states: tf.Tensor | None = None, + encoder_attention_mask: tf.Tensor | None = None, + past_key_value: Tuple[Tuple[tf.Tensor]] | None = None, output_attentions: Optional[bool] = False, training: Optional[bool] = None, ): @@ -590,31 +592,6 @@ def __init__(self, config, add_pooling_layer=True, name=None, **kwargs): self.encoder = TFBlipTextEncoder(config, name="encoder") self.pooler = TFBlipTextPooler(config, name="pooler") if add_pooling_layer else None - @tf.function( - input_signature=[ - { - "input_ids": tf.TensorSpec((None, None), tf.int32, name="input_ids"), - "attention_mask": tf.TensorSpec((None, None), tf.int32, name="attention_mask"), - } - ] - ) - def serving(self, inputs: Dict[str, tf.Tensor]) -> TFBaseModelOutputWithPoolingAndCrossAttentions: - output = self.call(inputs) - return self.serving_output(output) - - def serving_output( - self, output: TFBaseModelOutputWithPoolingAndCrossAttentions - ) -> TFBaseModelOutputWithPoolingAndCrossAttentions: - hs = tf.convert_to_tensor(output.hidden_states) if self.config.output_hidden_states else None - attns = tf.convert_to_tensor(output.attentions) if self.config.output_attentions else None - - return TFBaseModelOutputWithPoolingAndCrossAttentions( - last_hidden_state=output.last_hidden_state, - pooler_output=output.pooler_output, - hidden_states=hs, - attentions=attns, - ) - def get_input_embeddings(self): return self.embeddings.word_embeddings @@ -633,7 +610,7 @@ def get_extended_attention_mask( Mask with ones indicating tokens to attend to, zeros for tokens to ignore. input_shape (`Tuple[int]`): The shape of the input to the model. - is_decoder: (`bool`): + is_decoder (`bool`): Whether the model is used as a decoder. Returns: @@ -841,46 +818,6 @@ def get_output_embeddings(self): def set_output_embeddings(self, new_embeddings): self.cls.predictions.decoder = new_embeddings - @property - def dummy_inputs(self) -> Dict[str, tf.Tensor]: - """ - Dummy inputs to build the network. - - Returns: - `Dict[str, tf.Tensor]`: The dummy inputs. - """ - return {"input_ids": tf.constant(DUMMY_INPUTS, dtype=tf.int32)} - - @tf.function( - input_signature=[ - { - "input_ids": tf.TensorSpec((None, None), tf.int32, name="input_ids"), - } - ] - ) - def serving(self, inputs: Dict[str, tf.Tensor]) -> TFCausalLMOutputWithCrossAttentions: - """ - Method used for serving the model. - - Args: - inputs (`Dict[str, tf.Tensor]`): - The input of the saved model as a dictionary of tensors. - """ - output = self.call(inputs) - - return self.serving_output(output) - - def serving_output(self, output: TFCausalLMOutputWithCrossAttentions) -> TFCausalLMOutputWithCrossAttentions: - hs = tf.convert_to_tensor(output.hidden_states) if self.config.output_hidden_states else None - attns = tf.convert_to_tensor(output.attentions) if self.config.output_attentions else None - - return TFCausalLMOutputWithCrossAttentions( - logits=output.logits, - cross_attentions=output.cross_attentions, - hidden_states=hs, - attentions=attns, - ) - @add_start_docstrings_to_model_forward(BLIP_TEXT_INPUTS_DOCSTRING) @unpack_inputs def call( diff --git a/src/transformers/models/blip_2/modeling_blip_2.py b/src/transformers/models/blip_2/modeling_blip_2.py index be499dc1d516ad..381a3fe0ed7ada 100644 --- a/src/transformers/models/blip_2/modeling_blip_2.py +++ b/src/transformers/models/blip_2/modeling_blip_2.py @@ -1059,7 +1059,7 @@ def get_extended_attention_mask( Mask with ones indicating tokens to attend to, zeros for tokens to ignore. input_shape (`Tuple[int]`): The shape of the input to the model. - device: (`torch.device`): + device (`torch.device`): The device of the input to the model. Returns: diff --git a/src/transformers/models/bloom/modeling_bloom.py b/src/transformers/models/bloom/modeling_bloom.py index e954cfb4473068..5c0d570cbe9c21 100644 --- a/src/transformers/models/bloom/modeling_bloom.py +++ b/src/transformers/models/bloom/modeling_bloom.py @@ -256,7 +256,7 @@ def _merge_heads(self, x: torch.Tensor) -> torch.Tensor: Merge heads together over the last dimenstion Args: - x: (`torch.tensor`, *required*): [batch_size * num_heads, seq_length, head_dim] + x (`torch.tensor`, *required*): [batch_size * num_heads, seq_length, head_dim] Returns: torch.tensor: [batch_size, seq_length, num_heads * head_dim] diff --git a/src/transformers/models/camembert/modeling_tf_camembert.py b/src/transformers/models/camembert/modeling_tf_camembert.py index c9e4c98c1467d5..8def74a5b3045e 100644 --- a/src/transformers/models/camembert/modeling_tf_camembert.py +++ b/src/transformers/models/camembert/modeling_tf_camembert.py @@ -15,6 +15,9 @@ # limitations under the License. """ TF 2.0 CamemBERT model.""" + +from __future__ import annotations + import math import warnings from typing import Optional, Tuple, Union @@ -48,8 +51,6 @@ ) from ...tf_utils import check_embeddings_within_bounds, shape_list, stable_softmax from ...utils import ( - DUMMY_INPUTS, - MULTIPLE_CHOICE_DUMMY_INPUTS, add_code_sample_docstrings, add_start_docstrings, add_start_docstrings_to_model_forward, @@ -526,9 +527,9 @@ def call( hidden_states: tf.Tensor, attention_mask: tf.Tensor, head_mask: tf.Tensor, - encoder_hidden_states: Optional[tf.Tensor], - encoder_attention_mask: Optional[tf.Tensor], - past_key_value: Optional[Tuple[tf.Tensor]], + encoder_hidden_states: tf.Tensor | None, + encoder_attention_mask: tf.Tensor | None, + past_key_value: Tuple[tf.Tensor] | None, output_attentions: bool, training: bool = False, ) -> Tuple[tf.Tensor]: @@ -605,9 +606,9 @@ def call( hidden_states: tf.Tensor, attention_mask: tf.Tensor, head_mask: tf.Tensor, - encoder_hidden_states: Optional[tf.Tensor], - encoder_attention_mask: Optional[tf.Tensor], - past_key_values: Optional[Tuple[Tuple[tf.Tensor]]], + encoder_hidden_states: tf.Tensor | None, + encoder_attention_mask: tf.Tensor | None, + past_key_values: Tuple[Tuple[tf.Tensor]] | None, use_cache: Optional[bool], output_attentions: bool, output_hidden_states: bool, @@ -705,14 +706,14 @@ class PreTrainedModel # Copied from transformers.models.bert.modeling_tf_bert.TFBertMainLayer.call def call( self, - input_ids: Optional[TFModelInputType] = None, - attention_mask: Optional[Union[np.ndarray, tf.Tensor]] = None, - token_type_ids: Optional[Union[np.ndarray, tf.Tensor]] = None, - position_ids: Optional[Union[np.ndarray, tf.Tensor]] = None, - head_mask: Optional[Union[np.ndarray, tf.Tensor]] = None, - inputs_embeds: Optional[Union[np.ndarray, tf.Tensor]] = None, - encoder_hidden_states: Optional[Union[np.ndarray, tf.Tensor]] = None, - encoder_attention_mask: Optional[Union[np.ndarray, tf.Tensor]] = None, + input_ids: TFModelInputType | None = None, + attention_mask: np.ndarray | tf.Tensor | None = None, + token_type_ids: np.ndarray | tf.Tensor | None = None, + position_ids: np.ndarray | tf.Tensor | None = None, + head_mask: np.ndarray | tf.Tensor | None = None, + inputs_embeds: np.ndarray | tf.Tensor | None = None, + encoder_hidden_states: np.ndarray | tf.Tensor | None = None, + encoder_attention_mask: np.ndarray | tf.Tensor | None = None, past_key_values: Optional[Tuple[Tuple[Union[np.ndarray, tf.Tensor]]]] = None, use_cache: Optional[bool] = None, output_attentions: Optional[bool] = None, @@ -870,38 +871,6 @@ class TFCamembertPreTrainedModel(TFPreTrainedModel): config_class = CamembertConfig base_model_prefix = "roberta" - @property - # Copied from transformers.models.bert.modeling_tf_bert.TFBertPreTrainedModel.dummy_inputs - def dummy_inputs(self): - """ - Dummy inputs to build the network. - - Returns: - `Dict[str, tf.Tensor]`: The dummy inputs. - """ - dummy = {"input_ids": tf.constant(DUMMY_INPUTS, dtype=tf.int32)} - # Add `encoder_hidden_states` to make the cross-attention layers' weights initialized - if self.config.add_cross_attention: - batch_size, seq_len = tf.constant(DUMMY_INPUTS).shape - shape = (batch_size, seq_len) + (self.config.hidden_size,) - h = tf.random.uniform(shape=shape) - dummy["encoder_hidden_states"] = h - - return dummy - - @tf.function( - input_signature=[ - { - "input_ids": tf.TensorSpec((None, None), tf.int32, name="input_ids"), - "attention_mask": tf.TensorSpec((None, None), tf.int32, name="attention_mask"), - } - ] - ) - def serving(self, inputs): - output = self.call(inputs) - - return self.serving_output(output) - @add_start_docstrings( "The bare CamemBERT Model transformer outputting raw hidden-states without any specific head on top.", @@ -922,14 +891,14 @@ def __init__(self, config, *inputs, **kwargs): ) def call( self, - input_ids: Optional[TFModelInputType] = None, - attention_mask: Optional[Union[np.ndarray, tf.Tensor]] = None, - token_type_ids: Optional[Union[np.ndarray, tf.Tensor]] = None, - position_ids: Optional[Union[np.ndarray, tf.Tensor]] = None, - head_mask: Optional[Union[np.ndarray, tf.Tensor]] = None, - inputs_embeds: Optional[Union[np.ndarray, tf.Tensor]] = None, - encoder_hidden_states: Optional[Union[np.ndarray, tf.Tensor]] = None, - encoder_attention_mask: Optional[Union[np.ndarray, tf.Tensor]] = None, + input_ids: TFModelInputType | None = None, + attention_mask: np.ndarray | tf.Tensor | None = None, + token_type_ids: np.ndarray | tf.Tensor | None = None, + position_ids: np.ndarray | tf.Tensor | None = None, + head_mask: np.ndarray | tf.Tensor | None = None, + inputs_embeds: np.ndarray | tf.Tensor | None = None, + encoder_hidden_states: np.ndarray | tf.Tensor | None = None, + encoder_attention_mask: np.ndarray | tf.Tensor | None = None, past_key_values: Optional[Tuple[Tuple[Union[np.ndarray, tf.Tensor]]]] = None, use_cache: Optional[bool] = None, output_attentions: Optional[bool] = None, @@ -976,27 +945,6 @@ def call( return outputs - # Copied from transformers.models.bert.modeling_tf_bert.TFBertModel.serving_output - def serving_output( - self, output: TFBaseModelOutputWithPoolingAndCrossAttentions - ) -> TFBaseModelOutputWithPoolingAndCrossAttentions: - output_cache = self.config.use_cache and self.config.is_decoder - pkv = tf.convert_to_tensor(output.past_key_values) if output_cache else None - hs = tf.convert_to_tensor(output.hidden_states) if self.config.output_hidden_states else None - attns = tf.convert_to_tensor(output.attentions) if self.config.output_attentions else None - cross_attns = tf.convert_to_tensor(output.cross_attentions) if output.cross_attentions is not None else None - if not (self.config.output_attentions and self.config.add_cross_attention): - cross_attns = None - - return TFBaseModelOutputWithPoolingAndCrossAttentions( - last_hidden_state=output.last_hidden_state, - pooler_output=output.pooler_output, - past_key_values=pkv, - hidden_states=hs, - attentions=attns, - cross_attentions=cross_attns, - ) - # Copied from transformers.models.roberta.modeling_tf_roberta.TFRobertaLMHead with Roberta->Camembert class TFCamembertLMHead(tf.keras.layers.Layer): @@ -1085,16 +1033,16 @@ def get_prefix_bias_name(self): ) def call( self, - input_ids: Optional[TFModelInputType] = None, - attention_mask: Optional[Union[np.ndarray, tf.Tensor]] = None, - token_type_ids: Optional[Union[np.ndarray, tf.Tensor]] = None, - position_ids: Optional[Union[np.ndarray, tf.Tensor]] = None, - head_mask: Optional[Union[np.ndarray, tf.Tensor]] = None, - inputs_embeds: Optional[Union[np.ndarray, tf.Tensor]] = None, + input_ids: TFModelInputType | None = None, + attention_mask: np.ndarray | tf.Tensor | None = None, + token_type_ids: np.ndarray | tf.Tensor | None = None, + position_ids: np.ndarray | tf.Tensor | None = None, + head_mask: np.ndarray | tf.Tensor | None = None, + inputs_embeds: np.ndarray | tf.Tensor | None = None, output_attentions: Optional[bool] = None, output_hidden_states: Optional[bool] = None, return_dict: Optional[bool] = None, - labels: Optional[Union[np.ndarray, tf.Tensor]] = None, + labels: np.ndarray | tf.Tensor | None = None, training: Optional[bool] = False, ) -> Union[TFMaskedLMOutput, Tuple[tf.Tensor]]: r""" @@ -1132,13 +1080,6 @@ def call( attentions=outputs.attentions, ) - # Copied from transformers.models.bert.modeling_tf_bert.TFBertForMaskedLM.serving_output - def serving_output(self, output: TFMaskedLMOutput) -> TFMaskedLMOutput: - hs = tf.convert_to_tensor(output.hidden_states) if self.config.output_hidden_states else None - attns = tf.convert_to_tensor(output.attentions) if self.config.output_attentions else None - - return TFMaskedLMOutput(logits=output.logits, hidden_states=hs, attentions=attns) - # Copied from transformers.models.roberta.modeling_tf_roberta.TFRobertaClassificationHead class TFCamembertClassificationHead(tf.keras.layers.Layer): @@ -1199,16 +1140,16 @@ def __init__(self, config, *inputs, **kwargs): ) def call( self, - input_ids: Optional[TFModelInputType] = None, - attention_mask: Optional[Union[np.ndarray, tf.Tensor]] = None, - token_type_ids: Optional[Union[np.ndarray, tf.Tensor]] = None, - position_ids: Optional[Union[np.ndarray, tf.Tensor]] = None, - head_mask: Optional[Union[np.ndarray, tf.Tensor]] = None, - inputs_embeds: Optional[Union[np.ndarray, tf.Tensor]] = None, + input_ids: TFModelInputType | None = None, + attention_mask: np.ndarray | tf.Tensor | None = None, + token_type_ids: np.ndarray | tf.Tensor | None = None, + position_ids: np.ndarray | tf.Tensor | None = None, + head_mask: np.ndarray | tf.Tensor | None = None, + inputs_embeds: np.ndarray | tf.Tensor | None = None, output_attentions: Optional[bool] = None, output_hidden_states: Optional[bool] = None, return_dict: Optional[bool] = None, - labels: Optional[Union[np.ndarray, tf.Tensor]] = None, + labels: np.ndarray | tf.Tensor | None = None, training: Optional[bool] = False, ) -> Union[TFSequenceClassifierOutput, Tuple[tf.Tensor]]: r""" @@ -1245,13 +1186,6 @@ def call( attentions=outputs.attentions, ) - # Copied from transformers.models.bert.modeling_tf_bert.TFBertForSequenceClassification.serving_output - def serving_output(self, output: TFSequenceClassifierOutput) -> TFSequenceClassifierOutput: - hs = tf.convert_to_tensor(output.hidden_states) if self.config.output_hidden_states else None - attns = tf.convert_to_tensor(output.attentions) if self.config.output_attentions else None - - return TFSequenceClassifierOutput(logits=output.logits, hidden_states=hs, attentions=attns) - @add_start_docstrings( """ @@ -1290,16 +1224,16 @@ def __init__(self, config, *inputs, **kwargs): ) def call( self, - input_ids: Optional[TFModelInputType] = None, - attention_mask: Optional[Union[np.ndarray, tf.Tensor]] = None, - token_type_ids: Optional[Union[np.ndarray, tf.Tensor]] = None, - position_ids: Optional[Union[np.ndarray, tf.Tensor]] = None, - head_mask: Optional[Union[np.ndarray, tf.Tensor]] = None, - inputs_embeds: Optional[Union[np.ndarray, tf.Tensor]] = None, + input_ids: TFModelInputType | None = None, + attention_mask: np.ndarray | tf.Tensor | None = None, + token_type_ids: np.ndarray | tf.Tensor | None = None, + position_ids: np.ndarray | tf.Tensor | None = None, + head_mask: np.ndarray | tf.Tensor | None = None, + inputs_embeds: np.ndarray | tf.Tensor | None = None, output_attentions: Optional[bool] = None, output_hidden_states: Optional[bool] = None, return_dict: Optional[bool] = None, - labels: Optional[Union[np.ndarray, tf.Tensor]] = None, + labels: np.ndarray | tf.Tensor | None = None, training: Optional[bool] = False, ) -> Union[TFTokenClassifierOutput, Tuple[tf.Tensor]]: r""" @@ -1336,13 +1270,6 @@ def call( attentions=outputs.attentions, ) - # Copied from transformers.models.bert.modeling_tf_bert.TFBertForTokenClassification.serving_output - def serving_output(self, output: TFTokenClassifierOutput) -> TFTokenClassifierOutput: - hs = tf.convert_to_tensor(output.hidden_states) if self.config.output_hidden_states else None - attns = tf.convert_to_tensor(output.attentions) if self.config.output_attentions else None - - return TFTokenClassifierOutput(logits=output.logits, hidden_states=hs, attentions=attns) - @add_start_docstrings( """ @@ -1366,16 +1293,6 @@ def __init__(self, config, *inputs, **kwargs): 1, kernel_initializer=get_initializer(config.initializer_range), name="classifier" ) - @property - def dummy_inputs(self): - """ - Dummy inputs to build the network. - - Returns: - tf.Tensor with dummy inputs - """ - return {"input_ids": tf.constant(MULTIPLE_CHOICE_DUMMY_INPUTS, dtype=tf.int32)} - @unpack_inputs @add_start_docstrings_to_model_forward( CAMEMBERT_INPUTS_DOCSTRING.format("batch_size, num_choices, sequence_length") @@ -1387,16 +1304,16 @@ def dummy_inputs(self): ) def call( self, - input_ids: Optional[TFModelInputType] = None, - attention_mask: Optional[Union[np.ndarray, tf.Tensor]] = None, - token_type_ids: Optional[Union[np.ndarray, tf.Tensor]] = None, - position_ids: Optional[Union[np.ndarray, tf.Tensor]] = None, - head_mask: Optional[Union[np.ndarray, tf.Tensor]] = None, - inputs_embeds: Optional[Union[np.ndarray, tf.Tensor]] = None, + input_ids: TFModelInputType | None = None, + attention_mask: np.ndarray | tf.Tensor | None = None, + token_type_ids: np.ndarray | tf.Tensor | None = None, + position_ids: np.ndarray | tf.Tensor | None = None, + head_mask: np.ndarray | tf.Tensor | None = None, + inputs_embeds: np.ndarray | tf.Tensor | None = None, output_attentions: Optional[bool] = None, output_hidden_states: Optional[bool] = None, return_dict: Optional[bool] = None, - labels: Optional[Union[np.ndarray, tf.Tensor]] = None, + labels: np.ndarray | tf.Tensor | None = None, training: Optional[bool] = False, ) -> Union[TFMultipleChoiceModelOutput, Tuple[tf.Tensor]]: r""" @@ -1446,26 +1363,6 @@ def call( attentions=outputs.attentions, ) - @tf.function( - input_signature=[ - { - "input_ids": tf.TensorSpec((None, None, None), tf.int32, name="input_ids"), - "attention_mask": tf.TensorSpec((None, None, None), tf.int32, name="attention_mask"), - } - ] - ) - def serving(self, inputs): - output = self.call(inputs) - - return self.serving_output(output) - - # Copied from transformers.models.bert.modeling_tf_bert.TFBertForMultipleChoice.serving_output - def serving_output(self, output: TFMultipleChoiceModelOutput) -> TFMultipleChoiceModelOutput: - hs = tf.convert_to_tensor(output.hidden_states) if self.config.output_hidden_states else None - attns = tf.convert_to_tensor(output.attentions) if self.config.output_attentions else None - - return TFMultipleChoiceModelOutput(logits=output.logits, hidden_states=hs, attentions=attns) - @add_start_docstrings( """ @@ -1499,17 +1396,17 @@ def __init__(self, config, *inputs, **kwargs): ) def call( self, - input_ids: Optional[TFModelInputType] = None, - attention_mask: Optional[Union[np.ndarray, tf.Tensor]] = None, - token_type_ids: Optional[Union[np.ndarray, tf.Tensor]] = None, - position_ids: Optional[Union[np.ndarray, tf.Tensor]] = None, - head_mask: Optional[Union[np.ndarray, tf.Tensor]] = None, - inputs_embeds: Optional[Union[np.ndarray, tf.Tensor]] = None, + input_ids: TFModelInputType | None = None, + attention_mask: np.ndarray | tf.Tensor | None = None, + token_type_ids: np.ndarray | tf.Tensor | None = None, + position_ids: np.ndarray | tf.Tensor | None = None, + head_mask: np.ndarray | tf.Tensor | None = None, + inputs_embeds: np.ndarray | tf.Tensor | None = None, output_attentions: Optional[bool] = None, output_hidden_states: Optional[bool] = None, return_dict: Optional[bool] = None, - start_positions: Optional[Union[np.ndarray, tf.Tensor]] = None, - end_positions: Optional[Union[np.ndarray, tf.Tensor]] = None, + start_positions: np.ndarray | tf.Tensor | None = None, + end_positions: np.ndarray | tf.Tensor | None = None, training: Optional[bool] = False, ) -> Union[TFQuestionAnsweringModelOutput, Tuple[tf.Tensor]]: r""" @@ -1559,15 +1456,6 @@ def call( attentions=outputs.attentions, ) - # Copied from transformers.models.bert.modeling_tf_bert.TFBertForQuestionAnswering.serving_output - def serving_output(self, output: TFQuestionAnsweringModelOutput) -> TFQuestionAnsweringModelOutput: - hs = tf.convert_to_tensor(output.hidden_states) if self.config.output_hidden_states else None - attns = tf.convert_to_tensor(output.attentions) if self.config.output_attentions else None - - return TFQuestionAnsweringModelOutput( - start_logits=output.start_logits, end_logits=output.end_logits, hidden_states=hs, attentions=attns - ) - @add_start_docstrings( """CamemBERT Model with a `language modeling` head on top for CLM fine-tuning.""", CAMEMBERT_START_DOCSTRING @@ -1615,20 +1503,20 @@ def prepare_inputs_for_generation(self, input_ids, past_key_values=None, attenti ) def call( self, - input_ids: Optional[TFModelInputType] = None, - attention_mask: Optional[Union[np.ndarray, tf.Tensor]] = None, - token_type_ids: Optional[Union[np.ndarray, tf.Tensor]] = None, - position_ids: Optional[Union[np.ndarray, tf.Tensor]] = None, - head_mask: Optional[Union[np.ndarray, tf.Tensor]] = None, - inputs_embeds: Optional[Union[np.ndarray, tf.Tensor]] = None, - encoder_hidden_states: Optional[Union[np.ndarray, tf.Tensor]] = None, - encoder_attention_mask: Optional[Union[np.ndarray, tf.Tensor]] = None, + input_ids: TFModelInputType | None = None, + attention_mask: np.ndarray | tf.Tensor | None = None, + token_type_ids: np.ndarray | tf.Tensor | None = None, + position_ids: np.ndarray | tf.Tensor | None = None, + head_mask: np.ndarray | tf.Tensor | None = None, + inputs_embeds: np.ndarray | tf.Tensor | None = None, + encoder_hidden_states: np.ndarray | tf.Tensor | None = None, + encoder_attention_mask: np.ndarray | tf.Tensor | None = None, past_key_values: Optional[Tuple[Tuple[Union[np.ndarray, tf.Tensor]]]] = None, use_cache: Optional[bool] = None, output_attentions: Optional[bool] = None, output_hidden_states: Optional[bool] = None, return_dict: Optional[bool] = None, - labels: Optional[Union[np.ndarray, tf.Tensor]] = None, + labels: np.ndarray | tf.Tensor | None = None, training: Optional[bool] = False, ) -> Union[TFCausalLMOutputWithCrossAttentions, Tuple[tf.Tensor]]: r""" @@ -1693,17 +1581,3 @@ def call( attentions=outputs.attentions, cross_attentions=outputs.cross_attentions, ) - - # Copied from transformers.models.bert.modeling_tf_bert.TFBertLMHeadModel.serving_output - def serving_output(self, output: TFCausalLMOutputWithCrossAttentions) -> TFCausalLMOutputWithCrossAttentions: - output_cache = self.config.use_cache and self.config.is_decoder - pkv = tf.convert_to_tensor(output.past_key_values) if output_cache else None - hs = tf.convert_to_tensor(output.hidden_states) if self.config.output_hidden_states else None - attns = tf.convert_to_tensor(output.attentions) if self.config.output_attentions else None - cross_attns = tf.convert_to_tensor(output.cross_attentions) if output.cross_attentions is not None else None - if not (self.config.output_attentions and self.config.add_cross_attention): - cross_attns = None - - return TFCausalLMOutputWithCrossAttentions( - logits=output.logits, past_key_values=pkv, hidden_states=hs, attentions=attns, cross_attentions=cross_attns - ) diff --git a/src/transformers/models/clap/feature_extraction_clap.py b/src/transformers/models/clap/feature_extraction_clap.py index 6edd739fa16d60..d33307ffbd22fc 100644 --- a/src/transformers/models/clap/feature_extraction_clap.py +++ b/src/transformers/models/clap/feature_extraction_clap.py @@ -272,7 +272,8 @@ def __call__( Args: raw_speech (`np.ndarray`, `List[float]`, `List[np.ndarray]`, `List[List[float]]`): The sequence or batch of sequences to be padded. Each sequence can be a numpy array, a list of float - values, a list of numpy arrays or a list of list of float values. + values, a list of numpy arrays or a list of list of float values. Must be mono channel audio, not + stereo, i.e. single float per timestep. truncation (`str`, *optional*): Truncation pattern for long audio inputs. Two patterns are available: - `fusion` will use `_random_mel_fusion`, which stacks 3 random crops from the mel spectrogram and @@ -312,9 +313,11 @@ def __call__( "Failing to do so can result in silent errors that might be hard to debug." ) - is_batched = bool( - isinstance(raw_speech, (list, tuple)) - and (isinstance(raw_speech[0], np.ndarray) or isinstance(raw_speech[0], (tuple, list))) + is_batched_numpy = isinstance(raw_speech, np.ndarray) and len(raw_speech.shape) > 1 + if is_batched_numpy and len(raw_speech.shape) > 2: + raise ValueError(f"Only mono-channel audio is supported for input to {self}") + is_batched = is_batched_numpy or ( + isinstance(raw_speech, (list, tuple)) and (isinstance(raw_speech[0], (np.ndarray, tuple, list))) ) if is_batched: diff --git a/src/transformers/models/clip/modeling_tf_clip.py b/src/transformers/models/clip/modeling_tf_clip.py index 7cf52500aed63a..778f1ed2c92e4a 100644 --- a/src/transformers/models/clip/modeling_tf_clip.py +++ b/src/transformers/models/clip/modeling_tf_clip.py @@ -15,9 +15,11 @@ """ TF 2.0 CLIP model.""" +from __future__ import annotations + import math from dataclasses import dataclass -from typing import Any, Dict, Optional, Tuple, Union +from typing import Any, Optional, Tuple, Union import numpy as np import tensorflow as tf @@ -27,7 +29,6 @@ # Public API from ...modeling_tf_utils import ( - DUMMY_INPUTS, TFModelInputType, TFPreTrainedModel, get_initializer, @@ -111,7 +112,7 @@ class TFCLIPOutput(ModelOutput): The output of the [`TFCLIPVisionModel`]. """ - loss: Optional[tf.Tensor] = None + loss: tf.Tensor | None = None logits_per_image: tf.Tensor = None logits_per_text: tf.Tensor = None text_embeds: tf.Tensor = None @@ -586,9 +587,9 @@ def set_input_embeddings(self, value: tf.Variable): @unpack_inputs def call( self, - input_ids: Optional[TFModelInputType] = None, - attention_mask: Optional[Union[np.ndarray, tf.Tensor]] = None, - position_ids: Optional[Union[np.ndarray, tf.Tensor]] = None, + input_ids: TFModelInputType | None = None, + attention_mask: np.ndarray | tf.Tensor | None = None, + position_ids: np.ndarray | tf.Tensor | None = None, output_attentions: Optional[bool] = None, output_hidden_states: Optional[bool] = None, return_dict: Optional[bool] = None, @@ -675,7 +676,7 @@ def get_input_embeddings(self) -> tf.keras.layers.Layer: @unpack_inputs def call( self, - pixel_values: Optional[TFModelInputType] = None, + pixel_values: TFModelInputType | None = None, output_attentions: Optional[bool] = None, output_hidden_states: Optional[bool] = None, return_dict: Optional[bool] = None, @@ -751,9 +752,9 @@ def build(self, input_shape: tf.TensorShape): @unpack_inputs def get_text_features( self, - input_ids: Optional[TFModelInputType] = None, - attention_mask: Optional[Union[np.ndarray, tf.Tensor]] = None, - position_ids: Optional[Union[np.ndarray, tf.Tensor]] = None, + input_ids: TFModelInputType | None = None, + attention_mask: np.ndarray | tf.Tensor | None = None, + position_ids: np.ndarray | tf.Tensor | None = None, output_attentions: Optional[bool] = None, output_hidden_states: Optional[bool] = None, return_dict: Optional[bool] = None, @@ -785,7 +786,7 @@ def get_text_features( @unpack_inputs def get_image_features( self, - pixel_values: Optional[TFModelInputType] = None, + pixel_values: TFModelInputType | None = None, output_attentions: Optional[bool] = None, output_hidden_states: Optional[bool] = None, return_dict: Optional[bool] = None, @@ -810,10 +811,10 @@ def get_image_features( @unpack_inputs def call( self, - input_ids: Optional[TFModelInputType] = None, - pixel_values: Optional[TFModelInputType] = None, - attention_mask: Optional[Union[np.ndarray, tf.Tensor]] = None, - position_ids: Optional[Union[np.ndarray, tf.Tensor]] = None, + input_ids: TFModelInputType | None = None, + pixel_values: TFModelInputType | None = None, + attention_mask: np.ndarray | tf.Tensor | None = None, + position_ids: np.ndarray | tf.Tensor | None = None, return_loss: Optional[bool] = None, output_attentions: Optional[bool] = None, output_hidden_states: Optional[bool] = None, @@ -1050,9 +1051,9 @@ def __init__(self, config: CLIPTextConfig, *inputs, **kwargs): @replace_return_docstrings(output_type=TFBaseModelOutputWithPooling, config_class=CLIPTextConfig) def call( self, - input_ids: Optional[TFModelInputType] = None, - attention_mask: Optional[Union[np.ndarray, tf.Tensor]] = None, - position_ids: Optional[Union[np.ndarray, tf.Tensor]] = None, + input_ids: TFModelInputType | None = None, + attention_mask: np.ndarray | tf.Tensor | None = None, + position_ids: np.ndarray | tf.Tensor | None = None, output_attentions: Optional[bool] = None, output_hidden_states: Optional[bool] = None, return_dict: Optional[bool] = None, @@ -1088,29 +1089,6 @@ def call( return outputs - @tf.function( - input_signature=[ - { - "input_ids": tf.TensorSpec((None, None), tf.int32, name="input_ids"), - "attention_mask": tf.TensorSpec((None, None), tf.int32, name="attention_mask"), - } - ] - ) - def serving(self, inputs: Dict[str, tf.Tensor]) -> TFBaseModelOutputWithPooling: - output = self.call(inputs) - return self.serving_output(output) - - def serving_output(self, output: TFBaseModelOutputWithPooling) -> TFBaseModelOutputWithPooling: - hs = tf.convert_to_tensor(output.hidden_states) if self.config.output_hidden_states else None - attns = tf.convert_to_tensor(output.attentions) if self.config.output_attentions else None - - return TFBaseModelOutputWithPooling( - last_hidden_state=output.last_hidden_state, - pooler_output=output.pooler_output, - hidden_states=hs, - attentions=attns, - ) - class TFCLIPVisionModel(TFCLIPPreTrainedModel): config_class = CLIPVisionConfig @@ -1121,44 +1099,12 @@ def __init__(self, config: CLIPVisionConfig, *inputs, **kwargs): self.clip = TFCLIPVisionMainLayer(config, name="clip") - @property - def dummy_inputs(self) -> Dict[str, tf.Tensor]: - """ - Dummy inputs to build the network. - - Returns: - `Dict[str, tf.Tensor]`: The dummy inputs. - """ - VISION_DUMMY_INPUTS = tf.random.uniform( - shape=(len(DUMMY_INPUTS), 3, self.config.image_size, self.config.image_size), dtype=tf.float32 - ) - return {"pixel_values": VISION_DUMMY_INPUTS} - - @tf.function( - input_signature=[ - { - "pixel_values": tf.TensorSpec((None, None, None, None), tf.float32, name="pixel_values"), - } - ] - ) - def serving(self, inputs: Dict[str, tf.Tensor]) -> TFBaseModelOutputWithPooling: - """ - Method used for serving the model. - - Args: - inputs (`Dict[str, tf.Tensor]`): - The input of the saved model as a dictionary of tensors. - """ - output = self.call(inputs) - - return self.serving_output(output) - @unpack_inputs @add_start_docstrings_to_model_forward(CLIP_VISION_INPUTS_DOCSTRING) @replace_return_docstrings(output_type=TFBaseModelOutputWithPooling, config_class=CLIPVisionConfig) def call( self, - pixel_values: Optional[TFModelInputType] = None, + pixel_values: TFModelInputType | None = None, output_attentions: Optional[bool] = None, output_hidden_states: Optional[bool] = None, return_dict: Optional[bool] = None, @@ -1197,17 +1143,6 @@ def call( return outputs - def serving_output(self, output: TFBaseModelOutputWithPooling) -> TFBaseModelOutputWithPooling: - hs = tf.convert_to_tensor(output.hidden_states) if self.config.output_hidden_states else None - attns = tf.convert_to_tensor(output.attentions) if self.config.output_attentions else None - - return TFBaseModelOutputWithPooling( - last_hidden_state=output.last_hidden_state, - pooler_output=output.pooler_output, - hidden_states=hs, - attentions=attns, - ) - @add_start_docstrings(CLIP_START_DOCSTRING) class TFCLIPModel(TFCLIPPreTrainedModel): @@ -1218,51 +1153,13 @@ def __init__(self, config: CLIPConfig, *inputs, **kwargs): self.clip = TFCLIPMainLayer(config, name="clip") - @property - def dummy_inputs(self) -> Dict[str, tf.Tensor]: - """ - Dummy inputs to build the network. - - Returns: - `Dict[str, tf.Tensor]`: The dummy inputs. - """ - VISION_DUMMY_INPUTS = tf.random.uniform( - shape=(len(DUMMY_INPUTS), 3, self.config.vision_config.image_size, self.config.vision_config.image_size), - dtype=tf.float32, - ) - return { - "input_ids": tf.constant(DUMMY_INPUTS, dtype=tf.int32), - "pixel_values": VISION_DUMMY_INPUTS, - } - - @tf.function( - input_signature=[ - { - "input_ids": tf.TensorSpec((None, None), tf.int32, name="input_ids"), - "pixel_values": tf.TensorSpec((None, None, None, None), tf.float32, name="pixel_values"), - "attention_mask": tf.TensorSpec((None, None), tf.int32, name="attention_mask"), - } - ] - ) - def serving(self, inputs: Dict[str, tf.Tensor]) -> TFCLIPOutput: - """ - Method used for serving the model. - - Args: - inputs (`Dict[str, tf.Tensor]`): - The input of the saved model as a dictionary of tensors. - """ - output = self.call(inputs) - - return self.serving_output(output) - @unpack_inputs @add_start_docstrings_to_model_forward(CLIP_TEXT_INPUTS_DOCSTRING.format("batch_size, sequence_length")) def get_text_features( self, - input_ids: Optional[TFModelInputType] = None, - attention_mask: Optional[Union[np.ndarray, tf.Tensor]] = None, - position_ids: Optional[Union[np.ndarray, tf.Tensor]] = None, + input_ids: TFModelInputType | None = None, + attention_mask: np.ndarray | tf.Tensor | None = None, + position_ids: np.ndarray | tf.Tensor | None = None, output_attentions: Optional[bool] = None, output_hidden_states: Optional[bool] = None, return_dict: Optional[bool] = None, @@ -1300,7 +1197,7 @@ def get_text_features( @add_start_docstrings_to_model_forward(CLIP_VISION_INPUTS_DOCSTRING) def get_image_features( self, - pixel_values: Optional[TFModelInputType] = None, + pixel_values: TFModelInputType | None = None, output_attentions: Optional[bool] = None, output_hidden_states: Optional[bool] = None, return_dict: Optional[bool] = None, @@ -1343,10 +1240,10 @@ def get_image_features( @replace_return_docstrings(output_type=TFCLIPOutput, config_class=CLIPConfig) def call( self, - input_ids: Optional[TFModelInputType] = None, - pixel_values: Optional[TFModelInputType] = None, - attention_mask: Optional[Union[np.ndarray, tf.Tensor]] = None, - position_ids: Optional[Union[np.ndarray, tf.Tensor]] = None, + input_ids: TFModelInputType | None = None, + pixel_values: TFModelInputType | None = None, + attention_mask: np.ndarray | tf.Tensor | None = None, + position_ids: np.ndarray | tf.Tensor | None = None, return_loss: Optional[bool] = None, output_attentions: Optional[bool] = None, output_hidden_states: Optional[bool] = None, diff --git a/src/transformers/models/convbert/modeling_tf_convbert.py b/src/transformers/models/convbert/modeling_tf_convbert.py index e853da76277139..9b2bf2383bb740 100644 --- a/src/transformers/models/convbert/modeling_tf_convbert.py +++ b/src/transformers/models/convbert/modeling_tf_convbert.py @@ -15,6 +15,8 @@ """ TF 2.0 ConvBERT model.""" +from __future__ import annotations + from typing import Optional, Tuple, Union import numpy as np @@ -44,7 +46,6 @@ ) from ...tf_utils import check_embeddings_within_bounds, shape_list, stable_softmax from ...utils import ( - MULTIPLE_CHOICE_DUMMY_INPUTS, add_code_sample_docstrings, add_start_docstrings, add_start_docstrings_to_model_forward, @@ -742,12 +743,12 @@ def __init__(self, config, *inputs, **kwargs): ) def call( self, - input_ids: Optional[TFModelInputType] = None, + input_ids: TFModelInputType | None = None, attention_mask: Optional[Union[np.array, tf.Tensor]] = None, token_type_ids: Optional[Union[np.array, tf.Tensor]] = None, position_ids: Optional[Union[np.array, tf.Tensor]] = None, head_mask: Optional[Union[np.array, tf.Tensor]] = None, - inputs_embeds: Optional[tf.Tensor] = None, + inputs_embeds: tf.Tensor | None = None, output_attentions: Optional[bool] = None, output_hidden_states: Optional[bool] = None, return_dict: Optional[bool] = None, @@ -768,12 +769,6 @@ def call( return outputs - def serving_output(self, output): - hs = tf.convert_to_tensor(output.hidden_states) if self.config.output_hidden_states else None - attns = tf.convert_to_tensor(output.attentions) if self.config.output_attentions else None - - return TFBaseModelOutput(last_hidden_state=output.last_hidden_state, hidden_states=hs, attentions=attns) - class TFConvBertMaskedLMHead(tf.keras.layers.Layer): def __init__(self, config, input_embeddings, **kwargs): @@ -858,16 +853,16 @@ def get_prefix_bias_name(self): ) def call( self, - input_ids: Optional[TFModelInputType] = None, - attention_mask: Optional[Union[np.ndarray, tf.Tensor]] = None, - token_type_ids: Optional[Union[np.ndarray, tf.Tensor]] = None, - position_ids: Optional[Union[np.ndarray, tf.Tensor]] = None, - head_mask: Optional[Union[np.ndarray, tf.Tensor]] = None, - inputs_embeds: Optional[tf.Tensor] = None, + input_ids: TFModelInputType | None = None, + attention_mask: np.ndarray | tf.Tensor | None = None, + token_type_ids: np.ndarray | tf.Tensor | None = None, + position_ids: np.ndarray | tf.Tensor | None = None, + head_mask: np.ndarray | tf.Tensor | None = None, + inputs_embeds: tf.Tensor | None = None, output_attentions: Optional[bool] = None, output_hidden_states: Optional[bool] = None, return_dict: Optional[bool] = None, - labels: Optional[tf.Tensor] = None, + labels: tf.Tensor | None = None, training: Optional[bool] = False, ) -> Union[Tuple, TFMaskedLMOutput]: r""" @@ -905,13 +900,6 @@ def call( attentions=generator_hidden_states.attentions, ) - # Copied from transformers.models.bert.modeling_tf_bert.TFBertForMaskedLM.serving_output - def serving_output(self, output): - hs = tf.convert_to_tensor(output.hidden_states) if self.config.output_hidden_states else None - attns = tf.convert_to_tensor(output.attentions) if self.config.output_attentions else None - - return TFMaskedLMOutput(logits=output.logits, hidden_states=hs, attentions=attns) - class TFConvBertClassificationHead(tf.keras.layers.Layer): """Head for sentence-level classification tasks.""" @@ -965,16 +953,16 @@ def __init__(self, config, *inputs, **kwargs): ) def call( self, - input_ids: Optional[TFModelInputType] = None, - attention_mask: Optional[Union[np.ndarray, tf.Tensor]] = None, - token_type_ids: Optional[Union[np.ndarray, tf.Tensor]] = None, - position_ids: Optional[Union[np.ndarray, tf.Tensor]] = None, - head_mask: Optional[Union[np.ndarray, tf.Tensor]] = None, - inputs_embeds: Optional[tf.Tensor] = None, + input_ids: TFModelInputType | None = None, + attention_mask: np.ndarray | tf.Tensor | None = None, + token_type_ids: np.ndarray | tf.Tensor | None = None, + position_ids: np.ndarray | tf.Tensor | None = None, + head_mask: np.ndarray | tf.Tensor | None = None, + inputs_embeds: tf.Tensor | None = None, output_attentions: Optional[bool] = None, output_hidden_states: Optional[bool] = None, return_dict: Optional[bool] = None, - labels: Optional[tf.Tensor] = None, + labels: tf.Tensor | None = None, training: Optional[bool] = False, ) -> Union[Tuple, TFSequenceClassifierOutput]: r""" @@ -1010,12 +998,6 @@ def call( attentions=outputs.attentions, ) - def serving_output(self, output): - hs = tf.convert_to_tensor(output.hidden_states) if self.config.output_hidden_states else None - attns = tf.convert_to_tensor(output.attentions) if self.config.output_attentions else None - - return TFSequenceClassifierOutput(logits=output.logits, hidden_states=hs, attentions=attns) - @add_start_docstrings( """ @@ -1036,16 +1018,6 @@ def __init__(self, config, *inputs, **kwargs): 1, kernel_initializer=get_initializer(config.initializer_range), name="classifier" ) - @property - def dummy_inputs(self): - """ - Dummy inputs to build the network. - - Returns: - tf.Tensor with dummy inputs - """ - return {"input_ids": tf.convert_to_tensor(MULTIPLE_CHOICE_DUMMY_INPUTS, dtype=tf.int32)} - @unpack_inputs @add_start_docstrings_to_model_forward( CONVBERT_INPUTS_DOCSTRING.format("batch_size, num_choices, sequence_length") @@ -1057,16 +1029,16 @@ def dummy_inputs(self): ) def call( self, - input_ids: Optional[TFModelInputType] = None, - attention_mask: Optional[Union[np.ndarray, tf.Tensor]] = None, - token_type_ids: Optional[Union[np.ndarray, tf.Tensor]] = None, - position_ids: Optional[Union[np.ndarray, tf.Tensor]] = None, - head_mask: Optional[Union[np.ndarray, tf.Tensor]] = None, - inputs_embeds: Optional[tf.Tensor] = None, + input_ids: TFModelInputType | None = None, + attention_mask: np.ndarray | tf.Tensor | None = None, + token_type_ids: np.ndarray | tf.Tensor | None = None, + position_ids: np.ndarray | tf.Tensor | None = None, + head_mask: np.ndarray | tf.Tensor | None = None, + inputs_embeds: tf.Tensor | None = None, output_attentions: Optional[bool] = None, output_hidden_states: Optional[bool] = None, return_dict: Optional[bool] = None, - labels: Optional[tf.Tensor] = None, + labels: tf.Tensor | None = None, training: Optional[bool] = False, ) -> Union[Tuple, TFMultipleChoiceModelOutput]: r""" @@ -1119,26 +1091,6 @@ def call( attentions=outputs.attentions, ) - @tf.function( - input_signature=[ - { - "input_ids": tf.TensorSpec((None, None, None), tf.int32, name="input_ids"), - "attention_mask": tf.TensorSpec((None, None, None), tf.int32, name="attention_mask"), - "token_type_ids": tf.TensorSpec((None, None, None), tf.int32, name="token_type_ids"), - } - ] - ) - def serving(self, inputs): - output = self.call(inputs) - - return self.serving_output(output) - - def serving_output(self, output): - hs = tf.convert_to_tensor(output.hidden_states) if self.config.output_hidden_states else None - attns = tf.convert_to_tensor(output.attentions) if self.config.output_attentions else None - - return TFMultipleChoiceModelOutput(logits=output.logits, hidden_states=hs, attentions=attns) - @add_start_docstrings( """ @@ -1170,16 +1122,16 @@ def __init__(self, config, *inputs, **kwargs): ) def call( self, - input_ids: Optional[TFModelInputType] = None, - attention_mask: Optional[Union[np.ndarray, tf.Tensor]] = None, - token_type_ids: Optional[Union[np.ndarray, tf.Tensor]] = None, - position_ids: Optional[Union[np.ndarray, tf.Tensor]] = None, - head_mask: Optional[Union[np.ndarray, tf.Tensor]] = None, - inputs_embeds: Optional[tf.Tensor] = None, + input_ids: TFModelInputType | None = None, + attention_mask: np.ndarray | tf.Tensor | None = None, + token_type_ids: np.ndarray | tf.Tensor | None = None, + position_ids: np.ndarray | tf.Tensor | None = None, + head_mask: np.ndarray | tf.Tensor | None = None, + inputs_embeds: tf.Tensor | None = None, output_attentions: Optional[bool] = None, output_hidden_states: Optional[bool] = None, return_dict: Optional[bool] = None, - labels: Optional[tf.Tensor] = None, + labels: tf.Tensor | None = None, training: Optional[bool] = False, ) -> Union[Tuple, TFTokenClassifierOutput]: r""" @@ -1214,12 +1166,6 @@ def call( attentions=outputs.attentions, ) - def serving_output(self, output): - hs = tf.convert_to_tensor(output.hidden_states) if self.config.output_hidden_states else None - attns = tf.convert_to_tensor(output.attentions) if self.config.output_attentions else None - - return TFTokenClassifierOutput(logits=output.logits, hidden_states=hs, attentions=attns) - @add_start_docstrings( """ @@ -1247,17 +1193,17 @@ def __init__(self, config, *inputs, **kwargs): ) def call( self, - input_ids: Optional[TFModelInputType] = None, - attention_mask: Optional[Union[np.ndarray, tf.Tensor]] = None, - token_type_ids: Optional[Union[np.ndarray, tf.Tensor]] = None, - position_ids: Optional[Union[np.ndarray, tf.Tensor]] = None, - head_mask: Optional[Union[np.ndarray, tf.Tensor]] = None, - inputs_embeds: Optional[tf.Tensor] = None, + input_ids: TFModelInputType | None = None, + attention_mask: np.ndarray | tf.Tensor | None = None, + token_type_ids: np.ndarray | tf.Tensor | None = None, + position_ids: np.ndarray | tf.Tensor | None = None, + head_mask: np.ndarray | tf.Tensor | None = None, + inputs_embeds: tf.Tensor | None = None, output_attentions: Optional[bool] = None, output_hidden_states: Optional[bool] = None, return_dict: Optional[bool] = None, - start_positions: Optional[tf.Tensor] = None, - end_positions: Optional[tf.Tensor] = None, + start_positions: tf.Tensor | None = None, + end_positions: tf.Tensor | None = None, training: Optional[bool] = False, ) -> Union[Tuple, TFQuestionAnsweringModelOutput]: r""" @@ -1305,11 +1251,3 @@ def call( hidden_states=outputs.hidden_states, attentions=outputs.attentions, ) - - def serving_output(self, output): - hs = tf.convert_to_tensor(output.hidden_states) if self.config.output_hidden_states else None - attns = tf.convert_to_tensor(output.attentions) if self.config.output_attentions else None - - return TFQuestionAnsweringModelOutput( - start_logits=output.start_logits, end_logits=output.end_logits, hidden_states=hs, attentions=attns - ) diff --git a/src/transformers/models/convnext/modeling_tf_convnext.py b/src/transformers/models/convnext/modeling_tf_convnext.py index 00db1f0b78842c..23a77a928ecc5a 100644 --- a/src/transformers/models/convnext/modeling_tf_convnext.py +++ b/src/transformers/models/convnext/modeling_tf_convnext.py @@ -15,7 +15,9 @@ """ TF 2.0 ConvNext model.""" -from typing import Dict, Optional, Tuple, Union +from __future__ import annotations + +from typing import Optional, Tuple, Union import numpy as np import tensorflow as tf @@ -297,7 +299,7 @@ def __init__(self, config: ConvNextConfig, add_pooling_layer: bool = True, **kwa @unpack_inputs def call( self, - pixel_values: Optional[TFModelInputType] = None, + pixel_values: TFModelInputType | None = None, output_hidden_states: Optional[bool] = None, return_dict: Optional[bool] = None, training: bool = False, @@ -349,43 +351,6 @@ class TFConvNextPreTrainedModel(TFPreTrainedModel): base_model_prefix = "convnext" main_input_name = "pixel_values" - @property - def dummy_inputs(self) -> Dict[str, tf.Tensor]: - """ - Dummy inputs to build the network. - - Returns: - `Dict[str, tf.Tensor]`: The dummy inputs. - """ - VISION_DUMMY_INPUTS = tf.random.uniform( - shape=( - 3, - self.config.num_channels, - self.config.image_size, - self.config.image_size, - ), - dtype=tf.float32, - ) - return {"pixel_values": tf.constant(VISION_DUMMY_INPUTS)} - - @tf.function( - input_signature=[ - { - "pixel_values": tf.TensorSpec((None, None, None, None), tf.float32, name="pixel_values"), - } - ] - ) - def serving(self, inputs): - """ - Method used for serving the model. - - Args: - inputs (`Dict[str, tf.Tensor]`): - The input of the saved model as a dictionary of tensors. - """ - output = self.call(inputs) - return self.serving_output(output) - CONVNEXT_START_DOCSTRING = r""" This model inherits from [`TFPreTrainedModel`]. Check the superclass documentation for the generic methods the @@ -458,7 +423,7 @@ def __init__(self, config, *inputs, add_pooling_layer=True, **kwargs): @replace_return_docstrings(output_type=TFBaseModelOutputWithPooling, config_class=_CONFIG_FOR_DOC) def call( self, - pixel_values: Optional[TFModelInputType] = None, + pixel_values: TFModelInputType | None = None, output_hidden_states: Optional[bool] = None, return_dict: Optional[bool] = None, training: bool = False, @@ -507,14 +472,6 @@ def call( hidden_states=outputs.hidden_states, ) - def serving_output(self, output: TFBaseModelOutputWithPooling) -> TFBaseModelOutputWithPooling: - # hidden_states not converted to Tensor with tf.convert_to_tensor as they are all of different dimensions - return TFBaseModelOutputWithPooling( - last_hidden_state=output.last_hidden_state, - pooler_output=output.pooler_output, - hidden_states=output.hidden_states, - ) - @add_start_docstrings( """ @@ -543,10 +500,10 @@ def __init__(self, config: ConvNextConfig, *inputs, **kwargs): @replace_return_docstrings(output_type=TFSequenceClassifierOutput, config_class=_CONFIG_FOR_DOC) def call( self, - pixel_values: Optional[TFModelInputType] = None, + pixel_values: TFModelInputType | None = None, output_hidden_states: Optional[bool] = None, return_dict: Optional[bool] = None, - labels: Optional[Union[np.ndarray, tf.Tensor]] = None, + labels: np.ndarray | tf.Tensor | None = None, training: Optional[bool] = False, ) -> Union[TFSequenceClassifierOutput, Tuple[tf.Tensor]]: r""" @@ -607,7 +564,3 @@ def call( logits=logits, hidden_states=outputs.hidden_states, ) - - def serving_output(self, output: TFSequenceClassifierOutput) -> TFSequenceClassifierOutput: - # hidden_states not converted to Tensor with tf.convert_to_tensor as they are all of different dimensions - return TFSequenceClassifierOutput(logits=output.logits, hidden_states=output.hidden_states) diff --git a/src/transformers/models/ctrl/modeling_tf_ctrl.py b/src/transformers/models/ctrl/modeling_tf_ctrl.py index f4742b4e33d79c..4dd9e73925070e 100644 --- a/src/transformers/models/ctrl/modeling_tf_ctrl.py +++ b/src/transformers/models/ctrl/modeling_tf_ctrl.py @@ -15,6 +15,9 @@ # limitations under the License. """ TF 2.0 CTRL model.""" + +from __future__ import annotations + import warnings from typing import Optional, Tuple, Union @@ -256,13 +259,13 @@ def _prune_heads(self, heads_to_prune): @unpack_inputs def call( self, - input_ids: Optional[TFModelInputType] = None, + input_ids: TFModelInputType | None = None, past_key_values: Optional[Tuple[Tuple[Union[np.ndarray, tf.Tensor]]]] = None, - attention_mask: Optional[Union[np.ndarray, tf.Tensor]] = None, - token_type_ids: Optional[Union[np.ndarray, tf.Tensor]] = None, - position_ids: Optional[Union[np.ndarray, tf.Tensor]] = None, - head_mask: Optional[Union[np.ndarray, tf.Tensor]] = None, - inputs_embeds: Optional[Union[np.ndarray, tf.Tensor]] = None, + attention_mask: np.ndarray | tf.Tensor | None = None, + token_type_ids: np.ndarray | tf.Tensor | None = None, + position_ids: np.ndarray | tf.Tensor | None = None, + head_mask: np.ndarray | tf.Tensor | None = None, + inputs_embeds: np.ndarray | tf.Tensor | None = None, use_cache: Optional[bool] = None, output_attentions: Optional[bool] = None, output_hidden_states: Optional[bool] = None, @@ -532,13 +535,13 @@ def __init__(self, config, *inputs, **kwargs): ) def call( self, - input_ids: Optional[TFModelInputType] = None, + input_ids: TFModelInputType | None = None, past_key_values: Optional[Tuple[Tuple[Union[np.ndarray, tf.Tensor]]]] = None, - attention_mask: Optional[Union[np.ndarray, tf.Tensor]] = None, - token_type_ids: Optional[Union[np.ndarray, tf.Tensor]] = None, - position_ids: Optional[Union[np.ndarray, tf.Tensor]] = None, - head_mask: Optional[Union[np.ndarray, tf.Tensor]] = None, - inputs_embeds: Optional[Union[np.ndarray, tf.Tensor]] = None, + attention_mask: np.ndarray | tf.Tensor | None = None, + token_type_ids: np.ndarray | tf.Tensor | None = None, + position_ids: np.ndarray | tf.Tensor | None = None, + head_mask: np.ndarray | tf.Tensor | None = None, + inputs_embeds: np.ndarray | tf.Tensor | None = None, use_cache: Optional[bool] = None, output_attentions: Optional[bool] = None, output_hidden_states: Optional[bool] = None, @@ -561,15 +564,6 @@ def call( ) return outputs - def serving_output(self, output): - pkv = tf.convert_to_tensor(output.past_key_values) if self.config.use_cache else None - hs = tf.convert_to_tensor(output.hidden_states) if self.config.output_hidden_states else None - attns = tf.convert_to_tensor(output.attentions) if self.config.output_attentions else None - - return TFBaseModelOutputWithPast( - last_hidden_state=output.last_hidden_state, past_key_values=pkv, hidden_states=hs, attentions=attns - ) - class TFCTRLLMHead(tf.keras.layers.Layer): def __init__(self, config, input_embeddings, **kwargs): @@ -645,18 +639,18 @@ def prepare_inputs_for_generation(self, input_ids, past_key_values=None, use_cac ) def call( self, - input_ids: Optional[TFModelInputType] = None, + input_ids: TFModelInputType | None = None, past_key_values: Optional[Tuple[Tuple[Union[np.ndarray, tf.Tensor]]]] = None, - attention_mask: Optional[Union[np.ndarray, tf.Tensor]] = None, - token_type_ids: Optional[Union[np.ndarray, tf.Tensor]] = None, - position_ids: Optional[Union[np.ndarray, tf.Tensor]] = None, - head_mask: Optional[Union[np.ndarray, tf.Tensor]] = None, - inputs_embeds: Optional[Union[np.ndarray, tf.Tensor]] = None, + attention_mask: np.ndarray | tf.Tensor | None = None, + token_type_ids: np.ndarray | tf.Tensor | None = None, + position_ids: np.ndarray | tf.Tensor | None = None, + head_mask: np.ndarray | tf.Tensor | None = None, + inputs_embeds: np.ndarray | tf.Tensor | None = None, use_cache: Optional[bool] = None, output_attentions: Optional[bool] = None, output_hidden_states: Optional[bool] = None, return_dict: Optional[bool] = None, - labels: Optional[Union[np.ndarray, tf.Tensor]] = None, + labels: np.ndarray | tf.Tensor | None = None, training: Optional[bool] = False, ) -> Union[Tuple, TFCausalLMOutputWithPast]: r""" @@ -702,13 +696,6 @@ def call( attentions=transformer_outputs.attentions, ) - def serving_output(self, output): - pkv = tf.convert_to_tensor(output.past_key_values) if self.config.use_cache else None - hs = tf.convert_to_tensor(output.hidden_states) if self.config.output_hidden_states else None - attns = tf.convert_to_tensor(output.attentions) if self.config.output_attentions else None - - return TFCausalLMOutputWithPast(logits=output.logits, past_key_values=pkv, hidden_states=hs, attentions=attns) - @add_start_docstrings( """ @@ -749,18 +736,18 @@ def get_output_embeddings(self): ) def call( self, - input_ids: Optional[TFModelInputType] = None, + input_ids: TFModelInputType | None = None, past_key_values: Optional[Tuple[Tuple[Union[np.ndarray, tf.Tensor]]]] = None, - attention_mask: Optional[Union[np.ndarray, tf.Tensor]] = None, - token_type_ids: Optional[Union[np.ndarray, tf.Tensor]] = None, - position_ids: Optional[Union[np.ndarray, tf.Tensor]] = None, - head_mask: Optional[Union[np.ndarray, tf.Tensor]] = None, - inputs_embeds: Optional[Union[np.ndarray, tf.Tensor]] = None, + attention_mask: np.ndarray | tf.Tensor | None = None, + token_type_ids: np.ndarray | tf.Tensor | None = None, + position_ids: np.ndarray | tf.Tensor | None = None, + head_mask: np.ndarray | tf.Tensor | None = None, + inputs_embeds: np.ndarray | tf.Tensor | None = None, use_cache: Optional[bool] = None, output_attentions: Optional[bool] = None, output_hidden_states: Optional[bool] = None, return_dict: Optional[bool] = None, - labels: Optional[Union[np.ndarray, tf.Tensor]] = None, + labels: np.ndarray | tf.Tensor | None = None, training: Optional[bool] = False, ) -> Union[Tuple, TFSequenceClassifierOutput]: r""" @@ -836,10 +823,3 @@ def call( hidden_states=transformer_outputs.hidden_states, attentions=transformer_outputs.attentions, ) - - # Copied from transformers.models.bert.modeling_tf_bert.TFBertForSequenceClassification.serving_output - def serving_output(self, output: TFSequenceClassifierOutput) -> TFSequenceClassifierOutput: - hs = tf.convert_to_tensor(output.hidden_states) if self.config.output_hidden_states else None - attns = tf.convert_to_tensor(output.attentions) if self.config.output_attentions else None - - return TFSequenceClassifierOutput(logits=output.logits, hidden_states=hs, attentions=attns) diff --git a/src/transformers/models/cvt/modeling_tf_cvt.py b/src/transformers/models/cvt/modeling_tf_cvt.py index 6ad86071e47d5c..80e15a196f8590 100644 --- a/src/transformers/models/cvt/modeling_tf_cvt.py +++ b/src/transformers/models/cvt/modeling_tf_cvt.py @@ -15,9 +15,11 @@ """ TF 2.0 Cvt model.""" +from __future__ import annotations + import collections.abc from dataclasses import dataclass -from typing import Dict, Optional, Tuple, Union +from typing import Optional, Tuple, Union import tensorflow as tf @@ -75,7 +77,7 @@ class TFBaseModelOutputWithCLSToken(ModelOutput): last_hidden_state: tf.Tensor = None cls_token_value: tf.Tensor = None - hidden_states: Optional[Tuple[tf.Tensor]] = None + hidden_states: Tuple[tf.Tensor] | None = None class TFCvtDropPath(tf.keras.layers.Layer): @@ -668,7 +670,7 @@ def __init__(self, config: CvtConfig, **kwargs): @unpack_inputs def call( self, - pixel_values: Optional[TFModelInputType] = None, + pixel_values: TFModelInputType | None = None, output_hidden_states: Optional[bool] = None, return_dict: Optional[bool] = None, training: Optional[bool] = False, @@ -705,35 +707,6 @@ class TFCvtPreTrainedModel(TFPreTrainedModel): base_model_prefix = "cvt" main_input_name = "pixel_values" - @property - def dummy_inputs(self) -> Dict[str, tf.Tensor]: - """ - Dummy inputs to build the network. - - Returns: - `Dict[str, tf.Tensor]`: The dummy inputs. - """ - VISION_DUMMY_INPUTS = tf.random.uniform(shape=(3, self.config.num_channels, 224, 224), dtype=tf.float32) - return {"pixel_values": tf.constant(VISION_DUMMY_INPUTS)} - - @tf.function( - input_signature=[ - { - "pixel_values": tf.TensorSpec((None, None, None, None), tf.float32, name="pixel_values"), - } - ] - ) - def serving(self, inputs): - """ - Method used for serving the model. - - Args: - inputs (`Dict[str, tf.Tensor]`): - The input of the saved model as a dictionary of tensors. - """ - output = self.call(inputs) - return self.serving_output(output) - TFCVT_START_DOCSTRING = r""" @@ -797,7 +770,7 @@ def __init__(self, config: CvtConfig, *inputs, **kwargs): @replace_return_docstrings(output_type=TFBaseModelOutputWithCLSToken, config_class=_CONFIG_FOR_DOC) def call( self, - pixel_values: Optional[tf.Tensor] = None, + pixel_values: tf.Tensor | None = None, output_hidden_states: Optional[bool] = None, return_dict: Optional[bool] = None, training: Optional[bool] = False, @@ -842,13 +815,6 @@ def call( hidden_states=outputs.hidden_states, ) - def serving_output(self, output: TFBaseModelOutputWithCLSToken) -> TFBaseModelOutputWithCLSToken: - return TFBaseModelOutputWithCLSToken( - last_hidden_state=output.last_hidden_state, - cls_token_value=output.cls_token_value, - hidden_states=output.hidden_states, - ) - @add_start_docstrings( """ @@ -880,8 +846,8 @@ def __init__(self, config: CvtConfig, *inputs, **kwargs): @replace_return_docstrings(output_type=TFImageClassifierOutputWithNoAttention, config_class=_CONFIG_FOR_DOC) def call( self, - pixel_values: Optional[tf.Tensor] = None, - labels: Optional[tf.Tensor] = None, + pixel_values: tf.Tensor | None = None, + labels: tf.Tensor | None = None, output_hidden_states: Optional[bool] = None, return_dict: Optional[bool] = None, training: Optional[bool] = False, @@ -943,6 +909,3 @@ def call( return ((loss,) + output) if loss is not None else output return TFImageClassifierOutputWithNoAttention(loss=loss, logits=logits, hidden_states=outputs.hidden_states) - - def serving_output(self, output: TFImageClassifierOutputWithNoAttention) -> TFImageClassifierOutputWithNoAttention: - return TFImageClassifierOutputWithNoAttention(logits=output.logits, hidden_states=output.hidden_states) diff --git a/src/transformers/models/data2vec/configuration_data2vec_audio.py b/src/transformers/models/data2vec/configuration_data2vec_audio.py index 2ec526924f36eb..066d81a5daed35 100644 --- a/src/transformers/models/data2vec/configuration_data2vec_audio.py +++ b/src/transformers/models/data2vec/configuration_data2vec_audio.py @@ -62,6 +62,9 @@ class Data2VecAudioConfig(PretrainedConfig): The dropout ratio for the attention probabilities. final_dropout (`float`, *optional*, defaults to 0.1): The dropout probability for the final projection layer of [`Data2VecAudioForCTC`]. + layerdrop (`float`, *optional*, defaults to 0.1): + The LayerDrop probability. See the [LayerDrop paper](see https://arxiv.org/abs/1909.11556) for more + details. initializer_range (`float`, *optional*, defaults to 0.02): The standard deviation of the truncated_normal_initializer for initializing all weight matrices. layer_norm_eps (`float`, *optional*, defaults to 1e-12): diff --git a/src/transformers/models/data2vec/modeling_tf_data2vec_vision.py b/src/transformers/models/data2vec/modeling_tf_data2vec_vision.py index 06a6f010dd7463..8ebb8c68ff8d99 100644 --- a/src/transformers/models/data2vec/modeling_tf_data2vec_vision.py +++ b/src/transformers/models/data2vec/modeling_tf_data2vec_vision.py @@ -14,10 +14,13 @@ # limitations under the License. """ TF 2.0 Data2Vec Vision model.""" + +from __future__ import annotations + import collections.abc import math from dataclasses import dataclass -from typing import Dict, List, Optional, Tuple, Union +from typing import List, Optional, Tuple, Union import numpy as np import tensorflow as tf @@ -94,8 +97,8 @@ class TFData2VecVisionModelOutputWithPooling(TFBaseModelOutputWithPooling): last_hidden_state: tf.Tensor = None pooler_output: tf.Tensor = None - hidden_states: Optional[Tuple[tf.Tensor]] = None - attentions: Optional[Tuple[tf.Tensor]] = None + hidden_states: Tuple[tf.Tensor] | None = None + attentions: Tuple[tf.Tensor] | None = None class TFData2VecVisionDropPath(tf.keras.layers.Layer): @@ -163,7 +166,7 @@ def build(self, input_shape: tf.TensorShape): super().build(input_shape) - def call(self, pixel_values: tf.Tensor, bool_masked_pos: Optional[tf.Tensor] = None) -> tf.Tensor: + def call(self, pixel_values: tf.Tensor, bool_masked_pos: tf.Tensor | None = None) -> tf.Tensor: embeddings = self.patch_embeddings(pixel_values) batch_size, seq_len, projection_dim = shape_list(embeddings) @@ -609,7 +612,7 @@ def __init__(self, config: Data2VecVisionConfig, window_size: Optional[tuple] = def call( self, hidden_states: tf.Tensor, - head_mask: Optional[tf.Tensor] = None, + head_mask: tf.Tensor | None = None, output_attentions: bool = False, output_hidden_states: bool = False, return_dict: bool = True, @@ -685,9 +688,9 @@ class PreTrainedModel @unpack_inputs def call( self, - pixel_values: Optional[tf.Tensor] = None, - bool_masked_pos: Optional[tf.Tensor] = None, - head_mask: Optional[tf.Tensor] = None, + pixel_values: tf.Tensor | None = None, + bool_masked_pos: tf.Tensor | None = None, + head_mask: tf.Tensor | None = None, output_attentions: Optional[bool] = None, output_hidden_states: Optional[bool] = None, return_dict: Optional[bool] = None, @@ -771,36 +774,6 @@ class TFData2VecVisionPreTrainedModel(TFPreTrainedModel): main_input_name = "pixel_values" _keys_to_ignore_on_load_unexpected = [r"relative_position_index"] - @property - def dummy_inputs(self) -> Dict[str, tf.Tensor]: - """ - Dummy inputs to build the network. Returns: - `Dict[str, tf.Tensor]`: The dummy inputs. - """ - VISION_DUMMY_INPUTS = tf.random.uniform( - shape=(3, self.config.num_channels, self.config.image_size, self.config.image_size), - dtype=tf.float32, - ) - return {"pixel_values": tf.constant(VISION_DUMMY_INPUTS)} - - @tf.function( - input_signature=[ - { - "pixel_values": tf.TensorSpec((None, None, None, None), tf.float32, name="pixel_values"), - } - ] - ) - def serving(self, inputs): - """ - Method used for serving the model. - - Args: - inputs (`Dict[str, tf.Tensor]`): - The input of the saved model as a dictionary of tensors. - """ - output = self.call(inputs) - return self.serving_output(output) - DATA2VEC_VISION_START_DOCSTRING = r""" This model inherits from [`TFPreTrainedModel`]. Check the superclass documentation for the generic methods the @@ -899,9 +872,9 @@ def get_input_embeddings(self): ) def call( self, - pixel_values: Optional[TFModelInputType] = None, - bool_masked_pos: Optional[tf.Tensor] = None, - head_mask: Optional[Union[np.ndarray, tf.Tensor]] = None, + pixel_values: TFModelInputType | None = None, + bool_masked_pos: tf.Tensor | None = None, + head_mask: np.ndarray | tf.Tensor | None = None, output_attentions: Optional[bool] = None, output_hidden_states: Optional[bool] = None, return_dict: Optional[bool] = None, @@ -923,17 +896,6 @@ def call( return outputs - def serving_output(self, output: TFData2VecVisionModelOutputWithPooling) -> TFData2VecVisionModelOutputWithPooling: - hidden_states = tf.convert_to_tensor(output.hidden_states) if self.config.output_hidden_states else None - attentions = tf.convert_to_tensor(output.attentions) if self.config.output_attentions else None - - return TFData2VecVisionModelOutputWithPooling( - last_hidden_state=output.last_hidden_state, - pooler_output=output.pooler_output, - hidden_states=hidden_states, - attentions=attentions, - ) - @add_start_docstrings( """ @@ -966,12 +928,12 @@ def __init__(self, config: Data2VecVisionConfig, *inputs, **kwargs): ) def call( self, - pixel_values: Optional[TFModelInputType] = None, - head_mask: Optional[Union[np.ndarray, tf.Tensor]] = None, + pixel_values: TFModelInputType | None = None, + head_mask: np.ndarray | tf.Tensor | None = None, output_attentions: Optional[bool] = None, output_hidden_states: Optional[bool] = None, return_dict: Optional[bool] = None, - labels: Optional[Union[np.ndarray, tf.Tensor]] = None, + labels: np.ndarray | tf.Tensor | None = None, training: Optional[bool] = False, ) -> Union[TFSequenceClassifierOutput, tuple]: r""" @@ -1006,12 +968,6 @@ def call( attentions=outputs.attentions, ) - def serving_output(self, output: TFSequenceClassifierOutput) -> TFSequenceClassifierOutput: - hidden_states = tf.convert_to_tensor(output.hidden_states) if self.config.output_hidden_states else None - attentions = tf.convert_to_tensor(output.attentions) if self.config.output_attentions else None - - return TFSequenceClassifierOutput(logits=output.logits, hidden_states=hidden_states, attentions=attentions) - class TFData2VecVisionConvModule(tf.keras.layers.Layer): """ @@ -1378,9 +1334,9 @@ def masked_loss(real, pred): @replace_return_docstrings(output_type=TFSemanticSegmenterOutput, config_class=_CONFIG_FOR_DOC) def call( self, - pixel_values: Optional[tf.Tensor] = None, - head_mask: Optional[tf.Tensor] = None, - labels: Optional[tf.Tensor] = None, + pixel_values: tf.Tensor | None = None, + head_mask: tf.Tensor | None = None, + labels: tf.Tensor | None = None, output_attentions: Optional[bool] = None, output_hidden_states: Optional[bool] = None, return_dict: Optional[bool] = None, @@ -1472,9 +1428,3 @@ def reshape_features(x): hidden_states=outputs.hidden_states if output_hidden_states else None, attentions=outputs.attentions, ) - - def serving_output(self, output: TFSemanticSegmenterOutput) -> TFSemanticSegmenterOutput: - hidden_states = tf.convert_to_tensor(output.hidden_states) if self.config.output_hidden_states else None - attentions = tf.convert_to_tensor(output.attentions) if self.config.output_attentions else None - - return TFSemanticSegmenterOutput(logits=output.logits, hidden_states=hidden_states, attentions=attentions) diff --git a/src/transformers/models/deberta/modeling_deberta.py b/src/transformers/models/deberta/modeling_deberta.py index 7c98a2d0d49dd5..62e89c824e6466 100644 --- a/src/transformers/models/deberta/modeling_deberta.py +++ b/src/transformers/models/deberta/modeling_deberta.py @@ -139,7 +139,7 @@ def symbolic(g, self, mask, dim): r_mask = g.op( "Cast", g.op("Sub", g.op("Constant", value_t=torch.tensor(1, dtype=torch.int64)), mask_cast_value), - to_i=sym_help.cast_pytorch_to_onnx["Byte"], + to_i=sym_help.cast_pytorch_to_onnx["Bool"], ) output = masked_fill( g, self, r_mask, g.op("Constant", value_t=torch.tensor(torch.finfo(self.type().dtype()).min)) @@ -420,7 +420,6 @@ def get_attention_mask(self, attention_mask): if attention_mask.dim() <= 2: extended_attention_mask = attention_mask.unsqueeze(1).unsqueeze(2) attention_mask = extended_attention_mask * extended_attention_mask.squeeze(-2).unsqueeze(-1) - attention_mask = attention_mask.byte() elif attention_mask.dim() == 3: attention_mask = attention_mask.unsqueeze(1) @@ -614,7 +613,7 @@ def forward( Input states to the module usually the output from previous layer, it will be the Q,K and V in *Attention(Q,K,V)* - attention_mask (`torch.ByteTensor`): + attention_mask (`torch.BoolTensor`): An attention mask matrix of shape [*B*, *N*, *N*] where *B* is the batch size, *N* is the maximum sequence length in which element [i,j] = *1* means the *i* th token in the input can attend to the *j* th token. diff --git a/src/transformers/models/deberta/modeling_tf_deberta.py b/src/transformers/models/deberta/modeling_tf_deberta.py index dcd0582777eb42..57e6ea8b1e9b07 100644 --- a/src/transformers/models/deberta/modeling_tf_deberta.py +++ b/src/transformers/models/deberta/modeling_tf_deberta.py @@ -15,6 +15,8 @@ """ TF 2.0 DeBERTa model.""" +from __future__ import annotations + import math from typing import Dict, Optional, Sequence, Tuple, Union @@ -922,11 +924,11 @@ class PreTrainedModel @unpack_inputs def call( self, - input_ids: Optional[TFModelInputType] = None, - attention_mask: Optional[Union[np.ndarray, tf.Tensor]] = None, - token_type_ids: Optional[Union[np.ndarray, tf.Tensor]] = None, - position_ids: Optional[Union[np.ndarray, tf.Tensor]] = None, - inputs_embeds: Optional[Union[np.ndarray, tf.Tensor]] = None, + input_ids: TFModelInputType | None = None, + attention_mask: np.ndarray | tf.Tensor | None = None, + token_type_ids: np.ndarray | tf.Tensor | None = None, + position_ids: np.ndarray | tf.Tensor | None = None, + inputs_embeds: np.ndarray | tf.Tensor | None = None, output_attentions: Optional[bool] = None, output_hidden_states: Optional[bool] = None, return_dict: Optional[bool] = None, @@ -1092,11 +1094,11 @@ def __init__(self, config: DebertaConfig, *inputs, **kwargs): ) def call( self, - input_ids: Optional[TFModelInputType] = None, - attention_mask: Optional[Union[np.ndarray, tf.Tensor]] = None, - token_type_ids: Optional[Union[np.ndarray, tf.Tensor]] = None, - position_ids: Optional[Union[np.ndarray, tf.Tensor]] = None, - inputs_embeds: Optional[Union[np.ndarray, tf.Tensor]] = None, + input_ids: TFModelInputType | None = None, + attention_mask: np.ndarray | tf.Tensor | None = None, + token_type_ids: np.ndarray | tf.Tensor | None = None, + position_ids: np.ndarray | tf.Tensor | None = None, + inputs_embeds: np.ndarray | tf.Tensor | None = None, output_attentions: Optional[bool] = None, output_hidden_states: Optional[bool] = None, return_dict: Optional[bool] = None, @@ -1116,12 +1118,6 @@ def call( return outputs - def serving_output(self, output: TFBaseModelOutput) -> TFBaseModelOutput: - hs = tf.convert_to_tensor(output.hidden_states) if self.config.output_hidden_states else None - attns = tf.convert_to_tensor(output.attentions) if self.config.output_attentions else None - - return TFBaseModelOutput(last_hidden_state=output.last_hidden_state, hidden_states=hs, attentions=attns) - @add_start_docstrings("""DeBERTa Model with a `language modeling` head on top.""", DEBERTA_START_DOCSTRING) class TFDebertaForMaskedLM(TFDebertaPreTrainedModel, TFMaskedLanguageModelingLoss): @@ -1149,15 +1145,15 @@ def get_lm_head(self) -> tf.keras.layers.Layer: ) def call( self, - input_ids: Optional[TFModelInputType] = None, - attention_mask: Optional[Union[np.ndarray, tf.Tensor]] = None, - token_type_ids: Optional[Union[np.ndarray, tf.Tensor]] = None, - position_ids: Optional[Union[np.ndarray, tf.Tensor]] = None, - inputs_embeds: Optional[Union[np.ndarray, tf.Tensor]] = None, + input_ids: TFModelInputType | None = None, + attention_mask: np.ndarray | tf.Tensor | None = None, + token_type_ids: np.ndarray | tf.Tensor | None = None, + position_ids: np.ndarray | tf.Tensor | None = None, + inputs_embeds: np.ndarray | tf.Tensor | None = None, output_attentions: Optional[bool] = None, output_hidden_states: Optional[bool] = None, return_dict: Optional[bool] = None, - labels: Optional[Union[np.ndarray, tf.Tensor]] = None, + labels: np.ndarray | tf.Tensor | None = None, training: Optional[bool] = False, ) -> Union[TFMaskedLMOutput, Tuple[tf.Tensor]]: r""" @@ -1192,12 +1188,6 @@ def call( attentions=outputs.attentions, ) - def serving_output(self, output: TFMaskedLMOutput) -> TFMaskedLMOutput: - hs = tf.convert_to_tensor(output.hidden_states) if self.config.output_hidden_states else None - attns = tf.convert_to_tensor(output.attentions) if self.config.output_attentions else None - - return TFMaskedLMOutput(logits=output.logits, hidden_states=hs, attentions=attns) - @add_start_docstrings( """ @@ -1233,15 +1223,15 @@ def __init__(self, config: DebertaConfig, *inputs, **kwargs): ) def call( self, - input_ids: Optional[TFModelInputType] = None, - attention_mask: Optional[Union[np.ndarray, tf.Tensor]] = None, - token_type_ids: Optional[Union[np.ndarray, tf.Tensor]] = None, - position_ids: Optional[Union[np.ndarray, tf.Tensor]] = None, - inputs_embeds: Optional[Union[np.ndarray, tf.Tensor]] = None, + input_ids: TFModelInputType | None = None, + attention_mask: np.ndarray | tf.Tensor | None = None, + token_type_ids: np.ndarray | tf.Tensor | None = None, + position_ids: np.ndarray | tf.Tensor | None = None, + inputs_embeds: np.ndarray | tf.Tensor | None = None, output_attentions: Optional[bool] = None, output_hidden_states: Optional[bool] = None, return_dict: Optional[bool] = None, - labels: Optional[Union[np.ndarray, tf.Tensor]] = None, + labels: np.ndarray | tf.Tensor | None = None, training: Optional[bool] = False, ) -> Union[TFSequenceClassifierOutput, Tuple[tf.Tensor]]: r""" @@ -1279,12 +1269,6 @@ def call( attentions=outputs.attentions, ) - def serving_output(self, output: TFSequenceClassifierOutput) -> TFSequenceClassifierOutput: - hs = tf.convert_to_tensor(output.hidden_states) if self.config.output_hidden_states else None - attns = tf.convert_to_tensor(output.attentions) if self.config.output_attentions else None - - return TFSequenceClassifierOutput(logits=output.logits, hidden_states=hs, attentions=attns) - @add_start_docstrings( """ @@ -1314,15 +1298,15 @@ def __init__(self, config: DebertaConfig, *inputs, **kwargs): ) def call( self, - input_ids: Optional[TFModelInputType] = None, - attention_mask: Optional[Union[np.ndarray, tf.Tensor]] = None, - token_type_ids: Optional[Union[np.ndarray, tf.Tensor]] = None, - position_ids: Optional[Union[np.ndarray, tf.Tensor]] = None, - inputs_embeds: Optional[Union[np.ndarray, tf.Tensor]] = None, + input_ids: TFModelInputType | None = None, + attention_mask: np.ndarray | tf.Tensor | None = None, + token_type_ids: np.ndarray | tf.Tensor | None = None, + position_ids: np.ndarray | tf.Tensor | None = None, + inputs_embeds: np.ndarray | tf.Tensor | None = None, output_attentions: Optional[bool] = None, output_hidden_states: Optional[bool] = None, return_dict: Optional[bool] = None, - labels: Optional[Union[np.ndarray, tf.Tensor]] = None, + labels: np.ndarray | tf.Tensor | None = None, training: Optional[bool] = False, ) -> Union[TFTokenClassifierOutput, Tuple[tf.Tensor]]: r""" @@ -1356,12 +1340,6 @@ def call( attentions=outputs.attentions, ) - def serving_output(self, output: TFTokenClassifierOutput) -> TFTokenClassifierOutput: - hs = tf.convert_to_tensor(output.hidden_states) if self.config.output_hidden_states else None - attns = tf.convert_to_tensor(output.attentions) if self.config.output_attentions else None - - return TFTokenClassifierOutput(logits=output.logits, hidden_states=hs, attentions=attns) - @add_start_docstrings( """ @@ -1390,16 +1368,16 @@ def __init__(self, config: DebertaConfig, *inputs, **kwargs): ) def call( self, - input_ids: Optional[TFModelInputType] = None, - attention_mask: Optional[Union[np.ndarray, tf.Tensor]] = None, - token_type_ids: Optional[Union[np.ndarray, tf.Tensor]] = None, - position_ids: Optional[Union[np.ndarray, tf.Tensor]] = None, - inputs_embeds: Optional[Union[np.ndarray, tf.Tensor]] = None, + input_ids: TFModelInputType | None = None, + attention_mask: np.ndarray | tf.Tensor | None = None, + token_type_ids: np.ndarray | tf.Tensor | None = None, + position_ids: np.ndarray | tf.Tensor | None = None, + inputs_embeds: np.ndarray | tf.Tensor | None = None, output_attentions: Optional[bool] = None, output_hidden_states: Optional[bool] = None, return_dict: Optional[bool] = None, - start_positions: Optional[Union[np.ndarray, tf.Tensor]] = None, - end_positions: Optional[Union[np.ndarray, tf.Tensor]] = None, + start_positions: np.ndarray | tf.Tensor | None = None, + end_positions: np.ndarray | tf.Tensor | None = None, training: Optional[bool] = False, ) -> Union[TFQuestionAnsweringModelOutput, Tuple[tf.Tensor]]: r""" @@ -1446,11 +1424,3 @@ def call( hidden_states=outputs.hidden_states, attentions=outputs.attentions, ) - - def serving_output(self, output: TFQuestionAnsweringModelOutput) -> TFQuestionAnsweringModelOutput: - hs = tf.convert_to_tensor(output.hidden_states) if self.config.output_hidden_states else None - attns = tf.convert_to_tensor(output.attentions) if self.config.output_attentions else None - - return TFQuestionAnsweringModelOutput( - start_logits=output.start_logits, end_logits=output.end_logits, hidden_states=hs, attentions=attns - ) diff --git a/src/transformers/models/deberta_v2/modeling_deberta_v2.py b/src/transformers/models/deberta_v2/modeling_deberta_v2.py index cd6e6318e3d426..95a288657e4838 100644 --- a/src/transformers/models/deberta_v2/modeling_deberta_v2.py +++ b/src/transformers/models/deberta_v2/modeling_deberta_v2.py @@ -130,7 +130,7 @@ def symbolic(g, self, mask, dim): r_mask = g.op( "Cast", g.op("Sub", g.op("Constant", value_t=torch.tensor(1, dtype=torch.int64)), mask_cast_value), - to_i=sym_help.cast_pytorch_to_onnx["Byte"], + to_i=sym_help.cast_pytorch_to_onnx["Bool"], ) output = masked_fill( g, self, r_mask, g.op("Constant", value_t=torch.tensor(torch.finfo(self.type().dtype()).min)) @@ -453,7 +453,6 @@ def get_attention_mask(self, attention_mask): if attention_mask.dim() <= 2: extended_attention_mask = attention_mask.unsqueeze(1).unsqueeze(2) attention_mask = extended_attention_mask * extended_attention_mask.squeeze(-2).unsqueeze(-1) - attention_mask = attention_mask.byte() elif attention_mask.dim() == 3: attention_mask = attention_mask.unsqueeze(1) @@ -484,7 +483,7 @@ def forward( if attention_mask.dim() <= 2: input_mask = attention_mask else: - input_mask = (attention_mask.sum(-2) > 0).byte() + input_mask = attention_mask.sum(-2) > 0 attention_mask = self.get_attention_mask(attention_mask) relative_pos = self.get_rel_pos(hidden_states, query_states, relative_pos) @@ -687,7 +686,7 @@ def forward( Input states to the module usually the output from previous layer, it will be the Q,K and V in *Attention(Q,K,V)* - attention_mask (`torch.ByteTensor`): + attention_mask (`torch.BoolTensor`): An attention mask matrix of shape [*B*, *N*, *N*] where *B* is the batch size, *N* is the maximum sequence length in which element [i,j] = *1* means the *i* th token in the input can attend to the *j* th token. diff --git a/src/transformers/models/deberta_v2/modeling_tf_deberta_v2.py b/src/transformers/models/deberta_v2/modeling_tf_deberta_v2.py index b3c210352a32b5..1075cc855a020b 100644 --- a/src/transformers/models/deberta_v2/modeling_tf_deberta_v2.py +++ b/src/transformers/models/deberta_v2/modeling_tf_deberta_v2.py @@ -15,6 +15,8 @@ """ TF 2.0 DeBERTa-v2 model.""" +from __future__ import annotations + from typing import Dict, Optional, Tuple, Union import numpy as np @@ -1014,11 +1016,11 @@ class PreTrainedModel @unpack_inputs def call( self, - input_ids: Optional[TFModelInputType] = None, - attention_mask: Optional[Union[np.ndarray, tf.Tensor]] = None, - token_type_ids: Optional[Union[np.ndarray, tf.Tensor]] = None, - position_ids: Optional[Union[np.ndarray, tf.Tensor]] = None, - inputs_embeds: Optional[Union[np.ndarray, tf.Tensor]] = None, + input_ids: TFModelInputType | None = None, + attention_mask: np.ndarray | tf.Tensor | None = None, + token_type_ids: np.ndarray | tf.Tensor | None = None, + position_ids: np.ndarray | tf.Tensor | None = None, + inputs_embeds: np.ndarray | tf.Tensor | None = None, output_attentions: Optional[bool] = None, output_hidden_states: Optional[bool] = None, return_dict: Optional[bool] = None, @@ -1186,11 +1188,11 @@ def __init__(self, config: DebertaV2Config, *inputs, **kwargs): ) def call( self, - input_ids: Optional[TFModelInputType] = None, - attention_mask: Optional[Union[np.ndarray, tf.Tensor]] = None, - token_type_ids: Optional[Union[np.ndarray, tf.Tensor]] = None, - position_ids: Optional[Union[np.ndarray, tf.Tensor]] = None, - inputs_embeds: Optional[Union[np.ndarray, tf.Tensor]] = None, + input_ids: TFModelInputType | None = None, + attention_mask: np.ndarray | tf.Tensor | None = None, + token_type_ids: np.ndarray | tf.Tensor | None = None, + position_ids: np.ndarray | tf.Tensor | None = None, + inputs_embeds: np.ndarray | tf.Tensor | None = None, output_attentions: Optional[bool] = None, output_hidden_states: Optional[bool] = None, return_dict: Optional[bool] = None, @@ -1210,12 +1212,6 @@ def call( return outputs - def serving_output(self, output: TFBaseModelOutput) -> TFBaseModelOutput: - hs = tf.convert_to_tensor(output.hidden_states) if self.config.output_hidden_states else None - attns = tf.convert_to_tensor(output.attentions) if self.config.output_attentions else None - - return TFBaseModelOutput(last_hidden_state=output.last_hidden_state, hidden_states=hs, attentions=attns) - @add_start_docstrings("""DeBERTa Model with a `language modeling` head on top.""", DEBERTA_START_DOCSTRING) # Copied from transformers.models.deberta.modeling_tf_deberta.TFDebertaForMaskedLM with Deberta->DebertaV2 @@ -1244,15 +1240,15 @@ def get_lm_head(self) -> tf.keras.layers.Layer: ) def call( self, - input_ids: Optional[TFModelInputType] = None, - attention_mask: Optional[Union[np.ndarray, tf.Tensor]] = None, - token_type_ids: Optional[Union[np.ndarray, tf.Tensor]] = None, - position_ids: Optional[Union[np.ndarray, tf.Tensor]] = None, - inputs_embeds: Optional[Union[np.ndarray, tf.Tensor]] = None, + input_ids: TFModelInputType | None = None, + attention_mask: np.ndarray | tf.Tensor | None = None, + token_type_ids: np.ndarray | tf.Tensor | None = None, + position_ids: np.ndarray | tf.Tensor | None = None, + inputs_embeds: np.ndarray | tf.Tensor | None = None, output_attentions: Optional[bool] = None, output_hidden_states: Optional[bool] = None, return_dict: Optional[bool] = None, - labels: Optional[Union[np.ndarray, tf.Tensor]] = None, + labels: np.ndarray | tf.Tensor | None = None, training: Optional[bool] = False, ) -> Union[TFMaskedLMOutput, Tuple[tf.Tensor]]: r""" @@ -1287,12 +1283,6 @@ def call( attentions=outputs.attentions, ) - def serving_output(self, output: TFMaskedLMOutput) -> TFMaskedLMOutput: - hs = tf.convert_to_tensor(output.hidden_states) if self.config.output_hidden_states else None - attns = tf.convert_to_tensor(output.attentions) if self.config.output_attentions else None - - return TFMaskedLMOutput(logits=output.logits, hidden_states=hs, attentions=attns) - @add_start_docstrings( """ @@ -1329,15 +1319,15 @@ def __init__(self, config: DebertaV2Config, *inputs, **kwargs): ) def call( self, - input_ids: Optional[TFModelInputType] = None, - attention_mask: Optional[Union[np.ndarray, tf.Tensor]] = None, - token_type_ids: Optional[Union[np.ndarray, tf.Tensor]] = None, - position_ids: Optional[Union[np.ndarray, tf.Tensor]] = None, - inputs_embeds: Optional[Union[np.ndarray, tf.Tensor]] = None, + input_ids: TFModelInputType | None = None, + attention_mask: np.ndarray | tf.Tensor | None = None, + token_type_ids: np.ndarray | tf.Tensor | None = None, + position_ids: np.ndarray | tf.Tensor | None = None, + inputs_embeds: np.ndarray | tf.Tensor | None = None, output_attentions: Optional[bool] = None, output_hidden_states: Optional[bool] = None, return_dict: Optional[bool] = None, - labels: Optional[Union[np.ndarray, tf.Tensor]] = None, + labels: np.ndarray | tf.Tensor | None = None, training: Optional[bool] = False, ) -> Union[TFSequenceClassifierOutput, Tuple[tf.Tensor]]: r""" @@ -1375,12 +1365,6 @@ def call( attentions=outputs.attentions, ) - def serving_output(self, output: TFSequenceClassifierOutput) -> TFSequenceClassifierOutput: - hs = tf.convert_to_tensor(output.hidden_states) if self.config.output_hidden_states else None - attns = tf.convert_to_tensor(output.attentions) if self.config.output_attentions else None - - return TFSequenceClassifierOutput(logits=output.logits, hidden_states=hs, attentions=attns) - @add_start_docstrings( """ @@ -1411,15 +1395,15 @@ def __init__(self, config: DebertaV2Config, *inputs, **kwargs): ) def call( self, - input_ids: Optional[TFModelInputType] = None, - attention_mask: Optional[Union[np.ndarray, tf.Tensor]] = None, - token_type_ids: Optional[Union[np.ndarray, tf.Tensor]] = None, - position_ids: Optional[Union[np.ndarray, tf.Tensor]] = None, - inputs_embeds: Optional[Union[np.ndarray, tf.Tensor]] = None, + input_ids: TFModelInputType | None = None, + attention_mask: np.ndarray | tf.Tensor | None = None, + token_type_ids: np.ndarray | tf.Tensor | None = None, + position_ids: np.ndarray | tf.Tensor | None = None, + inputs_embeds: np.ndarray | tf.Tensor | None = None, output_attentions: Optional[bool] = None, output_hidden_states: Optional[bool] = None, return_dict: Optional[bool] = None, - labels: Optional[Union[np.ndarray, tf.Tensor]] = None, + labels: np.ndarray | tf.Tensor | None = None, training: Optional[bool] = False, ) -> Union[TFTokenClassifierOutput, Tuple[tf.Tensor]]: r""" @@ -1453,12 +1437,6 @@ def call( attentions=outputs.attentions, ) - def serving_output(self, output: TFTokenClassifierOutput) -> TFTokenClassifierOutput: - hs = tf.convert_to_tensor(output.hidden_states) if self.config.output_hidden_states else None - attns = tf.convert_to_tensor(output.attentions) if self.config.output_attentions else None - - return TFTokenClassifierOutput(logits=output.logits, hidden_states=hs, attentions=attns) - @add_start_docstrings( """ @@ -1488,16 +1466,16 @@ def __init__(self, config: DebertaV2Config, *inputs, **kwargs): ) def call( self, - input_ids: Optional[TFModelInputType] = None, - attention_mask: Optional[Union[np.ndarray, tf.Tensor]] = None, - token_type_ids: Optional[Union[np.ndarray, tf.Tensor]] = None, - position_ids: Optional[Union[np.ndarray, tf.Tensor]] = None, - inputs_embeds: Optional[Union[np.ndarray, tf.Tensor]] = None, + input_ids: TFModelInputType | None = None, + attention_mask: np.ndarray | tf.Tensor | None = None, + token_type_ids: np.ndarray | tf.Tensor | None = None, + position_ids: np.ndarray | tf.Tensor | None = None, + inputs_embeds: np.ndarray | tf.Tensor | None = None, output_attentions: Optional[bool] = None, output_hidden_states: Optional[bool] = None, return_dict: Optional[bool] = None, - start_positions: Optional[Union[np.ndarray, tf.Tensor]] = None, - end_positions: Optional[Union[np.ndarray, tf.Tensor]] = None, + start_positions: np.ndarray | tf.Tensor | None = None, + end_positions: np.ndarray | tf.Tensor | None = None, training: Optional[bool] = False, ) -> Union[TFQuestionAnsweringModelOutput, Tuple[tf.Tensor]]: r""" @@ -1544,11 +1522,3 @@ def call( hidden_states=outputs.hidden_states, attentions=outputs.attentions, ) - - def serving_output(self, output: TFQuestionAnsweringModelOutput) -> TFQuestionAnsweringModelOutput: - hs = tf.convert_to_tensor(output.hidden_states) if self.config.output_hidden_states else None - attns = tf.convert_to_tensor(output.attentions) if self.config.output_attentions else None - - return TFQuestionAnsweringModelOutput( - start_logits=output.start_logits, end_logits=output.end_logits, hidden_states=hs, attentions=attns - ) diff --git a/src/transformers/models/deformable_detr/configuration_deformable_detr.py b/src/transformers/models/deformable_detr/configuration_deformable_detr.py index 54614eef96f9fd..dbe5fd7f0a7803 100644 --- a/src/transformers/models/deformable_detr/configuration_deformable_detr.py +++ b/src/transformers/models/deformable_detr/configuration_deformable_detr.py @@ -77,7 +77,7 @@ class DeformableDetrConfig(PretrainedConfig): The standard deviation of the truncated_normal_initializer for initializing all weight matrices. init_xavier_std (`float`, *optional*, defaults to 1): The scaling factor used for the Xavier initialization gain in the HM Attention map module. - encoder_layerdrop: (`float`, *optional*, defaults to 0.0): + encoder_layerdrop (`float`, *optional*, defaults to 0.0): The LayerDrop probability for the encoder. See the [LayerDrop paper](see https://arxiv.org/abs/1909.11556) for more details. auxiliary_loss (`bool`, *optional*, defaults to `False`): diff --git a/src/transformers/models/deit/modeling_tf_deit.py b/src/transformers/models/deit/modeling_tf_deit.py index a3d487021d4b39..efd25788b0330b 100644 --- a/src/transformers/models/deit/modeling_tf_deit.py +++ b/src/transformers/models/deit/modeling_tf_deit.py @@ -15,10 +15,12 @@ """ TensorFlow DeiT model.""" +from __future__ import annotations + import collections.abc import math from dataclasses import dataclass -from typing import Dict, Optional, Tuple, Union +from typing import Optional, Tuple, Union import tensorflow as tf @@ -95,8 +97,8 @@ class token). logits: tf.Tensor = None cls_logits: tf.Tensor = None distillation_logits: tf.Tensor = None - hidden_states: Optional[Tuple[tf.Tensor]] = None - attentions: Optional[Tuple[tf.Tensor]] = None + hidden_states: Tuple[tf.Tensor] | None = None + attentions: Tuple[tf.Tensor] | None = None class TFDeiTEmbeddings(tf.keras.layers.Layer): @@ -142,7 +144,7 @@ def build(self, input_shape: tf.TensorShape): super().build(input_shape) def call( - self, pixel_values: tf.Tensor, bool_masked_pos: Optional[tf.Tensor] = None, training: bool = False + self, pixel_values: tf.Tensor, bool_masked_pos: tf.Tensor | None = None, training: bool = False ) -> tf.Tensor: embeddings = self.patch_embeddings(pixel_values) batch_size, seq_length, _ = shape_list(embeddings) @@ -501,9 +503,9 @@ def get_head_mask(self, head_mask): @unpack_inputs def call( self, - pixel_values: Optional[tf.Tensor] = None, - bool_masked_pos: Optional[tf.Tensor] = None, - head_mask: Optional[tf.Tensor] = None, + pixel_values: tf.Tensor | None = None, + bool_masked_pos: tf.Tensor | None = None, + head_mask: tf.Tensor | None = None, output_attentions: Optional[bool] = None, output_hidden_states: Optional[bool] = None, return_dict: Optional[bool] = None, @@ -566,38 +568,6 @@ class TFDeiTPreTrainedModel(TFPreTrainedModel): base_model_prefix = "deit" main_input_name = "pixel_values" - @property - def dummy_inputs(self) -> Dict[str, tf.Tensor]: - """ - Dummy inputs to build the network. - - Returns: - `Dict[str, tf.Tensor]`: The dummy inputs. - """ - VISION_DUMMY_INPUTS = tf.random.uniform( - shape=(3, self.config.num_channels, self.config.image_size, self.config.image_size), dtype=tf.float32 - ) - return {"pixel_values": tf.constant(VISION_DUMMY_INPUTS)} - - @tf.function( - input_signature=[ - { - "pixel_values": tf.TensorSpec((None, None, None, None), tf.float32, name="pixel_values"), - } - ] - ) - def serving(self, inputs): - """ - Method used for serving the model. - - Args: - inputs (`Dict[str, tf.Tensor]`): - The input of the saved model as a dictionary of tensors. - """ - output = self.call(inputs) - - return self.serving_output(output) - DEIT_START_DOCSTRING = r""" This model is a TensorFlow @@ -658,9 +628,9 @@ def __init__( ) def call( self, - pixel_values: Optional[tf.Tensor] = None, - bool_masked_pos: Optional[tf.Tensor] = None, - head_mask: Optional[tf.Tensor] = None, + pixel_values: tf.Tensor | None = None, + bool_masked_pos: tf.Tensor | None = None, + head_mask: tf.Tensor | None = None, output_attentions: Optional[bool] = None, output_hidden_states: Optional[bool] = None, return_dict: Optional[bool] = None, @@ -677,17 +647,6 @@ def call( ) return outputs - def serving_output(self, output: TFBaseModelOutputWithPooling) -> TFBaseModelOutputWithPooling: - hidden_states = tf.convert_to_tensor(output.hidden_states) if self.config.output_hidden_states else None - attentions = tf.convert_to_tensor(output.attentions) if self.config.output_attentions else None - - return TFBaseModelOutputWithPooling( - last_hidden_state=output.last_hidden_state, - pooler_output=output.pooler_output, - hidden_states=hidden_states, - attentions=attentions, - ) - # Copied from transformers.models.vit.modeling_tf_vit.TFViTPooler with ViT->DeiT class TFDeiTPooler(tf.keras.layers.Layer): @@ -768,9 +727,9 @@ def __init__(self, config: DeiTConfig) -> None: @replace_return_docstrings(output_type=TFMaskedImageModelingOutput, config_class=_CONFIG_FOR_DOC) def call( self, - pixel_values: Optional[tf.Tensor] = None, - bool_masked_pos: Optional[tf.Tensor] = None, - head_mask: Optional[tf.Tensor] = None, + pixel_values: tf.Tensor | None = None, + bool_masked_pos: tf.Tensor | None = None, + head_mask: tf.Tensor | None = None, output_attentions: Optional[bool] = None, output_hidden_states: Optional[bool] = None, return_dict: Optional[bool] = None, @@ -863,14 +822,6 @@ def call( attentions=outputs.attentions, ) - def serving_output(self, output: TFMaskedImageModelingOutput) -> TFMaskedImageModelingOutput: - hidden_states = tf.convert_to_tensor(output.hidden_states) if self.config.output_hidden_states else None - attentions = tf.convert_to_tensor(output.attentions) if self.config.output_attentions else None - - return TFMaskedImageModelingOutput( - reconstruction=output.reconstruction, hidden_states=hidden_states, attentions=attentions - ) - @add_start_docstrings( """ @@ -898,9 +849,9 @@ def __init__(self, config: DeiTConfig): @replace_return_docstrings(output_type=TFImageClassifierOutput, config_class=_CONFIG_FOR_DOC) def call( self, - pixel_values: Optional[tf.Tensor] = None, - head_mask: Optional[tf.Tensor] = None, - labels: Optional[tf.Tensor] = None, + pixel_values: tf.Tensor | None = None, + head_mask: tf.Tensor | None = None, + labels: tf.Tensor | None = None, output_attentions: Optional[bool] = None, output_hidden_states: Optional[bool] = None, return_dict: Optional[bool] = None, @@ -968,12 +919,6 @@ def call( attentions=outputs.attentions, ) - def serving_output(self, output: TFImageClassifierOutput) -> TFImageClassifierOutput: - hidden_states = tf.convert_to_tensor(output.hidden_states) if self.config.output_hidden_states else None - attentions = tf.convert_to_tensor(output.attentions) if self.config.output_attentions else None - - return TFImageClassifierOutput(logits=output.logits, hidden_states=hidden_states, attentions=attentions) - @add_start_docstrings( """ @@ -1016,8 +961,8 @@ def __init__(self, config: DeiTConfig) -> None: ) def call( self, - pixel_values: Optional[tf.Tensor] = None, - head_mask: Optional[tf.Tensor] = None, + pixel_values: tf.Tensor | None = None, + head_mask: tf.Tensor | None = None, output_attentions: Optional[bool] = None, output_hidden_states: Optional[bool] = None, return_dict: Optional[bool] = None, @@ -1053,17 +998,3 @@ def call( hidden_states=outputs.hidden_states, attentions=outputs.attentions, ) - - def serving_output( - self, output: TFDeiTForImageClassificationWithTeacherOutput - ) -> TFDeiTForImageClassificationWithTeacherOutput: - hidden_states = tf.convert_to_tensor(output.hidden_states) if self.config.output_hidden_states else None - attentions = tf.convert_to_tensor(output.attentions) if self.config.output_attentions else None - - return TFDeiTForImageClassificationWithTeacherOutput( - logits=output.logits, - cls_logits=output.cls_logits, - distillation_logits=output.distillation_logits, - hidden_states=hidden_states, - attentions=attentions, - ) diff --git a/src/transformers/models/deta/configuration_deta.py b/src/transformers/models/deta/configuration_deta.py index 836e9732e68ed8..dc85ea91df361c 100644 --- a/src/transformers/models/deta/configuration_deta.py +++ b/src/transformers/models/deta/configuration_deta.py @@ -71,7 +71,7 @@ class DetaConfig(PretrainedConfig): The standard deviation of the truncated_normal_initializer for initializing all weight matrices. init_xavier_std (`float`, *optional*, defaults to 1): The scaling factor used for the Xavier initialization gain in the HM Attention map module. - encoder_layerdrop: (`float`, *optional*, defaults to 0.0): + encoder_layerdrop (`float`, *optional*, defaults to 0.0): The LayerDrop probability for the encoder. See the [LayerDrop paper](see https://arxiv.org/abs/1909.11556) for more details. auxiliary_loss (`bool`, *optional*, defaults to `False`): diff --git a/src/transformers/models/distilbert/modeling_tf_distilbert.py b/src/transformers/models/distilbert/modeling_tf_distilbert.py index 3013f4ca30d7fe..6b0e1b0f3febcf 100644 --- a/src/transformers/models/distilbert/modeling_tf_distilbert.py +++ b/src/transformers/models/distilbert/modeling_tf_distilbert.py @@ -16,6 +16,9 @@ TF 2.0 DistilBERT model """ + +from __future__ import annotations + import warnings from typing import Optional, Tuple, Union @@ -45,7 +48,6 @@ ) from ...tf_utils import check_embeddings_within_bounds, shape_list, stable_softmax from ...utils import ( - MULTIPLE_CHOICE_DUMMY_INPUTS, add_code_sample_docstrings, add_start_docstrings, add_start_docstrings_to_model_forward, @@ -421,19 +423,6 @@ class TFDistilBertPreTrainedModel(TFPreTrainedModel): config_class = DistilBertConfig base_model_prefix = "distilbert" - @tf.function( - input_signature=[ - { - "input_ids": tf.TensorSpec((None, None), tf.int32, name="input_ids"), - "attention_mask": tf.TensorSpec((None, None), tf.int32, name="attention_mask"), - } - ] - ) - def serving(self, inputs): - output = self.call(inputs) - - return self.serving_output(output) - DISTILBERT_START_DOCSTRING = r""" @@ -538,10 +527,10 @@ def __init__(self, config, *inputs, **kwargs): ) def call( self, - input_ids: Optional[TFModelInputType] = None, - attention_mask: Optional[Union[np.ndarray, tf.Tensor]] = None, - head_mask: Optional[Union[np.ndarray, tf.Tensor]] = None, - inputs_embeds: Optional[Union[np.ndarray, tf.Tensor]] = None, + input_ids: TFModelInputType | None = None, + attention_mask: np.ndarray | tf.Tensor | None = None, + head_mask: np.ndarray | tf.Tensor | None = None, + inputs_embeds: np.ndarray | tf.Tensor | None = None, output_attentions: Optional[bool] = None, output_hidden_states: Optional[bool] = None, return_dict: Optional[bool] = None, @@ -559,12 +548,6 @@ def call( ) return outputs - def serving_output(self, output): - hs = tf.convert_to_tensor(output.hidden_states) if self.config.output_hidden_states else None - attns = tf.convert_to_tensor(output.attentions) if self.config.output_attentions else None - - return TFBaseModelOutput(last_hidden_state=output.last_hidden_state, hidden_states=hs, attentions=attns) - class TFDistilBertLMHead(tf.keras.layers.Layer): def __init__(self, config, input_embeddings, **kwargs): @@ -639,14 +622,14 @@ def get_prefix_bias_name(self): ) def call( self, - input_ids: Optional[TFModelInputType] = None, - attention_mask: Optional[Union[np.ndarray, tf.Tensor]] = None, - head_mask: Optional[Union[np.ndarray, tf.Tensor]] = None, - inputs_embeds: Optional[Union[np.ndarray, tf.Tensor]] = None, + input_ids: TFModelInputType | None = None, + attention_mask: np.ndarray | tf.Tensor | None = None, + head_mask: np.ndarray | tf.Tensor | None = None, + inputs_embeds: np.ndarray | tf.Tensor | None = None, output_attentions: Optional[bool] = None, output_hidden_states: Optional[bool] = None, return_dict: Optional[bool] = None, - labels: Optional[Union[np.ndarray, tf.Tensor]] = None, + labels: np.ndarray | tf.Tensor | None = None, training: Optional[bool] = False, ) -> Union[TFMaskedLMOutput, Tuple[tf.Tensor]]: r""" @@ -684,13 +667,6 @@ def call( attentions=distilbert_output.attentions, ) - # Copied from transformers.models.bert.modeling_tf_bert.TFBertForMaskedLM.serving_output - def serving_output(self, output: TFMaskedLMOutput) -> TFMaskedLMOutput: - hs = tf.convert_to_tensor(output.hidden_states) if self.config.output_hidden_states else None - attns = tf.convert_to_tensor(output.attentions) if self.config.output_attentions else None - - return TFMaskedLMOutput(logits=output.logits, hidden_states=hs, attentions=attns) - @add_start_docstrings( """ @@ -725,14 +701,14 @@ def __init__(self, config, *inputs, **kwargs): ) def call( self, - input_ids: Optional[TFModelInputType] = None, - attention_mask: Optional[Union[np.ndarray, tf.Tensor]] = None, - head_mask: Optional[Union[np.ndarray, tf.Tensor]] = None, - inputs_embeds: Optional[Union[np.ndarray, tf.Tensor]] = None, + input_ids: TFModelInputType | None = None, + attention_mask: np.ndarray | tf.Tensor | None = None, + head_mask: np.ndarray | tf.Tensor | None = None, + inputs_embeds: np.ndarray | tf.Tensor | None = None, output_attentions: Optional[bool] = None, output_hidden_states: Optional[bool] = None, return_dict: Optional[bool] = None, - labels: Optional[Union[np.ndarray, tf.Tensor]] = None, + labels: np.ndarray | tf.Tensor | None = None, training: Optional[bool] = False, ) -> Union[TFSequenceClassifierOutput, Tuple[tf.Tensor]]: r""" @@ -770,13 +746,6 @@ def call( attentions=distilbert_output.attentions, ) - # Copied from transformers.models.bert.modeling_tf_bert.TFBertForSequenceClassification.serving_output - def serving_output(self, output: TFSequenceClassifierOutput) -> TFSequenceClassifierOutput: - hs = tf.convert_to_tensor(output.hidden_states) if self.config.output_hidden_states else None - attns = tf.convert_to_tensor(output.attentions) if self.config.output_attentions else None - - return TFSequenceClassifierOutput(logits=output.logits, hidden_states=hs, attentions=attns) - @add_start_docstrings( """ @@ -805,14 +774,14 @@ def __init__(self, config, *inputs, **kwargs): ) def call( self, - input_ids: Optional[TFModelInputType] = None, - attention_mask: Optional[Union[np.ndarray, tf.Tensor]] = None, - head_mask: Optional[Union[np.ndarray, tf.Tensor]] = None, - inputs_embeds: Optional[Union[np.ndarray, tf.Tensor]] = None, + input_ids: TFModelInputType | None = None, + attention_mask: np.ndarray | tf.Tensor | None = None, + head_mask: np.ndarray | tf.Tensor | None = None, + inputs_embeds: np.ndarray | tf.Tensor | None = None, output_attentions: Optional[bool] = None, output_hidden_states: Optional[bool] = None, return_dict: Optional[bool] = None, - labels: Optional[Union[np.ndarray, tf.Tensor]] = None, + labels: np.ndarray | tf.Tensor | None = None, training: Optional[bool] = False, ) -> Union[TFTokenClassifierOutput, Tuple[tf.Tensor]]: r""" @@ -845,13 +814,6 @@ def call( attentions=outputs.attentions, ) - # Copied from transformers.models.bert.modeling_tf_bert.TFBertForTokenClassification.serving_output - def serving_output(self, output: TFTokenClassifierOutput) -> TFTokenClassifierOutput: - hs = tf.convert_to_tensor(output.hidden_states) if self.config.output_hidden_states else None - attns = tf.convert_to_tensor(output.attentions) if self.config.output_attentions else None - - return TFTokenClassifierOutput(logits=output.logits, hidden_states=hs, attentions=attns) - @add_start_docstrings( """ @@ -876,16 +838,6 @@ def __init__(self, config, *inputs, **kwargs): 1, kernel_initializer=get_initializer(config.initializer_range), name="classifier" ) - @property - def dummy_inputs(self): - """ - Dummy inputs to build the network. - - Returns: - tf.Tensor with dummy inputs - """ - return {"input_ids": tf.constant(MULTIPLE_CHOICE_DUMMY_INPUTS, dtype=tf.int32)} - @unpack_inputs @add_start_docstrings_to_model_forward( DISTILBERT_INPUTS_DOCSTRING.format("batch_size, num_choices, sequence_length") @@ -897,14 +849,14 @@ def dummy_inputs(self): ) def call( self, - input_ids: Optional[TFModelInputType] = None, - attention_mask: Optional[Union[np.ndarray, tf.Tensor]] = None, - head_mask: Optional[Union[np.ndarray, tf.Tensor]] = None, - inputs_embeds: Optional[Union[np.ndarray, tf.Tensor]] = None, + input_ids: TFModelInputType | None = None, + attention_mask: np.ndarray | tf.Tensor | None = None, + head_mask: np.ndarray | tf.Tensor | None = None, + inputs_embeds: np.ndarray | tf.Tensor | None = None, output_attentions: Optional[bool] = None, output_hidden_states: Optional[bool] = None, return_dict: Optional[bool] = None, - labels: Optional[Union[np.ndarray, tf.Tensor]] = None, + labels: np.ndarray | tf.Tensor | None = None, training: Optional[bool] = False, ) -> Union[TFMultipleChoiceModelOutput, Tuple[tf.Tensor]]: r""" @@ -956,26 +908,6 @@ def call( attentions=distilbert_output.attentions, ) - @tf.function( - input_signature=[ - { - "input_ids": tf.TensorSpec((None, None, None), tf.int32, name="input_ids"), - "attention_mask": tf.TensorSpec((None, None, None), tf.int32, name="attention_mask"), - } - ] - ) - def serving(self, inputs): - output = self.call(inputs) - - return self.serving_output(output) - - # Copied from transformers.models.bert.modeling_tf_bert.TFBertForMultipleChoice.serving_output - def serving_output(self, output: TFMultipleChoiceModelOutput) -> TFMultipleChoiceModelOutput: - hs = tf.convert_to_tensor(output.hidden_states) if self.config.output_hidden_states else None - attns = tf.convert_to_tensor(output.attentions) if self.config.output_attentions else None - - return TFMultipleChoiceModelOutput(logits=output.logits, hidden_states=hs, attentions=attns) - @add_start_docstrings( """ @@ -1004,15 +936,15 @@ def __init__(self, config, *inputs, **kwargs): ) def call( self, - input_ids: Optional[TFModelInputType] = None, - attention_mask: Optional[Union[np.ndarray, tf.Tensor]] = None, - head_mask: Optional[Union[np.ndarray, tf.Tensor]] = None, - inputs_embeds: Optional[Union[np.ndarray, tf.Tensor]] = None, + input_ids: TFModelInputType | None = None, + attention_mask: np.ndarray | tf.Tensor | None = None, + head_mask: np.ndarray | tf.Tensor | None = None, + inputs_embeds: np.ndarray | tf.Tensor | None = None, output_attentions: Optional[bool] = None, output_hidden_states: Optional[bool] = None, return_dict: Optional[bool] = None, - start_positions: Optional[Union[np.ndarray, tf.Tensor]] = None, - end_positions: Optional[Union[np.ndarray, tf.Tensor]] = None, + start_positions: np.ndarray | tf.Tensor | None = None, + end_positions: np.ndarray | tf.Tensor | None = None, training: Optional[bool] = False, ) -> Union[TFQuestionAnsweringModelOutput, Tuple[tf.Tensor]]: r""" @@ -1059,12 +991,3 @@ def call( hidden_states=distilbert_output.hidden_states, attentions=distilbert_output.attentions, ) - - # Copied from transformers.models.bert.modeling_tf_bert.TFBertForQuestionAnswering.serving_output - def serving_output(self, output: TFQuestionAnsweringModelOutput) -> TFQuestionAnsweringModelOutput: - hs = tf.convert_to_tensor(output.hidden_states) if self.config.output_hidden_states else None - attns = tf.convert_to_tensor(output.attentions) if self.config.output_attentions else None - - return TFQuestionAnsweringModelOutput( - start_logits=output.start_logits, end_logits=output.end_logits, hidden_states=hs, attentions=attns - ) diff --git a/src/transformers/models/dpr/modeling_tf_dpr.py b/src/transformers/models/dpr/modeling_tf_dpr.py index 565ad37b2117e8..759e22c8c71cf8 100644 --- a/src/transformers/models/dpr/modeling_tf_dpr.py +++ b/src/transformers/models/dpr/modeling_tf_dpr.py @@ -15,8 +15,10 @@ """ TensorFlow DPR model for Open Domain Question Answering.""" +from __future__ import annotations + from dataclasses import dataclass -from typing import Optional, Tuple, Union +from typing import Tuple, Union import tensorflow as tf @@ -80,8 +82,8 @@ class TFDPRContextEncoderOutput(ModelOutput): """ pooler_output: tf.Tensor = None - hidden_states: Optional[Tuple[tf.Tensor]] = None - attentions: Optional[Tuple[tf.Tensor]] = None + hidden_states: Tuple[tf.Tensor] | None = None + attentions: Tuple[tf.Tensor] | None = None @dataclass @@ -108,8 +110,8 @@ class TFDPRQuestionEncoderOutput(ModelOutput): """ pooler_output: tf.Tensor = None - hidden_states: Optional[Tuple[tf.Tensor]] = None - attentions: Optional[Tuple[tf.Tensor]] = None + hidden_states: Tuple[tf.Tensor] | None = None + attentions: Tuple[tf.Tensor] | None = None @dataclass @@ -141,8 +143,8 @@ class TFDPRReaderOutput(ModelOutput): start_logits: tf.Tensor = None end_logits: tf.Tensor = None relevance_logits: tf.Tensor = None - hidden_states: Optional[Tuple[tf.Tensor]] = None - attentions: Optional[Tuple[tf.Tensor]] = None + hidden_states: Tuple[tf.Tensor] | None = None + attentions: Tuple[tf.Tensor] | None = None class TFDPREncoderLayer(tf.keras.layers.Layer): @@ -167,9 +169,9 @@ def __init__(self, config: DPRConfig, **kwargs): def call( self, input_ids: tf.Tensor = None, - attention_mask: Optional[tf.Tensor] = None, - token_type_ids: Optional[tf.Tensor] = None, - inputs_embeds: Optional[tf.Tensor] = None, + attention_mask: tf.Tensor | None = None, + token_type_ids: tf.Tensor | None = None, + inputs_embeds: tf.Tensor | None = None, output_attentions: bool = None, output_hidden_states: bool = None, return_dict: bool = None, @@ -227,8 +229,8 @@ def __init__(self, config: DPRConfig, **kwargs): def call( self, input_ids: tf.Tensor = None, - attention_mask: Optional[tf.Tensor] = None, - inputs_embeds: Optional[tf.Tensor] = None, + attention_mask: tf.Tensor | None = None, + inputs_embeds: tf.Tensor | None = None, output_attentions: bool = False, output_hidden_states: bool = False, return_dict: bool = False, @@ -283,9 +285,9 @@ def __init__(self, config: DPRConfig, **kwargs): def call( self, input_ids: tf.Tensor = None, - attention_mask: Optional[tf.Tensor] = None, - token_type_ids: Optional[tf.Tensor] = None, - inputs_embeds: Optional[tf.Tensor] = None, + attention_mask: tf.Tensor | None = None, + token_type_ids: tf.Tensor | None = None, + inputs_embeds: tf.Tensor | None = None, output_attentions: bool = False, output_hidden_states: bool = False, return_dict: bool = False, @@ -316,9 +318,9 @@ def __init__(self, config: DPRConfig, **kwargs): def call( self, input_ids: tf.Tensor = None, - attention_mask: Optional[tf.Tensor] = None, - token_type_ids: Optional[tf.Tensor] = None, - inputs_embeds: Optional[tf.Tensor] = None, + attention_mask: tf.Tensor | None = None, + token_type_ids: tf.Tensor | None = None, + inputs_embeds: tf.Tensor | None = None, output_attentions: bool = False, output_hidden_states: bool = False, return_dict: bool = False, @@ -370,19 +372,6 @@ class TFDPRPretrainedReader(TFPreTrainedModel): config_class = DPRConfig base_model_prefix = "reader" - @tf.function( - input_signature=[ - { - "input_ids": tf.TensorSpec((None, None), tf.int32, name="input_ids"), - "attention_mask": tf.TensorSpec((None, None), tf.int32, name="attention_mask"), - } - ] - ) - def serving(self, inputs): - output = self.call(inputs) - - return self.serving_output(output) - ############### # Actual Models @@ -552,9 +541,9 @@ def get_input_embeddings(self): def call( self, input_ids=None, - attention_mask: Optional[tf.Tensor] = None, - token_type_ids: Optional[tf.Tensor] = None, - inputs_embeds: Optional[tf.Tensor] = None, + attention_mask: tf.Tensor | None = None, + token_type_ids: tf.Tensor | None = None, + inputs_embeds: tf.Tensor | None = None, output_attentions=None, output_hidden_states=None, return_dict=None, @@ -610,12 +599,6 @@ def call( pooler_output=outputs.pooler_output, hidden_states=outputs.hidden_states, attentions=outputs.attentions ) - def serving_output(self, output): - hs = tf.convert_to_tensor(output.hidden_states) if self.config.output_hidden_states else None - attns = tf.convert_to_tensor(output.attentions) if self.config.output_attentions else None - - return TFDPRContextEncoderOutput(pooler_output=output.pooler_output, hidden_states=hs, attentions=attns) - @add_start_docstrings( "The bare DPRQuestionEncoder transformer outputting pooler outputs as question representations.", @@ -639,9 +622,9 @@ def get_input_embeddings(self): def call( self, input_ids=None, - attention_mask: Optional[tf.Tensor] = None, - token_type_ids: Optional[tf.Tensor] = None, - inputs_embeds: Optional[tf.Tensor] = None, + attention_mask: tf.Tensor | None = None, + token_type_ids: tf.Tensor | None = None, + inputs_embeds: tf.Tensor | None = None, output_attentions=None, output_hidden_states=None, return_dict=None, @@ -696,12 +679,6 @@ def call( pooler_output=outputs.pooler_output, hidden_states=outputs.hidden_states, attentions=outputs.attentions ) - def serving_output(self, output): - hs = tf.convert_to_tensor(output.hidden_states) if self.config.output_hidden_states else None - attns = tf.convert_to_tensor(output.attentions) if self.config.output_attentions else None - - return TFDPRQuestionEncoderOutput(pooler_output=output.pooler_output, hidden_states=hs, attentions=attns) - @add_start_docstrings( "The bare DPRReader transformer outputting span predictions.", @@ -725,8 +702,8 @@ def get_input_embeddings(self): def call( self, input_ids=None, - attention_mask: Optional[tf.Tensor] = None, - inputs_embeds: Optional[tf.Tensor] = None, + attention_mask: tf.Tensor | None = None, + inputs_embeds: tf.Tensor | None = None, output_attentions: bool = None, output_hidden_states: bool = None, return_dict=None, @@ -775,15 +752,3 @@ def call( return_dict=return_dict, training=training, ) - - def serving_output(self, output): - hs = tf.convert_to_tensor(output.hidden_states) if self.config.output_hidden_states else None - attns = tf.convert_to_tensor(output.attentions) if self.config.output_attentions else None - - return TFDPRReaderOutput( - start_logits=output.start_logits, - end_logits=output.end_logits, - relevance_logits=output.relevance_logits, - hidden_states=hs, - attentions=attns, - ) diff --git a/src/transformers/models/efficientnet/configuration_efficientnet.py b/src/transformers/models/efficientnet/configuration_efficientnet.py index a6e4d172f44289..e6b6a1c261ca5f 100644 --- a/src/transformers/models/efficientnet/configuration_efficientnet.py +++ b/src/transformers/models/efficientnet/configuration_efficientnet.py @@ -60,7 +60,7 @@ class EfficientNetConfig(PretrainedConfig): List of output channel sizes to be used in each block for convolutional layers. depthwise_padding (`List[int]`, *optional*, defaults to `[]`): List of block indices with square padding. - strides: (`List[int]`, *optional*, defaults to `[1, 2, 2, 2, 1, 2, 1]`): + strides (`List[int]`, *optional*, defaults to `[1, 2, 2, 2, 1, 2, 1]`): List of stride sizes to be used in each block for convolutional layers. num_block_repeats (`List[int]`, *optional*, defaults to `[1, 2, 2, 3, 3, 4, 1]`): List of the number of times each block is to repeated. diff --git a/src/transformers/models/electra/modeling_tf_electra.py b/src/transformers/models/electra/modeling_tf_electra.py index 82c3381724dcea..41c64eed369d6a 100644 --- a/src/transformers/models/electra/modeling_tf_electra.py +++ b/src/transformers/models/electra/modeling_tf_electra.py @@ -14,10 +14,13 @@ # limitations under the License. """ TF Electra model.""" + +from __future__ import annotations + import math import warnings from dataclasses import dataclass -from typing import Dict, Optional, Tuple, Union +from typing import Optional, Tuple, Union import numpy as np import tensorflow as tf @@ -46,8 +49,6 @@ ) from ...tf_utils import check_embeddings_within_bounds, shape_list, stable_softmax from ...utils import ( - DUMMY_INPUTS, - MULTIPLE_CHOICE_DUMMY_INPUTS, ModelOutput, add_code_sample_docstrings, add_start_docstrings, @@ -312,9 +313,9 @@ def call( hidden_states: tf.Tensor, attention_mask: tf.Tensor, head_mask: tf.Tensor, - encoder_hidden_states: Optional[tf.Tensor], - encoder_attention_mask: Optional[tf.Tensor], - past_key_value: Optional[Tuple[tf.Tensor]], + encoder_hidden_states: tf.Tensor | None, + encoder_attention_mask: tf.Tensor | None, + past_key_value: Tuple[tf.Tensor] | None, output_attentions: bool, training: bool = False, ) -> Tuple[tf.Tensor]: @@ -391,9 +392,9 @@ def call( hidden_states: tf.Tensor, attention_mask: tf.Tensor, head_mask: tf.Tensor, - encoder_hidden_states: Optional[tf.Tensor], - encoder_attention_mask: Optional[tf.Tensor], - past_key_values: Optional[Tuple[Tuple[tf.Tensor]]], + encoder_hidden_states: tf.Tensor | None, + encoder_attention_mask: tf.Tensor | None, + past_key_values: Tuple[Tuple[tf.Tensor]] | None, use_cache: Optional[bool], output_attentions: bool, output_hidden_states: bool, @@ -593,25 +594,6 @@ class TFElectraPreTrainedModel(TFPreTrainedModel): _keys_to_ignore_on_load_unexpected = [r"generator_lm_head.weight"] _keys_to_ignore_on_load_missing = [r"dropout"] - @property - # Copied from transformers.models.bert.modeling_tf_bert.TFBertPreTrainedModel.dummy_inputs - def dummy_inputs(self): - """ - Dummy inputs to build the network. - - Returns: - `Dict[str, tf.Tensor]`: The dummy inputs. - """ - dummy = {"input_ids": tf.constant(DUMMY_INPUTS, dtype=tf.int32)} - # Add `encoder_hidden_states` to make the cross-attention layers' weights initialized - if self.config.add_cross_attention: - batch_size, seq_len = tf.constant(DUMMY_INPUTS).shape - shape = (batch_size, seq_len) + (self.config.hidden_size,) - h = tf.random.uniform(shape=shape) - dummy["encoder_hidden_states"] = h - - return dummy - @keras_serializable class TFElectraMainLayer(tf.keras.layers.Layer): @@ -704,14 +686,14 @@ def get_head_mask(self, head_mask): @unpack_inputs def call( self, - input_ids: Optional[TFModelInputType] = None, - attention_mask: Optional[Union[np.ndarray, tf.Tensor]] = None, - token_type_ids: Optional[Union[np.ndarray, tf.Tensor]] = None, - position_ids: Optional[Union[np.ndarray, tf.Tensor]] = None, - head_mask: Optional[Union[np.ndarray, tf.Tensor]] = None, - inputs_embeds: Optional[Union[np.ndarray, tf.Tensor]] = None, - encoder_hidden_states: Optional[Union[np.ndarray, tf.Tensor]] = None, - encoder_attention_mask: Optional[Union[np.ndarray, tf.Tensor]] = None, + input_ids: TFModelInputType | None = None, + attention_mask: np.ndarray | tf.Tensor | None = None, + token_type_ids: np.ndarray | tf.Tensor | None = None, + position_ids: np.ndarray | tf.Tensor | None = None, + head_mask: np.ndarray | tf.Tensor | None = None, + inputs_embeds: np.ndarray | tf.Tensor | None = None, + encoder_hidden_states: np.ndarray | tf.Tensor | None = None, + encoder_attention_mask: np.ndarray | tf.Tensor | None = None, past_key_values: Optional[Tuple[Tuple[Union[np.ndarray, tf.Tensor]]]] = None, use_cache: Optional[bool] = None, output_attentions: Optional[bool] = None, @@ -824,8 +806,8 @@ class TFElectraForPreTrainingOutput(ModelOutput): """ logits: tf.Tensor = None - hidden_states: Optional[Tuple[tf.Tensor]] = None - attentions: Optional[Tuple[tf.Tensor]] = None + hidden_states: Tuple[tf.Tensor] | None = None + attentions: Tuple[tf.Tensor] | None = None ELECTRA_START_DOCSTRING = r""" @@ -941,14 +923,14 @@ def __init__(self, config, *inputs, **kwargs): ) def call( self, - input_ids: Optional[TFModelInputType] = None, - attention_mask: Optional[Union[np.ndarray, tf.Tensor]] = None, - token_type_ids: Optional[Union[np.ndarray, tf.Tensor]] = None, - position_ids: Optional[Union[np.ndarray, tf.Tensor]] = None, - head_mask: Optional[Union[np.ndarray, tf.Tensor]] = None, - inputs_embeds: Optional[Union[np.ndarray, tf.Tensor]] = None, - encoder_hidden_states: Optional[Union[np.ndarray, tf.Tensor]] = None, - encoder_attention_mask: Optional[Union[np.ndarray, tf.Tensor]] = None, + input_ids: TFModelInputType | None = None, + attention_mask: np.ndarray | tf.Tensor | None = None, + token_type_ids: np.ndarray | tf.Tensor | None = None, + position_ids: np.ndarray | tf.Tensor | None = None, + head_mask: np.ndarray | tf.Tensor | None = None, + inputs_embeds: np.ndarray | tf.Tensor | None = None, + encoder_hidden_states: np.ndarray | tf.Tensor | None = None, + encoder_attention_mask: np.ndarray | tf.Tensor | None = None, past_key_values: Optional[Tuple[Tuple[Union[np.ndarray, tf.Tensor]]]] = None, use_cache: Optional[bool] = None, output_attentions: Optional[bool] = None, @@ -995,23 +977,6 @@ def call( return outputs - def serving_output(self, output): - output_cache = self.config.use_cache and self.config.is_decoder - pkv = tf.convert_to_tensor(output.past_key_values) if output_cache else None - hs = tf.convert_to_tensor(output.hidden_states) if self.config.output_hidden_states else None - attns = tf.convert_to_tensor(output.attentions) if self.config.output_attentions else None - cross_attns = tf.convert_to_tensor(output.cross_attentions) if output.cross_attentions is not None else None - if not (self.config.output_attentions and self.config.add_cross_attention): - cross_attns = None - - return TFBaseModelOutputWithPastAndCrossAttentions( - last_hidden_state=output.last_hidden_state, - past_key_values=pkv, - hidden_states=hs, - attentions=attns, - cross_attentions=cross_attns, - ) - @add_start_docstrings( """ @@ -1034,12 +999,12 @@ def __init__(self, config, **kwargs): @replace_return_docstrings(output_type=TFElectraForPreTrainingOutput, config_class=_CONFIG_FOR_DOC) def call( self, - input_ids: Optional[TFModelInputType] = None, - attention_mask: Optional[Union[np.ndarray, tf.Tensor]] = None, - token_type_ids: Optional[Union[np.ndarray, tf.Tensor]] = None, - position_ids: Optional[Union[np.ndarray, tf.Tensor]] = None, - head_mask: Optional[Union[np.ndarray, tf.Tensor]] = None, - inputs_embeds: Optional[Union[np.ndarray, tf.Tensor]] = None, + input_ids: TFModelInputType | None = None, + attention_mask: np.ndarray | tf.Tensor | None = None, + token_type_ids: np.ndarray | tf.Tensor | None = None, + position_ids: np.ndarray | tf.Tensor | None = None, + head_mask: np.ndarray | tf.Tensor | None = None, + inputs_embeds: np.ndarray | tf.Tensor | None = None, output_attentions: Optional[bool] = None, output_hidden_states: Optional[bool] = None, return_dict: Optional[bool] = None, @@ -1084,12 +1049,6 @@ def call( attentions=discriminator_hidden_states.attentions, ) - def serving_output(self, output): - hs = tf.convert_to_tensor(output.hidden_states) if self.config.output_hidden_states else None - attns = tf.convert_to_tensor(output.attentions) if self.config.output_attentions else None - - return TFElectraForPreTrainingOutput(logits=output.logits, hidden_states=hs, attentions=attns) - class TFElectraMaskedLMHead(tf.keras.layers.Layer): def __init__(self, config, input_embeddings, **kwargs): @@ -1171,16 +1130,16 @@ def get_prefix_bias_name(self): ) def call( self, - input_ids: Optional[TFModelInputType] = None, - attention_mask: Optional[Union[np.ndarray, tf.Tensor]] = None, - token_type_ids: Optional[Union[np.ndarray, tf.Tensor]] = None, - position_ids: Optional[Union[np.ndarray, tf.Tensor]] = None, - head_mask: Optional[Union[np.ndarray, tf.Tensor]] = None, - inputs_embeds: Optional[Union[np.ndarray, tf.Tensor]] = None, + input_ids: TFModelInputType | None = None, + attention_mask: np.ndarray | tf.Tensor | None = None, + token_type_ids: np.ndarray | tf.Tensor | None = None, + position_ids: np.ndarray | tf.Tensor | None = None, + head_mask: np.ndarray | tf.Tensor | None = None, + inputs_embeds: np.ndarray | tf.Tensor | None = None, output_attentions: Optional[bool] = None, output_hidden_states: Optional[bool] = None, return_dict: Optional[bool] = None, - labels: Optional[Union[np.ndarray, tf.Tensor]] = None, + labels: np.ndarray | tf.Tensor | None = None, training: Optional[bool] = False, ) -> Union[TFMaskedLMOutput, Tuple[tf.Tensor]]: r""" @@ -1218,13 +1177,6 @@ def call( attentions=generator_hidden_states.attentions, ) - # Copied from transformers.models.bert.modeling_tf_bert.TFBertForMaskedLM.serving_output - def serving_output(self, output: TFMaskedLMOutput) -> TFMaskedLMOutput: - hs = tf.convert_to_tensor(output.hidden_states) if self.config.output_hidden_states else None - attns = tf.convert_to_tensor(output.attentions) if self.config.output_attentions else None - - return TFMaskedLMOutput(logits=output.logits, hidden_states=hs, attentions=attns) - class TFElectraClassificationHead(tf.keras.layers.Layer): """Head for sentence-level classification tasks.""" @@ -1281,16 +1233,16 @@ def __init__(self, config, *inputs, **kwargs): ) def call( self, - input_ids: Optional[TFModelInputType] = None, - attention_mask: Optional[Union[np.ndarray, tf.Tensor]] = None, - token_type_ids: Optional[Union[np.ndarray, tf.Tensor]] = None, - position_ids: Optional[Union[np.ndarray, tf.Tensor]] = None, - head_mask: Optional[Union[np.ndarray, tf.Tensor]] = None, - inputs_embeds: Optional[Union[np.ndarray, tf.Tensor]] = None, + input_ids: TFModelInputType | None = None, + attention_mask: np.ndarray | tf.Tensor | None = None, + token_type_ids: np.ndarray | tf.Tensor | None = None, + position_ids: np.ndarray | tf.Tensor | None = None, + head_mask: np.ndarray | tf.Tensor | None = None, + inputs_embeds: np.ndarray | tf.Tensor | None = None, output_attentions: Optional[bool] = None, output_hidden_states: Optional[bool] = None, return_dict: Optional[bool] = None, - labels: Optional[Union[np.ndarray, tf.Tensor]] = None, + labels: np.ndarray | tf.Tensor | None = None, training: Optional[bool] = False, ) -> Union[TFSequenceClassifierOutput, Tuple[tf.Tensor]]: r""" @@ -1326,13 +1278,6 @@ def call( attentions=outputs.attentions, ) - # Copied from transformers.models.bert.modeling_tf_bert.TFBertForSequenceClassification.serving_output - def serving_output(self, output: TFSequenceClassifierOutput) -> TFSequenceClassifierOutput: - hs = tf.convert_to_tensor(output.hidden_states) if self.config.output_hidden_states else None - attns = tf.convert_to_tensor(output.attentions) if self.config.output_attentions else None - - return TFSequenceClassifierOutput(logits=output.logits, hidden_states=hs, attentions=attns) - @add_start_docstrings( """ @@ -1353,16 +1298,6 @@ def __init__(self, config, *inputs, **kwargs): 1, kernel_initializer=get_initializer(config.initializer_range), name="classifier" ) - @property - def dummy_inputs(self): - """ - Dummy inputs to build the network. - - Returns: - tf.Tensor with dummy inputs - """ - return {"input_ids": tf.constant(MULTIPLE_CHOICE_DUMMY_INPUTS, dtype=tf.int32)} - @unpack_inputs @add_start_docstrings_to_model_forward(ELECTRA_INPUTS_DOCSTRING.format("batch_size, num_choices, sequence_length")) @add_code_sample_docstrings( @@ -1372,16 +1307,16 @@ def dummy_inputs(self): ) def call( self, - input_ids: Optional[TFModelInputType] = None, - attention_mask: Optional[Union[np.ndarray, tf.Tensor]] = None, - token_type_ids: Optional[Union[np.ndarray, tf.Tensor]] = None, - position_ids: Optional[Union[np.ndarray, tf.Tensor]] = None, - head_mask: Optional[Union[np.ndarray, tf.Tensor]] = None, - inputs_embeds: Optional[Union[np.ndarray, tf.Tensor]] = None, + input_ids: TFModelInputType | None = None, + attention_mask: np.ndarray | tf.Tensor | None = None, + token_type_ids: np.ndarray | tf.Tensor | None = None, + position_ids: np.ndarray | tf.Tensor | None = None, + head_mask: np.ndarray | tf.Tensor | None = None, + inputs_embeds: np.ndarray | tf.Tensor | None = None, output_attentions: Optional[bool] = None, output_hidden_states: Optional[bool] = None, return_dict: Optional[bool] = None, - labels: Optional[Union[np.ndarray, tf.Tensor]] = None, + labels: np.ndarray | tf.Tensor | None = None, training: Optional[bool] = False, ) -> Union[TFMultipleChoiceModelOutput, Tuple[tf.Tensor]]: r""" @@ -1435,28 +1370,6 @@ def call( attentions=outputs.attentions, ) - @tf.function( - input_signature=[ - { - "input_ids": tf.TensorSpec((None, None, None), tf.int32, name="input_ids"), - "attention_mask": tf.TensorSpec((None, None, None), tf.int32, name="attention_mask"), - "token_type_ids": tf.TensorSpec((None, None, None), tf.int32, name="token_type_ids"), - } - ] - ) - # Copied from transformers.models.bert.modeling_tf_bert.TFBertForMultipleChoice.serving - def serving(self, inputs: Dict[str, tf.Tensor]): - output = self.call(input_ids=inputs) - - return self.serving_output(output) - - # Copied from transformers.models.bert.modeling_tf_bert.TFBertForMultipleChoice.serving_output - def serving_output(self, output: TFMultipleChoiceModelOutput) -> TFMultipleChoiceModelOutput: - hs = tf.convert_to_tensor(output.hidden_states) if self.config.output_hidden_states else None - attns = tf.convert_to_tensor(output.attentions) if self.config.output_attentions else None - - return TFMultipleChoiceModelOutput(logits=output.logits, hidden_states=hs, attentions=attns) - @add_start_docstrings( """ @@ -1490,16 +1403,16 @@ def __init__(self, config, **kwargs): ) def call( self, - input_ids: Optional[TFModelInputType] = None, - attention_mask: Optional[Union[np.ndarray, tf.Tensor]] = None, - token_type_ids: Optional[Union[np.ndarray, tf.Tensor]] = None, - position_ids: Optional[Union[np.ndarray, tf.Tensor]] = None, - head_mask: Optional[Union[np.ndarray, tf.Tensor]] = None, - inputs_embeds: Optional[Union[np.ndarray, tf.Tensor]] = None, + input_ids: TFModelInputType | None = None, + attention_mask: np.ndarray | tf.Tensor | None = None, + token_type_ids: np.ndarray | tf.Tensor | None = None, + position_ids: np.ndarray | tf.Tensor | None = None, + head_mask: np.ndarray | tf.Tensor | None = None, + inputs_embeds: np.ndarray | tf.Tensor | None = None, output_attentions: Optional[bool] = None, output_hidden_states: Optional[bool] = None, return_dict: Optional[bool] = None, - labels: Optional[Union[np.ndarray, tf.Tensor]] = None, + labels: np.ndarray | tf.Tensor | None = None, training: Optional[bool] = False, ) -> Union[TFTokenClassifierOutput, Tuple[tf.Tensor]]: r""" @@ -1535,13 +1448,6 @@ def call( attentions=discriminator_hidden_states.attentions, ) - # Copied from transformers.models.bert.modeling_tf_bert.TFBertForTokenClassification.serving_output - def serving_output(self, output: TFTokenClassifierOutput) -> TFTokenClassifierOutput: - hs = tf.convert_to_tensor(output.hidden_states) if self.config.output_hidden_states else None - attns = tf.convert_to_tensor(output.attentions) if self.config.output_attentions else None - - return TFTokenClassifierOutput(logits=output.logits, hidden_states=hs, attentions=attns) - @add_start_docstrings( """ @@ -1573,17 +1479,17 @@ def __init__(self, config, *inputs, **kwargs): ) def call( self, - input_ids: Optional[TFModelInputType] = None, - attention_mask: Optional[Union[np.ndarray, tf.Tensor]] = None, - token_type_ids: Optional[Union[np.ndarray, tf.Tensor]] = None, - position_ids: Optional[Union[np.ndarray, tf.Tensor]] = None, - head_mask: Optional[Union[np.ndarray, tf.Tensor]] = None, - inputs_embeds: Optional[Union[np.ndarray, tf.Tensor]] = None, + input_ids: TFModelInputType | None = None, + attention_mask: np.ndarray | tf.Tensor | None = None, + token_type_ids: np.ndarray | tf.Tensor | None = None, + position_ids: np.ndarray | tf.Tensor | None = None, + head_mask: np.ndarray | tf.Tensor | None = None, + inputs_embeds: np.ndarray | tf.Tensor | None = None, output_attentions: Optional[bool] = None, output_hidden_states: Optional[bool] = None, return_dict: Optional[bool] = None, - start_positions: Optional[Union[np.ndarray, tf.Tensor]] = None, - end_positions: Optional[Union[np.ndarray, tf.Tensor]] = None, + start_positions: np.ndarray | tf.Tensor | None = None, + end_positions: np.ndarray | tf.Tensor | None = None, training: Optional[bool] = False, ) -> Union[TFQuestionAnsweringModelOutput, Tuple[tf.Tensor]]: r""" @@ -1635,12 +1541,3 @@ def call( hidden_states=discriminator_hidden_states.hidden_states, attentions=discriminator_hidden_states.attentions, ) - - # Copied from transformers.models.bert.modeling_tf_bert.TFBertForQuestionAnswering.serving_output - def serving_output(self, output: TFQuestionAnsweringModelOutput) -> TFQuestionAnsweringModelOutput: - hs = tf.convert_to_tensor(output.hidden_states) if self.config.output_hidden_states else None - attns = tf.convert_to_tensor(output.attentions) if self.config.output_attentions else None - - return TFQuestionAnsweringModelOutput( - start_logits=output.start_logits, end_logits=output.end_logits, hidden_states=hs, attentions=attns - ) diff --git a/src/transformers/models/encoder_decoder/modeling_tf_encoder_decoder.py b/src/transformers/models/encoder_decoder/modeling_tf_encoder_decoder.py index 5ec7f2932f5952..19fc47546b0f75 100644 --- a/src/transformers/models/encoder_decoder/modeling_tf_encoder_decoder.py +++ b/src/transformers/models/encoder_decoder/modeling_tf_encoder_decoder.py @@ -14,6 +14,9 @@ # limitations under the License. """ Classes to support TF Encoder-Decoder architectures""" + +from __future__ import annotations + import inspect import re import warnings @@ -33,7 +36,6 @@ ) from ...tf_utils import shape_list from ...utils import ( - DUMMY_INPUTS, ModelOutput, add_start_docstrings, add_start_docstrings_to_model_forward, @@ -273,19 +275,6 @@ def __init__( "following discussion on GitHub: https://github.com/huggingface/transformers/issues/23350" ) - @property - def dummy_inputs(self): - """ - Dummy inputs to build the network. - - Returns: - `Dict[str, tf.Tensor]`: The dummy inputs. - """ - # Add `decoder_input_ids` because `self.decoder` requires it. - input_ids = tf.constant(DUMMY_INPUTS, dtype=tf.int32) - dummy = {"input_ids": input_ids, "decoder_input_ids": input_ids} - return dummy - def get_encoder(self): return self.encoder @@ -482,15 +471,15 @@ def from_encoder_decoder_pretrained( @replace_return_docstrings(output_type=TFSeq2SeqLMOutput, config_class=_CONFIG_FOR_DOC) def call( self, - input_ids: Optional[TFModelInputType] = None, - attention_mask: Optional[Union[np.ndarray, tf.Tensor]] = None, - decoder_input_ids: Optional[Union[np.ndarray, tf.Tensor]] = None, - decoder_attention_mask: Optional[Union[np.ndarray, tf.Tensor]] = None, - encoder_outputs: Optional[Union[np.ndarray, tf.Tensor]] = None, - past_key_values: Optional[Tuple[Tuple[tf.Tensor]]] = None, - inputs_embeds: Optional[Union[np.ndarray, tf.Tensor]] = None, - decoder_inputs_embeds: Optional[Union[np.ndarray, tf.Tensor]] = None, - labels: Optional[Union[np.ndarray, tf.Tensor]] = None, + input_ids: TFModelInputType | None = None, + attention_mask: np.ndarray | tf.Tensor | None = None, + decoder_input_ids: np.ndarray | tf.Tensor | None = None, + decoder_attention_mask: np.ndarray | tf.Tensor | None = None, + encoder_outputs: np.ndarray | tf.Tensor | None = None, + past_key_values: Tuple[Tuple[tf.Tensor]] | None = None, + inputs_embeds: np.ndarray | tf.Tensor | None = None, + decoder_inputs_embeds: np.ndarray | tf.Tensor | None = None, + labels: np.ndarray | tf.Tensor | None = None, use_cache: Optional[bool] = None, output_attentions: Optional[bool] = None, output_hidden_states: Optional[bool] = None, @@ -639,33 +628,6 @@ def call( encoder_attentions=encoder_outputs.attentions, ) - def serving_output(self, output): - pkv = tf.tuple(output.past_key_values)[1] if self.config.decoder.use_cache else None - dec_hs = ( - tf.convert_to_tensor(output.decoder_hidden_states) if self.config.decoder.output_hidden_states else None - ) - dec_attns = tf.convert_to_tensor(output.decoder_attentions) if self.config.decoder.output_attentions else None - enc_hs = ( - tf.convert_to_tensor(output.encoder_hidden_states) if self.config.encoder.output_hidden_states else None - ) - enc_attns = tf.convert_to_tensor(output.encoder_attentions) if self.config.encoder.output_attentions else None - cross_attns = ( - tf.convert_to_tensor(output.cross_attentions) - if self.config.decoder.output_attentions and output.cross_attentions is not None - else None - ) - - return TFSeq2SeqLMOutput( - logits=output.logits, - past_key_values=pkv, - decoder_hidden_states=dec_hs, - decoder_attentions=dec_attns, - encoder_last_hidden_state=output.encoder_last_hidden_state, - encoder_hidden_states=enc_hs, - encoder_attentions=enc_attns, - cross_attentions=cross_attns, - ) - def prepare_inputs_for_generation( self, input_ids, past_key_values=None, attention_mask=None, use_cache=None, encoder_outputs=None, **kwargs ): diff --git a/src/transformers/models/esm/modeling_tf_esm.py b/src/transformers/models/esm/modeling_tf_esm.py index 135c16a14b36dd..126473ee529ae9 100644 --- a/src/transformers/models/esm/modeling_tf_esm.py +++ b/src/transformers/models/esm/modeling_tf_esm.py @@ -14,6 +14,9 @@ # limitations under the License. """ PyTorch ESM model.""" + +from __future__ import annotations + import os from typing import Optional, Tuple, Union @@ -312,11 +315,11 @@ def transpose_for_scores(self, x: tf.Tensor) -> tf.Tensor: def call( self, hidden_states: tf.Tensor, - attention_mask: Optional[tf.Tensor] = None, - head_mask: Optional[tf.Tensor] = None, - encoder_hidden_states: Optional[tf.Tensor] = None, - encoder_attention_mask: Optional[tf.Tensor] = None, - past_key_value: Optional[Tuple[Tuple[tf.Tensor]]] = None, + attention_mask: tf.Tensor | None = None, + head_mask: tf.Tensor | None = None, + encoder_hidden_states: tf.Tensor | None = None, + encoder_attention_mask: tf.Tensor | None = None, + past_key_value: Tuple[Tuple[tf.Tensor]] | None = None, output_attentions: Optional[bool] = False, training: bool = False, ) -> Tuple[tf.Tensor]: @@ -801,13 +804,13 @@ def _prune_heads(self, heads_to_prune): def call( self, - input_ids: Optional[TFModelInputType] = None, - attention_mask: Optional[Union[np.ndarray, tf.Tensor]] = None, - position_ids: Optional[Union[np.ndarray, tf.Tensor]] = None, - head_mask: Optional[Union[np.ndarray, tf.Tensor]] = None, - inputs_embeds: Optional[Union[np.ndarray, tf.Tensor]] = None, - encoder_hidden_states: Optional[Union[np.ndarray, tf.Tensor]] = None, - encoder_attention_mask: Optional[Union[np.ndarray, tf.Tensor]] = None, + input_ids: TFModelInputType | None = None, + attention_mask: np.ndarray | tf.Tensor | None = None, + position_ids: np.ndarray | tf.Tensor | None = None, + head_mask: np.ndarray | tf.Tensor | None = None, + inputs_embeds: np.ndarray | tf.Tensor | None = None, + encoder_hidden_states: np.ndarray | tf.Tensor | None = None, + encoder_attention_mask: np.ndarray | tf.Tensor | None = None, past_key_values: Optional[Tuple[Tuple[Union[np.ndarray, tf.Tensor]]]] = None, use_cache: Optional[bool] = None, output_attentions: Optional[bool] = None, @@ -984,13 +987,13 @@ def __init__(self, config: EsmConfig, add_pooling_layer=True, *inputs, **kwargs) ) def call( self, - input_ids: Optional[TFModelInputType] = None, - attention_mask: Optional[Union[np.ndarray, tf.Tensor]] = None, - position_ids: Optional[Union[np.ndarray, tf.Tensor]] = None, - head_mask: Optional[Union[np.ndarray, tf.Tensor]] = None, - inputs_embeds: Optional[Union[np.ndarray, tf.Tensor]] = None, - encoder_hidden_states: Optional[Union[np.ndarray, tf.Tensor]] = None, - encoder_attention_mask: Optional[Union[np.ndarray, tf.Tensor]] = None, + input_ids: TFModelInputType | None = None, + attention_mask: np.ndarray | tf.Tensor | None = None, + position_ids: np.ndarray | tf.Tensor | None = None, + head_mask: np.ndarray | tf.Tensor | None = None, + inputs_embeds: np.ndarray | tf.Tensor | None = None, + encoder_hidden_states: np.ndarray | tf.Tensor | None = None, + encoder_attention_mask: np.ndarray | tf.Tensor | None = None, past_key_values: Optional[Tuple[Tuple[Union[np.ndarray, tf.Tensor]]]] = None, use_cache: Optional[bool] = None, output_attentions: Optional[bool] = None, @@ -1035,39 +1038,6 @@ def call( ) return outputs - @tf.function( - input_signature=[ - { - "input_ids": tf.TensorSpec((None, None), tf.int32, name="input_ids"), - "attention_mask": tf.TensorSpec((None, None), tf.int32, name="attention_mask"), - } - ] - ) - def serving(self, inputs): - output = self.call(inputs) - - return self.serving_output(output) - - def serving_output( - self, output: TFBaseModelOutputWithPoolingAndCrossAttentions - ) -> TFBaseModelOutputWithPoolingAndCrossAttentions: - output_cache = self.config.use_cache and self.config.is_decoder - pkv = tf.convert_to_tensor(output.past_key_values) if output_cache else None - hs = tf.convert_to_tensor(output.hidden_states) if self.config.output_hidden_states else None - attns = tf.convert_to_tensor(output.attentions) if self.config.output_attentions else None - cross_attns = tf.convert_to_tensor(output.cross_attentions) if output.cross_attentions is not None else None - if not (self.config.output_attentions and self.config.add_cross_attention): - cross_attns = None - - return TFBaseModelOutputWithPoolingAndCrossAttentions( - last_hidden_state=output.last_hidden_state, - pooler_output=output.pooler_output, - past_key_values=pkv, - hidden_states=hs, - attentions=attns, - cross_attentions=cross_attns, - ) - def predict_contacts(self, tokens, attention_mask): return self.esm.predict_contacts(tokens, attention_mask) @@ -1113,14 +1083,14 @@ def get_lm_head(self): ) def call( self, - input_ids: Optional[TFModelInputType] = None, - attention_mask: Optional[Union[np.ndarray, tf.Tensor]] = None, - position_ids: Optional[Union[np.ndarray, tf.Tensor]] = None, - head_mask: Optional[Union[np.ndarray, tf.Tensor]] = None, - inputs_embeds: Optional[Union[np.ndarray, tf.Tensor]] = None, - encoder_hidden_states: Optional[Union[np.ndarray, tf.Tensor]] = None, - encoder_attention_mask: Optional[Union[np.ndarray, tf.Tensor]] = None, - labels: Optional[Union[np.ndarray, tf.Tensor]] = None, + input_ids: TFModelInputType | None = None, + attention_mask: np.ndarray | tf.Tensor | None = None, + position_ids: np.ndarray | tf.Tensor | None = None, + head_mask: np.ndarray | tf.Tensor | None = None, + inputs_embeds: np.ndarray | tf.Tensor | None = None, + encoder_hidden_states: np.ndarray | tf.Tensor | None = None, + encoder_attention_mask: np.ndarray | tf.Tensor | None = None, + labels: np.ndarray | tf.Tensor | None = None, output_attentions: Optional[bool] = None, output_hidden_states: Optional[bool] = None, return_dict: Optional[bool] = None, @@ -1167,26 +1137,6 @@ def call( attentions=outputs.attentions, ) - # Copied from transformers.models.bert.modeling_tf_bert.TFBertForMaskedLM.serving_output - def serving_output(self, output: TFMaskedLMOutput) -> TFMaskedLMOutput: - hs = tf.convert_to_tensor(output.hidden_states) if self.config.output_hidden_states else None - attns = tf.convert_to_tensor(output.attentions) if self.config.output_attentions else None - - return TFMaskedLMOutput(logits=output.logits, hidden_states=hs, attentions=attns) - - @tf.function( - input_signature=[ - { - "input_ids": tf.TensorSpec((None, None), tf.int32, name="input_ids"), - "attention_mask": tf.TensorSpec((None, None), tf.int32, name="attention_mask"), - } - ] - ) - def serving(self, inputs): - output = self.call(inputs) - - return self.serving_output(output) - def predict_contacts(self, tokens, attention_mask): return self.esm.predict_contacts(tokens, attention_mask) @@ -1261,12 +1211,12 @@ def __init__(self, config): ) def call( self, - input_ids: Optional[TFModelInputType] = None, - attention_mask: Optional[Union[np.ndarray, tf.Tensor]] = None, - position_ids: Optional[Union[np.ndarray, tf.Tensor]] = None, - head_mask: Optional[Union[np.ndarray, tf.Tensor]] = None, - inputs_embeds: Optional[Union[np.ndarray, tf.Tensor]] = None, - labels: Optional[Union[np.ndarray, tf.Tensor]] = None, + input_ids: TFModelInputType | None = None, + attention_mask: np.ndarray | tf.Tensor | None = None, + position_ids: np.ndarray | tf.Tensor | None = None, + head_mask: np.ndarray | tf.Tensor | None = None, + inputs_embeds: np.ndarray | tf.Tensor | None = None, + labels: np.ndarray | tf.Tensor | None = None, output_attentions: Optional[bool] = None, output_hidden_states: Optional[bool] = None, return_dict: Optional[bool] = None, @@ -1307,26 +1257,6 @@ def call( attentions=outputs.attentions, ) - # Copied from transformers.models.bert.modeling_tf_bert.TFBertForSequenceClassification.serving_output - def serving_output(self, output: TFSequenceClassifierOutput) -> TFSequenceClassifierOutput: - hs = tf.convert_to_tensor(output.hidden_states) if self.config.output_hidden_states else None - attns = tf.convert_to_tensor(output.attentions) if self.config.output_attentions else None - - return TFSequenceClassifierOutput(logits=output.logits, hidden_states=hs, attentions=attns) - - @tf.function( - input_signature=[ - { - "input_ids": tf.TensorSpec((None, None, None), tf.int32, name="input_ids"), - "attention_mask": tf.TensorSpec((None, None, None), tf.int32, name="attention_mask"), - } - ] - ) - def serving(self, inputs): - output = self.call(inputs) - - return self.serving_output(output) - @add_start_docstrings( """ @@ -1356,12 +1286,12 @@ def __init__(self, config): ) def call( self, - input_ids: Optional[TFModelInputType] = None, - attention_mask: Optional[Union[np.ndarray, tf.Tensor]] = None, - position_ids: Optional[Union[np.ndarray, tf.Tensor]] = None, - head_mask: Optional[Union[np.ndarray, tf.Tensor]] = None, - inputs_embeds: Optional[Union[np.ndarray, tf.Tensor]] = None, - labels: Optional[Union[np.ndarray, tf.Tensor]] = None, + input_ids: TFModelInputType | None = None, + attention_mask: np.ndarray | tf.Tensor | None = None, + position_ids: np.ndarray | tf.Tensor | None = None, + head_mask: np.ndarray | tf.Tensor | None = None, + inputs_embeds: np.ndarray | tf.Tensor | None = None, + labels: np.ndarray | tf.Tensor | None = None, output_attentions: Optional[bool] = None, output_hidden_states: Optional[bool] = None, return_dict: Optional[bool] = None, @@ -1403,26 +1333,6 @@ def call( attentions=outputs.attentions, ) - # Copied from transformers.models.bert.modeling_tf_bert.TFBertForTokenClassification.serving_output - def serving_output(self, output: TFTokenClassifierOutput) -> TFTokenClassifierOutput: - hs = tf.convert_to_tensor(output.hidden_states) if self.config.output_hidden_states else None - attns = tf.convert_to_tensor(output.attentions) if self.config.output_attentions else None - - return TFTokenClassifierOutput(logits=output.logits, hidden_states=hs, attentions=attns) - - @tf.function( - input_signature=[ - { - "input_ids": tf.TensorSpec((None, None), tf.int32, name="input_ids"), - "attention_mask": tf.TensorSpec((None, None), tf.int32, name="attention_mask"), - } - ] - ) - def serving(self, inputs): - output = self.call(inputs) - - return self.serving_output(output) - class TFEsmClassificationHead(Layer): """Head for sentence-level classification tasks.""" diff --git a/src/transformers/models/flaubert/modeling_tf_flaubert.py b/src/transformers/models/flaubert/modeling_tf_flaubert.py index b1dd523dedaf85..068119d35f1709 100644 --- a/src/transformers/models/flaubert/modeling_tf_flaubert.py +++ b/src/transformers/models/flaubert/modeling_tf_flaubert.py @@ -16,6 +16,9 @@ TF 2.0 Flaubert model. """ + +from __future__ import annotations + import itertools import random import warnings @@ -255,15 +258,15 @@ def __init__(self, config, *inputs, **kwargs): ) def call( self, - input_ids: Optional[Union[np.ndarray, tf.Tensor]] = None, - attention_mask: Optional[Union[np.ndarray, tf.Tensor]] = None, - langs: Optional[Union[np.ndarray, tf.Tensor]] = None, - token_type_ids: Optional[Union[np.ndarray, tf.Tensor]] = None, - position_ids: Optional[Union[np.ndarray, tf.Tensor]] = None, - lengths: Optional[Union[np.ndarray, tf.Tensor]] = None, + input_ids: np.ndarray | tf.Tensor | None = None, + attention_mask: np.ndarray | tf.Tensor | None = None, + langs: np.ndarray | tf.Tensor | None = None, + token_type_ids: np.ndarray | tf.Tensor | None = None, + position_ids: np.ndarray | tf.Tensor | None = None, + lengths: np.ndarray | tf.Tensor | None = None, cache: Optional[Dict[str, tf.Tensor]] = None, - head_mask: Optional[Union[np.ndarray, tf.Tensor]] = None, - inputs_embeds: Optional[tf.Tensor] = None, + head_mask: np.ndarray | tf.Tensor | None = None, + inputs_embeds: tf.Tensor | None = None, output_attentions: Optional[bool] = None, output_hidden_states: Optional[bool] = None, return_dict: Optional[bool] = None, @@ -287,13 +290,6 @@ def call( return outputs - # Copied from transformers.models.distilbert.modeling_tf_distilbert.TFDistilBertModel.serving_output - def serving_output(self, output): - hs = tf.convert_to_tensor(output.hidden_states) if self.config.output_hidden_states else None - attns = tf.convert_to_tensor(output.attentions) if self.config.output_attentions else None - - return TFBaseModelOutput(last_hidden_state=output.last_hidden_state, hidden_states=hs, attentions=attns) - # Copied from transformers.models.xlm.modeling_tf_xlm.TFXLMMultiHeadAttention with XLM->Flaubert class TFFlaubertMultiHeadAttention(tf.keras.layers.Layer): @@ -486,15 +482,15 @@ def set_input_embeddings(self, value): @unpack_inputs def call( self, - input_ids: Optional[Union[np.ndarray, tf.Tensor]] = None, - attention_mask: Optional[Union[np.ndarray, tf.Tensor]] = None, - langs: Optional[Union[np.ndarray, tf.Tensor]] = None, - token_type_ids: Optional[Union[np.ndarray, tf.Tensor]] = None, - position_ids: Optional[Union[np.ndarray, tf.Tensor]] = None, - lengths: Optional[Union[np.ndarray, tf.Tensor]] = None, + input_ids: np.ndarray | tf.Tensor | None = None, + attention_mask: np.ndarray | tf.Tensor | None = None, + langs: np.ndarray | tf.Tensor | None = None, + token_type_ids: np.ndarray | tf.Tensor | None = None, + position_ids: np.ndarray | tf.Tensor | None = None, + lengths: np.ndarray | tf.Tensor | None = None, cache: Optional[Dict[str, tf.Tensor]] = None, - head_mask: Optional[Union[np.ndarray, tf.Tensor]] = None, - inputs_embeds: Optional[tf.Tensor] = None, + head_mask: np.ndarray | tf.Tensor | None = None, + inputs_embeds: tf.Tensor | None = None, output_attentions: Optional[bool] = None, output_hidden_states: Optional[bool] = None, return_dict: Optional[bool] = None, @@ -754,8 +750,8 @@ class TFFlaubertWithLMHeadModelOutput(ModelOutput): """ logits: tf.Tensor = None - hidden_states: Optional[Tuple[tf.Tensor]] = None - attentions: Optional[Tuple[tf.Tensor]] = None + hidden_states: Tuple[tf.Tensor] | None = None + attentions: Tuple[tf.Tensor] | None = None @add_start_docstrings( @@ -803,15 +799,15 @@ def prepare_inputs_for_generation(self, inputs, **kwargs): ) def call( self, - input_ids: Optional[Union[np.ndarray, tf.Tensor]] = None, - attention_mask: Optional[Union[np.ndarray, tf.Tensor]] = None, - langs: Optional[Union[np.ndarray, tf.Tensor]] = None, - token_type_ids: Optional[Union[np.ndarray, tf.Tensor]] = None, - position_ids: Optional[Union[np.ndarray, tf.Tensor]] = None, - lengths: Optional[Union[np.ndarray, tf.Tensor]] = None, + input_ids: np.ndarray | tf.Tensor | None = None, + attention_mask: np.ndarray | tf.Tensor | None = None, + langs: np.ndarray | tf.Tensor | None = None, + token_type_ids: np.ndarray | tf.Tensor | None = None, + position_ids: np.ndarray | tf.Tensor | None = None, + lengths: np.ndarray | tf.Tensor | None = None, cache: Optional[Dict[str, tf.Tensor]] = None, - head_mask: Optional[Union[np.ndarray, tf.Tensor]] = None, - inputs_embeds: Optional[tf.Tensor] = None, + head_mask: np.ndarray | tf.Tensor | None = None, + inputs_embeds: tf.Tensor | None = None, output_attentions: Optional[bool] = None, output_hidden_states: Optional[bool] = None, return_dict: Optional[bool] = None, @@ -842,12 +838,6 @@ def call( logits=outputs, hidden_states=transformer_outputs.hidden_states, attentions=transformer_outputs.attentions ) - def serving_output(self, output): - hs = tf.convert_to_tensor(output.hidden_states) if self.config.output_hidden_states else None - attns = tf.convert_to_tensor(output.attentions) if self.config.output_attentions else None - - return TFFlaubertWithLMHeadModelOutput(logits=output.logits, hidden_states=hs, attentions=attns) - @add_start_docstrings( """ @@ -874,19 +864,19 @@ def __init__(self, config, *inputs, **kwargs): ) def call( self, - input_ids: Optional[TFModelInputType] = None, - attention_mask: Optional[Union[np.ndarray, tf.Tensor]] = None, - langs: Optional[Union[np.ndarray, tf.Tensor]] = None, - token_type_ids: Optional[Union[np.ndarray, tf.Tensor]] = None, - position_ids: Optional[Union[np.ndarray, tf.Tensor]] = None, - lengths: Optional[Union[np.ndarray, tf.Tensor]] = None, + input_ids: TFModelInputType | None = None, + attention_mask: np.ndarray | tf.Tensor | None = None, + langs: np.ndarray | tf.Tensor | None = None, + token_type_ids: np.ndarray | tf.Tensor | None = None, + position_ids: np.ndarray | tf.Tensor | None = None, + lengths: np.ndarray | tf.Tensor | None = None, cache: Optional[Dict[str, tf.Tensor]] = None, - head_mask: Optional[Union[np.ndarray, tf.Tensor]] = None, - inputs_embeds: Optional[Union[np.ndarray, tf.Tensor]] = None, + head_mask: np.ndarray | tf.Tensor | None = None, + inputs_embeds: np.ndarray | tf.Tensor | None = None, output_attentions: Optional[bool] = None, output_hidden_states: Optional[bool] = None, return_dict: Optional[bool] = None, - labels: Optional[Union[np.ndarray, tf.Tensor]] = None, + labels: np.ndarray | tf.Tensor | None = None, training: bool = False, ) -> Union[TFSequenceClassifierOutput, Tuple[tf.Tensor]]: r""" @@ -927,13 +917,6 @@ def call( attentions=transformer_outputs.attentions, ) - # Copied from transformers.models.bert.modeling_tf_bert.TFBertForSequenceClassification.serving_output - def serving_output(self, output: TFSequenceClassifierOutput) -> TFSequenceClassifierOutput: - hs = tf.convert_to_tensor(output.hidden_states) if self.config.output_hidden_states else None - attns = tf.convert_to_tensor(output.attentions) if self.config.output_attentions else None - - return TFSequenceClassifierOutput(logits=output.logits, hidden_states=hs, attentions=attns) - @add_start_docstrings( """ @@ -960,20 +943,20 @@ def __init__(self, config, *inputs, **kwargs): ) def call( self, - input_ids: Optional[TFModelInputType] = None, - attention_mask: Optional[Union[np.ndarray, tf.Tensor]] = None, - langs: Optional[Union[np.ndarray, tf.Tensor]] = None, - token_type_ids: Optional[Union[np.ndarray, tf.Tensor]] = None, - position_ids: Optional[Union[np.ndarray, tf.Tensor]] = None, - lengths: Optional[Union[np.ndarray, tf.Tensor]] = None, + input_ids: TFModelInputType | None = None, + attention_mask: np.ndarray | tf.Tensor | None = None, + langs: np.ndarray | tf.Tensor | None = None, + token_type_ids: np.ndarray | tf.Tensor | None = None, + position_ids: np.ndarray | tf.Tensor | None = None, + lengths: np.ndarray | tf.Tensor | None = None, cache: Optional[Dict[str, tf.Tensor]] = None, - head_mask: Optional[Union[np.ndarray, tf.Tensor]] = None, - inputs_embeds: Optional[Union[np.ndarray, tf.Tensor]] = None, + head_mask: np.ndarray | tf.Tensor | None = None, + inputs_embeds: np.ndarray | tf.Tensor | None = None, output_attentions: Optional[bool] = None, output_hidden_states: Optional[bool] = None, return_dict: Optional[bool] = None, - start_positions: Optional[Union[np.ndarray, tf.Tensor]] = None, - end_positions: Optional[Union[np.ndarray, tf.Tensor]] = None, + start_positions: np.ndarray | tf.Tensor | None = None, + end_positions: np.ndarray | tf.Tensor | None = None, training: bool = False, ) -> Union[TFQuestionAnsweringModelOutput, Tuple[tf.Tensor]]: r""" @@ -1026,15 +1009,6 @@ def call( attentions=transformer_outputs.attentions, ) - # Copied from transformers.models.bert.modeling_tf_bert.TFBertForQuestionAnswering.serving_output - def serving_output(self, output: TFQuestionAnsweringModelOutput) -> TFQuestionAnsweringModelOutput: - hs = tf.convert_to_tensor(output.hidden_states) if self.config.output_hidden_states else None - attns = tf.convert_to_tensor(output.attentions) if self.config.output_attentions else None - - return TFQuestionAnsweringModelOutput( - start_logits=output.start_logits, end_logits=output.end_logits, hidden_states=hs, attentions=attns - ) - @add_start_docstrings( """ @@ -1064,19 +1038,19 @@ def __init__(self, config, *inputs, **kwargs): ) def call( self, - input_ids: Optional[TFModelInputType] = None, - attention_mask: Optional[Union[np.ndarray, tf.Tensor]] = None, - langs: Optional[Union[np.ndarray, tf.Tensor]] = None, - token_type_ids: Optional[Union[np.ndarray, tf.Tensor]] = None, - position_ids: Optional[Union[np.ndarray, tf.Tensor]] = None, - lengths: Optional[Union[np.ndarray, tf.Tensor]] = None, + input_ids: TFModelInputType | None = None, + attention_mask: np.ndarray | tf.Tensor | None = None, + langs: np.ndarray | tf.Tensor | None = None, + token_type_ids: np.ndarray | tf.Tensor | None = None, + position_ids: np.ndarray | tf.Tensor | None = None, + lengths: np.ndarray | tf.Tensor | None = None, cache: Optional[Dict[str, tf.Tensor]] = None, - head_mask: Optional[Union[np.ndarray, tf.Tensor]] = None, - inputs_embeds: Optional[Union[np.ndarray, tf.Tensor]] = None, + head_mask: np.ndarray | tf.Tensor | None = None, + inputs_embeds: np.ndarray | tf.Tensor | None = None, output_attentions: Optional[bool] = None, output_hidden_states: Optional[bool] = None, return_dict: Optional[bool] = None, - labels: Optional[Union[np.ndarray, tf.Tensor]] = None, + labels: np.ndarray | tf.Tensor | None = None, training: bool = False, ) -> Union[TFTokenClassifierOutput, Tuple[tf.Tensor]]: r""" @@ -1116,13 +1090,6 @@ def call( attentions=transformer_outputs.attentions, ) - # Copied from transformers.models.bert.modeling_tf_bert.TFBertForTokenClassification.serving_output - def serving_output(self, output: TFTokenClassifierOutput) -> TFTokenClassifierOutput: - hs = tf.convert_to_tensor(output.hidden_states) if self.config.output_hidden_states else None - attns = tf.convert_to_tensor(output.attentions) if self.config.output_attentions else None - - return TFTokenClassifierOutput(logits=output.logits, hidden_states=hs, attentions=attns) - @add_start_docstrings( """ @@ -1172,19 +1139,19 @@ def dummy_inputs(self): ) def call( self, - input_ids: Optional[TFModelInputType] = None, - attention_mask: Optional[Union[np.ndarray, tf.Tensor]] = None, - langs: Optional[Union[np.ndarray, tf.Tensor]] = None, - token_type_ids: Optional[Union[np.ndarray, tf.Tensor]] = None, - position_ids: Optional[Union[np.ndarray, tf.Tensor]] = None, - lengths: Optional[Union[np.ndarray, tf.Tensor]] = None, + input_ids: TFModelInputType | None = None, + attention_mask: np.ndarray | tf.Tensor | None = None, + langs: np.ndarray | tf.Tensor | None = None, + token_type_ids: np.ndarray | tf.Tensor | None = None, + position_ids: np.ndarray | tf.Tensor | None = None, + lengths: np.ndarray | tf.Tensor | None = None, cache: Optional[Dict[str, tf.Tensor]] = None, - head_mask: Optional[Union[np.ndarray, tf.Tensor]] = None, - inputs_embeds: Optional[Union[np.ndarray, tf.Tensor]] = None, + head_mask: np.ndarray | tf.Tensor | None = None, + inputs_embeds: np.ndarray | tf.Tensor | None = None, output_attentions: Optional[bool] = None, output_hidden_states: Optional[bool] = None, return_dict: Optional[bool] = None, - labels: Optional[Union[np.ndarray, tf.Tensor]] = None, + labels: np.ndarray | tf.Tensor | None = None, training: bool = False, ) -> Union[TFMultipleChoiceModelOutput, Tuple[tf.Tensor]]: if input_ids is not None: @@ -1244,25 +1211,3 @@ def call( hidden_states=transformer_outputs.hidden_states, attentions=transformer_outputs.attentions, ) - - @tf.function( - input_signature=[ - { - "input_ids": tf.TensorSpec((None, None, None), tf.int32, name="input_ids"), - "attention_mask": tf.TensorSpec((None, None, None), tf.int32, name="attention_mask"), - "token_type_ids": tf.TensorSpec((None, None, None), tf.int32, name="token_type_ids"), - } - ] - ) - # Copied from transformers.models.bert.modeling_tf_bert.TFBertForMultipleChoice.serving - def serving(self, inputs: Dict[str, tf.Tensor]): - output = self.call(input_ids=inputs) - - return self.serving_output(output) - - # Copied from transformers.models.bert.modeling_tf_bert.TFBertForMultipleChoice.serving_output - def serving_output(self, output: TFMultipleChoiceModelOutput) -> TFMultipleChoiceModelOutput: - hs = tf.convert_to_tensor(output.hidden_states) if self.config.output_hidden_states else None - attns = tf.convert_to_tensor(output.attentions) if self.config.output_attentions else None - - return TFMultipleChoiceModelOutput(logits=output.logits, hidden_states=hs, attentions=attns) diff --git a/src/transformers/models/funnel/modeling_tf_funnel.py b/src/transformers/models/funnel/modeling_tf_funnel.py index 84254f2b288c58..9c472674cf6505 100644 --- a/src/transformers/models/funnel/modeling_tf_funnel.py +++ b/src/transformers/models/funnel/modeling_tf_funnel.py @@ -14,9 +14,12 @@ # limitations under the License. """ TF 2.0 Funnel model.""" + +from __future__ import annotations + import warnings from dataclasses import dataclass -from typing import Dict, Optional, Tuple, Union +from typing import Optional, Tuple, Union import numpy as np import tensorflow as tf @@ -44,7 +47,6 @@ ) from ...tf_utils import check_embeddings_within_bounds, shape_list, stable_softmax from ...utils import ( - MULTIPLE_CHOICE_DUMMY_INPUTS, ModelOutput, add_code_sample_docstrings, add_start_docstrings, @@ -995,8 +997,8 @@ class TFFunnelForPreTrainingOutput(ModelOutput): """ logits: tf.Tensor = None - hidden_states: Optional[Tuple[tf.Tensor]] = None - attentions: Optional[Tuple[tf.Tensor]] = None + hidden_states: Tuple[tf.Tensor] | None = None + attentions: Tuple[tf.Tensor] | None = None FUNNEL_START_DOCSTRING = r""" @@ -1110,10 +1112,10 @@ def __init__(self, config: FunnelConfig, *inputs, **kwargs) -> None: @unpack_inputs def call( self, - input_ids: Optional[TFModelInputType] = None, - attention_mask: Optional[Union[np.ndarray, tf.Tensor]] = None, - token_type_ids: Optional[Union[np.ndarray, tf.Tensor]] = None, - inputs_embeds: Optional[Union[np.ndarray, tf.Tensor]] = None, + input_ids: TFModelInputType | None = None, + attention_mask: np.ndarray | tf.Tensor | None = None, + token_type_ids: np.ndarray | tf.Tensor | None = None, + inputs_embeds: np.ndarray | tf.Tensor | None = None, output_attentions: Optional[bool] = None, output_hidden_states: Optional[bool] = None, return_dict: Optional[bool] = None, @@ -1158,10 +1160,10 @@ def __init__(self, config: FunnelConfig, *inputs, **kwargs) -> None: ) def call( self, - input_ids: Optional[TFModelInputType] = None, - attention_mask: Optional[Union[np.ndarray, tf.Tensor]] = None, - token_type_ids: Optional[Union[np.ndarray, tf.Tensor]] = None, - inputs_embeds: Optional[Union[np.ndarray, tf.Tensor]] = None, + input_ids: TFModelInputType | None = None, + attention_mask: np.ndarray | tf.Tensor | None = None, + token_type_ids: np.ndarray | tf.Tensor | None = None, + inputs_embeds: np.ndarray | tf.Tensor | None = None, output_attentions: Optional[bool] = None, output_hidden_states: Optional[bool] = None, return_dict: Optional[bool] = None, @@ -1206,10 +1208,10 @@ def __init__(self, config: FunnelConfig, **kwargs) -> None: @replace_return_docstrings(output_type=TFFunnelForPreTrainingOutput, config_class=_CONFIG_FOR_DOC) def call( self, - input_ids: Optional[TFModelInputType] = None, - attention_mask: Optional[Union[np.ndarray, tf.Tensor]] = None, - token_type_ids: Optional[Union[np.ndarray, tf.Tensor]] = None, - inputs_embeds: Optional[Union[np.ndarray, tf.Tensor]] = None, + input_ids: TFModelInputType | None = None, + attention_mask: np.ndarray | tf.Tensor | None = None, + token_type_ids: np.ndarray | tf.Tensor | None = None, + inputs_embeds: np.ndarray | tf.Tensor | None = None, output_attentions: Optional[bool] = None, output_hidden_states: Optional[bool] = None, return_dict: Optional[bool] = None, @@ -1285,14 +1287,14 @@ def get_prefix_bias_name(self) -> str: ) def call( self, - input_ids: Optional[TFModelInputType] = None, - attention_mask: Optional[Union[np.ndarray, tf.Tensor]] = None, - token_type_ids: Optional[Union[np.ndarray, tf.Tensor]] = None, - inputs_embeds: Optional[Union[np.ndarray, tf.Tensor]] = None, + input_ids: TFModelInputType | None = None, + attention_mask: np.ndarray | tf.Tensor | None = None, + token_type_ids: np.ndarray | tf.Tensor | None = None, + inputs_embeds: np.ndarray | tf.Tensor | None = None, output_attentions: Optional[bool] = None, output_hidden_states: Optional[bool] = None, return_dict: Optional[bool] = None, - labels: Optional[Union[np.ndarray, tf.Tensor]] = None, + labels: np.ndarray | tf.Tensor | None = None, training: bool = False, ) -> Union[Tuple[tf.Tensor], TFMaskedLMOutput]: r""" @@ -1357,14 +1359,14 @@ def __init__(self, config: FunnelConfig, *inputs, **kwargs) -> None: ) def call( self, - input_ids: Optional[TFModelInputType] = None, - attention_mask: Optional[Union[np.ndarray, tf.Tensor]] = None, - token_type_ids: Optional[Union[np.ndarray, tf.Tensor]] = None, - inputs_embeds: Optional[Union[np.ndarray, tf.Tensor]] = None, + input_ids: TFModelInputType | None = None, + attention_mask: np.ndarray | tf.Tensor | None = None, + token_type_ids: np.ndarray | tf.Tensor | None = None, + inputs_embeds: np.ndarray | tf.Tensor | None = None, output_attentions: Optional[bool] = None, output_hidden_states: Optional[bool] = None, return_dict: Optional[bool] = None, - labels: Optional[Union[np.ndarray, tf.Tensor]] = None, + labels: np.ndarray | tf.Tensor | None = None, training: bool = False, ) -> Union[Tuple[tf.Tensor], TFSequenceClassifierOutput]: r""" @@ -1422,16 +1424,6 @@ def __init__(self, config: FunnelConfig, *inputs, **kwargs) -> None: self.funnel = TFFunnelBaseLayer(config, name="funnel") self.classifier = TFFunnelClassificationHead(config, 1, name="classifier") - @property - def dummy_inputs(self): - """ - Dummy inputs to build the network. - - Returns: - tf.Tensor with dummy inputs - """ - return {"input_ids": tf.constant(MULTIPLE_CHOICE_DUMMY_INPUTS, dtype=tf.int32)} - @unpack_inputs @add_start_docstrings_to_model_forward(FUNNEL_INPUTS_DOCSTRING.format("batch_size, num_choices, sequence_length")) @add_code_sample_docstrings( @@ -1441,14 +1433,14 @@ def dummy_inputs(self): ) def call( self, - input_ids: Optional[TFModelInputType] = None, - attention_mask: Optional[Union[np.ndarray, tf.Tensor]] = None, - token_type_ids: Optional[Union[np.ndarray, tf.Tensor]] = None, - inputs_embeds: Optional[Union[np.ndarray, tf.Tensor]] = None, + input_ids: TFModelInputType | None = None, + attention_mask: np.ndarray | tf.Tensor | None = None, + token_type_ids: np.ndarray | tf.Tensor | None = None, + inputs_embeds: np.ndarray | tf.Tensor | None = None, output_attentions: Optional[bool] = None, output_hidden_states: Optional[bool] = None, return_dict: Optional[bool] = None, - labels: Optional[Union[np.ndarray, tf.Tensor]] = None, + labels: np.ndarray | tf.Tensor | None = None, training: bool = False, ) -> Union[Tuple[tf.Tensor], TFMultipleChoiceModelOutput]: r""" @@ -1501,20 +1493,6 @@ def call( attentions=outputs.attentions, ) - @tf.function( - input_signature=[ - { - "input_ids": tf.TensorSpec((None, None), tf.int32, name="input_ids"), - "attention_mask": tf.TensorSpec((None, None), tf.float32, name="attention_mask"), - "token_type_ids": tf.TensorSpec((None, None), tf.int32, name="token_type_ids"), - } - ] - ) - def serving(self, inputs: Dict[str, tf.Tensor]) -> TFMultipleChoiceModelOutput: - output = self.call(input_ids=inputs) - - return self.serving_output(output=output) - def serving_output(self, output: TFMultipleChoiceModelOutput) -> TFMultipleChoiceModelOutput: # hidden_states and attentions not converted to Tensor with tf.convert_to_tensor as they are all of # different dimensions @@ -1550,14 +1528,14 @@ def __init__(self, config: FunnelConfig, *inputs, **kwargs) -> None: ) def call( self, - input_ids: Optional[TFModelInputType] = None, - attention_mask: Optional[Union[np.ndarray, tf.Tensor]] = None, - token_type_ids: Optional[Union[np.ndarray, tf.Tensor]] = None, - inputs_embeds: Optional[Union[np.ndarray, tf.Tensor]] = None, + input_ids: TFModelInputType | None = None, + attention_mask: np.ndarray | tf.Tensor | None = None, + token_type_ids: np.ndarray | tf.Tensor | None = None, + inputs_embeds: np.ndarray | tf.Tensor | None = None, output_attentions: Optional[bool] = None, output_hidden_states: Optional[bool] = None, return_dict: Optional[bool] = None, - labels: Optional[Union[np.ndarray, tf.Tensor]] = None, + labels: np.ndarray | tf.Tensor | None = None, training: bool = False, ) -> Union[Tuple[tf.Tensor], TFTokenClassifierOutput]: r""" @@ -1626,15 +1604,15 @@ def __init__(self, config: FunnelConfig, *inputs, **kwargs) -> None: ) def call( self, - input_ids: Optional[TFModelInputType] = None, - attention_mask: Optional[Union[np.ndarray, tf.Tensor]] = None, - token_type_ids: Optional[Union[np.ndarray, tf.Tensor]] = None, - inputs_embeds: Optional[Union[np.ndarray, tf.Tensor]] = None, + input_ids: TFModelInputType | None = None, + attention_mask: np.ndarray | tf.Tensor | None = None, + token_type_ids: np.ndarray | tf.Tensor | None = None, + inputs_embeds: np.ndarray | tf.Tensor | None = None, output_attentions: Optional[bool] = None, output_hidden_states: Optional[bool] = None, return_dict: Optional[bool] = None, - start_positions: Optional[Union[np.ndarray, tf.Tensor]] = None, - end_positions: Optional[Union[np.ndarray, tf.Tensor]] = None, + start_positions: np.ndarray | tf.Tensor | None = None, + end_positions: np.ndarray | tf.Tensor | None = None, training: bool = False, ) -> Union[Tuple[tf.Tensor], TFQuestionAnsweringModelOutput]: r""" diff --git a/src/transformers/models/gpt2/modeling_tf_gpt2.py b/src/transformers/models/gpt2/modeling_tf_gpt2.py index b7cb1b6df2a20a..ab6bc07947cce7 100644 --- a/src/transformers/models/gpt2/modeling_tf_gpt2.py +++ b/src/transformers/models/gpt2/modeling_tf_gpt2.py @@ -15,6 +15,8 @@ # limitations under the License. """ TF 2.0 OpenAI GPT-2 model.""" +from __future__ import annotations + from dataclasses import dataclass from typing import List, Optional, Tuple, Union @@ -40,7 +42,6 @@ ) from ...tf_utils import check_embeddings_within_bounds, shape_list, stable_softmax from ...utils import ( - DUMMY_INPUTS, ModelOutput, add_code_sample_docstrings, add_start_docstrings, @@ -345,15 +346,15 @@ def _prune_heads(self, heads_to_prune): @unpack_inputs def call( self, - input_ids: Optional[TFModelInputType] = None, + input_ids: TFModelInputType | None = None, past_key_values: Optional[Tuple[Tuple[Union[np.ndarray, tf.Tensor]]]] = None, - attention_mask: Optional[Union[np.ndarray, tf.Tensor]] = None, - token_type_ids: Optional[Union[np.ndarray, tf.Tensor]] = None, - position_ids: Optional[Union[np.ndarray, tf.Tensor]] = None, - head_mask: Optional[Union[np.ndarray, tf.Tensor]] = None, - inputs_embeds: Optional[Union[np.ndarray, tf.Tensor]] = None, - encoder_hidden_states: Optional[Union[np.ndarray, tf.Tensor]] = None, - encoder_attention_mask: Optional[Union[np.ndarray, tf.Tensor]] = None, + attention_mask: np.ndarray | tf.Tensor | None = None, + token_type_ids: np.ndarray | tf.Tensor | None = None, + position_ids: np.ndarray | tf.Tensor | None = None, + head_mask: np.ndarray | tf.Tensor | None = None, + inputs_embeds: np.ndarray | tf.Tensor | None = None, + encoder_hidden_states: np.ndarray | tf.Tensor | None = None, + encoder_attention_mask: np.ndarray | tf.Tensor | None = None, use_cache: Optional[bool] = None, output_attentions: Optional[bool] = None, output_hidden_states: Optional[bool] = None, @@ -520,37 +521,6 @@ class TFGPT2PreTrainedModel(TFPreTrainedModel): # names with a '.' represents the authorized unexpected/missing layers when a TF model is loaded from a PT model _keys_to_ignore_on_load_unexpected = [r"h.\d+.attn.bias", r"h.\d+.crossattention.bias"] - @property - def dummy_inputs(self): - """ - Dummy inputs to build the network. - - Returns: - `Dict[str, tf.Tensor]`: The dummy inputs. - """ - dummy = {"input_ids": tf.constant(DUMMY_INPUTS, dtype=tf.int32)} - # Add `encoder_hidden_states` to make the cross-attention layers' weights initialized - if self.config.add_cross_attention: - batch_size, seq_len = tf.constant(DUMMY_INPUTS).shape - shape = (batch_size, seq_len) + (self.config.hidden_size,) - h = tf.random.uniform(shape=shape) - dummy["encoder_hidden_states"] = h - - return dummy - - @tf.function( - input_signature=[ - { - "input_ids": tf.TensorSpec((None, None), tf.int32, name="input_ids"), - "attention_mask": tf.TensorSpec((None, None), tf.int32, name="attention_mask"), - } - ] - ) - def serving(self, inputs): - output = self.call(inputs) - - return self.serving_output(output) - @dataclass class TFGPT2DoubleHeadsModelOutput(ModelOutput): @@ -583,9 +553,9 @@ class TFGPT2DoubleHeadsModelOutput(ModelOutput): logits: tf.Tensor = None mc_logits: tf.Tensor = None - past_key_values: Optional[List[tf.Tensor]] = None - hidden_states: Optional[Tuple[tf.Tensor]] = None - attentions: Optional[Tuple[tf.Tensor]] = None + past_key_values: List[tf.Tensor] | None = None + hidden_states: Tuple[tf.Tensor] | None = None + attentions: Tuple[tf.Tensor] | None = None GPT2_START_DOCSTRING = r""" @@ -716,15 +686,15 @@ def __init__(self, config, *inputs, **kwargs): ) def call( self, - input_ids: Optional[TFModelInputType] = None, + input_ids: TFModelInputType | None = None, past_key_values: Optional[Tuple[Tuple[Union[np.ndarray, tf.Tensor]]]] = None, - attention_mask: Optional[Union[np.ndarray, tf.Tensor]] = None, - token_type_ids: Optional[Union[np.ndarray, tf.Tensor]] = None, - position_ids: Optional[Union[np.ndarray, tf.Tensor]] = None, - head_mask: Optional[Union[np.ndarray, tf.Tensor]] = None, - inputs_embeds: Optional[Union[np.ndarray, tf.Tensor]] = None, - encoder_hidden_states: Optional[Union[np.ndarray, tf.Tensor]] = None, - encoder_attention_mask: Optional[Union[np.ndarray, tf.Tensor]] = None, + attention_mask: np.ndarray | tf.Tensor | None = None, + token_type_ids: np.ndarray | tf.Tensor | None = None, + position_ids: np.ndarray | tf.Tensor | None = None, + head_mask: np.ndarray | tf.Tensor | None = None, + inputs_embeds: np.ndarray | tf.Tensor | None = None, + encoder_hidden_states: np.ndarray | tf.Tensor | None = None, + encoder_attention_mask: np.ndarray | tf.Tensor | None = None, use_cache: Optional[bool] = None, output_attentions: Optional[bool] = None, output_hidden_states: Optional[bool] = None, @@ -771,26 +741,6 @@ def call( return outputs - def serving_output(self, output): - pkv = tf.convert_to_tensor(output.past_key_values) if self.config.use_cache else None - hs = tf.convert_to_tensor(output.hidden_states) if self.config.output_hidden_states else None - attns = tf.convert_to_tensor(output.attentions) if self.config.output_attentions else None - cross_attns = ( - tf.convert_to_tensor(output.cross_attentions) - if self.config.output_attentions - and self.config.add_cross_attention - and output.cross_attentions is not None - else None - ) - - return TFBaseModelOutputWithPastAndCrossAttentions( - last_hidden_state=output.last_hidden_state, - past_key_values=pkv, - hidden_states=hs, - attentions=attns, - cross_attentions=cross_attns, - ) - @add_start_docstrings( """ @@ -844,20 +794,20 @@ def prepare_inputs_for_generation(self, inputs, past_key_values=None, use_cache= ) def call( self, - input_ids: Optional[TFModelInputType] = None, + input_ids: TFModelInputType | None = None, past_key_values: Optional[Tuple[Tuple[Union[np.ndarray, tf.Tensor]]]] = None, - attention_mask: Optional[Union[np.ndarray, tf.Tensor]] = None, - token_type_ids: Optional[Union[np.ndarray, tf.Tensor]] = None, - position_ids: Optional[Union[np.ndarray, tf.Tensor]] = None, - head_mask: Optional[Union[np.ndarray, tf.Tensor]] = None, - inputs_embeds: Optional[Union[np.ndarray, tf.Tensor]] = None, - encoder_hidden_states: Optional[Union[np.ndarray, tf.Tensor]] = None, - encoder_attention_mask: Optional[Union[np.ndarray, tf.Tensor]] = None, + attention_mask: np.ndarray | tf.Tensor | None = None, + token_type_ids: np.ndarray | tf.Tensor | None = None, + position_ids: np.ndarray | tf.Tensor | None = None, + head_mask: np.ndarray | tf.Tensor | None = None, + inputs_embeds: np.ndarray | tf.Tensor | None = None, + encoder_hidden_states: np.ndarray | tf.Tensor | None = None, + encoder_attention_mask: np.ndarray | tf.Tensor | None = None, use_cache: Optional[bool] = None, output_attentions: Optional[bool] = None, output_hidden_states: Optional[bool] = None, return_dict: Optional[bool] = None, - labels: Optional[Union[np.ndarray, tf.Tensor]] = None, + labels: np.ndarray | tf.Tensor | None = None, training: Optional[bool] = False, ) -> Union[TFCausalLMOutputWithCrossAttentions, Tuple[tf.Tensor]]: r""" @@ -923,22 +873,6 @@ def call( cross_attentions=transformer_outputs.cross_attentions, ) - def serving_output(self, output): - pkv = tf.convert_to_tensor(output.past_key_values) if self.config.use_cache else None - hs = tf.convert_to_tensor(output.hidden_states) if self.config.output_hidden_states else None - attns = tf.convert_to_tensor(output.attentions) if self.config.output_attentions else None - cross_attns = ( - tf.convert_to_tensor(output.cross_attentions) - if self.config.output_attentions - and self.config.add_cross_attention - and output.cross_attentions is not None - else None - ) - - return TFCausalLMOutputWithCrossAttentions( - logits=output.logits, past_key_values=pkv, hidden_states=hs, attentions=attns, cross_attentions=cross_attns - ) - @add_start_docstrings( """ @@ -963,14 +897,14 @@ def __init__(self, config, *inputs, **kwargs): @replace_return_docstrings(output_type=TFGPT2DoubleHeadsModelOutput, config_class=_CONFIG_FOR_DOC) def call( self, - input_ids: Optional[TFModelInputType] = None, + input_ids: TFModelInputType | None = None, past_key_values: Optional[Tuple[Tuple[Union[np.ndarray, tf.Tensor]]]] = None, - attention_mask: Optional[Union[np.ndarray, tf.Tensor]] = None, - token_type_ids: Optional[Union[np.ndarray, tf.Tensor]] = None, - position_ids: Optional[Union[np.ndarray, tf.Tensor]] = None, - head_mask: Optional[Union[np.ndarray, tf.Tensor]] = None, - inputs_embeds: Optional[Union[np.ndarray, tf.Tensor]] = None, - mc_token_ids: Optional[Union[np.ndarray, tf.Tensor]] = None, + attention_mask: np.ndarray | tf.Tensor | None = None, + token_type_ids: np.ndarray | tf.Tensor | None = None, + position_ids: np.ndarray | tf.Tensor | None = None, + head_mask: np.ndarray | tf.Tensor | None = None, + inputs_embeds: np.ndarray | tf.Tensor | None = None, + mc_token_ids: np.ndarray | tf.Tensor | None = None, use_cache: Optional[bool] = None, output_attentions: Optional[bool] = None, output_hidden_states: Optional[bool] = None, @@ -1060,32 +994,13 @@ def call( attentions=transformer_outputs.attentions, ) - @tf.function( - input_signature=[ - { - "input_ids": tf.TensorSpec((None, None, None), tf.int32, name="input_ids"), - "attention_mask": tf.TensorSpec((None, None, None), tf.int32, name="attention_mask"), - "mc_token_ids": tf.TensorSpec((None, None), tf.int32, name="mc_token_ids"), - } - ] - ) - def serving(self, inputs): - output = self.call(inputs) - - return self.serving_output(output) - - def serving_output(self, output): - pkv = tf.convert_to_tensor(output.past_key_values) if self.config.use_cache else None - hs = tf.convert_to_tensor(output.hidden_states) if self.config.output_hidden_states else None - attns = tf.convert_to_tensor(output.attentions) if self.config.output_attentions else None - - return TFGPT2DoubleHeadsModelOutput( - logits=output.logits, - mc_logits=output.mc_logits, - past_key_values=pkv, - hidden_states=hs, - attentions=attns, - ) + @property + def input_signature(self): + return { + "input_ids": tf.TensorSpec((None, None, None), tf.int32, name="input_ids"), + "attention_mask": tf.TensorSpec((None, None, None), tf.int32, name="attention_mask"), + "mc_token_ids": tf.TensorSpec((None, None), tf.int32, name="mc_token_ids"), + } @add_start_docstrings( @@ -1124,18 +1039,18 @@ def __init__(self, config, *inputs, **kwargs): ) def call( self, - input_ids: Optional[TFModelInputType] = None, + input_ids: TFModelInputType | None = None, past_key_values: Optional[Tuple[Tuple[Union[np.ndarray, tf.Tensor]]]] = None, - attention_mask: Optional[Union[np.ndarray, tf.Tensor]] = None, - token_type_ids: Optional[Union[np.ndarray, tf.Tensor]] = None, - position_ids: Optional[Union[np.ndarray, tf.Tensor]] = None, - head_mask: Optional[Union[np.ndarray, tf.Tensor]] = None, - inputs_embeds: Optional[Union[np.ndarray, tf.Tensor]] = None, + attention_mask: np.ndarray | tf.Tensor | None = None, + token_type_ids: np.ndarray | tf.Tensor | None = None, + position_ids: np.ndarray | tf.Tensor | None = None, + head_mask: np.ndarray | tf.Tensor | None = None, + inputs_embeds: np.ndarray | tf.Tensor | None = None, use_cache: Optional[bool] = None, output_attentions: Optional[bool] = None, output_hidden_states: Optional[bool] = None, return_dict: Optional[bool] = None, - labels: Optional[Union[np.ndarray, tf.Tensor]] = None, + labels: np.ndarray | tf.Tensor | None = None, training: Optional[bool] = False, ) -> Union[TFSequenceClassifierOutputWithPast, Tuple[tf.Tensor]]: r""" @@ -1208,12 +1123,3 @@ def call( hidden_states=transformer_outputs.hidden_states, attentions=transformer_outputs.attentions, ) - - def serving_output(self, output): - pkv = tf.convert_to_tensor(output.past_key_values) if self.config.use_cache else None - hs = tf.convert_to_tensor(output.hidden_states) if self.config.output_hidden_states else None - attns = tf.convert_to_tensor(output.attentions) if self.config.output_attentions else None - - return TFSequenceClassifierOutputWithPast( - logits=output.logits, past_key_values=pkv, hidden_states=hs, attentions=attns - ) diff --git a/src/transformers/models/gptj/modeling_gptj.py b/src/transformers/models/gptj/modeling_gptj.py index 3a1f99dd713a7d..82cb280caba9c4 100644 --- a/src/transformers/models/gptj/modeling_gptj.py +++ b/src/transformers/models/gptj/modeling_gptj.py @@ -212,7 +212,7 @@ def forward( key = self._split_heads(key, self.num_attention_heads, self.head_dim, True) value = self._split_heads(value, self.num_attention_heads, self.head_dim, False) - if is_torch_fx_proxy(position_ids): + if is_torch_fx_proxy(position_ids) or torch.jit.is_tracing(): # The logic to conditionally copy to GPU could not be traced, so we do this # every time in the torch.fx case embed_positions = get_embed_positions(self.embed_positions, position_ids) diff --git a/src/transformers/models/gptj/modeling_tf_gptj.py b/src/transformers/models/gptj/modeling_tf_gptj.py index fbef4f0effc733..bbcdf3bd240ada 100644 --- a/src/transformers/models/gptj/modeling_tf_gptj.py +++ b/src/transformers/models/gptj/modeling_tf_gptj.py @@ -14,6 +14,8 @@ # limitations under the License. """ TF 2.0 GPT-J model.""" +from __future__ import annotations + from typing import Optional, Tuple, Union import numpy as np @@ -21,7 +23,6 @@ from ...activations_tf import get_tf_activation from ...file_utils import ( - DUMMY_INPUTS, add_code_sample_docstrings, add_start_docstrings, add_start_docstrings_to_model_forward, @@ -171,8 +172,8 @@ def _attn( query: tf.Tensor, key: tf.Tensor, value: tf.Tensor, - attention_mask: Optional[tf.Tensor] = None, - head_mask: Optional[tf.Tensor] = None, + attention_mask: tf.Tensor | None = None, + head_mask: tf.Tensor | None = None, ) -> Tuple[tf.Tensor, tf.Tensor]: # compute causal mask from causal mask buffer query_length, key_length = shape_list(query)[-2], shape_list(key)[-2] @@ -207,9 +208,9 @@ def call( self, hidden_states: tf.Tensor, layer_past: Optional[Tuple[tf.Tensor, tf.Tensor]] = None, - attention_mask: Optional[tf.Tensor] = None, - position_ids: Optional[tf.Tensor] = None, - head_mask: Optional[tf.Tensor] = None, + attention_mask: tf.Tensor | None = None, + position_ids: tf.Tensor | None = None, + head_mask: tf.Tensor | None = None, use_cache: bool = False, output_attentions: bool = False, ): @@ -301,10 +302,10 @@ def __init__(self, config: GPTJConfig, **kwargs): def call( self, hidden_states: tf.Tensor, - layer_past: Optional[tf.Tensor] = None, - attention_mask: Optional[tf.Tensor] = None, - position_ids: Optional[tf.Tensor] = None, - head_mask: Optional[tf.Tensor] = None, + layer_past: tf.Tensor | None = None, + attention_mask: tf.Tensor | None = None, + position_ids: tf.Tensor | None = None, + head_mask: tf.Tensor | None = None, use_cache: bool = False, output_attentions: bool = False, ): @@ -511,30 +512,6 @@ class TFGPTJPreTrainedModel(TFPreTrainedModel): # names with a '.' represents the authorized unexpected/missing layers when a TF model is loaded from a PT model _keys_to_ignore_on_load_unexpected = [r"h.\d+.attn.bias"] - @property - def dummy_inputs(self): - """ - Dummy inputs to build the network. - - Returns: - `Dict[str, tf.Tensor]`: The dummy inputs. - """ - dummy = {"input_ids": tf.constant(DUMMY_INPUTS, dtype=tf.int32)} - return dummy - - @tf.function( - input_signature=[ - { - "input_ids": tf.TensorSpec((None, None), tf.int32, name="input_ids"), - "attention_mask": tf.TensorSpec((None, None), tf.int32, name="attention_mask"), - } - ] - ) - def serving(self, inputs): - output = self.call(inputs) - - return self.serving_output(output) - GPTJ_START_DOCSTRING = r""" @@ -659,13 +636,13 @@ def __init__(self, config, *inputs, **kwargs): ) def call( self, - input_ids: Optional[TFModelInputType] = None, + input_ids: TFModelInputType | None = None, past_key_values: Optional[Tuple[Tuple[Union[np.ndarray, tf.Tensor]]]] = None, - attention_mask: Optional[Union[np.ndarray, tf.Tensor]] = None, - token_type_ids: Optional[Union[np.ndarray, tf.Tensor]] = None, - position_ids: Optional[Union[np.ndarray, tf.Tensor]] = None, - head_mask: Optional[Union[np.ndarray, tf.Tensor]] = None, - inputs_embeds: Optional[Union[np.ndarray, tf.Tensor]] = None, + attention_mask: np.ndarray | tf.Tensor | None = None, + token_type_ids: np.ndarray | tf.Tensor | None = None, + position_ids: np.ndarray | tf.Tensor | None = None, + head_mask: np.ndarray | tf.Tensor | None = None, + inputs_embeds: np.ndarray | tf.Tensor | None = None, use_cache: Optional[bool] = None, output_attentions: Optional[bool] = None, output_hidden_states: Optional[bool] = None, @@ -695,18 +672,6 @@ def call( return outputs - def serving_output(self, output): - pkv = tf.convert_to_tensor(output.past_key_values) if self.config.use_cache else None - hs = tf.convert_to_tensor(output.hidden_states) if self.config.output_hidden_states else None - attns = tf.convert_to_tensor(output.attentions) if self.config.output_attentions else None - - return TFBaseModelOutputWithPast( - last_hidden_state=output.last_hidden_state, - past_key_values=pkv, - hidden_states=hs, - attentions=attns, - ) - @add_start_docstrings( """ @@ -762,14 +727,14 @@ def prepare_inputs_for_generation(self, inputs, past_key_values=None, use_cache= ) def call( self, - input_ids: Optional[TFModelInputType] = None, + input_ids: TFModelInputType | None = None, past_key_values: Optional[Tuple[Tuple[Union[np.ndarray, tf.Tensor]]]] = None, - attention_mask: Optional[Union[np.ndarray, tf.Tensor]] = None, - token_type_ids: Optional[Union[np.ndarray, tf.Tensor]] = None, - position_ids: Optional[Union[np.ndarray, tf.Tensor]] = None, - head_mask: Optional[Union[np.ndarray, tf.Tensor]] = None, - inputs_embeds: Optional[Union[np.ndarray, tf.Tensor]] = None, - labels: Optional[Union[np.ndarray, tf.Tensor]] = None, + attention_mask: np.ndarray | tf.Tensor | None = None, + token_type_ids: np.ndarray | tf.Tensor | None = None, + position_ids: np.ndarray | tf.Tensor | None = None, + head_mask: np.ndarray | tf.Tensor | None = None, + inputs_embeds: np.ndarray | tf.Tensor | None = None, + labels: np.ndarray | tf.Tensor | None = None, use_cache: Optional[bool] = None, output_attentions: Optional[bool] = None, output_hidden_states: Optional[bool] = None, @@ -819,13 +784,6 @@ def call( attentions=transformer_outputs.attentions, ) - def serving_output(self, output): - pkv = tf.convert_to_tensor(output.past_key_values) if self.config.use_cache else None - hs = tf.convert_to_tensor(output.hidden_states) if self.config.output_hidden_states else None - attns = tf.convert_to_tensor(output.attentions) if self.config.output_attentions else None - - return TFCausalLMOutputWithPast(logits=output.logits, past_key_values=pkv, hidden_states=hs, attentions=attns) - @add_start_docstrings( """ @@ -865,14 +823,14 @@ def __init__(self, config, *inputs, **kwargs): ) def call( self, - input_ids: Optional[TFModelInputType] = None, + input_ids: TFModelInputType | None = None, past_key_values: Optional[Tuple[Tuple[Union[np.ndarray, tf.Tensor]]]] = None, - attention_mask: Optional[Union[np.ndarray, tf.Tensor]] = None, - token_type_ids: Optional[Union[np.ndarray, tf.Tensor]] = None, - position_ids: Optional[Union[np.ndarray, tf.Tensor]] = None, - head_mask: Optional[Union[np.ndarray, tf.Tensor]] = None, - inputs_embeds: Optional[Union[np.ndarray, tf.Tensor]] = None, - labels: Optional[Union[np.ndarray, tf.Tensor]] = None, + attention_mask: np.ndarray | tf.Tensor | None = None, + token_type_ids: np.ndarray | tf.Tensor | None = None, + position_ids: np.ndarray | tf.Tensor | None = None, + head_mask: np.ndarray | tf.Tensor | None = None, + inputs_embeds: np.ndarray | tf.Tensor | None = None, + labels: np.ndarray | tf.Tensor | None = None, use_cache: Optional[bool] = None, output_attentions: Optional[bool] = None, output_hidden_states: Optional[bool] = None, @@ -950,15 +908,6 @@ def call( attentions=transformer_outputs.attentions, ) - def serving_output(self, output): - pkv = tf.convert_to_tensor(output.past_key_values) if self.config.use_cache else None - hs = tf.convert_to_tensor(output.hidden_states) if self.config.output_hidden_states else None - attns = tf.convert_to_tensor(output.attentions) if self.config.output_attentions else None - - return TFSequenceClassifierOutputWithPast( - logits=output.logits, past_key_values=pkv, hidden_states=hs, attentions=attns - ) - @add_start_docstrings( """ @@ -987,15 +936,15 @@ def __init__(self, config, *inputs, **kwargs): ) def call( self, - input_ids: Optional[TFModelInputType] = None, + input_ids: TFModelInputType | None = None, past_key_values: Optional[Tuple[Tuple[Union[np.ndarray, tf.Tensor]]]] = None, - attention_mask: Optional[Union[np.ndarray, tf.Tensor]] = None, - token_type_ids: Optional[Union[np.ndarray, tf.Tensor]] = None, - position_ids: Optional[Union[np.ndarray, tf.Tensor]] = None, - head_mask: Optional[Union[np.ndarray, tf.Tensor]] = None, - inputs_embeds: Optional[Union[np.ndarray, tf.Tensor]] = None, - start_positions: Optional[Union[np.ndarray, tf.Tensor]] = None, - end_positions: Optional[Union[np.ndarray, tf.Tensor]] = None, + attention_mask: np.ndarray | tf.Tensor | None = None, + token_type_ids: np.ndarray | tf.Tensor | None = None, + position_ids: np.ndarray | tf.Tensor | None = None, + head_mask: np.ndarray | tf.Tensor | None = None, + inputs_embeds: np.ndarray | tf.Tensor | None = None, + start_positions: np.ndarray | tf.Tensor | None = None, + end_positions: np.ndarray | tf.Tensor | None = None, output_attentions: Optional[bool] = None, output_hidden_states: Optional[bool] = None, return_dict: Optional[bool] = None, @@ -1049,11 +998,3 @@ def call( hidden_states=transformer_outputs.hidden_states, attentions=transformer_outputs.attentions, ) - - def serving_output(self, output: TFQuestionAnsweringModelOutput) -> TFQuestionAnsweringModelOutput: - hs = tf.convert_to_tensor(output.hidden_states) if self.config.output_hidden_states else None - attns = tf.convert_to_tensor(output.attentions) if self.config.output_attentions else None - - return TFQuestionAnsweringModelOutput( - start_logits=output.start_logits, end_logits=output.end_logits, hidden_states=hs, attentions=attns - ) diff --git a/src/transformers/models/graphormer/collating_graphormer.py b/src/transformers/models/graphormer/collating_graphormer.py index e2cccc6668a417..58ce602ea28de1 100644 --- a/src/transformers/models/graphormer/collating_graphormer.py +++ b/src/transformers/models/graphormer/collating_graphormer.py @@ -129,6 +129,6 @@ def __call__(self, features: List[dict]) -> Dict[str, Any]: else: # binary classification batch["labels"] = torch.from_numpy(np.concatenate([i["labels"] for i in features])) else: # multi task classification, left to float to keep the NaNs - batch["labels"] = torch.from_numpy(np.stack([i["labels"] for i in features], dim=0)) + batch["labels"] = torch.from_numpy(np.stack([i["labels"] for i in features], axis=0)) return batch diff --git a/src/transformers/models/groupvit/modeling_tf_groupvit.py b/src/transformers/models/groupvit/modeling_tf_groupvit.py index 4891931c20abe5..5c989356a5de61 100644 --- a/src/transformers/models/groupvit/modeling_tf_groupvit.py +++ b/src/transformers/models/groupvit/modeling_tf_groupvit.py @@ -15,10 +15,12 @@ """ TF 2.0 GroupViT model.""" +from __future__ import annotations + import collections.abc import math from dataclasses import dataclass -from typing import Any, Dict, Optional, Tuple, Union +from typing import Any, Optional, Tuple, Union import numpy as np import tensorflow as tf @@ -26,7 +28,6 @@ from ...activations_tf import get_tf_activation from ...modeling_tf_outputs import TFBaseModelOutput, TFBaseModelOutputWithPooling from ...modeling_tf_utils import ( - DUMMY_INPUTS, TFModelInputType, TFPreTrainedModel, get_initializer, @@ -247,7 +248,7 @@ class TFGroupViTModelOutput(ModelOutput): The output of the [`TFGroupViTVisionModel`]. """ - loss: Optional[tf.Tensor] = None + loss: tf.Tensor | None = None logits_per_image: tf.Tensor = None logits_per_text: tf.Tensor = None segmentation_logits: tf.Tensor = None @@ -647,7 +648,7 @@ def split_x(self, x: tf.Tensor) -> tf.Tensor: else: return x, None - def concat_x(self, x: tf.Tensor, group_token: Optional[tf.Tensor] = None) -> tf.Tensor: + def concat_x(self, x: tf.Tensor, group_token: tf.Tensor | None = None) -> tf.Tensor: if group_token is None: return x return tf.concat([x, group_token], axis=1) @@ -655,7 +656,7 @@ def concat_x(self, x: tf.Tensor, group_token: Optional[tf.Tensor] = None) -> tf. def call( self, hidden_states: tf.Tensor, - prev_group_token: Optional[tf.Tensor] = None, + prev_group_token: tf.Tensor | None = None, output_attentions: bool = False, training: bool = False, ) -> Tuple[tf.Tensor]: @@ -1138,9 +1139,9 @@ def set_input_embeddings(self, value: tf.Variable): @unpack_inputs def call( self, - input_ids: Optional[TFModelInputType] = None, - attention_mask: Optional[Union[np.ndarray, tf.Tensor]] = None, - position_ids: Optional[Union[np.ndarray, tf.Tensor]] = None, + input_ids: TFModelInputType | None = None, + attention_mask: np.ndarray | tf.Tensor | None = None, + position_ids: np.ndarray | tf.Tensor | None = None, output_attentions: Optional[bool] = None, output_hidden_states: Optional[bool] = None, return_dict: Optional[bool] = None, @@ -1183,7 +1184,7 @@ def get_input_embeddings(self) -> tf.keras.layers.Layer: @unpack_inputs def call( self, - pixel_values: Optional[TFModelInputType] = None, + pixel_values: TFModelInputType | None = None, output_attentions: Optional[bool] = None, output_hidden_states: Optional[bool] = None, return_dict: Optional[bool] = None, @@ -1262,9 +1263,9 @@ def build(self, input_shape: tf.TensorShape): @unpack_inputs def get_text_features( self, - input_ids: Optional[TFModelInputType] = None, - attention_mask: Optional[Union[np.ndarray, tf.Tensor]] = None, - position_ids: Optional[Union[np.ndarray, tf.Tensor]] = None, + input_ids: TFModelInputType | None = None, + attention_mask: np.ndarray | tf.Tensor | None = None, + position_ids: np.ndarray | tf.Tensor | None = None, output_attentions: Optional[bool] = None, output_hidden_states: Optional[bool] = None, return_dict: Optional[bool] = None, @@ -1298,7 +1299,7 @@ def get_text_features( @unpack_inputs def get_image_features( self, - pixel_values: Optional[TFModelInputType] = None, + pixel_values: TFModelInputType | None = None, output_attentions: Optional[bool] = None, output_hidden_states: Optional[bool] = None, return_dict: Optional[bool] = None, @@ -1325,10 +1326,10 @@ def get_image_features( @unpack_inputs def call( self, - input_ids: Optional[TFModelInputType] = None, - pixel_values: Optional[TFModelInputType] = None, - attention_mask: Optional[Union[np.ndarray, tf.Tensor]] = None, - position_ids: Optional[Union[np.ndarray, tf.Tensor]] = None, + input_ids: TFModelInputType | None = None, + pixel_values: TFModelInputType | None = None, + attention_mask: np.ndarray | tf.Tensor | None = None, + position_ids: np.ndarray | tf.Tensor | None = None, return_loss: Optional[bool] = None, output_attentions: Optional[bool] = None, output_hidden_states: Optional[bool] = None, @@ -1606,38 +1607,14 @@ def __init__(self, config: GroupViTTextConfig, *inputs, **kwargs): self.groupvit = TFGroupViTTextMainLayer(config, name="groupvit") - @property - def dummy_inputs(self) -> Dict[str, tf.Tensor]: - """ - Dummy inputs to build the network. - - Returns: - `Dict[str, tf.Tensor]`: The dummy inputs. - """ - return { - "input_ids": tf.constant(DUMMY_INPUTS, dtype=tf.int32), - } - - @tf.function( - input_signature=[ - { - "input_ids": tf.TensorSpec((None, None), tf.int32, name="input_ids"), - "attention_mask": tf.TensorSpec((None, None), tf.int32, name="attention_mask"), - } - ] - ) - def serving(self, inputs: Dict[str, tf.Tensor]) -> TFBaseModelOutputWithPooling: - output = self.call(inputs) - return self.serving_output(output) - @unpack_inputs @add_start_docstrings_to_model_forward(GROUPVIT_TEXT_INPUTS_DOCSTRING.format("batch_size, sequence_length")) @replace_return_docstrings(output_type=TFBaseModelOutputWithPooling, config_class=GroupViTTextConfig) def call( self, - input_ids: Optional[TFModelInputType] = None, - attention_mask: Optional[Union[np.ndarray, tf.Tensor]] = None, - position_ids: Optional[Union[np.ndarray, tf.Tensor]] = None, + input_ids: TFModelInputType | None = None, + attention_mask: np.ndarray | tf.Tensor | None = None, + position_ids: np.ndarray | tf.Tensor | None = None, output_attentions: Optional[bool] = None, output_hidden_states: Optional[bool] = None, return_dict: Optional[bool] = None, @@ -1673,17 +1650,6 @@ def call( return outputs - def serving_output(self, output: TFBaseModelOutputWithPooling) -> TFBaseModelOutputWithPooling: - hs = tf.convert_to_tensor(output.hidden_states) if self.config.output_hidden_states else None - attns = tf.convert_to_tensor(output.attentions) if self.config.output_attentions else None - - return TFBaseModelOutputWithPooling( - last_hidden_state=output.last_hidden_state, - pooler_output=output.pooler_output, - hidden_states=hs, - attentions=attns, - ) - class TFGroupViTVisionModel(TFGroupViTPreTrainedModel): config_class = GroupViTVisionConfig @@ -1694,44 +1660,12 @@ def __init__(self, config: GroupViTVisionConfig, *inputs, **kwargs): self.groupvit = TFGroupViTVisionMainLayer(config, name="groupvit") - @property - def dummy_inputs(self) -> Dict[str, tf.Tensor]: - """ - Dummy inputs to build the network. - - Returns: - `Dict[str, tf.Tensor]`: The dummy inputs. - """ - VISION_DUMMY_INPUTS = tf.random.uniform( - shape=(len(DUMMY_INPUTS), 3, self.config.image_size, self.config.image_size), dtype=tf.float32 - ) - return {"pixel_values": VISION_DUMMY_INPUTS} - - @tf.function( - input_signature=[ - { - "pixel_values": tf.TensorSpec((None, None, None, None), tf.float32, name="pixel_values"), - } - ] - ) - def serving(self, inputs: Dict[str, tf.Tensor]) -> TFBaseModelOutputWithPooling: - """ - Method used for serving the model. - - Args: - inputs (`Dict[str, tf.Tensor]`): - The input of the saved model as a dictionary of tensors. - """ - output = self.call(inputs) - - return self.serving_output(output) - @unpack_inputs @add_start_docstrings_to_model_forward(GROUPVIT_VISION_INPUTS_DOCSTRING) @replace_return_docstrings(output_type=TFBaseModelOutputWithPooling, config_class=GroupViTVisionConfig) def call( self, - pixel_values: Optional[TFModelInputType] = None, + pixel_values: TFModelInputType | None = None, output_attentions: Optional[bool] = None, output_hidden_states: Optional[bool] = None, return_dict: Optional[bool] = None, @@ -1770,15 +1704,6 @@ def call( return outputs - def serving_output(self, output: TFBaseModelOutputWithPooling) -> TFBaseModelOutputWithPooling: - # hidden_states and attentions not converted to Tensor with tf.convert_to_tensor as they are all of different dimensions - return TFBaseModelOutputWithPooling( - last_hidden_state=output.last_hidden_state, - pooler_output=output.pooler_output, - hidden_states=output.hidden_states, - attentions=output.attentions, - ) - @add_start_docstrings(GROUPVIT_START_DOCSTRING) class TFGroupViTModel(TFGroupViTPreTrainedModel): @@ -1789,51 +1714,13 @@ def __init__(self, config: GroupViTConfig, *inputs, **kwargs): self.groupvit = TFGroupViTMainLayer(config, name="groupvit") - @property - def dummy_inputs(self) -> Dict[str, tf.Tensor]: - """ - Dummy inputs to build the network. - - Returns: - `Dict[str, tf.Tensor]`: The dummy inputs. - """ - VISION_DUMMY_INPUTS = tf.random.uniform( - shape=(len(DUMMY_INPUTS), 3, self.config.vision_config.image_size, self.config.vision_config.image_size), - dtype=tf.float32, - ) - return { - "input_ids": tf.constant(DUMMY_INPUTS, dtype=tf.int32), - "pixel_values": VISION_DUMMY_INPUTS, - } - - @tf.function( - input_signature=[ - { - "input_ids": tf.TensorSpec((None, None), tf.int32, name="input_ids"), - "pixel_values": tf.TensorSpec((None, None, None, None), tf.float64, name="pixel_values"), - "attention_mask": tf.TensorSpec((None, None), tf.int32, name="attention_mask"), - } - ] - ) - def serving(self, inputs: Dict[str, tf.Tensor]) -> TFGroupViTModelOutput: - """ - Method used for serving the model. - - Args: - inputs (`Dict[str, tf.Tensor]`): - The input of the saved model as a dictionary of tensors. - """ - output = self.call(inputs) - - return self.serving_output(output) - @unpack_inputs @add_start_docstrings_to_model_forward(GROUPVIT_TEXT_INPUTS_DOCSTRING.format("batch_size, sequence_length")) def get_text_features( self, - input_ids: Optional[TFModelInputType] = None, - attention_mask: Optional[Union[np.ndarray, tf.Tensor]] = None, - position_ids: Optional[Union[np.ndarray, tf.Tensor]] = None, + input_ids: TFModelInputType | None = None, + attention_mask: np.ndarray | tf.Tensor | None = None, + position_ids: np.ndarray | tf.Tensor | None = None, output_attentions: Optional[bool] = None, output_hidden_states: Optional[bool] = None, return_dict: Optional[bool] = None, @@ -1872,7 +1759,7 @@ def get_text_features( @add_start_docstrings_to_model_forward(GROUPVIT_VISION_INPUTS_DOCSTRING) def get_image_features( self, - pixel_values: Optional[TFModelInputType] = None, + pixel_values: TFModelInputType | None = None, output_attentions: Optional[bool] = None, output_hidden_states: Optional[bool] = None, return_dict: Optional[bool] = None, @@ -1916,10 +1803,10 @@ def get_image_features( @replace_return_docstrings(output_type=TFGroupViTModelOutput, config_class=GroupViTConfig) def call( self, - input_ids: Optional[TFModelInputType] = None, - pixel_values: Optional[TFModelInputType] = None, - attention_mask: Optional[Union[np.ndarray, tf.Tensor]] = None, - position_ids: Optional[Union[np.ndarray, tf.Tensor]] = None, + input_ids: TFModelInputType | None = None, + pixel_values: TFModelInputType | None = None, + attention_mask: np.ndarray | tf.Tensor | None = None, + position_ids: np.ndarray | tf.Tensor | None = None, return_loss: Optional[bool] = None, output_attentions: Optional[bool] = None, output_hidden_states: Optional[bool] = None, diff --git a/src/transformers/models/hubert/configuration_hubert.py b/src/transformers/models/hubert/configuration_hubert.py index 139df45bbb791d..1f326871c3c917 100644 --- a/src/transformers/models/hubert/configuration_hubert.py +++ b/src/transformers/models/hubert/configuration_hubert.py @@ -62,6 +62,9 @@ class HubertConfig(PretrainedConfig): The dropout ratio for the attention probabilities. final_dropout (`float`, *optional*, defaults to 0.1): The dropout probabilitiy for the final projection layer of [`Wav2Vec2ForCTC`]. + layerdrop (`float`, *optional*, defaults to 0.1): + The LayerDrop probability. See the [LayerDrop paper](see https://arxiv.org/abs/1909.11556) for more + details. initializer_range (`float`, *optional*, defaults to 0.02): The standard deviation of the truncated_normal_initializer for initializing all weight matrices. layer_norm_eps (`float`, *optional*, defaults to 1e-12): diff --git a/src/transformers/models/hubert/modeling_tf_hubert.py b/src/transformers/models/hubert/modeling_tf_hubert.py index 24cbde9af7c356..c237616bf2a42c 100644 --- a/src/transformers/models/hubert/modeling_tf_hubert.py +++ b/src/transformers/models/hubert/modeling_tf_hubert.py @@ -13,8 +13,11 @@ # See the License for the specific language governing permissions and # limitations under the License. """ TensorFlow Hubert model.""" + +from __future__ import annotations + import warnings -from typing import Any, Dict, Optional, Tuple, Union +from typing import Any, Optional, Tuple, Union import numpy as np import tensorflow as tf @@ -642,12 +645,12 @@ def _shape(self, tensor: tf.Tensor, seq_len: int, bsz: int): def call( self, hidden_states: tf.Tensor, - key_value_states: Optional[tf.Tensor] = None, - past_key_value: Optional[Tuple[Tuple[tf.Tensor]]] = None, - attention_mask: Optional[tf.Tensor] = None, - layer_head_mask: Optional[tf.Tensor] = None, + key_value_states: tf.Tensor | None = None, + past_key_value: Tuple[Tuple[tf.Tensor]] | None = None, + attention_mask: tf.Tensor | None = None, + layer_head_mask: tf.Tensor | None = None, training: Optional[bool] = False, - ) -> Tuple[tf.Tensor, Optional[tf.Tensor]]: + ) -> Tuple[tf.Tensor, tf.Tensor | None]: """Input shape: Batch x Time x Channel""" # if key_value_states are provided this layer is used as a cross-attention layer @@ -812,7 +815,7 @@ def __init__(self, config: HubertConfig, **kwargs): def call( self, hidden_states: tf.Tensor, - attention_mask: Optional[tf.Tensor] = None, + attention_mask: tf.Tensor | None = None, output_attentions: Optional[bool] = False, training: bool = False, ) -> Tuple[tf.Tensor]: @@ -856,7 +859,7 @@ def __init__(self, config: HubertConfig, **kwargs): def call( self, hidden_states: tf.Tensor, - attention_mask: Optional[tf.Tensor] = None, + attention_mask: tf.Tensor | None = None, output_attentions: Optional[bool] = False, training: bool = False, ) -> Tuple[tf.Tensor]: @@ -890,7 +893,7 @@ def __init__(self, config: HubertConfig, **kwargs): def call( self, hidden_states: tf.Tensor, - attention_mask: Optional[tf.Tensor] = None, + attention_mask: tf.Tensor | None = None, output_attentions: Optional[bool] = False, output_hidden_states: Optional[bool] = False, return_dict: Optional[bool] = True, @@ -958,7 +961,7 @@ def __init__(self, config: HubertConfig, **kwargs): def call( self, hidden_states: tf.Tensor, - attention_mask: Optional[tf.Tensor] = None, + attention_mask: tf.Tensor | None = None, output_attentions: Optional[bool] = False, output_hidden_states: Optional[bool] = False, return_dict: Optional[bool] = True, @@ -1048,7 +1051,7 @@ def _conv_out_length(input_length, kernel_size, stride): return input_lengths - def _mask_hidden_states(self, hidden_states: tf.Tensor, mask_time_indices: Optional[tf.Tensor] = None): + def _mask_hidden_states(self, hidden_states: tf.Tensor, mask_time_indices: tf.Tensor | None = None): """ Masks extracted features along time axis and/or along feature axis according to [SpecAugment](https://arxiv.org/abs/1904.08779). @@ -1096,13 +1099,13 @@ def _mask_hidden_states(self, hidden_states: tf.Tensor, mask_time_indices: Optio def call( self, input_values: tf.Tensor, - attention_mask: Optional[tf.Tensor] = None, - token_type_ids: Optional[tf.Tensor] = None, - position_ids: Optional[tf.Tensor] = None, - head_mask: Optional[tf.Tensor] = None, - inputs_embeds: Optional[tf.Tensor] = None, - output_attentions: Optional[tf.Tensor] = None, - output_hidden_states: Optional[tf.Tensor] = None, + attention_mask: tf.Tensor | None = None, + token_type_ids: tf.Tensor | None = None, + position_ids: tf.Tensor | None = None, + head_mask: tf.Tensor | None = None, + inputs_embeds: tf.Tensor | None = None, + output_attentions: tf.Tensor | None = None, + output_hidden_states: tf.Tensor | None = None, return_dict: Optional[bool] = None, training: bool = False, **kwargs: Any, @@ -1154,14 +1157,12 @@ class TFHubertPreTrainedModel(TFPreTrainedModel): main_input_name = "input_values" @property - def dummy_inputs(self) -> Dict[str, tf.Tensor]: - pad_token = 0.0 - input_values = tf.convert_to_tensor(np.random.rand(1, 16000), tf.float32) - dummy_inputs = { - "input_values": input_values, - "attention_mask": tf.cast(tf.not_equal(input_values, pad_token), tf.float32), + def input_signature(self): + return { + "input_values": tf.TensorSpec((None, 16000), tf.float32, name="input_values"), + "attention_mask": tf.TensorSpec((None, None), tf.int32, name="attention_mask"), + "token_type_ids": tf.TensorSpec((None, None), tf.int32, name="token_type_ids"), } - return dummy_inputs def __init__(self, config, *inputs, **kwargs): super().__init__(config, *inputs, **kwargs) @@ -1170,20 +1171,6 @@ def __init__(self, config, *inputs, **kwargs): "to train/fine-tine this model, you need a GPU or a TPU" ) - @tf.function( - input_signature=[ - { - "input_values": tf.TensorSpec((None, None), tf.float32, name="input_values"), - "attention_mask": tf.TensorSpec((None, None), tf.int32, name="attention_mask"), - "token_type_ids": tf.TensorSpec((None, None), tf.int32, name="token_type_ids"), - } - ] - ) - def serving(self, inputs): - output = self.call(input_values=inputs, training=False) - - return self.serving_output(output) - HUBERT_START_DOCSTRING = r""" @@ -1299,11 +1286,11 @@ def __init__(self, config: HubertConfig, *inputs, **kwargs): def call( self, input_values: tf.Tensor, - attention_mask: Optional[tf.Tensor] = None, - token_type_ids: Optional[tf.Tensor] = None, - position_ids: Optional[tf.Tensor] = None, - head_mask: Optional[tf.Tensor] = None, - inputs_embeds: Optional[tf.Tensor] = None, + attention_mask: tf.Tensor | None = None, + token_type_ids: tf.Tensor | None = None, + position_ids: tf.Tensor | None = None, + head_mask: tf.Tensor | None = None, + inputs_embeds: tf.Tensor | None = None, output_attentions: Optional[bool] = None, output_hidden_states: Optional[bool] = None, return_dict: Optional[bool] = None, @@ -1356,13 +1343,6 @@ def call( return outputs - def serving_output(self, output): - hidden_states = tf.convert_to_tensor(output.hidden_states) if self.config.output_hidden_states else None - attentions = tf.convert_to_tensor(output.attentions) if self.config.output_attentions else None - return TFBaseModelOutput( - last_hidden_state=output.last_hidden_state, hidden_states=hidden_states, attentions=attentions - ) - @add_start_docstrings( """TFHubert Model with a `language modeling` head on top for Connectionist Temporal Classification (CTC).""", @@ -1401,13 +1381,13 @@ def freeze_feature_encoder(self): def call( self, input_values: tf.Tensor, - attention_mask: Optional[tf.Tensor] = None, - token_type_ids: Optional[tf.Tensor] = None, - position_ids: Optional[tf.Tensor] = None, - head_mask: Optional[tf.Tensor] = None, - inputs_embeds: Optional[tf.Tensor] = None, + attention_mask: tf.Tensor | None = None, + token_type_ids: tf.Tensor | None = None, + position_ids: tf.Tensor | None = None, + head_mask: tf.Tensor | None = None, + inputs_embeds: tf.Tensor | None = None, output_attentions: Optional[bool] = None, - labels: Optional[tf.Tensor] = None, + labels: tf.Tensor | None = None, output_hidden_states: Optional[bool] = None, return_dict: Optional[bool] = None, training: Optional[bool] = False, @@ -1515,8 +1495,3 @@ def call( hidden_states=outputs.hidden_states, attentions=outputs.attentions, ) - - def serving_output(self, output: TFCausalLMOutput) -> TFCausalLMOutput: - hidden_states = tf.convert_to_tensor(output.hidden_states) if self.config.output_hidden_states else None - attentions = tf.convert_to_tensor(output.attentions) if self.config.output_attentions else None - return TFCausalLMOutput(logits=output.logits, hidden_states=hidden_states, attentions=attentions) diff --git a/src/transformers/models/informer/configuration_informer.py b/src/transformers/models/informer/configuration_informer.py index d5950275b988aa..d8af8c793cdb28 100644 --- a/src/transformers/models/informer/configuration_informer.py +++ b/src/transformers/models/informer/configuration_informer.py @@ -232,9 +232,6 @@ def __init__( self.activation_function = activation_function self.init_std = init_std - self.output_attentions = False - self.output_hidden_states = False - self.use_cache = use_cache # Informer diff --git a/src/transformers/models/informer/modeling_informer.py b/src/transformers/models/informer/modeling_informer.py index 1d0451add50d83..4c8edcbc1564c8 100644 --- a/src/transformers/models/informer/modeling_informer.py +++ b/src/transformers/models/informer/modeling_informer.py @@ -142,7 +142,9 @@ def __init__( self.default_scale = default_scale @torch.no_grad() - def forward(self, data: torch.Tensor, observed_indicator: torch.Tensor) -> Tuple[torch.Tensor, torch.Tensor]: + def forward( + self, data: torch.Tensor, observed_indicator: torch.Tensor + ) -> Tuple[torch.Tensor, torch.Tensor, torch.Tensor]: # shape: (N, [C], T=1) ts_sum = (data * observed_indicator).abs().sum(self.dim, keepdim=True) num_observed = observed_indicator.sum(self.dim, keepdim=True) @@ -1669,7 +1671,7 @@ def forward( >>> from transformers import InformerModel >>> file = hf_hub_download( - ... repo_id="kashif/tourism-monthly-batch", filename="train-batch.pt", repo_type="dataset" + ... repo_id="hf-internal-testing/tourism-monthly-batch", filename="train-batch.pt", repo_type="dataset" ... ) >>> batch = torch.load(file) @@ -1834,7 +1836,7 @@ def forward( >>> from transformers import InformerForPrediction >>> file = hf_hub_download( - ... repo_id="kashif/tourism-monthly-batch", filename="train-batch.pt", repo_type="dataset" + ... repo_id="hf-internal-testing/tourism-monthly-batch", filename="train-batch.pt", repo_type="dataset" ... ) >>> batch = torch.load(file) diff --git a/src/transformers/models/layoutlm/modeling_tf_layoutlm.py b/src/transformers/models/layoutlm/modeling_tf_layoutlm.py index 2755e055370b0d..c756609468598c 100644 --- a/src/transformers/models/layoutlm/modeling_tf_layoutlm.py +++ b/src/transformers/models/layoutlm/modeling_tf_layoutlm.py @@ -14,6 +14,9 @@ # limitations under the License. """ TF 2.0 LayoutLM model.""" + +from __future__ import annotations + import math import warnings from typing import Dict, Optional, Tuple, Union @@ -423,9 +426,9 @@ def call( hidden_states: tf.Tensor, attention_mask: tf.Tensor, head_mask: tf.Tensor, - encoder_hidden_states: Optional[tf.Tensor], - encoder_attention_mask: Optional[tf.Tensor], - past_key_value: Optional[Tuple[tf.Tensor]], + encoder_hidden_states: tf.Tensor | None, + encoder_attention_mask: tf.Tensor | None, + past_key_value: Tuple[tf.Tensor] | None, output_attentions: bool, training: bool = False, ) -> Tuple[tf.Tensor]: @@ -502,9 +505,9 @@ def call( hidden_states: tf.Tensor, attention_mask: tf.Tensor, head_mask: tf.Tensor, - encoder_hidden_states: Optional[tf.Tensor], - encoder_attention_mask: Optional[tf.Tensor], - past_key_values: Optional[Tuple[Tuple[tf.Tensor]]], + encoder_hidden_states: tf.Tensor | None, + encoder_attention_mask: tf.Tensor | None, + past_key_values: Tuple[Tuple[tf.Tensor]] | None, use_cache: Optional[bool], output_attentions: bool, output_hidden_states: bool, @@ -694,15 +697,15 @@ class PreTrainedModel @unpack_inputs def call( self, - input_ids: Optional[TFModelInputType] = None, - bbox: Optional[Union[np.ndarray, tf.Tensor]] = None, - attention_mask: Optional[Union[np.ndarray, tf.Tensor]] = None, - token_type_ids: Optional[Union[np.ndarray, tf.Tensor]] = None, - position_ids: Optional[Union[np.ndarray, tf.Tensor]] = None, - head_mask: Optional[Union[np.ndarray, tf.Tensor]] = None, - inputs_embeds: Optional[Union[np.ndarray, tf.Tensor]] = None, - encoder_hidden_states: Optional[Union[np.ndarray, tf.Tensor]] = None, - encoder_attention_mask: Optional[Union[np.ndarray, tf.Tensor]] = None, + input_ids: TFModelInputType | None = None, + bbox: np.ndarray | tf.Tensor | None = None, + attention_mask: np.ndarray | tf.Tensor | None = None, + token_type_ids: np.ndarray | tf.Tensor | None = None, + position_ids: np.ndarray | tf.Tensor | None = None, + head_mask: np.ndarray | tf.Tensor | None = None, + inputs_embeds: np.ndarray | tf.Tensor | None = None, + encoder_hidden_states: np.ndarray | tf.Tensor | None = None, + encoder_attention_mask: np.ndarray | tf.Tensor | None = None, output_attentions: Optional[bool] = None, output_hidden_states: Optional[bool] = None, return_dict: Optional[bool] = None, @@ -919,15 +922,15 @@ def __init__(self, config: LayoutLMConfig, *inputs, **kwargs): ) def call( self, - input_ids: Optional[TFModelInputType] = None, - bbox: Optional[Union[np.ndarray, tf.Tensor]] = None, - attention_mask: Optional[Union[np.ndarray, tf.Tensor]] = None, - token_type_ids: Optional[Union[np.ndarray, tf.Tensor]] = None, - position_ids: Optional[Union[np.ndarray, tf.Tensor]] = None, - head_mask: Optional[Union[np.ndarray, tf.Tensor]] = None, - inputs_embeds: Optional[Union[np.ndarray, tf.Tensor]] = None, - encoder_hidden_states: Optional[Union[np.ndarray, tf.Tensor]] = None, - encoder_attention_mask: Optional[Union[np.ndarray, tf.Tensor]] = None, + input_ids: TFModelInputType | None = None, + bbox: np.ndarray | tf.Tensor | None = None, + attention_mask: np.ndarray | tf.Tensor | None = None, + token_type_ids: np.ndarray | tf.Tensor | None = None, + position_ids: np.ndarray | tf.Tensor | None = None, + head_mask: np.ndarray | tf.Tensor | None = None, + inputs_embeds: np.ndarray | tf.Tensor | None = None, + encoder_hidden_states: np.ndarray | tf.Tensor | None = None, + encoder_attention_mask: np.ndarray | tf.Tensor | None = None, output_attentions: Optional[bool] = None, output_hidden_states: Optional[bool] = None, return_dict: Optional[bool] = None, @@ -983,27 +986,6 @@ def call( return outputs - # Copied from transformers.models.bert.modeling_tf_bert.TFBertModel.serving_output - def serving_output( - self, output: TFBaseModelOutputWithPoolingAndCrossAttentions - ) -> TFBaseModelOutputWithPoolingAndCrossAttentions: - output_cache = self.config.use_cache and self.config.is_decoder - pkv = tf.convert_to_tensor(output.past_key_values) if output_cache else None - hs = tf.convert_to_tensor(output.hidden_states) if self.config.output_hidden_states else None - attns = tf.convert_to_tensor(output.attentions) if self.config.output_attentions else None - cross_attns = tf.convert_to_tensor(output.cross_attentions) if output.cross_attentions is not None else None - if not (self.config.output_attentions and self.config.add_cross_attention): - cross_attns = None - - return TFBaseModelOutputWithPoolingAndCrossAttentions( - last_hidden_state=output.last_hidden_state, - pooler_output=output.pooler_output, - past_key_values=pkv, - hidden_states=hs, - attentions=attns, - cross_attentions=cross_attns, - ) - @add_start_docstrings("""LayoutLM Model with a `language modeling` head on top.""", LAYOUTLM_START_DOCSTRING) class TFLayoutLMForMaskedLM(TFLayoutLMPreTrainedModel, TFMaskedLanguageModelingLoss): @@ -1039,17 +1021,17 @@ def get_prefix_bias_name(self) -> str: @replace_return_docstrings(output_type=TFMaskedLMOutput, config_class=_CONFIG_FOR_DOC) def call( self, - input_ids: Optional[TFModelInputType] = None, - bbox: Optional[Union[np.ndarray, tf.Tensor]] = None, - attention_mask: Optional[Union[np.ndarray, tf.Tensor]] = None, - token_type_ids: Optional[Union[np.ndarray, tf.Tensor]] = None, - position_ids: Optional[Union[np.ndarray, tf.Tensor]] = None, - head_mask: Optional[Union[np.ndarray, tf.Tensor]] = None, - inputs_embeds: Optional[Union[np.ndarray, tf.Tensor]] = None, + input_ids: TFModelInputType | None = None, + bbox: np.ndarray | tf.Tensor | None = None, + attention_mask: np.ndarray | tf.Tensor | None = None, + token_type_ids: np.ndarray | tf.Tensor | None = None, + position_ids: np.ndarray | tf.Tensor | None = None, + head_mask: np.ndarray | tf.Tensor | None = None, + inputs_embeds: np.ndarray | tf.Tensor | None = None, output_attentions: Optional[bool] = None, output_hidden_states: Optional[bool] = None, return_dict: Optional[bool] = None, - labels: Optional[Union[np.ndarray, tf.Tensor]] = None, + labels: np.ndarray | tf.Tensor | None = None, training: Optional[bool] = False, ) -> Union[TFMaskedLMOutput, Tuple[tf.Tensor]]: r""" @@ -1125,12 +1107,6 @@ def call( attentions=outputs.attentions, ) - def serving_output(self, output: TFMaskedLMOutput) -> TFMaskedLMOutput: - hs = tf.convert_to_tensor(output.hidden_states) if self.config.output_hidden_states else None - attns = tf.convert_to_tensor(output.attentions) if self.config.output_attentions else None - - return TFMaskedLMOutput(logits=output.logits, hidden_states=hs, attentions=attns) - @add_start_docstrings( """ @@ -1162,17 +1138,17 @@ def __init__(self, config: LayoutLMConfig, *inputs, **kwargs): @replace_return_docstrings(output_type=TFSequenceClassifierOutput, config_class=_CONFIG_FOR_DOC) def call( self, - input_ids: Optional[TFModelInputType] = None, - bbox: Optional[Union[np.ndarray, tf.Tensor]] = None, - attention_mask: Optional[Union[np.ndarray, tf.Tensor]] = None, - token_type_ids: Optional[Union[np.ndarray, tf.Tensor]] = None, - position_ids: Optional[Union[np.ndarray, tf.Tensor]] = None, - head_mask: Optional[Union[np.ndarray, tf.Tensor]] = None, - inputs_embeds: Optional[Union[np.ndarray, tf.Tensor]] = None, + input_ids: TFModelInputType | None = None, + bbox: np.ndarray | tf.Tensor | None = None, + attention_mask: np.ndarray | tf.Tensor | None = None, + token_type_ids: np.ndarray | tf.Tensor | None = None, + position_ids: np.ndarray | tf.Tensor | None = None, + head_mask: np.ndarray | tf.Tensor | None = None, + inputs_embeds: np.ndarray | tf.Tensor | None = None, output_attentions: Optional[bool] = None, output_hidden_states: Optional[bool] = None, return_dict: Optional[bool] = None, - labels: Optional[Union[np.ndarray, tf.Tensor]] = None, + labels: np.ndarray | tf.Tensor | None = None, training: Optional[bool] = False, ) -> Union[TFSequenceClassifierOutput, Tuple[tf.Tensor]]: r""" @@ -1249,12 +1225,6 @@ def call( attentions=outputs.attentions, ) - def serving_output(self, output: TFSequenceClassifierOutput) -> TFSequenceClassifierOutput: - hs = tf.convert_to_tensor(output.hidden_states) if self.config.output_hidden_states else None - attns = tf.convert_to_tensor(output.attentions) if self.config.output_attentions else None - - return TFSequenceClassifierOutput(logits=output.logits, hidden_states=hs, attentions=attns) - @add_start_docstrings( """ @@ -1292,17 +1262,17 @@ def __init__(self, config: LayoutLMConfig, *inputs, **kwargs): @replace_return_docstrings(output_type=TFTokenClassifierOutput, config_class=_CONFIG_FOR_DOC) def call( self, - input_ids: Optional[TFModelInputType] = None, - bbox: Optional[Union[np.ndarray, tf.Tensor]] = None, - attention_mask: Optional[Union[np.ndarray, tf.Tensor]] = None, - token_type_ids: Optional[Union[np.ndarray, tf.Tensor]] = None, - position_ids: Optional[Union[np.ndarray, tf.Tensor]] = None, - head_mask: Optional[Union[np.ndarray, tf.Tensor]] = None, - inputs_embeds: Optional[Union[np.ndarray, tf.Tensor]] = None, + input_ids: TFModelInputType | None = None, + bbox: np.ndarray | tf.Tensor | None = None, + attention_mask: np.ndarray | tf.Tensor | None = None, + token_type_ids: np.ndarray | tf.Tensor | None = None, + position_ids: np.ndarray | tf.Tensor | None = None, + head_mask: np.ndarray | tf.Tensor | None = None, + inputs_embeds: np.ndarray | tf.Tensor | None = None, output_attentions: Optional[bool] = None, output_hidden_states: Optional[bool] = None, return_dict: Optional[bool] = None, - labels: Optional[Union[np.ndarray, tf.Tensor]] = None, + labels: np.ndarray | tf.Tensor | None = None, training: Optional[bool] = False, ) -> Union[TFTokenClassifierOutput, Tuple[tf.Tensor]]: r""" @@ -1377,12 +1347,6 @@ def call( attentions=outputs.attentions, ) - def serving_output(self, output: TFTokenClassifierOutput) -> TFTokenClassifierOutput: - hs = tf.convert_to_tensor(output.hidden_states) if self.config.output_hidden_states else None - attns = tf.convert_to_tensor(output.attentions) if self.config.output_attentions else None - - return TFTokenClassifierOutput(logits=output.logits, hidden_states=hs, attentions=attns) - @add_start_docstrings( """ @@ -1418,18 +1382,18 @@ def __init__(self, config: LayoutLMConfig, *inputs, **kwargs): @replace_return_docstrings(output_type=TFQuestionAnsweringModelOutput, config_class=_CONFIG_FOR_DOC) def call( self, - input_ids: Optional[TFModelInputType] = None, - bbox: Optional[Union[np.ndarray, tf.Tensor]] = None, - attention_mask: Optional[Union[np.ndarray, tf.Tensor]] = None, - token_type_ids: Optional[Union[np.ndarray, tf.Tensor]] = None, - position_ids: Optional[Union[np.ndarray, tf.Tensor]] = None, - head_mask: Optional[Union[np.ndarray, tf.Tensor]] = None, - inputs_embeds: Optional[Union[np.ndarray, tf.Tensor]] = None, + input_ids: TFModelInputType | None = None, + bbox: np.ndarray | tf.Tensor | None = None, + attention_mask: np.ndarray | tf.Tensor | None = None, + token_type_ids: np.ndarray | tf.Tensor | None = None, + position_ids: np.ndarray | tf.Tensor | None = None, + head_mask: np.ndarray | tf.Tensor | None = None, + inputs_embeds: np.ndarray | tf.Tensor | None = None, output_attentions: Optional[bool] = None, output_hidden_states: Optional[bool] = None, return_dict: Optional[bool] = None, - start_positions: Optional[Union[np.ndarray, tf.Tensor]] = None, - end_positions: Optional[Union[np.ndarray, tf.Tensor]] = None, + start_positions: np.ndarray | tf.Tensor | None = None, + end_positions: np.ndarray | tf.Tensor | None = None, training: Optional[bool] = False, ) -> Union[TFQuestionAnsweringModelOutput, Tuple[tf.Tensor]]: r""" @@ -1521,11 +1485,3 @@ def call( hidden_states=outputs.hidden_states, attentions=outputs.attentions, ) - - def serving_output(self, output: TFTokenClassifierOutput) -> TFTokenClassifierOutput: - hs = tf.convert_to_tensor(output.hidden_states) if self.config.output_hidden_states else None - attns = tf.convert_to_tensor(output.attentions) if self.config.output_attentions else None - - return TFQuestionAnsweringModelOutput( - start_logits=output.start_logits, end_logits=output.end_logits, hidden_states=hs, attentions=attns - ) diff --git a/src/transformers/models/layoutlmv3/modeling_tf_layoutlmv3.py b/src/transformers/models/layoutlmv3/modeling_tf_layoutlmv3.py index 491ef186e52275..feba69eafc2a71 100644 --- a/src/transformers/models/layoutlmv3/modeling_tf_layoutlmv3.py +++ b/src/transformers/models/layoutlmv3/modeling_tf_layoutlmv3.py @@ -14,9 +14,12 @@ # limitations under the License. """TF 2.0 LayoutLMv3 model.""" + +from __future__ import annotations + import collections import math -from typing import Dict, List, Optional, Tuple, Union +from typing import List, Optional, Tuple, Union import tensorflow as tf @@ -222,11 +225,11 @@ def create_position_ids(self, input_ids: tf.Tensor, inputs_embeds: tf.Tensor) -> def call( self, - input_ids: Optional[tf.Tensor] = None, + input_ids: tf.Tensor | None = None, bbox: tf.Tensor = None, - token_type_ids: Optional[tf.Tensor] = None, - position_ids: Optional[tf.Tensor] = None, - inputs_embeds: Optional[tf.Tensor] = None, + token_type_ids: tf.Tensor | None = None, + position_ids: tf.Tensor | None = None, + inputs_embeds: tf.Tensor | None = None, training: bool = False, ) -> tf.Tensor: if position_ids is None: @@ -319,11 +322,11 @@ def cogview_attention(self, attention_scores: tf.Tensor, alpha: Union[float, int def call( self, hidden_states: tf.Tensor, - attention_mask: Optional[tf.Tensor], - head_mask: Optional[tf.Tensor], + attention_mask: tf.Tensor | None, + head_mask: tf.Tensor | None, output_attentions: bool, - rel_pos: Optional[tf.Tensor] = None, - rel_2d_pos: Optional[tf.Tensor] = None, + rel_pos: tf.Tensor | None = None, + rel_2d_pos: tf.Tensor | None = None, training: bool = False, ) -> Union[Tuple[tf.Tensor], Tuple[tf.Tensor, tf.Tensor]]: key_layer = self.transpose_for_scores(self.key(hidden_states)) @@ -398,11 +401,11 @@ def __init__(self, config: LayoutLMv3Config, **kwargs): def call( self, hidden_states: tf.Tensor, - attention_mask: Optional[tf.Tensor], - head_mask: Optional[tf.Tensor], + attention_mask: tf.Tensor | None, + head_mask: tf.Tensor | None, output_attentions: bool, - rel_pos: Optional[tf.Tensor] = None, - rel_2d_pos: Optional[tf.Tensor] = None, + rel_pos: tf.Tensor | None = None, + rel_2d_pos: tf.Tensor | None = None, training: bool = False, ) -> Union[Tuple[tf.Tensor], Tuple[tf.Tensor, tf.Tensor]]: self_outputs = self.self_attention( @@ -469,11 +472,11 @@ def __init__(self, config: LayoutLMv3Config, **kwargs): def call( self, hidden_states: tf.Tensor, - attention_mask: Optional[tf.Tensor], - head_mask: Optional[tf.Tensor], + attention_mask: tf.Tensor | None, + head_mask: tf.Tensor | None, output_attentions: bool, - rel_pos: Optional[tf.Tensor] = None, - rel_2d_pos: Optional[tf.Tensor] = None, + rel_pos: tf.Tensor | None = None, + rel_2d_pos: tf.Tensor | None = None, training: bool = False, ) -> Union[Tuple[tf.Tensor], Tuple[tf.Tensor, tf.Tensor]]: self_attention_outputs = self.attention( @@ -593,13 +596,13 @@ def _cal_2d_pos_emb(self, bbox: tf.Tensor): def call( self, hidden_states: tf.Tensor, - bbox: Optional[tf.Tensor] = None, - attention_mask: Optional[tf.Tensor] = None, - head_mask: Optional[tf.Tensor] = None, + bbox: tf.Tensor | None = None, + attention_mask: tf.Tensor | None = None, + head_mask: tf.Tensor | None = None, output_attentions: bool = False, output_hidden_states: bool = False, return_dict: bool = True, - position_ids: Optional[tf.Tensor] = None, + position_ids: tf.Tensor | None = None, training: bool = False, ) -> Union[ TFBaseModelOutput, @@ -778,7 +781,7 @@ def get_extended_attention_mask(self, attention_mask: tf.Tensor) -> tf.Tensor: return extended_attention_mask - def get_head_mask(self, head_mask: Optional[tf.Tensor]) -> Union[tf.Tensor, List[Optional[tf.Tensor]]]: + def get_head_mask(self, head_mask: tf.Tensor | None) -> Union[tf.Tensor, List[tf.Tensor | None]]: if head_mask is None: return [None] * self.config.num_hidden_layers @@ -806,14 +809,14 @@ def get_head_mask(self, head_mask: Optional[tf.Tensor]) -> Union[tf.Tensor, List @unpack_inputs def call( self, - input_ids: Optional[tf.Tensor] = None, - bbox: Optional[tf.Tensor] = None, - attention_mask: Optional[tf.Tensor] = None, - token_type_ids: Optional[tf.Tensor] = None, - position_ids: Optional[tf.Tensor] = None, - head_mask: Optional[tf.Tensor] = None, - inputs_embeds: Optional[tf.Tensor] = None, - pixel_values: Optional[tf.Tensor] = None, + input_ids: tf.Tensor | None = None, + bbox: tf.Tensor | None = None, + attention_mask: tf.Tensor | None = None, + token_type_ids: tf.Tensor | None = None, + position_ids: tf.Tensor | None = None, + head_mask: tf.Tensor | None = None, + inputs_embeds: tf.Tensor | None = None, + pixel_values: tf.Tensor | None = None, output_attentions: Optional[bool] = None, output_hidden_states: Optional[bool] = None, return_dict: Optional[bool] = None, @@ -977,37 +980,10 @@ class TFLayoutLMv3PreTrainedModel(TFPreTrainedModel): base_model_prefix = "layoutlmv3" @property - def dummy_inputs(self) -> Dict[str, tf.Tensor]: - size = self.config.input_size - image_shape = (2, self.config.num_channels, size, size) - pixel_values = tf.random.uniform(shape=image_shape, minval=-1, maxval=1) - return { - "input_ids": tf.constant(_DUMMY_INPUT_IDS, dtype=tf.int32), - "bbox": tf.constant(_DUMMY_BBOX, dtype=tf.int32), - "pixel_values": pixel_values, - } - - @tf.function( - input_signature=[ - { - "input_ids": tf.TensorSpec((None, None), tf.int32, name="input_ids"), - "bbox": tf.TensorSpec((None, None, 4), tf.int32, name="bbox"), - "pixel_values": tf.TensorSpec((None, None, None, None), tf.float32, name="pixel_values"), - "attention_mask": tf.TensorSpec((None, None), tf.int32, name="attention_mask"), - } - ] - ) - def serving(self, inputs): - """ - Method used for serving the model. - - Args: - inputs (`Dict[str, tf.Tensor]`): - The input of the saved model as a dictionary of tensors. - """ - output = self.call(inputs) - - return self.serving_output(output) + def input_signature(self): + sig = super().input_signature + sig["bbox"] = tf.TensorSpec((None, None, 4), tf.int32, name="bbox") + return sig LAYOUTLMV3_START_DOCSTRING = r""" @@ -1145,14 +1121,14 @@ def __init__(self, config, *inputs, **kwargs): @replace_return_docstrings(output_type=TFBaseModelOutput, config_class=_CONFIG_FOR_DOC) def call( self, - input_ids: Optional[tf.Tensor] = None, - bbox: Optional[tf.Tensor] = None, - attention_mask: Optional[tf.Tensor] = None, - token_type_ids: Optional[tf.Tensor] = None, - position_ids: Optional[tf.Tensor] = None, - head_mask: Optional[tf.Tensor] = None, - inputs_embeds: Optional[tf.Tensor] = None, - pixel_values: Optional[tf.Tensor] = None, + input_ids: tf.Tensor | None = None, + bbox: tf.Tensor | None = None, + attention_mask: tf.Tensor | None = None, + token_type_ids: tf.Tensor | None = None, + position_ids: tf.Tensor | None = None, + head_mask: tf.Tensor | None = None, + inputs_embeds: tf.Tensor | None = None, + pixel_values: tf.Tensor | None = None, output_attentions: Optional[bool] = None, output_hidden_states: Optional[bool] = None, return_dict: Optional[bool] = None, @@ -1204,16 +1180,6 @@ def call( return outputs - def serving_output(self, output: TFBaseModelOutput) -> TFBaseModelOutput: - hs = tf.convert_to_tensor(output.hidden_states) if self.config.output_hidden_states else None - attns = tf.convert_to_tensor(output.attentions) if self.config.output_attentions else None - - return TFBaseModelOutput( - last_hidden_state=output.last_hidden_state, - hidden_states=hs, - attentions=attns, - ) - class TFLayoutLMv3ClassificationHead(tf.keras.layers.Layer): """ @@ -1272,18 +1238,18 @@ def __init__(self, config: LayoutLMv3Config, **kwargs): @replace_return_docstrings(output_type=TFSequenceClassifierOutput, config_class=_CONFIG_FOR_DOC) def call( self, - input_ids: Optional[tf.Tensor] = None, - attention_mask: Optional[tf.Tensor] = None, - token_type_ids: Optional[tf.Tensor] = None, - position_ids: Optional[tf.Tensor] = None, - head_mask: Optional[tf.Tensor] = None, - inputs_embeds: Optional[tf.Tensor] = None, - labels: Optional[tf.Tensor] = None, + input_ids: tf.Tensor | None = None, + attention_mask: tf.Tensor | None = None, + token_type_ids: tf.Tensor | None = None, + position_ids: tf.Tensor | None = None, + head_mask: tf.Tensor | None = None, + inputs_embeds: tf.Tensor | None = None, + labels: tf.Tensor | None = None, output_attentions: Optional[bool] = None, output_hidden_states: Optional[bool] = None, return_dict: Optional[bool] = None, - bbox: Optional[tf.Tensor] = None, - pixel_values: Optional[tf.Tensor] = None, + bbox: tf.Tensor | None = None, + pixel_values: tf.Tensor | None = None, training: Optional[bool] = False, ) -> Union[ TFSequenceClassifierOutput, @@ -1351,13 +1317,6 @@ def call( attentions=outputs.attentions, ) - # Copied from transformers.models.bert.modeling_tf_bert.TFBertForSequenceClassification.serving_output - def serving_output(self, output: TFSequenceClassifierOutput) -> TFSequenceClassifierOutput: - hs = tf.convert_to_tensor(output.hidden_states) if self.config.output_hidden_states else None - attns = tf.convert_to_tensor(output.attentions) if self.config.output_attentions else None - - return TFSequenceClassifierOutput(logits=output.logits, hidden_states=hs, attentions=attns) - @add_start_docstrings( """ @@ -1392,18 +1351,18 @@ def __init__(self, config: LayoutLMv3Config, **kwargs): @replace_return_docstrings(output_type=TFTokenClassifierOutput, config_class=_CONFIG_FOR_DOC) def call( self, - input_ids: Optional[tf.Tensor] = None, - bbox: Optional[tf.Tensor] = None, - attention_mask: Optional[tf.Tensor] = None, - token_type_ids: Optional[tf.Tensor] = None, - position_ids: Optional[tf.Tensor] = None, - head_mask: Optional[tf.Tensor] = None, - inputs_embeds: Optional[tf.Tensor] = None, - labels: Optional[tf.Tensor] = None, + input_ids: tf.Tensor | None = None, + bbox: tf.Tensor | None = None, + attention_mask: tf.Tensor | None = None, + token_type_ids: tf.Tensor | None = None, + position_ids: tf.Tensor | None = None, + head_mask: tf.Tensor | None = None, + inputs_embeds: tf.Tensor | None = None, + labels: tf.Tensor | None = None, output_attentions: Optional[bool] = None, output_hidden_states: Optional[bool] = None, return_dict: Optional[bool] = None, - pixel_values: Optional[tf.Tensor] = None, + pixel_values: tf.Tensor | None = None, training: Optional[bool] = False, ) -> Union[ TFTokenClassifierOutput, @@ -1481,13 +1440,6 @@ def call( attentions=outputs.attentions, ) - # Copied from transformers.models.bert.modeling_tf_bert.TFBertForTokenClassification.serving_output - def serving_output(self, output: TFTokenClassifierOutput) -> TFTokenClassifierOutput: - hs = tf.convert_to_tensor(output.hidden_states) if self.config.output_hidden_states else None - attns = tf.convert_to_tensor(output.attentions) if self.config.output_attentions else None - - return TFTokenClassifierOutput(logits=output.logits, hidden_states=hs, attentions=attns) - @add_start_docstrings( """ @@ -1514,18 +1466,18 @@ def __init__(self, config: LayoutLMv3Config, **kwargs): @replace_return_docstrings(output_type=TFQuestionAnsweringModelOutput, config_class=_CONFIG_FOR_DOC) def call( self, - input_ids: Optional[tf.Tensor] = None, - attention_mask: Optional[tf.Tensor] = None, - token_type_ids: Optional[tf.Tensor] = None, - position_ids: Optional[tf.Tensor] = None, - head_mask: Optional[tf.Tensor] = None, - inputs_embeds: Optional[tf.Tensor] = None, - start_positions: Optional[tf.Tensor] = None, - end_positions: Optional[tf.Tensor] = None, + input_ids: tf.Tensor | None = None, + attention_mask: tf.Tensor | None = None, + token_type_ids: tf.Tensor | None = None, + position_ids: tf.Tensor | None = None, + head_mask: tf.Tensor | None = None, + inputs_embeds: tf.Tensor | None = None, + start_positions: tf.Tensor | None = None, + end_positions: tf.Tensor | None = None, output_attentions: Optional[bool] = None, output_hidden_states: Optional[bool] = None, - bbox: Optional[tf.Tensor] = None, - pixel_values: Optional[tf.Tensor] = None, + bbox: tf.Tensor | None = None, + pixel_values: tf.Tensor | None = None, return_dict: Optional[bool] = None, training: bool = False, ) -> Union[ @@ -1615,12 +1567,3 @@ def call( hidden_states=outputs.hidden_states, attentions=outputs.attentions, ) - - # Copied from transformers.models.bert.modeling_tf_bert.TFBertForQuestionAnswering.serving_output - def serving_output(self, output: TFQuestionAnsweringModelOutput) -> TFQuestionAnsweringModelOutput: - hs = tf.convert_to_tensor(output.hidden_states) if self.config.output_hidden_states else None - attns = tf.convert_to_tensor(output.attentions) if self.config.output_attentions else None - - return TFQuestionAnsweringModelOutput( - start_logits=output.start_logits, end_logits=output.end_logits, hidden_states=hs, attentions=attns - ) diff --git a/src/transformers/models/led/modeling_tf_led.py b/src/transformers/models/led/modeling_tf_led.py index 324482b4d2456e..6e962ea4934e91 100644 --- a/src/transformers/models/led/modeling_tf_led.py +++ b/src/transformers/models/led/modeling_tf_led.py @@ -15,6 +15,8 @@ """ TF 2.0 LED model.""" +from __future__ import annotations + import random from dataclasses import dataclass from typing import List, Optional, Tuple, Union @@ -1030,12 +1032,12 @@ def _shape(self, tensor: tf.Tensor, seq_len: int, bsz: int): def call( self, hidden_states: tf.Tensor, - key_value_states: Optional[tf.Tensor] = None, - past_key_value: Optional[Tuple[Tuple[tf.Tensor]]] = None, - attention_mask: Optional[tf.Tensor] = None, - layer_head_mask: Optional[tf.Tensor] = None, + key_value_states: tf.Tensor | None = None, + past_key_value: Tuple[Tuple[tf.Tensor]] | None = None, + attention_mask: tf.Tensor | None = None, + layer_head_mask: tf.Tensor | None = None, training=False, - ) -> Tuple[tf.Tensor, Optional[tf.Tensor]]: + ) -> Tuple[tf.Tensor, tf.Tensor | None]: """Input shape: Batch x Time x Channel""" # if key_value_states are provided this layer is used as a cross-attention layer @@ -1238,12 +1240,12 @@ def __init__(self, config: LEDConfig, **kwargs): def call( self, hidden_states, - attention_mask: Optional[tf.Tensor] = None, - encoder_hidden_states: Optional[tf.Tensor] = None, - encoder_attention_mask: Optional[tf.Tensor] = None, - layer_head_mask: Optional[tf.Tensor] = None, - encoder_layer_head_mask: Optional[tf.Tensor] = None, - past_key_value: Optional[Tuple[tf.Tensor]] = None, + attention_mask: tf.Tensor | None = None, + encoder_hidden_states: tf.Tensor | None = None, + encoder_attention_mask: tf.Tensor | None = None, + layer_head_mask: tf.Tensor | None = None, + encoder_layer_head_mask: tf.Tensor | None = None, + past_key_value: Tuple[tf.Tensor] | None = None, training=False, ) -> Tuple[tf.Tensor, tf.Tensor, tf.Tensor, Tuple[Tuple[tf.Tensor]]]: """ @@ -1321,33 +1323,10 @@ class TFLEDPreTrainedModel(TFPreTrainedModel): base_model_prefix = "led" @property - def dummy_inputs(self): - input_ids = tf.convert_to_tensor([[7, 6, 0, 0, 1], [1, 2, 3, 0, 0]], dtype=tf.int32) - # make sure global layers are initialized - attention_mask = tf.convert_to_tensor([[1, 1, 0, 0, 1], [1, 1, 1, 0, 0]], dtype=tf.int32) - global_attention_mask = tf.convert_to_tensor([[0, 0, 0, 0, 1], [0, 0, 1, 0, 0]], dtype=tf.int32) - dummy_inputs = { - "input_ids": input_ids, - "attention_mask": attention_mask, - "global_attention_mask": global_attention_mask, - "decoder_input_ids": input_ids, - } - return dummy_inputs - - @tf.function( - input_signature=[ - { - "input_ids": tf.TensorSpec((None, None), tf.int32, name="input_ids"), - "attention_mask": tf.TensorSpec((None, None), tf.int32, name="attention_mask"), - "decoder_input_ids": tf.TensorSpec((None, None), tf.int32, name="decoder_input_ids"), - "decoder_attention_mask": tf.TensorSpec((None, None), tf.int32, name="decoder_attention_mask"), - } - ] - ) - def serving(self, inputs): - output = self.call(inputs) - - return self.serving_output(output) + def input_signature(self): + sig = super().input_signature + sig["global_attention_mask"] = tf.TensorSpec((None, None), tf.int32, name="global_attention_mask") + return sig @dataclass @@ -1389,9 +1368,9 @@ class TFLEDEncoderBaseModelOutput(ModelOutput): """ last_hidden_state: tf.Tensor = None - hidden_states: Optional[Tuple[tf.Tensor]] = None - attentions: Optional[Tuple[tf.Tensor]] = None - global_attentions: Optional[Tuple[tf.Tensor]] = None + hidden_states: Tuple[tf.Tensor] | None = None + attentions: Tuple[tf.Tensor] | None = None + global_attentions: Tuple[tf.Tensor] | None = None @dataclass @@ -1452,14 +1431,14 @@ class TFLEDSeq2SeqModelOutput(ModelOutput): """ last_hidden_state: tf.Tensor = None - past_key_values: Optional[List[tf.Tensor]] = None - decoder_hidden_states: Optional[Tuple[tf.Tensor]] = None - decoder_attentions: Optional[Tuple[tf.Tensor]] = None - cross_attentions: Optional[Tuple[tf.Tensor]] = None - encoder_last_hidden_state: Optional[tf.Tensor] = None - encoder_hidden_states: Optional[Tuple[tf.Tensor]] = None - encoder_attentions: Optional[Tuple[tf.Tensor]] = None - encoder_global_attentions: Optional[Tuple[tf.Tensor]] = None + past_key_values: List[tf.Tensor] | None = None + decoder_hidden_states: Tuple[tf.Tensor] | None = None + decoder_attentions: Tuple[tf.Tensor] | None = None + cross_attentions: Tuple[tf.Tensor] | None = None + encoder_last_hidden_state: tf.Tensor | None = None + encoder_hidden_states: Tuple[tf.Tensor] | None = None + encoder_attentions: Tuple[tf.Tensor] | None = None + encoder_global_attentions: Tuple[tf.Tensor] | None = None @dataclass @@ -1517,16 +1496,16 @@ class TFLEDSeq2SeqLMOutput(ModelOutput): in the sequence. """ - loss: Optional[tf.Tensor] = None + loss: tf.Tensor | None = None logits: tf.Tensor = None - past_key_values: Optional[List[tf.Tensor]] = None - decoder_hidden_states: Optional[Tuple[tf.Tensor]] = None - decoder_attentions: Optional[Tuple[tf.Tensor]] = None - cross_attentions: Optional[Tuple[tf.Tensor]] = None - encoder_last_hidden_state: Optional[tf.Tensor] = None - encoder_hidden_states: Optional[Tuple[tf.Tensor]] = None - encoder_attentions: Optional[Tuple[tf.Tensor]] = None - encoder_global_attentions: Optional[Tuple[tf.Tensor]] = None + past_key_values: List[tf.Tensor] | None = None + decoder_hidden_states: Tuple[tf.Tensor] | None = None + decoder_attentions: Tuple[tf.Tensor] | None = None + cross_attentions: Tuple[tf.Tensor] | None = None + encoder_last_hidden_state: tf.Tensor | None = None + encoder_hidden_states: Tuple[tf.Tensor] | None = None + encoder_attentions: Tuple[tf.Tensor] | None = None + encoder_global_attentions: Tuple[tf.Tensor] | None = None LED_START_DOCSTRING = r""" @@ -2383,22 +2362,22 @@ def set_output_embeddings(self, value): @replace_return_docstrings(output_type=TFLEDSeq2SeqLMOutput, config_class=_CONFIG_FOR_DOC) def call( self, - input_ids: Optional[TFModelInputType] = None, - attention_mask: Optional[Union[np.ndarray, tf.Tensor]] = None, - decoder_input_ids: Optional[Union[np.ndarray, tf.Tensor]] = None, - decoder_attention_mask: Optional[Union[np.ndarray, tf.Tensor]] = None, - head_mask: Optional[Union[np.ndarray, tf.Tensor]] = None, - decoder_head_mask: Optional[Union[np.ndarray, tf.Tensor]] = None, + input_ids: TFModelInputType | None = None, + attention_mask: np.ndarray | tf.Tensor | None = None, + decoder_input_ids: np.ndarray | tf.Tensor | None = None, + decoder_attention_mask: np.ndarray | tf.Tensor | None = None, + head_mask: np.ndarray | tf.Tensor | None = None, + decoder_head_mask: np.ndarray | tf.Tensor | None = None, encoder_outputs: Optional[TFLEDEncoderBaseModelOutput] = None, - global_attention_mask: Optional[Union[np.ndarray, tf.Tensor]] = None, + global_attention_mask: np.ndarray | tf.Tensor | None = None, past_key_values: Optional[Tuple[Tuple[Union[np.ndarray, tf.Tensor]]]] = None, - inputs_embeds: Optional[Union[np.ndarray, tf.Tensor]] = None, - decoder_inputs_embeds: Optional[Union[np.ndarray, tf.Tensor]] = None, + inputs_embeds: np.ndarray | tf.Tensor | None = None, + decoder_inputs_embeds: np.ndarray | tf.Tensor | None = None, use_cache: Optional[bool] = None, output_attentions: Optional[bool] = None, output_hidden_states: Optional[bool] = None, return_dict: Optional[bool] = None, - labels: Optional[tf.Tensor] = None, + labels: tf.Tensor | None = None, training: bool = False, ): """ diff --git a/src/transformers/models/llama/modeling_llama.py b/src/transformers/models/llama/modeling_llama.py index e7ce8a661fc877..80cfdfa5f06645 100755 --- a/src/transformers/models/llama/modeling_llama.py +++ b/src/transformers/models/llama/modeling_llama.py @@ -81,14 +81,11 @@ def __init__(self, hidden_size, eps=1e-6): self.variance_epsilon = eps def forward(self, hidden_states): + input_dtype = hidden_states.dtype variance = hidden_states.to(torch.float32).pow(2).mean(-1, keepdim=True) hidden_states = hidden_states * torch.rsqrt(variance + self.variance_epsilon) - # convert into half-precision if necessary - if self.weight.dtype in [torch.float16, torch.bfloat16]: - hidden_states = hidden_states.to(self.weight.dtype) - - return self.weight * hidden_states + return (self.weight * hidden_states).to(input_dtype) class LlamaRotaryEmbedding(torch.nn.Module): @@ -226,7 +223,9 @@ def forward( f"Attention mask should be of size {(bsz, 1, q_len, kv_seq_len)}, but is {attention_mask.size()}" ) attn_weights = attn_weights + attention_mask - attn_weights = torch.max(attn_weights, torch.tensor(torch.finfo(attn_weights.dtype).min)) + attn_weights = torch.max( + attn_weights, torch.tensor(torch.finfo(attn_weights.dtype).min, device=attn_weights.device) + ) # upcast attention to fp32 attn_weights = nn.functional.softmax(attn_weights, dim=-1, dtype=torch.float32).to(query_states.dtype) diff --git a/src/transformers/models/llama/tokenization_llama_fast.py b/src/transformers/models/llama/tokenization_llama_fast.py index bb2737075ea2ad..c3946d83b0e0b8 100644 --- a/src/transformers/models/llama/tokenization_llama_fast.py +++ b/src/transformers/models/llama/tokenization_llama_fast.py @@ -16,6 +16,8 @@ from shutil import copyfile from typing import Optional, Tuple +from tokenizers import processors + from ...tokenization_utils_fast import PreTrainedTokenizerFast from ...utils import is_sentencepiece_available, logging from ...utils.versions import require_version @@ -84,6 +86,8 @@ def __init__( unk_token="", bos_token="", eos_token="", + add_bos_token=True, + add_eos_token=False, **kwargs, ): super().__init__( @@ -95,10 +99,50 @@ def __init__( eos_token=eos_token, **kwargs, ) + self._add_bos_token = add_bos_token + self._add_eos_token = add_eos_token + self.update_post_processor() self.vocab_file = vocab_file self.can_save_slow_tokenizer = False if not self.vocab_file else True + def update_post_processor(self): + bos = self.bos_token + bos_token_id = self.bos_token_id + + eos = self.eos_token + eos_token_id = self.eos_token_id + + single = f"{(bos+':0 ') * self.add_bos_token}$A:0{(' '+eos+':0') * self.add_eos_token}" + pair = f"{single}{(' '+bos+':1') * self.add_bos_token} $B:1{(' '+eos+':1') * self.add_eos_token}" + + special_tokens = [] + if self.add_bos_token: + special_tokens.append((bos, bos_token_id)) + if self.add_eos_token: + special_tokens.append((eos, eos_token_id)) + self._tokenizer.post_processor = processors.TemplateProcessing( + single=single, pair=pair, special_tokens=special_tokens + ) + + @property + def add_eos_token(self): + return self._add_eos_token + + @property + def add_bos_token(self): + return self._add_bos_token + + @add_eos_token.setter + def add_eos_token(self, value): + self._add_eos_token = value + self.update_post_processor() + + @add_bos_token.setter + def add_bos_token(self, value): + self._add_bos_token = value + self.update_post_processor() + def save_vocabulary(self, save_directory: str, filename_prefix: Optional[str] = None) -> Tuple[str]: if not self.can_save_slow_tokenizer: raise ValueError( diff --git a/src/transformers/models/longformer/configuration_longformer.py b/src/transformers/models/longformer/configuration_longformer.py index 3f3e2da7e830e8..1542c497989ff0 100644 --- a/src/transformers/models/longformer/configuration_longformer.py +++ b/src/transformers/models/longformer/configuration_longformer.py @@ -86,12 +86,6 @@ class LongformerConfig(PretrainedConfig): The standard deviation of the truncated_normal_initializer for initializing all weight matrices. layer_norm_eps (`float`, *optional*, defaults to 1e-12): The epsilon used by the layer normalization layers. - position_embedding_type (`str`, *optional*, defaults to `"absolute"`): - Type of position embedding. Choose one of `"absolute"`, `"relative_key"`, `"relative_key_query"`. For - positional embeddings use `"absolute"`. For more information on `"relative_key"`, please refer to - [Self-Attention with Relative Position Representations (Shaw et al.)](https://arxiv.org/abs/1803.02155). - For more information on `"relative_key_query"`, please refer to *Method 4* in [Improve Transformer Models - with Better Relative Position Embeddings (Huang et al.)](https://arxiv.org/abs/2009.13658). attention_window (`int` or `List[int]`, *optional*, defaults to 512): Size of an attention window around each token. If an `int`, use the same size for all layers. To specify a different window size for each layer, use a `List[int]` where `len(attention_window) == num_hidden_layers`. @@ -131,7 +125,6 @@ def __init__( type_vocab_size: int = 2, initializer_range: float = 0.02, layer_norm_eps: float = 1e-12, - position_embedding_type: str = "absolute", onnx_export: bool = False, **kwargs, ): @@ -154,7 +147,6 @@ def __init__( self.type_vocab_size = type_vocab_size self.initializer_range = initializer_range self.layer_norm_eps = layer_norm_eps - self.position_embedding_type = position_embedding_type self.onnx_export = onnx_export diff --git a/src/transformers/models/longformer/modeling_longformer.py b/src/transformers/models/longformer/modeling_longformer.py index 9768641afe451c..cd975380be553b 100755 --- a/src/transformers/models/longformer/modeling_longformer.py +++ b/src/transformers/models/longformer/modeling_longformer.py @@ -445,8 +445,6 @@ def __init__(self, config): self.LayerNorm = nn.LayerNorm(config.hidden_size, eps=config.layer_norm_eps) self.dropout = nn.Dropout(config.hidden_dropout_prob) - self.position_embedding_type = getattr(config, "position_embedding_type", "absolute") - self.padding_idx = config.pad_token_id self.position_embeddings = nn.Embedding( config.max_position_embeddings, config.hidden_size, padding_idx=self.padding_idx diff --git a/src/transformers/models/longformer/modeling_tf_longformer.py b/src/transformers/models/longformer/modeling_tf_longformer.py index c47df169655cdf..60cee2a83e89b3 100644 --- a/src/transformers/models/longformer/modeling_tf_longformer.py +++ b/src/transformers/models/longformer/modeling_tf_longformer.py @@ -14,6 +14,9 @@ # limitations under the License. """Tensorflow Longformer model.""" + +from __future__ import annotations + import warnings from dataclasses import dataclass from typing import Optional, Tuple, Union @@ -36,7 +39,6 @@ ) from ...tf_utils import check_embeddings_within_bounds, shape_list, stable_softmax from ...utils import ( - MULTIPLE_CHOICE_DUMMY_INPUTS, ModelOutput, add_code_sample_docstrings, add_start_docstrings, @@ -101,9 +103,9 @@ class TFLongformerBaseModelOutput(ModelOutput): """ last_hidden_state: tf.Tensor = None - hidden_states: Optional[Tuple[tf.Tensor]] = None - attentions: Optional[Tuple[tf.Tensor]] = None - global_attentions: Optional[Tuple[tf.Tensor]] = None + hidden_states: Tuple[tf.Tensor] | None = None + attentions: Tuple[tf.Tensor] | None = None + global_attentions: Tuple[tf.Tensor] | None = None @dataclass @@ -149,9 +151,9 @@ class TFLongformerBaseModelOutputWithPooling(ModelOutput): last_hidden_state: tf.Tensor = None pooler_output: tf.Tensor = None - hidden_states: Optional[Tuple[tf.Tensor]] = None - attentions: Optional[Tuple[tf.Tensor]] = None - global_attentions: Optional[Tuple[tf.Tensor]] = None + hidden_states: Tuple[tf.Tensor] | None = None + attentions: Tuple[tf.Tensor] | None = None + global_attentions: Tuple[tf.Tensor] | None = None @dataclass @@ -193,11 +195,11 @@ class TFLongformerMaskedLMOutput(ModelOutput): in the sequence. """ - loss: Optional[tf.Tensor] = None + loss: tf.Tensor | None = None logits: tf.Tensor = None - hidden_states: Optional[Tuple[tf.Tensor]] = None - attentions: Optional[Tuple[tf.Tensor]] = None - global_attentions: Optional[Tuple[tf.Tensor]] = None + hidden_states: Tuple[tf.Tensor] | None = None + attentions: Tuple[tf.Tensor] | None = None + global_attentions: Tuple[tf.Tensor] | None = None @dataclass @@ -241,12 +243,12 @@ class TFLongformerQuestionAnsweringModelOutput(ModelOutput): in the sequence. """ - loss: Optional[tf.Tensor] = None + loss: tf.Tensor | None = None start_logits: tf.Tensor = None end_logits: tf.Tensor = None - hidden_states: Optional[Tuple[tf.Tensor]] = None - attentions: Optional[Tuple[tf.Tensor]] = None - global_attentions: Optional[Tuple[tf.Tensor]] = None + hidden_states: Tuple[tf.Tensor] | None = None + attentions: Tuple[tf.Tensor] | None = None + global_attentions: Tuple[tf.Tensor] | None = None @dataclass @@ -288,11 +290,11 @@ class TFLongformerSequenceClassifierOutput(ModelOutput): in the sequence. """ - loss: Optional[tf.Tensor] = None + loss: tf.Tensor | None = None logits: tf.Tensor = None - hidden_states: Optional[Tuple[tf.Tensor]] = None - attentions: Optional[Tuple[tf.Tensor]] = None - global_attentions: Optional[Tuple[tf.Tensor]] = None + hidden_states: Tuple[tf.Tensor] | None = None + attentions: Tuple[tf.Tensor] | None = None + global_attentions: Tuple[tf.Tensor] | None = None @dataclass @@ -336,11 +338,11 @@ class TFLongformerMultipleChoiceModelOutput(ModelOutput): in the sequence. """ - loss: Optional[tf.Tensor] = None + loss: tf.Tensor | None = None logits: tf.Tensor = None - hidden_states: Optional[Tuple[tf.Tensor]] = None - attentions: Optional[Tuple[tf.Tensor]] = None - global_attentions: Optional[Tuple[tf.Tensor]] = None + hidden_states: Tuple[tf.Tensor] | None = None + attentions: Tuple[tf.Tensor] | None = None + global_attentions: Tuple[tf.Tensor] | None = None @dataclass @@ -382,11 +384,11 @@ class TFLongformerTokenClassifierOutput(ModelOutput): in the sequence. """ - loss: Optional[tf.Tensor] = None + loss: tf.Tensor | None = None logits: tf.Tensor = None - hidden_states: Optional[Tuple[tf.Tensor]] = None - attentions: Optional[Tuple[tf.Tensor]] = None - global_attentions: Optional[Tuple[tf.Tensor]] = None + hidden_states: Tuple[tf.Tensor] | None = None + attentions: Tuple[tf.Tensor] | None = None + global_attentions: Tuple[tf.Tensor] | None = None def _compute_global_attention_mask(input_ids_shape, sep_token_indices, before_sep_token=True): @@ -1871,31 +1873,10 @@ class TFLongformerPreTrainedModel(TFPreTrainedModel): base_model_prefix = "longformer" @property - def dummy_inputs(self): - input_ids = tf.convert_to_tensor([[7, 6, 0, 0, 1], [1, 2, 3, 0, 0], [0, 0, 0, 4, 5]], dtype=tf.int32) - # make sure global layers are initialized - attention_mask = tf.convert_to_tensor([[1, 1, 0, 0, 1], [1, 1, 1, 0, 0], [1, 0, 0, 1, 1]], dtype=tf.int32) - global_attention_mask = tf.convert_to_tensor( - [[0, 0, 0, 0, 1], [0, 0, 1, 0, 0], [0, 0, 0, 0, 1]], dtype=tf.int32 - ) - return { - "input_ids": input_ids, - "attention_mask": attention_mask, - "global_attention_mask": global_attention_mask, - } - - @tf.function( - input_signature=[ - { - "input_ids": tf.TensorSpec((None, None), tf.int32, name="input_ids"), - "attention_mask": tf.TensorSpec((None, None), tf.int32, name="attention_mask"), - } - ] - ) - def serving(self, inputs): - output = self.call(inputs) - - return self.serving_output(output) + def input_signature(self): + sig = super().input_signature + sig["global_attention_mask"] = tf.TensorSpec((None, None), tf.int32, name="global_attention_mask") + return sig LONGFORMER_START_DOCSTRING = r""" @@ -2038,13 +2019,13 @@ def __init__(self, config, *inputs, **kwargs): @add_start_docstrings_to_model_forward(LONGFORMER_INPUTS_DOCSTRING.format("batch_size, sequence_length")) def call( self, - input_ids: Optional[TFModelInputType] = None, - attention_mask: Optional[Union[np.ndarray, tf.Tensor]] = None, - head_mask: Optional[Union[np.ndarray, tf.Tensor]] = None, - global_attention_mask: Optional[Union[np.ndarray, tf.Tensor]] = None, - token_type_ids: Optional[Union[np.ndarray, tf.Tensor]] = None, - position_ids: Optional[Union[np.ndarray, tf.Tensor]] = None, - inputs_embeds: Optional[Union[np.ndarray, tf.Tensor]] = None, + input_ids: TFModelInputType | None = None, + attention_mask: np.ndarray | tf.Tensor | None = None, + head_mask: np.ndarray | tf.Tensor | None = None, + global_attention_mask: np.ndarray | tf.Tensor | None = None, + token_type_ids: np.ndarray | tf.Tensor | None = None, + position_ids: np.ndarray | tf.Tensor | None = None, + inputs_embeds: np.ndarray | tf.Tensor | None = None, output_attentions: Optional[bool] = None, output_hidden_states: Optional[bool] = None, return_dict: Optional[bool] = None, @@ -2066,19 +2047,6 @@ def call( return outputs - def serving_output(self, output): - hs = tf.convert_to_tensor(output.hidden_states) if self.config.output_hidden_states else None - attns = tf.convert_to_tensor(output.attentions) if self.config.output_attentions else None - g_attns = tf.convert_to_tensor(output.global_attentions) if self.config.output_attentions else None - - return TFLongformerBaseModelOutputWithPooling( - last_hidden_state=output.last_hidden_state, - pooler_output=output.pooler_output, - hidden_states=hs, - attentions=attns, - global_attentions=g_attns, - ) - @add_start_docstrings( """Longformer Model with a `language modeling` head on top.""", @@ -2113,17 +2081,17 @@ def get_prefix_bias_name(self): ) def call( self, - input_ids: Optional[TFModelInputType] = None, - attention_mask: Optional[Union[np.ndarray, tf.Tensor]] = None, - head_mask: Optional[Union[np.ndarray, tf.Tensor]] = None, - global_attention_mask: Optional[Union[np.ndarray, tf.Tensor]] = None, - token_type_ids: Optional[Union[np.ndarray, tf.Tensor]] = None, - position_ids: Optional[Union[np.ndarray, tf.Tensor]] = None, - inputs_embeds: Optional[Union[np.ndarray, tf.Tensor]] = None, + input_ids: TFModelInputType | None = None, + attention_mask: np.ndarray | tf.Tensor | None = None, + head_mask: np.ndarray | tf.Tensor | None = None, + global_attention_mask: np.ndarray | tf.Tensor | None = None, + token_type_ids: np.ndarray | tf.Tensor | None = None, + position_ids: np.ndarray | tf.Tensor | None = None, + inputs_embeds: np.ndarray | tf.Tensor | None = None, output_attentions: Optional[bool] = None, output_hidden_states: Optional[bool] = None, return_dict: Optional[bool] = None, - labels: Optional[Union[np.ndarray, tf.Tensor]] = None, + labels: np.ndarray | tf.Tensor | None = None, training: Optional[bool] = False, ) -> Union[TFLongformerMaskedLMOutput, Tuple[tf.Tensor]]: r""" @@ -2163,15 +2131,6 @@ def call( global_attentions=outputs.global_attentions, ) - def serving_output(self, output): - hs = tf.convert_to_tensor(output.hidden_states) if self.config.output_hidden_states else None - attns = tf.convert_to_tensor(output.attentions) if self.config.output_attentions else None - g_attns = tf.convert_to_tensor(output.global_attentions) if self.config.output_attentions else None - - return TFLongformerMaskedLMOutput( - logits=output.logits, hidden_states=hs, attentions=attns, global_attentions=g_attns - ) - @add_start_docstrings( """ @@ -2206,18 +2165,18 @@ def __init__(self, config, *inputs, **kwargs): ) def call( self, - input_ids: Optional[TFModelInputType] = None, - attention_mask: Optional[Union[np.ndarray, tf.Tensor]] = None, - head_mask: Optional[Union[np.ndarray, tf.Tensor]] = None, - global_attention_mask: Optional[Union[np.ndarray, tf.Tensor]] = None, - token_type_ids: Optional[Union[np.ndarray, tf.Tensor]] = None, - position_ids: Optional[Union[np.ndarray, tf.Tensor]] = None, - inputs_embeds: Optional[Union[np.ndarray, tf.Tensor]] = None, + input_ids: TFModelInputType | None = None, + attention_mask: np.ndarray | tf.Tensor | None = None, + head_mask: np.ndarray | tf.Tensor | None = None, + global_attention_mask: np.ndarray | tf.Tensor | None = None, + token_type_ids: np.ndarray | tf.Tensor | None = None, + position_ids: np.ndarray | tf.Tensor | None = None, + inputs_embeds: np.ndarray | tf.Tensor | None = None, output_attentions: Optional[bool] = None, output_hidden_states: Optional[bool] = None, return_dict: Optional[bool] = None, - start_positions: Optional[Union[np.ndarray, tf.Tensor]] = None, - end_positions: Optional[Union[np.ndarray, tf.Tensor]] = None, + start_positions: np.ndarray | tf.Tensor | None = None, + end_positions: np.ndarray | tf.Tensor | None = None, training: Optional[bool] = False, ) -> Union[TFLongformerQuestionAnsweringModelOutput, Tuple[tf.Tensor]]: r""" @@ -2302,19 +2261,6 @@ def call( global_attentions=outputs.global_attentions, ) - def serving_output(self, output): - hs = tf.convert_to_tensor(output.hidden_states) if self.config.output_hidden_states else None - attns = tf.convert_to_tensor(output.attentions) if self.config.output_attentions else None - g_attns = tf.convert_to_tensor(output.global_attentions) if self.config.output_attentions else None - - return TFLongformerQuestionAnsweringModelOutput( - start_logits=output.start_logits, - end_logits=output.end_logits, - hidden_states=hs, - attentions=attns, - global_attentions=g_attns, - ) - class TFLongformerClassificationHead(tf.keras.layers.Layer): """Head for sentence-level classification tasks.""" @@ -2369,17 +2315,17 @@ def __init__(self, config, *inputs, **kwargs): ) def call( self, - input_ids: Optional[TFModelInputType] = None, - attention_mask: Optional[Union[np.ndarray, tf.Tensor]] = None, - head_mask: Optional[Union[np.ndarray, tf.Tensor]] = None, - token_type_ids: Optional[Union[np.ndarray, tf.Tensor]] = None, - position_ids: Optional[Union[np.ndarray, tf.Tensor]] = None, - global_attention_mask: Optional[Union[np.ndarray, tf.Tensor]] = None, - inputs_embeds: Optional[Union[np.ndarray, tf.Tensor]] = None, + input_ids: TFModelInputType | None = None, + attention_mask: np.ndarray | tf.Tensor | None = None, + head_mask: np.ndarray | tf.Tensor | None = None, + token_type_ids: np.ndarray | tf.Tensor | None = None, + position_ids: np.ndarray | tf.Tensor | None = None, + global_attention_mask: np.ndarray | tf.Tensor | None = None, + inputs_embeds: np.ndarray | tf.Tensor | None = None, output_attentions: Optional[bool] = None, output_hidden_states: Optional[bool] = None, return_dict: Optional[bool] = None, - labels: Optional[Union[np.ndarray, tf.Tensor]] = None, + labels: np.ndarray | tf.Tensor | None = None, training: Optional[bool] = False, ) -> Union[TFLongformerSequenceClassifierOutput, Tuple[tf.Tensor]]: if input_ids is not None and not isinstance(input_ids, tf.Tensor): @@ -2443,15 +2389,6 @@ def call( global_attentions=outputs.global_attentions, ) - def serving_output(self, output): - hs = tf.convert_to_tensor(output.hidden_states) if self.config.output_hidden_states else None - attns = tf.convert_to_tensor(output.attentions) if self.config.output_attentions else None - g_attns = tf.convert_to_tensor(output.global_attentions) if self.config.output_attentions else None - - return TFLongformerSequenceClassifierOutput( - logits=output.logits, hidden_states=hs, attentions=attns, global_attentions=g_attns - ) - @add_start_docstrings( """ @@ -2474,11 +2411,12 @@ def __init__(self, config, *inputs, **kwargs): ) @property - def dummy_inputs(self): - input_ids = tf.convert_to_tensor(MULTIPLE_CHOICE_DUMMY_INPUTS, dtype=tf.int32) - # make sure global layers are initialized - global_attention_mask = tf.convert_to_tensor([[[0, 0, 0, 1], [0, 0, 0, 1]]] * 2, dtype=tf.int32) - return {"input_ids": input_ids, "global_attention_mask": global_attention_mask} + def input_signature(self): + return { + "input_ids": tf.TensorSpec((None, None, None), tf.int32, name="input_ids"), + "attention_mask": tf.TensorSpec((None, None, None), tf.int32, name="attention_mask"), + "global_attention_mask": tf.TensorSpec((None, None, None), tf.int32, name="global_attention_mask"), + } @unpack_inputs @add_start_docstrings_to_model_forward( @@ -2491,17 +2429,17 @@ def dummy_inputs(self): ) def call( self, - input_ids: Optional[TFModelInputType] = None, - attention_mask: Optional[Union[np.ndarray, tf.Tensor]] = None, - head_mask: Optional[Union[np.ndarray, tf.Tensor]] = None, - token_type_ids: Optional[Union[np.ndarray, tf.Tensor]] = None, - position_ids: Optional[Union[np.ndarray, tf.Tensor]] = None, - global_attention_mask: Optional[Union[np.ndarray, tf.Tensor]] = None, - inputs_embeds: Optional[Union[np.ndarray, tf.Tensor]] = None, + input_ids: TFModelInputType | None = None, + attention_mask: np.ndarray | tf.Tensor | None = None, + head_mask: np.ndarray | tf.Tensor | None = None, + token_type_ids: np.ndarray | tf.Tensor | None = None, + position_ids: np.ndarray | tf.Tensor | None = None, + global_attention_mask: np.ndarray | tf.Tensor | None = None, + inputs_embeds: np.ndarray | tf.Tensor | None = None, output_attentions: Optional[bool] = None, output_hidden_states: Optional[bool] = None, return_dict: Optional[bool] = None, - labels: Optional[Union[np.ndarray, tf.Tensor]] = None, + labels: np.ndarray | tf.Tensor | None = None, training: Optional[bool] = False, ) -> Union[TFLongformerMultipleChoiceModelOutput, Tuple[tf.Tensor]]: r""" @@ -2565,28 +2503,6 @@ def call( global_attentions=outputs.global_attentions, ) - @tf.function( - input_signature=[ - { - "input_ids": tf.TensorSpec((None, None, None), tf.int32, name="input_ids"), - "attention_mask": tf.TensorSpec((None, None, None), tf.int32, name="attention_mask"), - } - ] - ) - def serving(self, inputs): - output = self.call(inputs) - - return self.serving_output(output) - - def serving_output(self, output): - hs = tf.convert_to_tensor(output.hidden_states) if self.config.output_hidden_states else None - attns = tf.convert_to_tensor(output.attentions) if self.config.output_attentions else None - g_attns = tf.convert_to_tensor(output.global_attentions) if self.config.output_attentions else None - - return TFLongformerMultipleChoiceModelOutput( - logits=output.logits, hidden_states=hs, attentions=attns, global_attentions=g_attns - ) - @add_start_docstrings( """ @@ -2619,13 +2535,13 @@ def __init__(self, config, *inputs, **kwargs): ) def call( self, - input_ids: Optional[TFModelInputType] = None, - attention_mask: Optional[Union[np.ndarray, tf.Tensor]] = None, - head_mask: Optional[Union[np.ndarray, tf.Tensor]] = None, - token_type_ids: Optional[Union[np.ndarray, tf.Tensor]] = None, - position_ids: Optional[Union[np.ndarray, tf.Tensor]] = None, - global_attention_mask: Optional[Union[np.ndarray, tf.Tensor]] = None, - inputs_embeds: Optional[Union[np.ndarray, tf.Tensor]] = None, + input_ids: TFModelInputType | None = None, + attention_mask: np.ndarray | tf.Tensor | None = None, + head_mask: np.ndarray | tf.Tensor | None = None, + token_type_ids: np.ndarray | tf.Tensor | None = None, + position_ids: np.ndarray | tf.Tensor | None = None, + global_attention_mask: np.ndarray | tf.Tensor | None = None, + inputs_embeds: np.ndarray | tf.Tensor | None = None, output_attentions: Optional[bool] = None, output_hidden_states: Optional[bool] = None, return_dict: Optional[bool] = None, @@ -2666,12 +2582,3 @@ def call( attentions=outputs.attentions, global_attentions=outputs.global_attentions, ) - - def serving_output(self, output): - hs = tf.convert_to_tensor(output.hidden_states) if self.config.output_hidden_states else None - attns = tf.convert_to_tensor(output.attentions) if self.config.output_attentions else None - g_attns = tf.convert_to_tensor(output.global_attentions) if self.config.output_attentions else None - - return TFLongformerTokenClassifierOutput( - logits=output.logits, hidden_states=hs, attentions=attns, global_attentions=g_attns - ) diff --git a/src/transformers/models/lxmert/modeling_lxmert.py b/src/transformers/models/lxmert/modeling_lxmert.py index 9fe7ecb730fdac..73586872e8a148 100644 --- a/src/transformers/models/lxmert/modeling_lxmert.py +++ b/src/transformers/models/lxmert/modeling_lxmert.py @@ -111,7 +111,7 @@ class LxmertForQuestionAnsweringOutput(ModelOutput): loss (*optional*, returned when `labels` is provided, `torch.FloatTensor` of shape `(1,)`): Total loss as the sum of the masked language modeling loss and the next sequence prediction (classification) loss.k. - question_answering_score: (`torch.FloatTensor` of shape `(batch_size, n_qa_answers)`, *optional*): + question_answering_score (`torch.FloatTensor` of shape `(batch_size, n_qa_answers)`, *optional*): Prediction scores of question answering objective (classification). language_hidden_states (`tuple(torch.FloatTensor)`, *optional*, returned when `output_hidden_states=True` is passed or when `config.output_hidden_states=True`): Tuple of `torch.FloatTensor` (one for input features + one for the output of each cross-modality layer) of @@ -153,10 +153,10 @@ class LxmertForPreTrainingOutput(ModelOutput): (classification) loss. prediction_logits (`torch.FloatTensor` of shape `(batch_size, sequence_length, config.vocab_size)`): Prediction scores of the language modeling head (scores for each vocabulary token before SoftMax). - cross_relationship_score: (`torch.FloatTensor` of shape `(batch_size, 2)`): + cross_relationship_score (`torch.FloatTensor` of shape `(batch_size, 2)`): Prediction scores of the textual matching objective (classification) head (scores of True/False continuation before SoftMax). - question_answering_score: (`torch.FloatTensor` of shape `(batch_size, n_qa_answers)`): + question_answering_score (`torch.FloatTensor` of shape `(batch_size, n_qa_answers)`): Prediction scores of question answering objective (classification). language_hidden_states (`tuple(torch.FloatTensor)`, *optional*, returned when `output_hidden_states=True` is passed or when `config.output_hidden_states=True`): Tuple of `torch.FloatTensor` (one for input features + one for the output of each cross-modality layer) of @@ -828,12 +828,12 @@ def _init_weights(self, module): [`PreTrainedTokenizer.__call__`] for details. [What are input IDs?](../glossary#input-ids) - visual_feats: (`torch.FloatTensor` of shape `(batch_size, num_visual_features, visual_feat_dim)`): + visual_feats (`torch.FloatTensor` of shape `(batch_size, num_visual_features, visual_feat_dim)`): This input represents visual features. They ROI pooled object features from bounding boxes using a faster-RCNN model) These are currently not provided by the transformers library. - visual_pos: (`torch.FloatTensor` of shape `(batch_size, num_visual_features, visual_pos_dim)`): + visual_pos (`torch.FloatTensor` of shape `(batch_size, num_visual_features, visual_pos_dim)`): This input represents spacial features corresponding to their relative (via index) visual features. The pre-trained LXMERT model expects these spacial features to be normalized bounding boxes on a scale of 0 to 1. @@ -1171,7 +1171,7 @@ def forward( Labels for computing the masked language modeling loss. Indices should be in `[-100, 0, ..., config.vocab_size]` (see `input_ids` docstring) Tokens with indices set to `-100` are ignored (masked), the loss is only computed for the tokens with labels in `[0, ..., config.vocab_size]` - obj_labels: (`Dict[Str: Tuple[Torch.FloatTensor, Torch.FloatTensor]]`, *optional*): + obj_labels (`Dict[Str: Tuple[Torch.FloatTensor, Torch.FloatTensor]]`, *optional*): each key is named after each one of the visual losses and each element of the tuple is of the shape `(batch_size, num_features)` and `(batch_size, num_features, visual_feature_dim)` for each the label id and the label score respectively @@ -1398,7 +1398,7 @@ def forward( return_dict: Optional[bool] = None, ) -> Union[LxmertForQuestionAnsweringOutput, Tuple[torch.FloatTensor]]: r""" - labels: (`Torch.Tensor` of shape `(batch_size)`, *optional*): + labels (`Torch.Tensor` of shape `(batch_size)`, *optional*): A one-hot representation of the correct answer """ return_dict = return_dict if return_dict is not None else self.config.use_return_dict diff --git a/src/transformers/models/lxmert/modeling_tf_lxmert.py b/src/transformers/models/lxmert/modeling_tf_lxmert.py index 948053c93e287a..0b54702d761d59 100644 --- a/src/transformers/models/lxmert/modeling_tf_lxmert.py +++ b/src/transformers/models/lxmert/modeling_tf_lxmert.py @@ -16,6 +16,9 @@ # limitations under the License. """ TF 2.0 LXMERT model.""" + +from __future__ import annotations + import warnings from dataclasses import dataclass from typing import Dict, Optional, Tuple, Union @@ -90,14 +93,14 @@ class TFLxmertModelOutput(ModelOutput): the self-attention heads. """ - language_output: Optional[tf.Tensor] = None - vision_output: Optional[tf.Tensor] = None - pooled_output: Optional[tf.Tensor] = None - language_hidden_states: Optional[Tuple[tf.Tensor]] = None - vision_hidden_states: Optional[Tuple[tf.Tensor]] = None - language_attentions: Optional[Tuple[tf.Tensor]] = None - vision_attentions: Optional[Tuple[tf.Tensor]] = None - cross_encoder_attentions: Optional[Tuple[tf.Tensor]] = None + language_output: tf.Tensor | None = None + vision_output: tf.Tensor | None = None + pooled_output: tf.Tensor | None = None + language_hidden_states: Tuple[tf.Tensor] | None = None + vision_hidden_states: Tuple[tf.Tensor] | None = None + language_attentions: Tuple[tf.Tensor] | None = None + vision_attentions: Tuple[tf.Tensor] | None = None + cross_encoder_attentions: Tuple[tf.Tensor] | None = None @dataclass @@ -111,10 +114,10 @@ class TFLxmertForPreTrainingOutput(ModelOutput): (classification) loss. prediction_logits (`tf.Tensor` of shape `(batch_size, sequence_length, config.vocab_size)`): Prediction scores of the language modeling head (scores for each vocabulary token before SoftMax). - cross_relationship_score: (`tf.Tensor` of shape `(batch_size, 2)`): + cross_relationship_score (`tf.Tensor` of shape `(batch_size, 2)`): Prediction scores of the textual matching objective (classification) head (scores of True/False continuation before SoftMax). - question_answering_score: (`tf.Tensor` of shape `(batch_size, n_qa_answers)`): + question_answering_score (`tf.Tensor` of shape `(batch_size, n_qa_answers)`): Prediction scores of question answering objective (classification). language_hidden_states (`tuple(tf.Tensor)`, *optional*, returned when `output_hidden_states=True` is passed or when `config.output_hidden_states=True`): Tuple of `tf.Tensor` (one for input features + one for the output of each cross-modality layer) of shape @@ -137,15 +140,15 @@ class TFLxmertForPreTrainingOutput(ModelOutput): """ - loss: Optional[tf.Tensor] = None - prediction_logits: Optional[tf.Tensor] = None - cross_relationship_score: Optional[tf.Tensor] = None - question_answering_score: Optional[tf.Tensor] = None - language_hidden_states: Optional[Tuple[tf.Tensor]] = None - vision_hidden_states: Optional[Tuple[tf.Tensor]] = None - language_attentions: Optional[Tuple[tf.Tensor]] = None - vision_attentions: Optional[Tuple[tf.Tensor]] = None - cross_encoder_attentions: Optional[Tuple[tf.Tensor]] = None + loss: tf.Tensor | None = None + prediction_logits: tf.Tensor | None = None + cross_relationship_score: tf.Tensor | None = None + question_answering_score: tf.Tensor | None = None + language_hidden_states: Tuple[tf.Tensor] | None = None + vision_hidden_states: Tuple[tf.Tensor] | None = None + language_attentions: Tuple[tf.Tensor] | None = None + vision_attentions: Tuple[tf.Tensor] | None = None + cross_encoder_attentions: Tuple[tf.Tensor] | None = None class TFLxmertVisualFeatureEncoder(tf.keras.layers.Layer): @@ -633,26 +636,6 @@ def call( class TFLxmertMainLayer(tf.keras.layers.Layer): config_class = LxmertConfig - @property - def dummy_inputs(self): - """ - Dummy inputs to build the network. - - Returns: - tf.Tensor with dummy inputs - """ - batch_size = 2 - num_visual_features = 10 - input_ids = tf.constant([[3, 5, 6], [2, 3, 4]], dtype=tf.int32) - visual_feats = tf.random.uniform((batch_size, num_visual_features, self.config.visual_feat_dim)) - visual_pos = tf.random.uniform((batch_size, num_visual_features, 4)) - - return { - "input_ids": input_ids, - "visual_feats": visual_feats, - "visual_pos": visual_pos, - } - def __init__(self, config, **kwargs): super().__init__(**kwargs) @@ -799,25 +782,35 @@ class TFLxmertPreTrainedModel(TFPreTrainedModel): base_model_prefix = "lxmert" @property - def dummy_inputs(self) -> Dict[str, tf.Tensor]: - return getattr(self, self.base_model_prefix).dummy_inputs - - @tf.function( - input_signature=[ - { - "input_ids": tf.TensorSpec((None, None), tf.int32, name="input_ids"), - "attention_mask": tf.TensorSpec((None, None), tf.int32, name="attention_mask"), - "visual_feats": tf.TensorSpec((None, None, None), tf.float32, name="visual_feats"), - "visual_pos": tf.TensorSpec((None, None, None), tf.float32, name="visual_pos"), - "visual_attention_mask": tf.TensorSpec((None, None), tf.int32, name="visual_attention_mask"), - "token_type_ids": tf.TensorSpec((None, None), tf.int32, name="token_type_ids"), - } - ] - ) - def serving(self, inputs): - output = self.call(inputs) + def dummy_inputs(self): + """ + Dummy inputs to build the network. - return self.serving_output(output) + Returns: + tf.Tensor with dummy inputs + """ + batch_size = 2 + num_visual_features = 10 + input_ids = tf.constant([[3, 5, 6], [2, 3, 4]], dtype=tf.int32) + visual_feats = tf.random.uniform((batch_size, num_visual_features, self.config.visual_feat_dim)) + visual_pos = tf.random.uniform((batch_size, num_visual_features, 4)) + + return { + "input_ids": input_ids, + "visual_feats": visual_feats, + "visual_pos": visual_pos, + } + + @property + def input_signature(self): + return { + "input_ids": tf.TensorSpec((None, None), tf.int32, name="input_ids"), + "attention_mask": tf.TensorSpec((None, None), tf.int32, name="attention_mask"), + "visual_feats": tf.TensorSpec((None, None, self.config.visual_feat_dim), tf.float32, name="visual_feats"), + "visual_pos": tf.TensorSpec((None, None, 4), tf.float32, name="visual_pos"), + "visual_attention_mask": tf.TensorSpec((None, None), tf.int32, name="visual_attention_mask"), + "token_type_ids": tf.TensorSpec((None, None), tf.int32, name="token_type_ids"), + } LXMERT_START_DOCSTRING = r""" @@ -873,12 +866,12 @@ def serving(self, inputs): [`PreTrainedTokenizer.encode`] for details. [What are input IDs?](../glossary#input-ids) - visual_feats: (`tf.Tensor` of shape `(batch_size, num_visual_features, visual_feat_dim)`): + visual_feats (`tf.Tensor` of shape `(batch_size, num_visual_features, visual_feat_dim)`): This input represents visual features. They ROI pooled object features from bounding boxes using a faster-RCNN model) These are currently not provided by the transformers library. - visual_pos: (`tf.Tensor` of shape `(batch_size, num_visual_features, visual_feat_dim)`): + visual_pos (`tf.Tensor` of shape `(batch_size, num_visual_features, visual_feat_dim)`): This input represents spacial features corresponding to their relative (via index) visual features. The pre-trained LXMERT model expects these spacial features to be normalized bounding boxes on a scale of 0 to 1. @@ -945,13 +938,13 @@ def __init__(self, config, *inputs, **kwargs): ) def call( self, - input_ids: Optional[TFModelInputType] = None, - visual_feats: Optional[tf.Tensor] = None, - visual_pos: Optional[tf.Tensor] = None, - attention_mask: Optional[Union[np.ndarray, tf.Tensor]] = None, - visual_attention_mask: Optional[Union[np.ndarray, tf.Tensor]] = None, - token_type_ids: Optional[Union[np.ndarray, tf.Tensor]] = None, - inputs_embeds: Optional[Union[np.ndarray, tf.Tensor]] = None, + input_ids: TFModelInputType | None = None, + visual_feats: tf.Tensor | None = None, + visual_pos: tf.Tensor | None = None, + attention_mask: np.ndarray | tf.Tensor | None = None, + visual_attention_mask: np.ndarray | tf.Tensor | None = None, + token_type_ids: np.ndarray | tf.Tensor | None = None, + inputs_embeds: np.ndarray | tf.Tensor | None = None, output_attentions: Optional[bool] = None, output_hidden_states: Optional[bool] = None, return_dict: Optional[bool] = None, @@ -973,24 +966,6 @@ def call( return outputs - def serving_output(self, output): - l_hs = tf.convert_to_tensor(output.language_hidden_states) if self.config.output_hidden_states else None - v_hs = tf.convert_to_tensor(output.vision_hidden_states) if self.config.output_hidden_states else None - l_attns = tf.convert_to_tensor(output.language_attentions) if self.config.output_attentions else None - v_attns = tf.convert_to_tensor(output.vision_attentions) if self.config.output_attentions else None - c_enc_attns = tf.convert_to_tensor(output.cross_encoder_attentions) if self.config.output_attentions else None - - return TFLxmertModelOutput( - pooled_output=output.pooled_output, - language_output=output.language_output, - vision_output=output.vision_output, - language_hidden_states=l_hs, - vision_hidden_states=v_hs, - language_attentions=l_attns, - vision_attentions=v_attns, - cross_encoder_attentions=c_enc_attns, - ) - class TFLxmertPooler(tf.keras.layers.Layer): def __init__(self, config, **kwargs): @@ -1297,7 +1272,7 @@ def call( Labels for computing the masked language modeling loss. Indices should be in `[-100, 0, ..., config.vocab_size]` (see `input_ids` docstring) Tokens with indices set to `-100` are ignored (masked), the loss is only computed for the tokens with labels in `[0, ..., config.vocab_size]` - obj_labels: (`Dict[Str: Tuple[tf.Tensor, tf.Tensor]]`, *optional*, defaults to `None`): + obj_labels (`Dict[Str: Tuple[tf.Tensor, tf.Tensor]]`, *optional*, defaults to `None`): each key is named after each one of the visual losses and each element of the tuple is of the shape `(batch_size, num_features)` and `(batch_size, num_features, visual_feature_dim)` for each the label id and the label score respectively @@ -1412,21 +1387,3 @@ def call( vision_attentions=lxmert_output.vision_attentions, cross_encoder_attentions=lxmert_output.cross_encoder_attentions, ) - - def serving_output(self, output): - l_hs = tf.convert_to_tensor(output.language_hidden_states) if self.config.output_hidden_states else None - v_hs = tf.convert_to_tensor(output.vision_hidden_states) if self.config.output_hidden_states else None - l_attns = tf.convert_to_tensor(output.language_attentions) if self.config.output_attentions else None - v_attns = tf.convert_to_tensor(output.vision_attentions) if self.config.output_attentions else None - c_enc_attns = tf.convert_to_tensor(output.cross_encoder_attentions) if self.config.output_attentions else None - - return TFLxmertForPreTrainingOutput( - prediction_logits=output.prediction_logits, - cross_relationship_score=output.cross_relationship_score, - question_answering_score=output.question_answering_score, - language_hidden_states=l_hs, - vision_hidden_states=v_hs, - language_attentions=l_attns, - vision_attentions=v_attns, - cross_encoder_attentions=c_enc_attns, - ) diff --git a/src/transformers/models/marian/modeling_tf_marian.py b/src/transformers/models/marian/modeling_tf_marian.py index 17511588320193..9632ddeaac8f43 100644 --- a/src/transformers/models/marian/modeling_tf_marian.py +++ b/src/transformers/models/marian/modeling_tf_marian.py @@ -15,6 +15,8 @@ """ TF 2.0 Marian model.""" +from __future__ import annotations + import random from typing import Optional, Tuple, Union @@ -31,7 +33,6 @@ # Public API from ...modeling_tf_utils import ( - DUMMY_INPUTS, TFCausalLanguageModelingLoss, TFPreTrainedModel, keras_serializable, @@ -165,7 +166,7 @@ def _init_weight(n_pos: int, dim: int): return table def call( - self, input_shape: tf.TensorShape, past_key_values_length: int = 0, position_ids: Optional[tf.Tensor] = None + self, input_shape: tf.TensorShape, past_key_values_length: int = 0, position_ids: tf.Tensor | None = None ): """Input is expected to be of size [bsz x seqlen].""" if position_ids is None: @@ -212,12 +213,12 @@ def _shape(self, tensor: tf.Tensor, seq_len: int, bsz: int): def call( self, hidden_states: tf.Tensor, - key_value_states: Optional[tf.Tensor] = None, - past_key_value: Optional[Tuple[Tuple[tf.Tensor]]] = None, - attention_mask: Optional[tf.Tensor] = None, - layer_head_mask: Optional[tf.Tensor] = None, + key_value_states: tf.Tensor | None = None, + past_key_value: Tuple[Tuple[tf.Tensor]] | None = None, + attention_mask: tf.Tensor | None = None, + layer_head_mask: tf.Tensor | None = None, training: Optional[bool] = False, - ) -> Tuple[tf.Tensor, Optional[tf.Tensor]]: + ) -> Tuple[tf.Tensor, tf.Tensor | None]: """Input shape: Batch x Time x Channel""" # if key_value_states are provided this layer is used as a cross-attention layer @@ -347,8 +348,8 @@ def __init__(self, config: MarianConfig, **kwargs): def call( self, hidden_states: tf.Tensor, - attention_mask: Optional[Union[np.ndarray, tf.Tensor]], - layer_head_mask: Optional[tf.Tensor], + attention_mask: np.ndarray | tf.Tensor | None, + layer_head_mask: tf.Tensor | None, training: Optional[bool] = False, ) -> tf.Tensor: """ @@ -417,11 +418,11 @@ def __init__(self, config: MarianConfig, **kwargs): def call( self, hidden_states: tf.Tensor, - attention_mask: Optional[Union[np.ndarray, tf.Tensor]] = None, - encoder_hidden_states: Optional[Union[np.ndarray, tf.Tensor]] = None, - encoder_attention_mask: Optional[Union[np.ndarray, tf.Tensor]] = None, - layer_head_mask: Optional[tf.Tensor] = None, - cross_attn_layer_head_mask: Optional[tf.Tensor] = None, + attention_mask: np.ndarray | tf.Tensor | None = None, + encoder_hidden_states: np.ndarray | tf.Tensor | None = None, + encoder_attention_mask: np.ndarray | tf.Tensor | None = None, + layer_head_mask: tf.Tensor | None = None, + cross_attn_layer_head_mask: tf.Tensor | None = None, past_key_value: Optional[Tuple[Tuple[Union[np.ndarray, tf.Tensor]]]] = None, training: Optional[bool] = False, ) -> Tuple[tf.Tensor, tf.Tensor, Tuple[Tuple[tf.Tensor]]]: @@ -499,34 +500,6 @@ class TFMarianPreTrainedModel(TFPreTrainedModel): config_class = MarianConfig base_model_prefix = "model" - @property - def dummy_inputs(self): - pad_token = 1 - input_ids = tf.cast(tf.convert_to_tensor(DUMMY_INPUTS), tf.int32) - decoder_input_ids = tf.cast(tf.convert_to_tensor(DUMMY_INPUTS), tf.int32) - dummy_inputs = { - "decoder_input_ids": decoder_input_ids, - "attention_mask": tf.cast(input_ids != pad_token, tf.int32), - "input_ids": input_ids, - } - return dummy_inputs - - @tf.function( - input_signature=[ - { - "input_ids": tf.TensorSpec((None, None), tf.int32, name="input_ids"), - "attention_mask": tf.TensorSpec((None, None), tf.int32, name="attention_mask"), - "decoder_input_ids": tf.TensorSpec((None, None), tf.int32, name="decoder_input_ids"), - "decoder_attention_mask": tf.TensorSpec((None, None), tf.int32, name="decoder_attention_mask"), - } - ] - ) - # Copied from transformers.models.bart.modeling_tf_bart.TFBartPretrainedModel.serving - def serving(self, inputs): - output = self.call(inputs) - - return self.serving_output(output) - MARIAN_START_DOCSTRING = r""" This model inherits from [`TFPreTrainedModel`]. Check the superclass documentation for the generic methods the @@ -708,10 +681,10 @@ def set_embed_tokens(self, embed_tokens): @unpack_inputs def call( self, - input_ids: Optional[tf.Tensor] = None, - inputs_embeds: Optional[tf.Tensor] = None, - attention_mask: Optional[tf.Tensor] = None, - head_mask: Optional[tf.Tensor] = None, + input_ids: tf.Tensor | None = None, + inputs_embeds: tf.Tensor | None = None, + attention_mask: tf.Tensor | None = None, + head_mask: tf.Tensor | None = None, output_attentions: Optional[bool] = None, output_hidden_states: Optional[bool] = None, return_dict: Optional[bool] = None, @@ -870,15 +843,15 @@ def set_embed_tokens(self, embed_tokens): @unpack_inputs def call( self, - input_ids: Optional[tf.Tensor] = None, - inputs_embeds: Optional[tf.Tensor] = None, - attention_mask: Optional[tf.Tensor] = None, - position_ids: Optional[tf.Tensor] = None, - encoder_hidden_states: Optional[tf.Tensor] = None, - encoder_attention_mask: Optional[tf.Tensor] = None, - head_mask: Optional[tf.Tensor] = None, - cross_attn_head_mask: Optional[tf.Tensor] = None, - past_key_values: Optional[Tuple[Tuple[tf.Tensor]]] = None, + input_ids: tf.Tensor | None = None, + inputs_embeds: tf.Tensor | None = None, + attention_mask: tf.Tensor | None = None, + position_ids: tf.Tensor | None = None, + encoder_hidden_states: tf.Tensor | None = None, + encoder_attention_mask: tf.Tensor | None = None, + head_mask: tf.Tensor | None = None, + cross_attn_head_mask: tf.Tensor | None = None, + past_key_values: Tuple[Tuple[tf.Tensor]] | None = None, use_cache: Optional[bool] = None, output_attentions: Optional[bool] = None, output_hidden_states: Optional[bool] = None, @@ -1097,18 +1070,18 @@ def set_input_embeddings(self, new_embeddings): @unpack_inputs def call( self, - input_ids: Optional[tf.Tensor] = None, - attention_mask: Optional[tf.Tensor] = None, - decoder_input_ids: Optional[tf.Tensor] = None, - decoder_attention_mask: Optional[tf.Tensor] = None, - decoder_position_ids: Optional[tf.Tensor] = None, - head_mask: Optional[tf.Tensor] = None, - decoder_head_mask: Optional[tf.Tensor] = None, - cross_attn_head_mask: Optional[tf.Tensor] = None, + input_ids: tf.Tensor | None = None, + attention_mask: tf.Tensor | None = None, + decoder_input_ids: tf.Tensor | None = None, + decoder_attention_mask: tf.Tensor | None = None, + decoder_position_ids: tf.Tensor | None = None, + head_mask: tf.Tensor | None = None, + decoder_head_mask: tf.Tensor | None = None, + cross_attn_head_mask: tf.Tensor | None = None, encoder_outputs: Optional[Union[Tuple, TFBaseModelOutput]] = None, past_key_values: Tuple[Tuple[tf.Tensor]] = None, - inputs_embeds: Optional[tf.Tensor] = None, - decoder_inputs_embeds: Optional[tf.Tensor] = None, + inputs_embeds: tf.Tensor | None = None, + decoder_inputs_embeds: tf.Tensor | None = None, use_cache: Optional[bool] = None, output_attentions: Optional[bool] = None, output_hidden_states: Optional[bool] = None, @@ -1202,18 +1175,18 @@ def get_decoder(self): ) def call( self, - input_ids: Optional[tf.Tensor] = None, - attention_mask: Optional[tf.Tensor] = None, - decoder_input_ids: Optional[tf.Tensor] = None, - decoder_attention_mask: Optional[tf.Tensor] = None, - decoder_position_ids: Optional[tf.Tensor] = None, - head_mask: Optional[tf.Tensor] = None, - decoder_head_mask: Optional[tf.Tensor] = None, - cross_attn_head_mask: Optional[tf.Tensor] = None, - encoder_outputs: Optional[tf.Tensor] = None, - past_key_values: Optional[Tuple[Tuple[tf.Tensor]]] = None, - inputs_embeds: Optional[tf.Tensor] = None, - decoder_inputs_embeds: Optional[tf.Tensor] = None, + input_ids: tf.Tensor | None = None, + attention_mask: tf.Tensor | None = None, + decoder_input_ids: tf.Tensor | None = None, + decoder_attention_mask: tf.Tensor | None = None, + decoder_position_ids: tf.Tensor | None = None, + head_mask: tf.Tensor | None = None, + decoder_head_mask: tf.Tensor | None = None, + cross_attn_head_mask: tf.Tensor | None = None, + encoder_outputs: tf.Tensor | None = None, + past_key_values: Tuple[Tuple[tf.Tensor]] | None = None, + inputs_embeds: tf.Tensor | None = None, + decoder_inputs_embeds: tf.Tensor | None = None, use_cache: Optional[bool] = None, output_attentions: Optional[bool] = None, output_hidden_states: Optional[bool] = None, @@ -1330,23 +1303,23 @@ def set_bias(self, value): @add_end_docstrings(MARIAN_GENERATION_EXAMPLE) def call( self, - input_ids: Optional[tf.Tensor] = None, - attention_mask: Optional[tf.Tensor] = None, - decoder_input_ids: Optional[tf.Tensor] = None, - decoder_attention_mask: Optional[tf.Tensor] = None, - decoder_position_ids: Optional[tf.Tensor] = None, - head_mask: Optional[tf.Tensor] = None, - decoder_head_mask: Optional[tf.Tensor] = None, - cross_attn_head_mask: Optional[tf.Tensor] = None, + input_ids: tf.Tensor | None = None, + attention_mask: tf.Tensor | None = None, + decoder_input_ids: tf.Tensor | None = None, + decoder_attention_mask: tf.Tensor | None = None, + decoder_position_ids: tf.Tensor | None = None, + head_mask: tf.Tensor | None = None, + decoder_head_mask: tf.Tensor | None = None, + cross_attn_head_mask: tf.Tensor | None = None, encoder_outputs: Optional[TFBaseModelOutput] = None, - past_key_values: Optional[Tuple[Tuple[tf.Tensor]]] = None, - inputs_embeds: Optional[tf.Tensor] = None, - decoder_inputs_embeds: Optional[tf.Tensor] = None, + past_key_values: Tuple[Tuple[tf.Tensor]] | None = None, + inputs_embeds: tf.Tensor | None = None, + decoder_inputs_embeds: tf.Tensor | None = None, use_cache: Optional[bool] = None, output_attentions: Optional[bool] = None, output_hidden_states: Optional[bool] = None, return_dict: Optional[bool] = None, - labels: Optional[tf.Tensor] = None, + labels: tf.Tensor | None = None, training: bool = False, ): r""" diff --git a/src/transformers/models/mask2former/modeling_mask2former.py b/src/transformers/models/mask2former/modeling_mask2former.py index 289b4919e36989..4cb2493e58c8bb 100644 --- a/src/transformers/models/mask2former/modeling_mask2former.py +++ b/src/transformers/models/mask2former/modeling_mask2former.py @@ -1767,7 +1767,7 @@ class Mask2FormerMaskedAttentionDecoder(nn.Module): of the predicted mask for each query, instead of attending to the full feature map. Args: - config: (`Mask2FormerConfig`): + config (`Mask2FormerConfig`): Configuration used to instantiate Mask2FormerMaskedAttentionDecoder. """ @@ -2003,7 +2003,7 @@ def __init__(self, hidden_size: int, num_heads: int, mask_feature_size: torch.Te The feature dimension of the Mask2FormerMaskedAttentionDecoder num_heads (`int`): The number of heads used in the Mask2FormerMaskedAttentionDecoder - mask_feature_size: (`torch.Tensor`): + mask_feature_size (`torch.Tensor`): one of the output dimensions of the predicted masks for each query """ super().__init__() diff --git a/src/transformers/models/mbart/modeling_tf_mbart.py b/src/transformers/models/mbart/modeling_tf_mbart.py index 13453bd22dbaaa..b0e2d141f4fa3b 100644 --- a/src/transformers/models/mbart/modeling_tf_mbart.py +++ b/src/transformers/models/mbart/modeling_tf_mbart.py @@ -15,6 +15,8 @@ """ TF 2.0 MBart model.""" +from __future__ import annotations + import random from typing import Optional, Tuple, Union @@ -30,7 +32,6 @@ # Public API from ...modeling_tf_utils import ( - DUMMY_INPUTS, TFCausalLanguageModelingLoss, TFModelInputType, TFPreTrainedModel, @@ -131,7 +132,7 @@ def call( self, input_shape: Optional[tf.TensorShape] = None, past_key_values_length: int = 0, - position_ids: Optional[tf.Tensor] = None, + position_ids: tf.Tensor | None = None, ): """Input is expected to be of size [bsz x seqlen].""" if position_ids is None: @@ -181,12 +182,12 @@ def _shape(self, tensor: tf.Tensor, seq_len: int, bsz: int): def call( self, hidden_states: tf.Tensor, - key_value_states: Optional[tf.Tensor] = None, - past_key_value: Optional[Tuple[Tuple[tf.Tensor]]] = None, - attention_mask: Optional[tf.Tensor] = None, - layer_head_mask: Optional[tf.Tensor] = None, + key_value_states: tf.Tensor | None = None, + past_key_value: Tuple[Tuple[tf.Tensor]] | None = None, + attention_mask: tf.Tensor | None = None, + layer_head_mask: tf.Tensor | None = None, training: Optional[bool] = False, - ) -> Tuple[tf.Tensor, Optional[tf.Tensor]]: + ) -> Tuple[tf.Tensor, tf.Tensor | None]: """Input shape: Batch x Time x Channel""" # if key_value_states are provided this layer is used as a cross-attention layer @@ -384,12 +385,12 @@ def __init__(self, config: MBartConfig, **kwargs): def call( self, hidden_states: tf.Tensor, - attention_mask: Optional[tf.Tensor] = None, - encoder_hidden_states: Optional[tf.Tensor] = None, - encoder_attention_mask: Optional[tf.Tensor] = None, - layer_head_mask: Optional[tf.Tensor] = None, - cross_attn_layer_head_mask: Optional[tf.Tensor] = None, - past_key_value: Optional[Tuple[tf.Tensor]] = None, + attention_mask: tf.Tensor | None = None, + encoder_hidden_states: tf.Tensor | None = None, + encoder_attention_mask: tf.Tensor | None = None, + layer_head_mask: tf.Tensor | None = None, + cross_attn_layer_head_mask: tf.Tensor | None = None, + past_key_value: Tuple[tf.Tensor] | None = None, training: Optional[bool] = False, ) -> Tuple[tf.Tensor, tf.Tensor, Tuple[Tuple[tf.Tensor]]]: """ @@ -466,34 +467,6 @@ class TFMBartPreTrainedModel(TFPreTrainedModel): config_class = MBartConfig base_model_prefix = "model" - @property - def dummy_inputs(self): - pad_token = 1 - input_ids = tf.cast(tf.convert_to_tensor(DUMMY_INPUTS), tf.int32) - decoder_input_ids = tf.cast(tf.convert_to_tensor(DUMMY_INPUTS), tf.int32) - dummy_inputs = { - "decoder_input_ids": decoder_input_ids, - "attention_mask": tf.cast(input_ids != pad_token, tf.int32), - "input_ids": input_ids, - } - return dummy_inputs - - @tf.function( - input_signature=[ - { - "input_ids": tf.TensorSpec((None, None), tf.int32, name="input_ids"), - "attention_mask": tf.TensorSpec((None, None), tf.int32, name="attention_mask"), - "decoder_input_ids": tf.TensorSpec((None, None), tf.int32, name="decoder_input_ids"), - "decoder_attention_mask": tf.TensorSpec((None, None), tf.int32, name="decoder_attention_mask"), - } - ] - ) - # Copied from transformers.models.bart.modeling_tf_bart.TFBartPretrainedModel.serving - def serving(self, inputs): - output = self.call(inputs) - - return self.serving_output(output) - MBART_START_DOCSTRING = r""" This model inherits from [`TFPreTrainedModel`]. Check the superclass documentation for the generic methods the @@ -700,10 +673,10 @@ def set_embed_tokens(self, embed_tokens): @unpack_inputs def call( self, - input_ids: Optional[TFModelInputType] = None, - inputs_embeds: Optional[tf.Tensor] = None, - attention_mask: Optional[tf.Tensor] = None, - head_mask: Optional[tf.Tensor] = None, + input_ids: TFModelInputType | None = None, + inputs_embeds: tf.Tensor | None = None, + attention_mask: tf.Tensor | None = None, + head_mask: tf.Tensor | None = None, output_attentions: Optional[bool] = None, output_hidden_states: Optional[bool] = None, return_dict: Optional[bool] = None, @@ -868,14 +841,14 @@ def set_embed_tokens(self, embed_tokens): def call( self, input_ids: TFModelInputType = None, - inputs_embeds: Optional[tf.Tensor] = None, - attention_mask: Optional[tf.Tensor] = None, - position_ids: Optional[tf.Tensor] = None, - encoder_hidden_states: Optional[tf.Tensor] = None, - encoder_attention_mask: Optional[tf.Tensor] = None, - head_mask: Optional[tf.Tensor] = None, - cross_attn_head_mask: Optional[tf.Tensor] = None, - past_key_values: Optional[Tuple[Tuple[tf.Tensor]]] = None, + inputs_embeds: tf.Tensor | None = None, + attention_mask: tf.Tensor | None = None, + position_ids: tf.Tensor | None = None, + encoder_hidden_states: tf.Tensor | None = None, + encoder_attention_mask: tf.Tensor | None = None, + head_mask: tf.Tensor | None = None, + cross_attn_head_mask: tf.Tensor | None = None, + past_key_values: Tuple[Tuple[tf.Tensor]] | None = None, use_cache: Optional[bool] = None, output_attentions: Optional[bool] = None, output_hidden_states: Optional[bool] = None, @@ -1100,17 +1073,17 @@ def set_input_embeddings(self, new_embeddings): def call( self, input_ids: TFModelInputType = None, - attention_mask: Optional[tf.Tensor] = None, - decoder_input_ids: Optional[tf.Tensor] = None, - decoder_attention_mask: Optional[tf.Tensor] = None, - decoder_position_ids: Optional[tf.Tensor] = None, - head_mask: Optional[tf.Tensor] = None, - decoder_head_mask: Optional[tf.Tensor] = None, - cross_attn_head_mask: Optional[tf.Tensor] = None, + attention_mask: tf.Tensor | None = None, + decoder_input_ids: tf.Tensor | None = None, + decoder_attention_mask: tf.Tensor | None = None, + decoder_position_ids: tf.Tensor | None = None, + head_mask: tf.Tensor | None = None, + decoder_head_mask: tf.Tensor | None = None, + cross_attn_head_mask: tf.Tensor | None = None, encoder_outputs: Optional[Union[Tuple, TFBaseModelOutput]] = None, - past_key_values: Optional[Tuple[Tuple[tf.Tensor]]] = None, - inputs_embeds: Optional[tf.Tensor] = None, - decoder_inputs_embeds: Optional[tf.Tensor] = None, + past_key_values: Tuple[Tuple[tf.Tensor]] | None = None, + inputs_embeds: tf.Tensor | None = None, + decoder_inputs_embeds: tf.Tensor | None = None, use_cache: Optional[bool] = None, output_attentions: Optional[bool] = None, output_hidden_states: Optional[bool] = None, @@ -1208,17 +1181,17 @@ def get_decoder(self): def call( self, input_ids: TFModelInputType = None, - attention_mask: Optional[tf.Tensor] = None, - decoder_input_ids: Optional[tf.Tensor] = None, - decoder_attention_mask: Optional[tf.Tensor] = None, - decoder_position_ids: Optional[tf.Tensor] = None, - head_mask: Optional[tf.Tensor] = None, - decoder_head_mask: Optional[tf.Tensor] = None, - cross_attn_head_mask: Optional[tf.Tensor] = None, + attention_mask: tf.Tensor | None = None, + decoder_input_ids: tf.Tensor | None = None, + decoder_attention_mask: tf.Tensor | None = None, + decoder_position_ids: tf.Tensor | None = None, + head_mask: tf.Tensor | None = None, + decoder_head_mask: tf.Tensor | None = None, + cross_attn_head_mask: tf.Tensor | None = None, encoder_outputs: Optional[Union[Tuple, TFBaseModelOutput]] = None, - past_key_values: Optional[Tuple[Tuple[tf.Tensor]]] = None, - inputs_embeds: Optional[tf.Tensor] = None, - decoder_inputs_embeds: Optional[tf.Tensor] = None, + past_key_values: Tuple[Tuple[tf.Tensor]] | None = None, + inputs_embeds: tf.Tensor | None = None, + decoder_inputs_embeds: tf.Tensor | None = None, use_cache: Optional[bool] = None, output_attentions: Optional[bool] = None, output_hidden_states: Optional[bool] = None, @@ -1336,22 +1309,22 @@ def set_bias(self, value): def call( self, input_ids: TFModelInputType = None, - attention_mask: Optional[tf.Tensor] = None, - decoder_input_ids: Optional[tf.Tensor] = None, - decoder_attention_mask: Optional[tf.Tensor] = None, - decoder_position_ids: Optional[tf.Tensor] = None, - head_mask: Optional[tf.Tensor] = None, - decoder_head_mask: Optional[tf.Tensor] = None, - cross_attn_head_mask: Optional[tf.Tensor] = None, + attention_mask: tf.Tensor | None = None, + decoder_input_ids: tf.Tensor | None = None, + decoder_attention_mask: tf.Tensor | None = None, + decoder_position_ids: tf.Tensor | None = None, + head_mask: tf.Tensor | None = None, + decoder_head_mask: tf.Tensor | None = None, + cross_attn_head_mask: tf.Tensor | None = None, encoder_outputs: Optional[TFBaseModelOutput] = None, past_key_values: Tuple[Tuple[tf.Tensor]] = None, - inputs_embeds: Optional[tf.Tensor] = None, - decoder_inputs_embeds: Optional[tf.Tensor] = None, + inputs_embeds: tf.Tensor | None = None, + decoder_inputs_embeds: tf.Tensor | None = None, use_cache: Optional[bool] = None, output_attentions: Optional[bool] = None, output_hidden_states: Optional[bool] = None, return_dict: Optional[bool] = None, - labels: Optional[tf.Tensor] = None, + labels: tf.Tensor | None = None, training: Optional[bool] = False, ) -> Union[TFSeq2SeqLMOutput, Tuple[tf.Tensor]]: """ diff --git a/src/transformers/models/mctct/feature_extraction_mctct.py b/src/transformers/models/mctct/feature_extraction_mctct.py index 467e654244b993..9e9e276c168ca1 100644 --- a/src/transformers/models/mctct/feature_extraction_mctct.py +++ b/src/transformers/models/mctct/feature_extraction_mctct.py @@ -180,7 +180,8 @@ def __call__( Args: raw_speech (`torch.Tensor`, `np.ndarray`, `List[float]`, `List[torch.Tensor]`, `List[np.ndarray]`, `List[List[float]]`): The sequence or batch of sequences to be padded. Each sequence can be a tensor, a numpy array, a list - of float values, a list of tensors, a list of numpy arrays or a list of list of float values. + of float values, a list of tensors, a list of numpy arrays or a list of list of float values. Must be + mono channel audio, not stereo, i.e. single float per timestep. padding (`bool`, `str` or [`~file_utils.PaddingStrategy`], *optional*, defaults to `False`): Select a strategy to pad the returned sequences (according to the model's padding side and padding index) among: @@ -231,9 +232,11 @@ def __call__( "Failing to do so can result in silent errors that might be hard to debug." ) - is_batched = bool( - isinstance(raw_speech, (list, tuple)) - and (isinstance(raw_speech[0], np.ndarray) or isinstance(raw_speech[0], (tuple, list))) + is_batched_numpy = isinstance(raw_speech, np.ndarray) and len(raw_speech.shape) > 1 + if is_batched_numpy and len(raw_speech.shape) > 2: + raise ValueError(f"Only mono-channel audio is supported for input to {self}") + is_batched = is_batched_numpy or ( + isinstance(raw_speech, (list, tuple)) and (isinstance(raw_speech[0], (np.ndarray, tuple, list))) ) if is_batched: diff --git a/src/transformers/models/mega/modeling_mega.py b/src/transformers/models/mega/modeling_mega.py index 98914b439c18d5..9d1890788f4fe2 100644 --- a/src/transformers/models/mega/modeling_mega.py +++ b/src/transformers/models/mega/modeling_mega.py @@ -1743,7 +1743,9 @@ def forward( >>> config = AutoConfig.from_pretrained("mnaylor/mega-base-wikitext") >>> config.is_decoder = True >>> config.bidirectional = False - >>> model = MegaForCausalLM.from_pretrained("mnaylor/mega-base-wikitext", config=config) + >>> model = MegaForCausalLM.from_pretrained( + ... "mnaylor/mega-base-wikitext", config=config, ignore_mismatched_sizes=True + ... ) >>> inputs = tokenizer("Hello, my dog is cute", return_tensors="pt") >>> outputs = model(**inputs) diff --git a/src/transformers/models/mobilebert/modeling_tf_mobilebert.py b/src/transformers/models/mobilebert/modeling_tf_mobilebert.py index c47cde847de424..c454a8b35db13d 100644 --- a/src/transformers/models/mobilebert/modeling_tf_mobilebert.py +++ b/src/transformers/models/mobilebert/modeling_tf_mobilebert.py @@ -15,9 +15,12 @@ # limitations under the License. """ TF 2.0 MobileBERT model.""" + +from __future__ import annotations + import warnings from dataclasses import dataclass -from typing import Dict, Optional, Tuple, Union +from typing import Optional, Tuple, Union import numpy as np import tensorflow as tf @@ -48,7 +51,6 @@ ) from ...tf_utils import check_embeddings_within_bounds, shape_list, stable_softmax from ...utils import ( - MULTIPLE_CHOICE_DUMMY_INPUTS, ModelOutput, add_code_sample_docstrings, add_start_docstrings, @@ -846,11 +848,11 @@ class TFMobileBertForPreTrainingOutput(ModelOutput): heads. """ - loss: Optional[tf.Tensor] = None + loss: tf.Tensor | None = None prediction_logits: tf.Tensor = None seq_relationship_logits: tf.Tensor = None - hidden_states: Optional[Tuple[tf.Tensor]] = None - attentions: Optional[Tuple[tf.Tensor]] = None + hidden_states: Tuple[tf.Tensor] | None = None + attentions: Tuple[tf.Tensor] | None = None MOBILEBERT_START_DOCSTRING = r""" @@ -969,12 +971,12 @@ def __init__(self, config, *inputs, **kwargs): ) def call( self, - input_ids: Optional[TFModelInputType] = None, - attention_mask: Optional[Union[np.ndarray, tf.Tensor]] = None, - token_type_ids: Optional[Union[np.ndarray, tf.Tensor]] = None, - position_ids: Optional[Union[np.ndarray, tf.Tensor]] = None, - head_mask: Optional[Union[np.ndarray, tf.Tensor]] = None, - inputs_embeds: Optional[Union[np.ndarray, tf.Tensor]] = None, + input_ids: TFModelInputType | None = None, + attention_mask: np.ndarray | tf.Tensor | None = None, + token_type_ids: np.ndarray | tf.Tensor | None = None, + position_ids: np.ndarray | tf.Tensor | None = None, + head_mask: np.ndarray | tf.Tensor | None = None, + inputs_embeds: np.ndarray | tf.Tensor | None = None, output_attentions: Optional[bool] = None, output_hidden_states: Optional[bool] = None, return_dict: Optional[bool] = None, @@ -995,17 +997,6 @@ def call( return outputs - def serving_output(self, output: TFBaseModelOutputWithPooling) -> TFBaseModelOutputWithPooling: - hs = tf.convert_to_tensor(output.hidden_states) if self.config.output_hidden_states else None - attns = tf.convert_to_tensor(output.attentions) if self.config.output_attentions else None - - return TFBaseModelOutputWithPooling( - last_hidden_state=output.last_hidden_state, - pooler_output=output.pooler_output, - hidden_states=hs, - attentions=attns, - ) - @add_start_docstrings( """ @@ -1033,17 +1024,17 @@ def get_prefix_bias_name(self): @replace_return_docstrings(output_type=TFMobileBertForPreTrainingOutput, config_class=_CONFIG_FOR_DOC) def call( self, - input_ids: Optional[TFModelInputType] = None, - attention_mask: Optional[Union[np.ndarray, tf.Tensor]] = None, - token_type_ids: Optional[Union[np.ndarray, tf.Tensor]] = None, - position_ids: Optional[Union[np.ndarray, tf.Tensor]] = None, - head_mask: Optional[Union[np.ndarray, tf.Tensor]] = None, - inputs_embeds: Optional[Union[np.ndarray, tf.Tensor]] = None, + input_ids: TFModelInputType | None = None, + attention_mask: np.ndarray | tf.Tensor | None = None, + token_type_ids: np.ndarray | tf.Tensor | None = None, + position_ids: np.ndarray | tf.Tensor | None = None, + head_mask: np.ndarray | tf.Tensor | None = None, + inputs_embeds: np.ndarray | tf.Tensor | None = None, output_attentions: Optional[bool] = None, output_hidden_states: Optional[bool] = None, return_dict: Optional[bool] = None, - labels: Optional[Union[np.ndarray, tf.Tensor]] = None, - next_sentence_label: Optional[Union[np.ndarray, tf.Tensor]] = None, + labels: np.ndarray | tf.Tensor | None = None, + next_sentence_label: np.ndarray | tf.Tensor | None = None, training: Optional[bool] = False, ) -> Union[Tuple, TFMobileBertForPreTrainingOutput]: r""" @@ -1096,17 +1087,6 @@ def call( attentions=outputs.attentions, ) - def serving_output(self, output): - hs = tf.convert_to_tensor(output.hidden_states) if self.config.output_hidden_states else None - attns = tf.convert_to_tensor(output.attentions) if self.config.output_attentions else None - - return TFMobileBertForPreTrainingOutput( - prediction_logits=output.prediction_logits, - seq_relationship_logits=output.seq_relationship_logits, - hidden_states=hs, - attentions=attns, - ) - @add_start_docstrings("""MobileBert Model with a `language modeling` head on top.""", MOBILEBERT_START_DOCSTRING) class TFMobileBertForMaskedLM(TFMobileBertPreTrainedModel, TFMaskedLanguageModelingLoss): @@ -1141,16 +1121,16 @@ def get_prefix_bias_name(self): ) def call( self, - input_ids: Optional[TFModelInputType] = None, - attention_mask: Optional[Union[np.ndarray, tf.Tensor]] = None, - token_type_ids: Optional[Union[np.ndarray, tf.Tensor]] = None, - position_ids: Optional[Union[np.ndarray, tf.Tensor]] = None, - head_mask: Optional[Union[np.ndarray, tf.Tensor]] = None, - inputs_embeds: Optional[Union[np.ndarray, tf.Tensor]] = None, + input_ids: TFModelInputType | None = None, + attention_mask: np.ndarray | tf.Tensor | None = None, + token_type_ids: np.ndarray | tf.Tensor | None = None, + position_ids: np.ndarray | tf.Tensor | None = None, + head_mask: np.ndarray | tf.Tensor | None = None, + inputs_embeds: np.ndarray | tf.Tensor | None = None, output_attentions: Optional[bool] = None, output_hidden_states: Optional[bool] = None, return_dict: Optional[bool] = None, - labels: Optional[Union[np.ndarray, tf.Tensor]] = None, + labels: np.ndarray | tf.Tensor | None = None, training: Optional[bool] = False, ) -> Union[Tuple, TFMaskedLMOutput]: r""" @@ -1187,13 +1167,6 @@ def call( attentions=outputs.attentions, ) - # Copied from transformers.models.bert.modeling_tf_bert.TFBertForMaskedLM.serving_output - def serving_output(self, output: TFMaskedLMOutput) -> TFMaskedLMOutput: - hs = tf.convert_to_tensor(output.hidden_states) if self.config.output_hidden_states else None - attns = tf.convert_to_tensor(output.attentions) if self.config.output_attentions else None - - return TFMaskedLMOutput(logits=output.logits, hidden_states=hs, attentions=attns) - class TFMobileBertOnlyNSPHead(tf.keras.layers.Layer): def __init__(self, config, **kwargs): @@ -1224,16 +1197,16 @@ def __init__(self, config, *inputs, **kwargs): @replace_return_docstrings(output_type=TFNextSentencePredictorOutput, config_class=_CONFIG_FOR_DOC) def call( self, - input_ids: Optional[TFModelInputType] = None, - attention_mask: Optional[Union[np.ndarray, tf.Tensor]] = None, - token_type_ids: Optional[Union[np.ndarray, tf.Tensor]] = None, - position_ids: Optional[Union[np.ndarray, tf.Tensor]] = None, - head_mask: Optional[Union[np.ndarray, tf.Tensor]] = None, - inputs_embeds: Optional[Union[np.ndarray, tf.Tensor]] = None, + input_ids: TFModelInputType | None = None, + attention_mask: np.ndarray | tf.Tensor | None = None, + token_type_ids: np.ndarray | tf.Tensor | None = None, + position_ids: np.ndarray | tf.Tensor | None = None, + head_mask: np.ndarray | tf.Tensor | None = None, + inputs_embeds: np.ndarray | tf.Tensor | None = None, output_attentions: Optional[bool] = None, output_hidden_states: Optional[bool] = None, return_dict: Optional[bool] = None, - next_sentence_label: Optional[Union[np.ndarray, tf.Tensor]] = None, + next_sentence_label: np.ndarray | tf.Tensor | None = None, training: Optional[bool] = False, ) -> Union[Tuple, TFNextSentencePredictorOutput]: r""" @@ -1286,13 +1259,6 @@ def call( attentions=outputs.attentions, ) - # Copied from transformers.models.bert.modeling_tf_bert.TFBertForNextSentencePrediction.serving_output - def serving_output(self, output: TFNextSentencePredictorOutput) -> TFNextSentencePredictorOutput: - hs = tf.convert_to_tensor(output.hidden_states) if self.config.output_hidden_states else None - attns = tf.convert_to_tensor(output.attentions) if self.config.output_attentions else None - - return TFNextSentencePredictorOutput(logits=output.logits, hidden_states=hs, attentions=attns) - @add_start_docstrings( """ @@ -1335,16 +1301,16 @@ def __init__(self, config, *inputs, **kwargs): ) def call( self, - input_ids: Optional[TFModelInputType] = None, - attention_mask: Optional[Union[np.ndarray, tf.Tensor]] = None, - token_type_ids: Optional[Union[np.ndarray, tf.Tensor]] = None, - position_ids: Optional[Union[np.ndarray, tf.Tensor]] = None, - head_mask: Optional[Union[np.ndarray, tf.Tensor]] = None, - inputs_embeds: Optional[Union[np.ndarray, tf.Tensor]] = None, + input_ids: TFModelInputType | None = None, + attention_mask: np.ndarray | tf.Tensor | None = None, + token_type_ids: np.ndarray | tf.Tensor | None = None, + position_ids: np.ndarray | tf.Tensor | None = None, + head_mask: np.ndarray | tf.Tensor | None = None, + inputs_embeds: np.ndarray | tf.Tensor | None = None, output_attentions: Optional[bool] = None, output_hidden_states: Optional[bool] = None, return_dict: Optional[bool] = None, - labels: Optional[Union[np.ndarray, tf.Tensor]] = None, + labels: np.ndarray | tf.Tensor | None = None, training: Optional[bool] = False, ) -> Union[Tuple, TFSequenceClassifierOutput]: r""" @@ -1383,13 +1349,6 @@ def call( attentions=outputs.attentions, ) - # Copied from transformers.models.bert.modeling_tf_bert.TFBertForSequenceClassification.serving_output - def serving_output(self, output: TFSequenceClassifierOutput) -> TFSequenceClassifierOutput: - hs = tf.convert_to_tensor(output.hidden_states) if self.config.output_hidden_states else None - attns = tf.convert_to_tensor(output.attentions) if self.config.output_attentions else None - - return TFSequenceClassifierOutput(logits=output.logits, hidden_states=hs, attentions=attns) - @add_start_docstrings( """ @@ -1430,17 +1389,17 @@ def __init__(self, config, *inputs, **kwargs): ) def call( self, - input_ids: Optional[TFModelInputType] = None, - attention_mask: Optional[Union[np.ndarray, tf.Tensor]] = None, - token_type_ids: Optional[Union[np.ndarray, tf.Tensor]] = None, - position_ids: Optional[Union[np.ndarray, tf.Tensor]] = None, - head_mask: Optional[Union[np.ndarray, tf.Tensor]] = None, - inputs_embeds: Optional[Union[np.ndarray, tf.Tensor]] = None, + input_ids: TFModelInputType | None = None, + attention_mask: np.ndarray | tf.Tensor | None = None, + token_type_ids: np.ndarray | tf.Tensor | None = None, + position_ids: np.ndarray | tf.Tensor | None = None, + head_mask: np.ndarray | tf.Tensor | None = None, + inputs_embeds: np.ndarray | tf.Tensor | None = None, output_attentions: Optional[bool] = None, output_hidden_states: Optional[bool] = None, return_dict: Optional[bool] = None, - start_positions: Optional[Union[np.ndarray, tf.Tensor]] = None, - end_positions: Optional[Union[np.ndarray, tf.Tensor]] = None, + start_positions: np.ndarray | tf.Tensor | None = None, + end_positions: np.ndarray | tf.Tensor | None = None, training: Optional[bool] = False, ) -> Union[Tuple, TFQuestionAnsweringModelOutput]: r""" @@ -1489,15 +1448,6 @@ def call( attentions=outputs.attentions, ) - # Copied from transformers.models.bert.modeling_tf_bert.TFBertForQuestionAnswering.serving_output - def serving_output(self, output: TFQuestionAnsweringModelOutput) -> TFQuestionAnsweringModelOutput: - hs = tf.convert_to_tensor(output.hidden_states) if self.config.output_hidden_states else None - attns = tf.convert_to_tensor(output.attentions) if self.config.output_attentions else None - - return TFQuestionAnsweringModelOutput( - start_logits=output.start_logits, end_logits=output.end_logits, hidden_states=hs, attentions=attns - ) - @add_start_docstrings( """ @@ -1525,16 +1475,6 @@ def __init__(self, config, *inputs, **kwargs): 1, kernel_initializer=get_initializer(config.initializer_range), name="classifier" ) - @property - def dummy_inputs(self): - """ - Dummy inputs to build the network. - - Returns: - tf.Tensor with dummy inputs - """ - return {"input_ids": tf.constant(MULTIPLE_CHOICE_DUMMY_INPUTS, dtype=tf.int32)} - @unpack_inputs @add_start_docstrings_to_model_forward( MOBILEBERT_INPUTS_DOCSTRING.format("batch_size, num_choices, sequence_length") @@ -1546,16 +1486,16 @@ def dummy_inputs(self): ) def call( self, - input_ids: Optional[TFModelInputType] = None, - attention_mask: Optional[Union[np.ndarray, tf.Tensor]] = None, - token_type_ids: Optional[Union[np.ndarray, tf.Tensor]] = None, - position_ids: Optional[Union[np.ndarray, tf.Tensor]] = None, - head_mask: Optional[Union[np.ndarray, tf.Tensor]] = None, - inputs_embeds: Optional[Union[np.ndarray, tf.Tensor]] = None, + input_ids: TFModelInputType | None = None, + attention_mask: np.ndarray | tf.Tensor | None = None, + token_type_ids: np.ndarray | tf.Tensor | None = None, + position_ids: np.ndarray | tf.Tensor | None = None, + head_mask: np.ndarray | tf.Tensor | None = None, + inputs_embeds: np.ndarray | tf.Tensor | None = None, output_attentions: Optional[bool] = None, output_hidden_states: Optional[bool] = None, return_dict: Optional[bool] = None, - labels: Optional[Union[np.ndarray, tf.Tensor]] = None, + labels: np.ndarray | tf.Tensor | None = None, training: Optional[bool] = False, ) -> Union[Tuple, TFMultipleChoiceModelOutput]: r""" @@ -1609,28 +1549,6 @@ def call( attentions=outputs.attentions, ) - @tf.function( - input_signature=[ - { - "input_ids": tf.TensorSpec((None, None, None), tf.int32, name="input_ids"), - "attention_mask": tf.TensorSpec((None, None, None), tf.int32, name="attention_mask"), - "token_type_ids": tf.TensorSpec((None, None, None), tf.int32, name="token_type_ids"), - } - ] - ) - # Copied from transformers.models.bert.modeling_tf_bert.TFBertForMultipleChoice.serving - def serving(self, inputs: Dict[str, tf.Tensor]): - output = self.call(input_ids=inputs) - - return self.serving_output(output) - - # Copied from transformers.models.bert.modeling_tf_bert.TFBertForMultipleChoice.serving_output - def serving_output(self, output: TFMultipleChoiceModelOutput) -> TFMultipleChoiceModelOutput: - hs = tf.convert_to_tensor(output.hidden_states) if self.config.output_hidden_states else None - attns = tf.convert_to_tensor(output.attentions) if self.config.output_attentions else None - - return TFMultipleChoiceModelOutput(logits=output.logits, hidden_states=hs, attentions=attns) - @add_start_docstrings( """ @@ -1674,16 +1592,16 @@ def __init__(self, config, *inputs, **kwargs): ) def call( self, - input_ids: Optional[TFModelInputType] = None, - attention_mask: Optional[Union[np.ndarray, tf.Tensor]] = None, - token_type_ids: Optional[Union[np.ndarray, tf.Tensor]] = None, - position_ids: Optional[Union[np.ndarray, tf.Tensor]] = None, - head_mask: Optional[Union[np.ndarray, tf.Tensor]] = None, - inputs_embeds: Optional[Union[np.ndarray, tf.Tensor]] = None, + input_ids: TFModelInputType | None = None, + attention_mask: np.ndarray | tf.Tensor | None = None, + token_type_ids: np.ndarray | tf.Tensor | None = None, + position_ids: np.ndarray | tf.Tensor | None = None, + head_mask: np.ndarray | tf.Tensor | None = None, + inputs_embeds: np.ndarray | tf.Tensor | None = None, output_attentions: Optional[bool] = None, output_hidden_states: Optional[bool] = None, return_dict: Optional[bool] = None, - labels: Optional[Union[np.ndarray, tf.Tensor]] = None, + labels: np.ndarray | tf.Tensor | None = None, training: Optional[bool] = False, ) -> Union[Tuple, TFTokenClassifierOutput]: r""" @@ -1719,10 +1637,3 @@ def call( hidden_states=outputs.hidden_states, attentions=outputs.attentions, ) - - # Copied from transformers.models.bert.modeling_tf_bert.TFBertForTokenClassification.serving_output - def serving_output(self, output: TFTokenClassifierOutput) -> TFTokenClassifierOutput: - hs = tf.convert_to_tensor(output.hidden_states) if self.config.output_hidden_states else None - attns = tf.convert_to_tensor(output.attentions) if self.config.output_attentions else None - - return TFTokenClassifierOutput(logits=output.logits, hidden_states=hs, attentions=attns) diff --git a/src/transformers/models/mobilevit/modeling_tf_mobilevit.py b/src/transformers/models/mobilevit/modeling_tf_mobilevit.py index 1b06f36536d6b9..4d48ce72725c1e 100644 --- a/src/transformers/models/mobilevit/modeling_tf_mobilevit.py +++ b/src/transformers/models/mobilevit/modeling_tf_mobilevit.py @@ -16,6 +16,8 @@ # Original license: https://github.com/apple/ml-cvnets/blob/main/LICENSE """ TensorFlow 2.0 MobileViT model.""" +from __future__ import annotations + from typing import Dict, Optional, Tuple, Union import tensorflow as tf @@ -663,7 +665,7 @@ class PreTrainedModel @unpack_inputs def call( self, - pixel_values: Optional[tf.Tensor] = None, + pixel_values: tf.Tensor | None = None, output_hidden_states: Optional[bool] = None, return_dict: Optional[bool] = None, training: bool = False, @@ -733,38 +735,6 @@ class TFMobileViTPreTrainedModel(TFPreTrainedModel): base_model_prefix = "mobilevit" main_input_name = "pixel_values" - @property - def dummy_inputs(self) -> Dict[str, tf.Tensor]: - """ - Dummy inputs to build the network. - - Returns: - `Dict[str, tf.Tensor]`: The dummy inputs. - """ - VISION_DUMMY_INPUTS = tf.random.uniform( - shape=(3, self.config.num_channels, self.config.image_size, self.config.image_size), - dtype=tf.float32, - ) - return {"pixel_values": tf.constant(VISION_DUMMY_INPUTS)} - - @tf.function( - input_signature=[ - { - "pixel_values": tf.TensorSpec((None, None, None, None), tf.float32, name="pixel_values"), - } - ] - ) - def serving(self, inputs): - """ - Method used for serving the model. - - Args: - inputs (`Dict[str, tf.Tensor]`): - The input of the saved model as a dictionary of tensors. - """ - output = self.call(inputs) - return self.serving_output(output) - MOBILEVIT_START_DOCSTRING = r""" This model inherits from [`TFPreTrainedModel`]. Check the superclass documentation for the generic methods the @@ -846,7 +816,7 @@ def __init__(self, config: MobileViTConfig, expand_output: bool = True, *inputs, ) def call( self, - pixel_values: Optional[tf.Tensor] = None, + pixel_values: tf.Tensor | None = None, output_hidden_states: Optional[bool] = None, return_dict: Optional[bool] = None, training: bool = False, @@ -854,14 +824,6 @@ def call( output = self.mobilevit(pixel_values, output_hidden_states, return_dict, training=training) return output - def serving_output(self, output: TFBaseModelOutputWithPooling) -> TFBaseModelOutputWithPooling: - # hidden_states not converted to Tensor with tf.convert_to_tensor as they are all of different dimensions - return TFBaseModelOutputWithPooling( - last_hidden_state=output.last_hidden_state, - pooler_output=output.pooler_output, - hidden_states=output.hidden_states, - ) - @add_start_docstrings( """ @@ -893,9 +855,9 @@ def __init__(self, config: MobileViTConfig, *inputs, **kwargs) -> None: ) def call( self, - pixel_values: Optional[tf.Tensor] = None, + pixel_values: tf.Tensor | None = None, output_hidden_states: Optional[bool] = None, - labels: Optional[tf.Tensor] = None, + labels: tf.Tensor | None = None, return_dict: Optional[bool] = None, training: Optional[bool] = False, ) -> Union[tuple, TFImageClassifierOutputWithNoAttention]: @@ -922,10 +884,6 @@ def call( return TFImageClassifierOutputWithNoAttention(loss=loss, logits=logits, hidden_states=outputs.hidden_states) - def serving_output(self, output: TFImageClassifierOutputWithNoAttention) -> TFImageClassifierOutputWithNoAttention: - # hidden_states and attention not converted to Tensor with tf.convert_to_tensor as they are all of different dimensions - return TFImageClassifierOutputWithNoAttention(logits=output.logits, hidden_states=output.hidden_states) - class TFMobileViTASPPPooling(tf.keras.layers.Layer): def __init__(self, config: MobileViTConfig, out_channels: int, **kwargs) -> None: @@ -1083,8 +1041,8 @@ def masked_loss(real, pred): @replace_return_docstrings(output_type=TFSemanticSegmenterOutputWithNoAttention, config_class=_CONFIG_FOR_DOC) def call( self, - pixel_values: Optional[tf.Tensor] = None, - labels: Optional[tf.Tensor] = None, + pixel_values: tf.Tensor | None = None, + labels: tf.Tensor | None = None, output_hidden_states: Optional[bool] = None, return_dict: Optional[bool] = None, training: bool = False, @@ -1155,8 +1113,3 @@ def call( logits=logits, hidden_states=outputs.hidden_states if output_hidden_states else None, ) - - def serving_output( - self, output: TFSemanticSegmenterOutputWithNoAttention - ) -> TFSemanticSegmenterOutputWithNoAttention: - return TFSemanticSegmenterOutputWithNoAttention(logits=output.logits, hidden_states=output.hidden_states) diff --git a/src/transformers/models/mpnet/modeling_tf_mpnet.py b/src/transformers/models/mpnet/modeling_tf_mpnet.py index 08db3101730854..2982899340d203 100644 --- a/src/transformers/models/mpnet/modeling_tf_mpnet.py +++ b/src/transformers/models/mpnet/modeling_tf_mpnet.py @@ -16,6 +16,8 @@ """ TF 2.0 MPNet model.""" +from __future__ import annotations + import math import warnings from typing import Optional, Tuple, Union @@ -47,7 +49,6 @@ ) from ...tf_utils import check_embeddings_within_bounds, shape_list, stable_softmax from ...utils import ( - MULTIPLE_CHOICE_DUMMY_INPUTS, add_code_sample_docstrings, add_start_docstrings, add_start_docstrings_to_model_forward, @@ -75,19 +76,6 @@ class TFMPNetPreTrainedModel(TFPreTrainedModel): config_class = MPNetConfig base_model_prefix = "mpnet" - @tf.function( - input_signature=[ - { - "input_ids": tf.TensorSpec((None, None), tf.int32, name="input_ids"), - "attention_mask": tf.TensorSpec((None, None), tf.int32, name="attention_mask"), - } - ] - ) - def serving(self, inputs): - output = self.call(inputs) - - return self.serving_output(output) - class TFMPNetEmbeddings(tf.keras.layers.Layer): """Construct the embeddings from word, position embeddings.""" @@ -682,11 +670,11 @@ def __init__(self, config, *inputs, **kwargs): ) def call( self, - input_ids: Optional[TFModelInputType] = None, + input_ids: TFModelInputType | None = None, attention_mask: Optional[Union[np.array, tf.Tensor]] = None, position_ids: Optional[Union[np.array, tf.Tensor]] = None, head_mask: Optional[Union[np.array, tf.Tensor]] = None, - inputs_embeds: Optional[tf.Tensor] = None, + inputs_embeds: tf.Tensor | None = None, output_attentions: Optional[bool] = None, output_hidden_states: Optional[bool] = None, return_dict: Optional[bool] = None, @@ -705,17 +693,6 @@ def call( ) return outputs - def serving_output(self, output: TFBaseModelOutputWithPooling) -> TFBaseModelOutputWithPooling: - hs = tf.convert_to_tensor(output.hidden_states) if self.config.output_hidden_states else None - attns = tf.convert_to_tensor(output.attentions) if self.config.output_attentions else None - - return TFBaseModelOutputWithPooling( - last_hidden_state=output.last_hidden_state, - pooler_output=output.pooler_output, - hidden_states=hs, - attentions=attns, - ) - class TFMPNetLMHead(tf.keras.layers.Layer): """MPNet head for masked and permuted language modeling""" @@ -795,15 +772,15 @@ def get_prefix_bias_name(self): ) def call( self, - input_ids: Optional[TFModelInputType] = None, - attention_mask: Optional[Union[np.ndarray, tf.Tensor]] = None, - position_ids: Optional[Union[np.ndarray, tf.Tensor]] = None, - head_mask: Optional[Union[np.ndarray, tf.Tensor]] = None, - inputs_embeds: Optional[tf.Tensor] = None, + input_ids: TFModelInputType | None = None, + attention_mask: np.ndarray | tf.Tensor | None = None, + position_ids: np.ndarray | tf.Tensor | None = None, + head_mask: np.ndarray | tf.Tensor | None = None, + inputs_embeds: tf.Tensor | None = None, output_attentions: Optional[bool] = None, output_hidden_states: Optional[bool] = None, return_dict: Optional[bool] = None, - labels: Optional[tf.Tensor] = None, + labels: tf.Tensor | None = None, training: bool = False, ) -> Union[TFMaskedLMOutput, Tuple[tf.Tensor]]: r""" @@ -839,13 +816,6 @@ def call( attentions=outputs.attentions, ) - # Copied from transformers.models.bert.modeling_tf_bert.TFBertForMaskedLM.serving_output - def serving_output(self, output: TFMaskedLMOutput) -> TFMaskedLMOutput: - hs = tf.convert_to_tensor(output.hidden_states) if self.config.output_hidden_states else None - attns = tf.convert_to_tensor(output.attentions) if self.config.output_attentions else None - - return TFMaskedLMOutput(logits=output.logits, hidden_states=hs, attentions=attns) - class TFMPNetClassificationHead(tf.keras.layers.Layer): """Head for sentence-level classification tasks.""" @@ -898,15 +868,15 @@ def __init__(self, config, *inputs, **kwargs): ) def call( self, - input_ids: Optional[TFModelInputType] = None, + input_ids: TFModelInputType | None = None, attention_mask: Optional[Union[np.array, tf.Tensor]] = None, position_ids: Optional[Union[np.array, tf.Tensor]] = None, head_mask: Optional[Union[np.array, tf.Tensor]] = None, - inputs_embeds: Optional[tf.Tensor] = None, + inputs_embeds: tf.Tensor | None = None, output_attentions: Optional[bool] = None, output_hidden_states: Optional[bool] = None, return_dict: Optional[bool] = None, - labels: Optional[tf.Tensor] = None, + labels: tf.Tensor | None = None, training: bool = False, ) -> Union[TFSequenceClassifierOutput, Tuple[tf.Tensor]]: r""" @@ -943,13 +913,6 @@ def call( attentions=outputs.attentions, ) - # Copied from transformers.models.bert.modeling_tf_bert.TFBertForSequenceClassification.serving_output - def serving_output(self, output: TFSequenceClassifierOutput) -> TFSequenceClassifierOutput: - hs = tf.convert_to_tensor(output.hidden_states) if self.config.output_hidden_states else None - attns = tf.convert_to_tensor(output.attentions) if self.config.output_attentions else None - - return TFSequenceClassifierOutput(logits=output.logits, hidden_states=hs, attentions=attns) - @add_start_docstrings( """ @@ -968,16 +931,6 @@ def __init__(self, config, *inputs, **kwargs): 1, kernel_initializer=get_initializer(config.initializer_range), name="classifier" ) - @property - def dummy_inputs(self): - """ - Dummy inputs to build the network. - - Returns: - tf.Tensor with dummy inputs - """ - return {"input_ids": tf.constant(MULTIPLE_CHOICE_DUMMY_INPUTS, dtype=tf.int32)} - @unpack_inputs @add_start_docstrings_to_model_forward(MPNET_INPUTS_DOCSTRING.format("batch_size, num_choices, sequence_length")) @add_code_sample_docstrings( @@ -987,15 +940,15 @@ def dummy_inputs(self): ) def call( self, - input_ids: Optional[TFModelInputType] = None, - attention_mask: Optional[Union[np.ndarray, tf.Tensor]] = None, - position_ids: Optional[Union[np.ndarray, tf.Tensor]] = None, - head_mask: Optional[Union[np.ndarray, tf.Tensor]] = None, - inputs_embeds: Optional[tf.Tensor] = None, + input_ids: TFModelInputType | None = None, + attention_mask: np.ndarray | tf.Tensor | None = None, + position_ids: np.ndarray | tf.Tensor | None = None, + head_mask: np.ndarray | tf.Tensor | None = None, + inputs_embeds: tf.Tensor | None = None, output_attentions: Optional[bool] = None, output_hidden_states: Optional[bool] = None, return_dict: Optional[bool] = None, - labels: Optional[tf.Tensor] = None, + labels: tf.Tensor | None = None, training: bool = False, ) -> Union[TFMultipleChoiceModelOutput, Tuple[tf.Tensor]]: r""" @@ -1046,26 +999,6 @@ def call( attentions=outputs.attentions, ) - @tf.function( - input_signature=[ - { - "input_ids": tf.TensorSpec((None, None, None), tf.int32, name="input_ids"), - "attention_mask": tf.TensorSpec((None, None, None), tf.int32, name="attention_mask"), - } - ] - ) - def serving(self, inputs): - output = self.call(inputs) - - return self.serving_output(output) - - # Copied from transformers.models.bert.modeling_tf_bert.TFBertForMultipleChoice.serving_output - def serving_output(self, output: TFMultipleChoiceModelOutput) -> TFMultipleChoiceModelOutput: - hs = tf.convert_to_tensor(output.hidden_states) if self.config.output_hidden_states else None - attns = tf.convert_to_tensor(output.attentions) if self.config.output_attentions else None - - return TFMultipleChoiceModelOutput(logits=output.logits, hidden_states=hs, attentions=attns) - @add_start_docstrings( """ @@ -1096,15 +1029,15 @@ def __init__(self, config, *inputs, **kwargs): ) def call( self, - input_ids: Optional[TFModelInputType] = None, - attention_mask: Optional[Union[np.ndarray, tf.Tensor]] = None, - position_ids: Optional[Union[np.ndarray, tf.Tensor]] = None, - head_mask: Optional[Union[np.ndarray, tf.Tensor]] = None, - inputs_embeds: Optional[tf.Tensor] = None, + input_ids: TFModelInputType | None = None, + attention_mask: np.ndarray | tf.Tensor | None = None, + position_ids: np.ndarray | tf.Tensor | None = None, + head_mask: np.ndarray | tf.Tensor | None = None, + inputs_embeds: tf.Tensor | None = None, output_attentions: Optional[bool] = None, output_hidden_states: Optional[bool] = None, return_dict: Optional[bool] = None, - labels: Optional[tf.Tensor] = None, + labels: tf.Tensor | None = None, training: bool = False, ) -> Union[TFTokenClassifierOutput, Tuple[tf.Tensor]]: r""" @@ -1140,13 +1073,6 @@ def call( attentions=outputs.attentions, ) - # Copied from transformers.models.bert.modeling_tf_bert.TFBertForTokenClassification.serving_output - def serving_output(self, output: TFTokenClassifierOutput) -> TFTokenClassifierOutput: - hs = tf.convert_to_tensor(output.hidden_states) if self.config.output_hidden_states else None - attns = tf.convert_to_tensor(output.attentions) if self.config.output_attentions else None - - return TFTokenClassifierOutput(logits=output.logits, hidden_states=hs, attentions=attns) - @add_start_docstrings( """ @@ -1176,16 +1102,16 @@ def __init__(self, config, *inputs, **kwargs): ) def call( self, - input_ids: Optional[TFModelInputType] = None, + input_ids: TFModelInputType | None = None, attention_mask: Optional[Union[np.array, tf.Tensor]] = None, position_ids: Optional[Union[np.array, tf.Tensor]] = None, head_mask: Optional[Union[np.array, tf.Tensor]] = None, - inputs_embeds: Optional[tf.Tensor] = None, + inputs_embeds: tf.Tensor | None = None, output_attentions: Optional[bool] = None, output_hidden_states: Optional[bool] = None, return_dict: Optional[bool] = None, - start_positions: Optional[tf.Tensor] = None, - end_positions: Optional[tf.Tensor] = None, + start_positions: tf.Tensor | None = None, + end_positions: tf.Tensor | None = None, training: bool = False, **kwargs, ) -> Union[TFQuestionAnsweringModelOutput, Tuple[tf.Tensor]]: @@ -1233,12 +1159,3 @@ def call( hidden_states=outputs.hidden_states, attentions=outputs.attentions, ) - - # Copied from transformers.models.bert.modeling_tf_bert.TFBertForQuestionAnswering.serving_output - def serving_output(self, output: TFQuestionAnsweringModelOutput) -> TFQuestionAnsweringModelOutput: - hs = tf.convert_to_tensor(output.hidden_states) if self.config.output_hidden_states else None - attns = tf.convert_to_tensor(output.attentions) if self.config.output_attentions else None - - return TFQuestionAnsweringModelOutput( - start_logits=output.start_logits, end_logits=output.end_logits, hidden_states=hs, attentions=attns - ) diff --git a/src/transformers/models/mpnet/tokenization_mpnet.py b/src/transformers/models/mpnet/tokenization_mpnet.py index 1f5ad2f41aae41..57a06beeead077 100644 --- a/src/transformers/models/mpnet/tokenization_mpnet.py +++ b/src/transformers/models/mpnet/tokenization_mpnet.py @@ -119,7 +119,7 @@ class MPNetTokenizer(PreTrainedTokenizer): This should likely be deactivated for Japanese (see this [issue](https://github.com/huggingface/transformers/issues/328)). - strip_accents: (`bool`, *optional*): + strip_accents (`bool`, *optional*): Whether or not to strip all accents. If this option is not specified, then it will be determined by the value for `lowercase` (as in the original BERT). """ diff --git a/src/transformers/models/mpnet/tokenization_mpnet_fast.py b/src/transformers/models/mpnet/tokenization_mpnet_fast.py index 288c69c62b3cd5..82d8ffec08d910 100644 --- a/src/transformers/models/mpnet/tokenization_mpnet_fast.py +++ b/src/transformers/models/mpnet/tokenization_mpnet_fast.py @@ -98,7 +98,7 @@ class MPNetTokenizerFast(PreTrainedTokenizerFast): tokenize_chinese_chars (`bool`, *optional*, defaults to `True`): Whether or not to tokenize Chinese characters. This should likely be deactivated for Japanese (see [this issue](https://github.com/huggingface/transformers/issues/328)). - strip_accents: (`bool`, *optional*): + strip_accents (`bool`, *optional*): Whether or not to strip all accents. If this option is not specified, then it will be determined by the value for `lowercase` (as in the original BERT). """ diff --git a/src/transformers/models/nllb_moe/modeling_nllb_moe.py b/src/transformers/models/nllb_moe/modeling_nllb_moe.py index 567b03c1c4bbe0..b6ea574469ee80 100644 --- a/src/transformers/models/nllb_moe/modeling_nllb_moe.py +++ b/src/transformers/models/nllb_moe/modeling_nllb_moe.py @@ -856,7 +856,7 @@ class NllbMoePreTrainedModel(PreTrainedModel): config_class = NllbMoeConfig base_model_prefix = "model" supports_gradient_checkpointing = True - _no_split_modules = ["NllbMoeAttention"] + _no_split_modules = ["NllbMoeEncoderLayer", "NllbMoeDecoderLayer"] def _init_weights(self, module): """Initialize the weights""" diff --git a/src/transformers/models/open_llama/configuration_open_llama.py b/src/transformers/models/open_llama/configuration_open_llama.py index c202082b553631..cbde4d67d498a7 100644 --- a/src/transformers/models/open_llama/configuration_open_llama.py +++ b/src/transformers/models/open_llama/configuration_open_llama.py @@ -99,7 +99,7 @@ def __init__( bos_token_id=1, eos_token_id=2, tie_word_embeddings=False, - use_memorry_efficient_attention=True, + use_memory_efficient_attention=True, hidden_dropout_prob=0.1, attention_dropout_prob=0.1, use_stable_embedding=True, @@ -116,7 +116,9 @@ def __init__( self.initializer_range = initializer_range self.rms_norm_eps = rms_norm_eps self.use_cache = use_cache - self.use_memorry_efficient_attention = use_memorry_efficient_attention + self.use_memory_efficient_attention = kwargs.pop( + "use_memorry_efficient_attention", use_memory_efficient_attention + ) self.hidden_dropout_prob = hidden_dropout_prob self.attention_dropout_prob = attention_dropout_prob self.use_stable_embedding = use_stable_embedding diff --git a/src/transformers/models/open_llama/modeling_open_llama.py b/src/transformers/models/open_llama/modeling_open_llama.py index a88ba62056d454..9a49f238068253 100644 --- a/src/transformers/models/open_llama/modeling_open_llama.py +++ b/src/transformers/models/open_llama/modeling_open_llama.py @@ -40,7 +40,7 @@ except ImportError: xops = None logger.warn( - "Xformers is not installed correctly. If you want to use memorry_efficient_attention to accelerate training use the following command to install Xformers\npip install xformers." + "Xformers is not installed correctly. If you want to use memory_efficient_attention to accelerate training use the following command to install Xformers\npip install xformers." ) @@ -91,14 +91,11 @@ def __init__(self, hidden_size, eps=1e-6): self.variance_epsilon = eps def forward(self, hidden_states): + input_dtype = hidden_states.dtype variance = hidden_states.to(torch.float32).pow(2).mean(-1, keepdim=True) hidden_states = hidden_states * torch.rsqrt(variance + self.variance_epsilon) - # convert into half-precision if necessary - if self.weight.dtype in [torch.float16, torch.bfloat16]: - hidden_states = hidden_states.to(self.weight.dtype) - - return self.weight * hidden_states + return (self.weight * hidden_states).to(input_dtype) # Copied from transformers.models.llama.modeling_llama.LlamaRotaryEmbedding with Llama->OpenLlama @@ -226,7 +223,7 @@ def forward( past_key_value = (key_states, value_states) if use_cache else None - if self.config.use_memorry_efficient_attention and xops is not None and self.training: + if self.config.use_memory_efficient_attention and xops is not None and self.training: attn_weights = None query_states = query_states.transpose(1, 2) key_states = key_states.transpose(1, 2) @@ -249,7 +246,9 @@ def forward( f"Attention mask should be of size {(bsz, 1, q_len, kv_seq_len)}, but is {attention_mask.size()}" ) attn_weights = attn_weights + attention_mask - attn_weights = torch.max(attn_weights, torch.tensor(torch.finfo(attn_weights.dtype).min)) + attn_weights = torch.max( + attn_weights, torch.tensor(torch.finfo(attn_weights.dtype).min, device=attn_weights.device) + ) # upcast attention to fp32 attn_weights = nn.functional.softmax(attn_weights, dim=-1, dtype=torch.float32).to(query_states.dtype) @@ -564,7 +563,7 @@ def forward( if self.embed_layer_norm: inputs_embeds = self.embed_layer_norm(inputs_embeds) # embed positions - if self.config.use_memorry_efficient_attention and self.training: + if self.config.use_memory_efficient_attention and self.training: attention_mask = None elif attention_mask is None: attention_mask = torch.ones( diff --git a/src/transformers/models/openai/modeling_tf_openai.py b/src/transformers/models/openai/modeling_tf_openai.py index 7c04520c9c1fe2..70b7f6c05efb3d 100644 --- a/src/transformers/models/openai/modeling_tf_openai.py +++ b/src/transformers/models/openai/modeling_tf_openai.py @@ -15,6 +15,8 @@ # limitations under the License. """ TF 2.0 OpenAI GPT model.""" +from __future__ import annotations + from dataclasses import dataclass from typing import Optional, Tuple, Union @@ -237,12 +239,12 @@ def _prune_heads(self, heads_to_prune): @unpack_inputs def call( self, - input_ids: Optional[TFModelInputType] = None, - attention_mask: Optional[Union[np.ndarray, tf.Tensor]] = None, - token_type_ids: Optional[Union[np.ndarray, tf.Tensor]] = None, - position_ids: Optional[Union[np.ndarray, tf.Tensor]] = None, - head_mask: Optional[Union[np.ndarray, tf.Tensor]] = None, - inputs_embeds: Optional[Union[np.ndarray, tf.Tensor]] = None, + input_ids: TFModelInputType | None = None, + attention_mask: np.ndarray | tf.Tensor | None = None, + token_type_ids: np.ndarray | tf.Tensor | None = None, + position_ids: np.ndarray | tf.Tensor | None = None, + head_mask: np.ndarray | tf.Tensor | None = None, + inputs_embeds: np.ndarray | tf.Tensor | None = None, output_attentions: Optional[bool] = None, output_hidden_states: Optional[bool] = None, return_dict: Optional[bool] = None, @@ -355,19 +357,6 @@ class TFOpenAIGPTPreTrainedModel(TFPreTrainedModel): config_class = OpenAIGPTConfig base_model_prefix = "transformer" - @tf.function( - input_signature=[ - { - "input_ids": tf.TensorSpec((None, None), tf.int32, name="input_ids"), - "attention_mask": tf.TensorSpec((None, None), tf.int32, name="attention_mask"), - } - ] - ) - def serving(self, inputs): - output = self.call(inputs) - - return self.serving_output(output) - @dataclass class TFOpenAIGPTDoubleHeadsModelOutput(ModelOutput): @@ -394,8 +383,8 @@ class TFOpenAIGPTDoubleHeadsModelOutput(ModelOutput): logits: tf.Tensor = None mc_logits: tf.Tensor = None - hidden_states: Optional[Tuple[tf.Tensor]] = None - attentions: Optional[Tuple[tf.Tensor]] = None + hidden_states: Tuple[tf.Tensor] | None = None + attentions: Tuple[tf.Tensor] | None = None OPENAI_GPT_START_DOCSTRING = r""" @@ -514,12 +503,12 @@ def __init__(self, config, *inputs, **kwargs): ) def call( self, - input_ids: Optional[TFModelInputType] = None, - attention_mask: Optional[Union[np.ndarray, tf.Tensor]] = None, - token_type_ids: Optional[Union[np.ndarray, tf.Tensor]] = None, - position_ids: Optional[Union[np.ndarray, tf.Tensor]] = None, - head_mask: Optional[Union[np.ndarray, tf.Tensor]] = None, - inputs_embeds: Optional[Union[np.ndarray, tf.Tensor]] = None, + input_ids: TFModelInputType | None = None, + attention_mask: np.ndarray | tf.Tensor | None = None, + token_type_ids: np.ndarray | tf.Tensor | None = None, + position_ids: np.ndarray | tf.Tensor | None = None, + head_mask: np.ndarray | tf.Tensor | None = None, + inputs_embeds: np.ndarray | tf.Tensor | None = None, output_attentions: Optional[bool] = None, output_hidden_states: Optional[bool] = None, return_dict: Optional[bool] = None, @@ -539,13 +528,6 @@ def call( ) return outputs - # Copied from transformers.models.distilbert.modeling_tf_distilbert.TFDistilBertModel.serving_output - def serving_output(self, output): - hs = tf.convert_to_tensor(output.hidden_states) if self.config.output_hidden_states else None - attns = tf.convert_to_tensor(output.attentions) if self.config.output_attentions else None - - return TFBaseModelOutput(last_hidden_state=output.last_hidden_state, hidden_states=hs, attentions=attns) - @add_start_docstrings( """ @@ -576,16 +558,16 @@ def set_output_embeddings(self, value): ) def call( self, - input_ids: Optional[TFModelInputType] = None, - attention_mask: Optional[Union[np.ndarray, tf.Tensor]] = None, - token_type_ids: Optional[Union[np.ndarray, tf.Tensor]] = None, - position_ids: Optional[Union[np.ndarray, tf.Tensor]] = None, - head_mask: Optional[Union[np.ndarray, tf.Tensor]] = None, - inputs_embeds: Optional[Union[np.ndarray, tf.Tensor]] = None, + input_ids: TFModelInputType | None = None, + attention_mask: np.ndarray | tf.Tensor | None = None, + token_type_ids: np.ndarray | tf.Tensor | None = None, + position_ids: np.ndarray | tf.Tensor | None = None, + head_mask: np.ndarray | tf.Tensor | None = None, + inputs_embeds: np.ndarray | tf.Tensor | None = None, output_attentions: Optional[bool] = None, output_hidden_states: Optional[bool] = None, return_dict: Optional[bool] = None, - labels: Optional[Union[np.ndarray, tf.Tensor]] = None, + labels: np.ndarray | tf.Tensor | None = None, training: Optional[bool] = False, ) -> Union[Tuple, TFCausalLMOutput]: r""" @@ -628,12 +610,6 @@ def call( attentions=transformer_outputs.attentions, ) - def serving_output(self, output: TFCausalLMOutput) -> TFCausalLMOutput: - hs = tf.convert_to_tensor(output.hidden_states) if self.config.output_hidden_states else None - attns = tf.convert_to_tensor(output.attentions) if self.config.output_attentions else None - - return TFCausalLMOutput(logits=output.logits, hidden_states=hs, attentions=attns) - def prepare_inputs_for_generation(self, inputs, **kwargs): return {"input_ids": inputs} @@ -661,13 +637,13 @@ def __init__(self, config, *inputs, **kwargs): @replace_return_docstrings(output_type=TFOpenAIGPTDoubleHeadsModelOutput, config_class=_CONFIG_FOR_DOC) def call( self, - input_ids: Optional[TFModelInputType] = None, - attention_mask: Optional[Union[np.ndarray, tf.Tensor]] = None, - token_type_ids: Optional[Union[np.ndarray, tf.Tensor]] = None, - position_ids: Optional[Union[np.ndarray, tf.Tensor]] = None, - head_mask: Optional[Union[np.ndarray, tf.Tensor]] = None, - inputs_embeds: Optional[Union[np.ndarray, tf.Tensor]] = None, - mc_token_ids: Optional[Union[np.ndarray, tf.Tensor]] = None, + input_ids: TFModelInputType | None = None, + attention_mask: np.ndarray | tf.Tensor | None = None, + token_type_ids: np.ndarray | tf.Tensor | None = None, + position_ids: np.ndarray | tf.Tensor | None = None, + head_mask: np.ndarray | tf.Tensor | None = None, + inputs_embeds: np.ndarray | tf.Tensor | None = None, + mc_token_ids: np.ndarray | tf.Tensor | None = None, output_attentions: Optional[bool] = None, output_hidden_states: Optional[bool] = None, return_dict: Optional[bool] = None, @@ -750,27 +726,13 @@ def call( attentions=transformer_outputs.attentions, ) - @tf.function( - input_signature=[ - { - "input_ids": tf.TensorSpec((None, None, None), tf.int32, name="input_ids"), - "attention_mask": tf.TensorSpec((None, None, None), tf.int32, name="attention_mask"), - "mc_token_ids": tf.TensorSpec((None, None), tf.int32, name="token_type_ids"), - } - ] - ) - def serving(self, inputs): - output = self.call(inputs) - - return self.serving_output(output) - - def serving_output(self, output): - hs = tf.convert_to_tensor(output.hidden_states) if self.config.output_hidden_states else None - attns = tf.convert_to_tensor(output.attentions) if self.config.output_attentions else None - - return TFOpenAIGPTDoubleHeadsModelOutput( - logits=output.logits, mc_logits=output.mc_logits, hidden_states=hs, attentions=attns - ) + @property + def input_signature(self): + return { + "input_ids": tf.TensorSpec((None, None, None), tf.int32, name="input_ids"), + "attention_mask": tf.TensorSpec((None, None, None), tf.int32, name="attention_mask"), + "mc_token_ids": tf.TensorSpec((None, None), tf.int32, name="token_type_ids"), + } @add_start_docstrings( @@ -809,16 +771,16 @@ def __init__(self, config, *inputs, **kwargs): ) def call( self, - input_ids: Optional[TFModelInputType] = None, - attention_mask: Optional[Union[np.ndarray, tf.Tensor]] = None, - token_type_ids: Optional[Union[np.ndarray, tf.Tensor]] = None, - position_ids: Optional[Union[np.ndarray, tf.Tensor]] = None, - head_mask: Optional[Union[np.ndarray, tf.Tensor]] = None, - inputs_embeds: Optional[Union[np.ndarray, tf.Tensor]] = None, + input_ids: TFModelInputType | None = None, + attention_mask: np.ndarray | tf.Tensor | None = None, + token_type_ids: np.ndarray | tf.Tensor | None = None, + position_ids: np.ndarray | tf.Tensor | None = None, + head_mask: np.ndarray | tf.Tensor | None = None, + inputs_embeds: np.ndarray | tf.Tensor | None = None, output_attentions: Optional[bool] = None, output_hidden_states: Optional[bool] = None, return_dict: Optional[bool] = None, - labels: Optional[Union[np.ndarray, tf.Tensor]] = None, + labels: np.ndarray | tf.Tensor | None = None, training: Optional[bool] = False, ) -> Union[Tuple, TFSequenceClassifierOutput]: r""" @@ -892,10 +854,3 @@ def call( hidden_states=transformer_outputs.hidden_states, attentions=transformer_outputs.attentions, ) - - # Copied from transformers.models.bert.modeling_tf_bert.TFBertForSequenceClassification.serving_output - def serving_output(self, output: TFSequenceClassifierOutput) -> TFSequenceClassifierOutput: - hs = tf.convert_to_tensor(output.hidden_states) if self.config.output_hidden_states else None - attns = tf.convert_to_tensor(output.attentions) if self.config.output_attentions else None - - return TFSequenceClassifierOutput(logits=output.logits, hidden_states=hs, attentions=attns) diff --git a/src/transformers/models/opt/configuration_opt.py b/src/transformers/models/opt/configuration_opt.py index df13b32019984e..d2b7a4347ea4e3 100644 --- a/src/transformers/models/opt/configuration_opt.py +++ b/src/transformers/models/opt/configuration_opt.py @@ -67,7 +67,7 @@ class OPTConfig(PretrainedConfig): The dropout probability for all fully connected layers in the embeddings, encoder, and pooler. attention_dropout (`float`, *optional*, defaults to 0.0): The dropout ratio for the attention probabilities. - layerdrop: (`float`, *optional*, defaults to 0.0): + layerdrop (`float`, *optional*, defaults to 0.0): The LayerDrop probability. See the [LayerDrop paper](see https://arxiv.org/abs/1909.11556) for more details. init_std (`float`, *optional*, defaults to 0.02): diff --git a/src/transformers/models/opt/modeling_opt.py b/src/transformers/models/opt/modeling_opt.py index b6c84777cc1f69..15fc3b033a228e 100644 --- a/src/transformers/models/opt/modeling_opt.py +++ b/src/transformers/models/opt/modeling_opt.py @@ -222,7 +222,9 @@ def forward( f"Attention mask should be of size {(bsz, 1, tgt_len, src_len)}, but is {attention_mask.size()}" ) attn_weights = attn_weights.view(bsz, self.num_heads, tgt_len, src_len) + attention_mask - attn_weights = torch.max(attn_weights, torch.tensor(torch.finfo(attn_weights.dtype).min)) + attn_weights = torch.max( + attn_weights, torch.tensor(torch.finfo(attn_weights.dtype).min, device=attn_weights.device) + ) attn_weights = attn_weights.view(bsz * self.num_heads, tgt_len, src_len) # upcast to fp32 if the weights are in fp16. Please see https://github.com/huggingface/transformers/pull/17437 diff --git a/src/transformers/models/opt/modeling_tf_opt.py b/src/transformers/models/opt/modeling_tf_opt.py index 1855fcb1bc034a..5f7dd22369b87d 100644 --- a/src/transformers/models/opt/modeling_tf_opt.py +++ b/src/transformers/models/opt/modeling_tf_opt.py @@ -15,6 +15,8 @@ """ TF 2.0 OPT model.""" +from __future__ import annotations + from typing import Optional, Tuple, Union import numpy as np @@ -25,7 +27,6 @@ # Public API from ...modeling_tf_utils import ( - DUMMY_INPUTS, TFCausalLanguageModelingLoss, TFModelInputType, TFPreTrainedModel, @@ -152,12 +153,12 @@ def _shape(self, tensor: tf.Tensor, seq_len: int, bsz: int): def call( self, hidden_states: tf.Tensor, - key_value_states: Optional[tf.Tensor] = None, - past_key_value: Optional[Tuple[Tuple[tf.Tensor]]] = None, - attention_mask: Optional[tf.Tensor] = None, - layer_head_mask: Optional[tf.Tensor] = None, + key_value_states: tf.Tensor | None = None, + past_key_value: Tuple[Tuple[tf.Tensor]] | None = None, + attention_mask: tf.Tensor | None = None, + layer_head_mask: tf.Tensor | None = None, training: Optional[bool] = False, - ) -> Tuple[tf.Tensor, Optional[tf.Tensor]]: + ) -> Tuple[tf.Tensor, tf.Tensor | None]: """Input shape: Batch x Time x Channel""" # if key_value_states are provided this layer is used as a cross-attention layer @@ -291,8 +292,8 @@ def __init__(self, config: OPTConfig, **kwargs): def call( self, hidden_states: tf.Tensor, - attention_mask: Optional[Union[np.ndarray, tf.Tensor]] = None, - layer_head_mask: Optional[tf.Tensor] = None, + attention_mask: np.ndarray | tf.Tensor | None = None, + layer_head_mask: tf.Tensor | None = None, past_key_value: Optional[Tuple[Tuple[Union[np.ndarray, tf.Tensor]]]] = None, training: Optional[bool] = False, output_attentions: Optional[bool] = False, @@ -411,29 +412,6 @@ class TFOPTPreTrainedModel(TFPreTrainedModel): config_class = OPTConfig base_model_prefix = "model" - @property - def dummy_inputs(self): - pad_token = 1 - input_ids = tf.convert_to_tensor(DUMMY_INPUTS, dtype=tf.int32) - dummy_inputs = { - "attention_mask": tf.cast(input_ids != pad_token, tf.int32), - "input_ids": input_ids, - } - return dummy_inputs - - @tf.function( - input_signature=[ - { - "input_ids": tf.TensorSpec((None, None), tf.int32, name="input_ids"), - "attention_mask": tf.TensorSpec((None, None), tf.int32, name="attention_mask"), - } - ] - ) - def serving(self, inputs): - output = self.call(inputs) - - return self.serving_output(output) - OPT_INPUTS_DOCSTRING = r""" Args: @@ -552,10 +530,10 @@ def _prepare_decoder_attention_mask(self, attention_mask, input_shape, past_key_ @unpack_inputs def call( self, - input_ids: Optional[TFModelInputType] = None, - inputs_embeds: Optional[Union[np.ndarray, tf.Tensor]] = None, - attention_mask: Optional[Union[np.ndarray, tf.Tensor]] = None, - head_mask: Optional[Union[np.ndarray, tf.Tensor]] = None, + input_ids: TFModelInputType | None = None, + inputs_embeds: np.ndarray | tf.Tensor | None = None, + attention_mask: np.ndarray | tf.Tensor | None = None, + head_mask: np.ndarray | tf.Tensor | None = None, past_key_values: Optional[Tuple[Tuple[Union[np.ndarray, tf.Tensor]]]] = None, use_cache: Optional[bool] = None, output_attentions: Optional[bool] = None, @@ -732,11 +710,11 @@ def set_input_embeddings(self, new_embeddings): @unpack_inputs def call( self, - input_ids: Optional[TFModelInputType] = None, - attention_mask: Optional[Union[np.ndarray, tf.Tensor]] = None, - head_mask: Optional[Union[np.ndarray, tf.Tensor]] = None, + input_ids: TFModelInputType | None = None, + attention_mask: np.ndarray | tf.Tensor | None = None, + head_mask: np.ndarray | tf.Tensor | None = None, past_key_values: Optional[Tuple[Tuple[Union[np.ndarray, tf.Tensor]]]] = None, - inputs_embeds: Optional[Union[np.ndarray, tf.Tensor]] = None, + inputs_embeds: np.ndarray | tf.Tensor | None = None, use_cache: Optional[bool] = None, output_attentions: Optional[bool] = None, output_hidden_states: Optional[bool] = None, @@ -804,11 +782,11 @@ def set_input_embeddings(self, new_embeddings): ) def call( self, - input_ids: Optional[TFModelInputType] = None, - attention_mask: Optional[Union[np.ndarray, tf.Tensor]] = None, - head_mask: Optional[Union[np.ndarray, tf.Tensor]] = None, + input_ids: TFModelInputType | None = None, + attention_mask: np.ndarray | tf.Tensor | None = None, + head_mask: np.ndarray | tf.Tensor | None = None, past_key_values: Optional[Tuple[Tuple[Union[np.ndarray, tf.Tensor]]]] = None, - inputs_embeds: Optional[Union[np.ndarray, tf.Tensor]] = None, + inputs_embeds: np.ndarray | tf.Tensor | None = None, use_cache: Optional[bool] = None, output_attentions: Optional[bool] = None, output_hidden_states: Optional[bool] = None, @@ -901,13 +879,13 @@ def prepare_inputs_for_generation(self, inputs, past_key_values=None, use_cache= ) def call( self, - input_ids: Optional[TFModelInputType] = None, + input_ids: TFModelInputType | None = None, past_key_values: Optional[Tuple[Tuple[Union[np.ndarray, tf.Tensor]]]] = None, - attention_mask: Optional[Union[np.ndarray, tf.Tensor]] = None, - position_ids: Optional[Union[np.ndarray, tf.Tensor]] = None, - head_mask: Optional[Union[np.ndarray, tf.Tensor]] = None, - inputs_embeds: Optional[Union[np.ndarray, tf.Tensor]] = None, - labels: Optional[Union[np.ndarray, tf.Tensor]] = None, + attention_mask: np.ndarray | tf.Tensor | None = None, + position_ids: np.ndarray | tf.Tensor | None = None, + head_mask: np.ndarray | tf.Tensor | None = None, + inputs_embeds: np.ndarray | tf.Tensor | None = None, + labels: np.ndarray | tf.Tensor | None = None, use_cache: Optional[bool] = None, output_attentions: Optional[bool] = None, output_hidden_states: Optional[bool] = None, diff --git a/src/transformers/models/pegasus/modeling_tf_pegasus.py b/src/transformers/models/pegasus/modeling_tf_pegasus.py index 1ccccc2dc5cec0..15c87b938bfafe 100644 --- a/src/transformers/models/pegasus/modeling_tf_pegasus.py +++ b/src/transformers/models/pegasus/modeling_tf_pegasus.py @@ -15,6 +15,8 @@ """ TF 2.0 Pegasus model.""" +from __future__ import annotations + import random from typing import Optional, Tuple, Union @@ -31,7 +33,6 @@ # Public API from ...modeling_tf_utils import ( - DUMMY_INPUTS, TFCausalLanguageModelingLoss, TFModelInputType, TFPreTrainedModel, @@ -167,7 +168,7 @@ def _init_weight(n_pos: int, dim: int): return table def call( - self, input_shape: tf.TensorShape, past_key_values_length: int = 0, position_ids: Optional[tf.Tensor] = None + self, input_shape: tf.TensorShape, past_key_values_length: int = 0, position_ids: tf.Tensor | None = None ): """Input is expected to be of size [bsz x seqlen].""" if position_ids is None: @@ -214,12 +215,12 @@ def _shape(self, tensor: tf.Tensor, seq_len: int, bsz: int): def call( self, hidden_states: tf.Tensor, - key_value_states: Optional[tf.Tensor] = None, - past_key_value: Optional[Tuple[Tuple[tf.Tensor]]] = None, - attention_mask: Optional[tf.Tensor] = None, - layer_head_mask: Optional[tf.Tensor] = None, + key_value_states: tf.Tensor | None = None, + past_key_value: Tuple[Tuple[tf.Tensor]] | None = None, + attention_mask: tf.Tensor | None = None, + layer_head_mask: tf.Tensor | None = None, training: Optional[bool] = False, - ) -> Tuple[tf.Tensor, Optional[tf.Tensor]]: + ) -> Tuple[tf.Tensor, tf.Tensor | None]: """Input shape: Batch x Time x Channel""" # if key_value_states are provided this layer is used as a cross-attention layer @@ -419,12 +420,12 @@ def __init__(self, config: PegasusConfig, **kwargs): def call( self, hidden_states: tf.Tensor, - attention_mask: Optional[tf.Tensor] = None, - encoder_hidden_states: Optional[tf.Tensor] = None, - encoder_attention_mask: Optional[tf.Tensor] = None, - layer_head_mask: Optional[tf.Tensor] = None, - cross_attn_layer_head_mask: Optional[tf.Tensor] = None, - past_key_value: Optional[Tuple[tf.Tensor]] = None, + attention_mask: tf.Tensor | None = None, + encoder_hidden_states: tf.Tensor | None = None, + encoder_attention_mask: tf.Tensor | None = None, + layer_head_mask: tf.Tensor | None = None, + cross_attn_layer_head_mask: tf.Tensor | None = None, + past_key_value: Tuple[tf.Tensor] | None = None, training: Optional[bool] = False, ) -> Tuple[tf.Tensor, tf.Tensor, Tuple[Tuple[tf.Tensor]]]: """ @@ -501,34 +502,6 @@ class TFPegasusPreTrainedModel(TFPreTrainedModel): config_class = PegasusConfig base_model_prefix = "model" - @property - def dummy_inputs(self): - pad_token = 1 - input_ids = tf.convert_to_tensor(DUMMY_INPUTS, dtype=tf.int32) - decoder_input_ids = tf.convert_to_tensor(DUMMY_INPUTS, dtype=tf.int32) - dummy_inputs = { - "decoder_input_ids": decoder_input_ids, - "attention_mask": tf.cast(input_ids != pad_token, tf.int32), - "input_ids": input_ids, - } - return dummy_inputs - - @tf.function( - input_signature=[ - { - "input_ids": tf.TensorSpec((None, None), tf.int32, name="input_ids"), - "attention_mask": tf.TensorSpec((None, None), tf.int32, name="attention_mask"), - "decoder_input_ids": tf.TensorSpec((None, None), tf.int32, name="decoder_input_ids"), - "decoder_attention_mask": tf.TensorSpec((None, None), tf.int32, name="decoder_attention_mask"), - } - ] - ) - # Copied from transformers.models.bart.modeling_tf_bart.TFBartPretrainedModel.serving - def serving(self, inputs): - output = self.call(inputs) - - return self.serving_output(output) - PEGASUS_START_DOCSTRING = r""" This model inherits from [`TFPreTrainedModel`]. Check the superclass documentation for the generic methods the @@ -712,10 +685,10 @@ def set_embed_tokens(self, embed_tokens): @unpack_inputs def call( self, - input_ids: Optional[tf.Tensor] = None, - inputs_embeds: Optional[tf.Tensor] = None, - attention_mask: Optional[tf.Tensor] = None, - head_mask: Optional[tf.Tensor] = None, + input_ids: tf.Tensor | None = None, + inputs_embeds: tf.Tensor | None = None, + attention_mask: tf.Tensor | None = None, + head_mask: tf.Tensor | None = None, output_attentions: Optional[bool] = None, output_hidden_states: Optional[bool] = None, return_dict: Optional[bool] = None, @@ -877,14 +850,14 @@ def set_embed_tokens(self, embed_tokens): @unpack_inputs def call( self, - input_ids: Optional[tf.Tensor] = None, - inputs_embeds: Optional[tf.Tensor] = None, - attention_mask: Optional[tf.Tensor] = None, - position_ids: Optional[tf.Tensor] = None, - encoder_hidden_states: Optional[tf.Tensor] = None, - encoder_attention_mask: Optional[tf.Tensor] = None, - head_mask: Optional[tf.Tensor] = None, - cross_attn_head_mask: Optional[tf.Tensor] = None, + input_ids: tf.Tensor | None = None, + inputs_embeds: tf.Tensor | None = None, + attention_mask: tf.Tensor | None = None, + position_ids: tf.Tensor | None = None, + encoder_hidden_states: tf.Tensor | None = None, + encoder_attention_mask: tf.Tensor | None = None, + head_mask: tf.Tensor | None = None, + cross_attn_head_mask: tf.Tensor | None = None, past_key_values: Tuple[Tuple[tf.Tensor]] = None, use_cache: Optional[bool] = None, output_attentions: Optional[bool] = None, @@ -1106,18 +1079,18 @@ def set_input_embeddings(self, new_embeddings): @unpack_inputs def call( self, - input_ids: Optional[tf.Tensor] = None, - attention_mask: Optional[tf.Tensor] = None, - decoder_input_ids: Optional[tf.Tensor] = None, - decoder_attention_mask: Optional[tf.Tensor] = None, - decoder_position_ids: Optional[tf.Tensor] = None, - head_mask: Optional[tf.Tensor] = None, - decoder_head_mask: Optional[tf.Tensor] = None, - cross_attn_head_mask: Optional[tf.Tensor] = None, + input_ids: tf.Tensor | None = None, + attention_mask: tf.Tensor | None = None, + decoder_input_ids: tf.Tensor | None = None, + decoder_attention_mask: tf.Tensor | None = None, + decoder_position_ids: tf.Tensor | None = None, + head_mask: tf.Tensor | None = None, + decoder_head_mask: tf.Tensor | None = None, + cross_attn_head_mask: tf.Tensor | None = None, encoder_outputs: Optional[Union[Tuple, TFBaseModelOutput]] = None, past_key_values: Tuple[Tuple[tf.Tensor]] = None, - inputs_embeds: Optional[tf.Tensor] = None, - decoder_inputs_embeds: Optional[tf.Tensor] = None, + inputs_embeds: tf.Tensor | None = None, + decoder_inputs_embeds: tf.Tensor | None = None, use_cache: Optional[bool] = None, output_attentions: Optional[bool] = None, output_hidden_states: Optional[bool] = None, @@ -1211,18 +1184,18 @@ def get_decoder(self): ) def call( self, - input_ids: Optional[TFModelInputType] = None, - attention_mask: Optional[Union[np.ndarray, tf.Tensor]] = None, - decoder_input_ids: Optional[Union[np.ndarray, tf.Tensor]] = None, - decoder_attention_mask: Optional[Union[np.ndarray, tf.Tensor]] = None, - decoder_position_ids: Optional[Union[np.ndarray, tf.Tensor]] = None, - head_mask: Optional[Union[np.ndarray, tf.Tensor]] = None, - decoder_head_mask: Optional[Union[np.ndarray, tf.Tensor]] = None, - cross_attn_head_mask: Optional[Union[np.ndarray, tf.Tensor]] = None, + input_ids: TFModelInputType | None = None, + attention_mask: np.ndarray | tf.Tensor | None = None, + decoder_input_ids: np.ndarray | tf.Tensor | None = None, + decoder_attention_mask: np.ndarray | tf.Tensor | None = None, + decoder_position_ids: np.ndarray | tf.Tensor | None = None, + head_mask: np.ndarray | tf.Tensor | None = None, + decoder_head_mask: np.ndarray | tf.Tensor | None = None, + cross_attn_head_mask: np.ndarray | tf.Tensor | None = None, encoder_outputs: Optional[Union[Tuple, TFBaseModelOutput]] = None, past_key_values: Optional[Tuple[Tuple[Union[np.ndarray, tf.Tensor]]]] = None, - inputs_embeds: Optional[Union[np.ndarray, tf.Tensor]] = None, - decoder_inputs_embeds: Optional[Union[np.ndarray, tf.Tensor]] = None, + inputs_embeds: np.ndarray | tf.Tensor | None = None, + decoder_inputs_embeds: np.ndarray | tf.Tensor | None = None, use_cache: Optional[bool] = None, output_attentions: Optional[bool] = None, output_hidden_states: Optional[bool] = None, @@ -1339,23 +1312,23 @@ def set_bias(self, value): @add_end_docstrings(PEGASUS_GENERATION_EXAMPLE) def call( self, - input_ids: Optional[TFModelInputType] = None, - attention_mask: Optional[Union[np.ndarray, tf.Tensor]] = None, - decoder_input_ids: Optional[Union[np.ndarray, tf.Tensor]] = None, - decoder_attention_mask: Optional[Union[np.ndarray, tf.Tensor]] = None, - decoder_position_ids: Optional[Union[np.ndarray, tf.Tensor]] = None, - head_mask: Optional[Union[np.ndarray, tf.Tensor]] = None, - decoder_head_mask: Optional[Union[np.ndarray, tf.Tensor]] = None, - cross_attn_head_mask: Optional[Union[np.ndarray, tf.Tensor]] = None, + input_ids: TFModelInputType | None = None, + attention_mask: np.ndarray | tf.Tensor | None = None, + decoder_input_ids: np.ndarray | tf.Tensor | None = None, + decoder_attention_mask: np.ndarray | tf.Tensor | None = None, + decoder_position_ids: np.ndarray | tf.Tensor | None = None, + head_mask: np.ndarray | tf.Tensor | None = None, + decoder_head_mask: np.ndarray | tf.Tensor | None = None, + cross_attn_head_mask: np.ndarray | tf.Tensor | None = None, encoder_outputs: Optional[TFBaseModelOutput] = None, past_key_values: Optional[Tuple[Tuple[Union[np.ndarray, tf.Tensor]]]] = None, - inputs_embeds: Optional[Union[np.ndarray, tf.Tensor]] = None, - decoder_inputs_embeds: Optional[Union[np.ndarray, tf.Tensor]] = None, + inputs_embeds: np.ndarray | tf.Tensor | None = None, + decoder_inputs_embeds: np.ndarray | tf.Tensor | None = None, use_cache: Optional[bool] = None, output_attentions: Optional[bool] = None, output_hidden_states: Optional[bool] = None, return_dict: Optional[bool] = None, - labels: Optional[Union[np.ndarray, tf.Tensor]] = None, + labels: np.ndarray | tf.Tensor | None = None, training: bool = False, ) -> Union[TFSeq2SeqLMOutput, Tuple[tf.Tensor]]: """ diff --git a/src/transformers/models/pegasus_x/configuration_pegasus_x.py b/src/transformers/models/pegasus_x/configuration_pegasus_x.py index c393a6b8a91044..f48e19bdcbca7c 100644 --- a/src/transformers/models/pegasus_x/configuration_pegasus_x.py +++ b/src/transformers/models/pegasus_x/configuration_pegasus_x.py @@ -70,10 +70,10 @@ class PegasusXConfig(PretrainedConfig): just in case (e.g., 512 or 1024 or 2048). init_std (`float`, *optional*, defaults to 0.02): The standard deviation of the truncated_normal_initializer for initializing all weight matrices. - encoder_layerdrop: (`float`, *optional*, defaults to 0.0): + encoder_layerdrop (`float`, *optional*, defaults to 0.0): The LayerDrop probability for the encoder. See the [LayerDrop paper](see https://arxiv.org/abs/1909.11556) for more details. - decoder_layerdrop: (`float`, *optional*, defaults to 0.0): + decoder_layerdrop (`float`, *optional*, defaults to 0.0): The LayerDrop probability for the decoder. See the [LayerDrop paper](see https://arxiv.org/abs/1909.11556) for more details. use_cache (`bool`, *optional*, defaults to `True`): diff --git a/src/transformers/models/rag/modeling_rag.py b/src/transformers/models/rag/modeling_rag.py index 6941bca09c7474..019b26ef08e948 100644 --- a/src/transformers/models/rag/modeling_rag.py +++ b/src/transformers/models/rag/modeling_rag.py @@ -1430,7 +1430,7 @@ def generate( priority: 1) from the `generation_config.json` model file, if it exists; 2) from the model configuration. Please note that unspecified parameters will inherit [`~generation.GenerationConfig`]'s default values, whose documentation should be checked to parameterize generation. - prefix_allowed_tokens_fn: (`Callable[[int, torch.Tensor], List[int]]`, *optional*): + prefix_allowed_tokens_fn (`Callable[[int, torch.Tensor], List[int]]`, *optional*): If provided, this function constraints the beam search to allowed tokens only at each step. If not provided no constraint is applied. This function takes 2 arguments `inputs_ids` and the batch ID `batch_id`. It has to return a list with the allowed tokens for the next generation step conditioned on diff --git a/src/transformers/models/rag/modeling_tf_rag.py b/src/transformers/models/rag/modeling_tf_rag.py index 0ea2e554489b6f..d91fa71df8a622 100644 --- a/src/transformers/models/rag/modeling_tf_rag.py +++ b/src/transformers/models/rag/modeling_tf_rag.py @@ -15,6 +15,9 @@ """TFRAG model implementation.""" + +from __future__ import annotations + import copy from dataclasses import dataclass from typing import List, Optional, Tuple, Union @@ -111,22 +114,22 @@ class TFRetrievAugLMMarginOutput(ModelOutput): average in the self-attention heads. """ - loss: Optional[tf.Tensor] = None + loss: tf.Tensor | None = None logits: tf.Tensor = None - past_key_values: Optional[List[tf.Tensor]] = None - doc_scores: Optional[tf.Tensor] = None - retrieved_doc_embeds: Optional[tf.Tensor] = None - retrieved_doc_ids: Optional[tf.Tensor] = None - context_input_ids: Optional[tf.Tensor] = None - context_attention_mask: Optional[tf.Tensor] = None - question_encoder_last_hidden_state: Optional[tf.Tensor] = None - question_enc_hidden_states: Optional[Tuple[tf.Tensor]] = None - question_enc_attentions: Optional[Tuple[tf.Tensor]] = None - generator_enc_last_hidden_state: Optional[tf.Tensor] = None - generator_enc_hidden_states: Optional[Tuple[tf.Tensor]] = None - generator_enc_attentions: Optional[Tuple[tf.Tensor]] = None - generator_dec_hidden_states: Optional[Tuple[tf.Tensor]] = None - generator_dec_attentions: Optional[Tuple[tf.Tensor]] = None + past_key_values: List[tf.Tensor] | None = None + doc_scores: tf.Tensor | None = None + retrieved_doc_embeds: tf.Tensor | None = None + retrieved_doc_ids: tf.Tensor | None = None + context_input_ids: tf.Tensor | None = None + context_attention_mask: tf.Tensor | None = None + question_encoder_last_hidden_state: tf.Tensor | None = None + question_enc_hidden_states: Tuple[tf.Tensor] | None = None + question_enc_attentions: Tuple[tf.Tensor] | None = None + generator_enc_last_hidden_state: tf.Tensor | None = None + generator_enc_hidden_states: Tuple[tf.Tensor] | None = None + generator_enc_attentions: Tuple[tf.Tensor] | None = None + generator_dec_hidden_states: Tuple[tf.Tensor] | None = None + generator_dec_attentions: Tuple[tf.Tensor] | None = None @dataclass @@ -196,20 +199,20 @@ class TFRetrievAugLMOutput(ModelOutput): """ logits: tf.Tensor = None - past_key_values: Optional[List[tf.Tensor]] = None - doc_scores: Optional[tf.Tensor] = None - retrieved_doc_embeds: Optional[tf.Tensor] = None - retrieved_doc_ids: Optional[tf.Tensor] = None - context_input_ids: Optional[tf.Tensor] = None - context_attention_mask: Optional[tf.Tensor] = None - question_encoder_last_hidden_state: Optional[tf.Tensor] = None - question_enc_hidden_states: Optional[Tuple[tf.Tensor]] = None - question_enc_attentions: Optional[Tuple[tf.Tensor]] = None - generator_enc_last_hidden_state: Optional[tf.Tensor] = None - generator_enc_hidden_states: Optional[Tuple[tf.Tensor]] = None - generator_enc_attentions: Optional[Tuple[tf.Tensor]] = None - generator_dec_hidden_states: Optional[Tuple[tf.Tensor]] = None - generator_dec_attentions: Optional[Tuple[tf.Tensor]] = None + past_key_values: List[tf.Tensor] | None = None + doc_scores: tf.Tensor | None = None + retrieved_doc_embeds: tf.Tensor | None = None + retrieved_doc_ids: tf.Tensor | None = None + context_input_ids: tf.Tensor | None = None + context_attention_mask: tf.Tensor | None = None + question_encoder_last_hidden_state: tf.Tensor | None = None + question_enc_hidden_states: Tuple[tf.Tensor] | None = None + question_enc_attentions: Tuple[tf.Tensor] | None = None + generator_enc_last_hidden_state: tf.Tensor | None = None + generator_enc_hidden_states: Tuple[tf.Tensor] | None = None + generator_enc_attentions: Tuple[tf.Tensor] | None = None + generator_dec_hidden_states: Tuple[tf.Tensor] | None = None + generator_dec_attentions: Tuple[tf.Tensor] | None = None class TFRagPreTrainedModel(TFPreTrainedModel): @@ -545,15 +548,15 @@ def set_retriever(self, retriever: RagRetriever): @replace_return_docstrings(output_type=TFRetrievAugLMOutput, config_class=_CONFIG_FOR_DOC) def call( self, - input_ids: Optional[TFModelInputType] = None, - attention_mask: Optional[Union[np.ndarray, tf.Tensor]] = None, - encoder_outputs: Optional[Union[np.ndarray, tf.Tensor]] = None, - decoder_input_ids: Optional[Union[np.ndarray, tf.Tensor]] = None, - decoder_attention_mask: Optional[Union[np.ndarray, tf.Tensor]] = None, + input_ids: TFModelInputType | None = None, + attention_mask: np.ndarray | tf.Tensor | None = None, + encoder_outputs: np.ndarray | tf.Tensor | None = None, + decoder_input_ids: np.ndarray | tf.Tensor | None = None, + decoder_attention_mask: np.ndarray | tf.Tensor | None = None, past_key_values: Optional[Tuple[Tuple[Union[np.ndarray, tf.Tensor]]]] = None, - doc_scores: Optional[Union[np.ndarray, tf.Tensor]] = None, - context_input_ids: Optional[Union[np.ndarray, tf.Tensor]] = None, - context_attention_mask: Optional[Union[np.ndarray, tf.Tensor]] = None, + doc_scores: np.ndarray | tf.Tensor | None = None, + context_input_ids: np.ndarray | tf.Tensor | None = None, + context_attention_mask: np.ndarray | tf.Tensor | None = None, use_cache: Optional[bool] = None, output_attentions: Optional[bool] = None, output_hidden_states: Optional[bool] = None, @@ -841,22 +844,22 @@ def marginalize(self, seq_logits, doc_scores, n_docs=None): @replace_return_docstrings(output_type=TFRetrievAugLMMarginOutput, config_class=_CONFIG_FOR_DOC) def call( self, - input_ids: Optional[TFModelInputType] = None, - attention_mask: Optional[Union[np.ndarray, tf.Tensor]] = None, - decoder_input_ids: Optional[Union[np.ndarray, tf.Tensor]] = None, - decoder_attention_mask: Optional[Union[np.ndarray, tf.Tensor]] = None, - encoder_outputs: Optional[Union[np.ndarray, tf.Tensor]] = None, + input_ids: TFModelInputType | None = None, + attention_mask: np.ndarray | tf.Tensor | None = None, + decoder_input_ids: np.ndarray | tf.Tensor | None = None, + decoder_attention_mask: np.ndarray | tf.Tensor | None = None, + encoder_outputs: np.ndarray | tf.Tensor | None = None, past_key_values: Optional[Tuple[Tuple[Union[np.ndarray, tf.Tensor]]]] = None, - doc_scores: Optional[Union[np.ndarray, tf.Tensor]] = None, - context_input_ids: Optional[Union[np.ndarray, tf.Tensor]] = None, - context_attention_mask: Optional[Union[np.ndarray, tf.Tensor]] = None, + doc_scores: np.ndarray | tf.Tensor | None = None, + context_input_ids: np.ndarray | tf.Tensor | None = None, + context_attention_mask: np.ndarray | tf.Tensor | None = None, use_cache: Optional[bool] = None, output_attentions: Optional[bool] = None, output_hidden_states: Optional[bool] = None, output_retrieved: Optional[bool] = None, n_docs: Optional[int] = None, do_marginalize: Optional[bool] = None, - labels: Optional[Union[np.ndarray, tf.Tensor]] = None, + labels: np.ndarray | tf.Tensor | None = None, reduce_loss: Optional[bool] = None, return_dict: Optional[bool] = None, training: bool = False, @@ -993,8 +996,8 @@ def call( def generate( self, - input_ids: Optional[TFModelInputType] = None, - attention_mask: Optional[tf.Tensor] = None, + input_ids: TFModelInputType | None = None, + attention_mask: tf.Tensor | None = None, context_input_ids=None, context_attention_mask=None, doc_scores=None, @@ -1347,22 +1350,22 @@ def question_encoder(self): @replace_return_docstrings(output_type=TFRetrievAugLMMarginOutput, config_class=_CONFIG_FOR_DOC) def call( self, - input_ids: Optional[TFModelInputType] = None, - attention_mask: Optional[Union[np.ndarray, tf.Tensor]] = None, - decoder_input_ids: Optional[Union[np.ndarray, tf.Tensor]] = None, - decoder_attention_mask: Optional[Union[np.ndarray, tf.Tensor]] = None, - encoder_outputs: Optional[Union[np.ndarray, tf.Tensor]] = None, + input_ids: TFModelInputType | None = None, + attention_mask: np.ndarray | tf.Tensor | None = None, + decoder_input_ids: np.ndarray | tf.Tensor | None = None, + decoder_attention_mask: np.ndarray | tf.Tensor | None = None, + encoder_outputs: np.ndarray | tf.Tensor | None = None, past_key_values: Optional[Tuple[Tuple[Union[np.ndarray, tf.Tensor]]]] = None, - doc_scores: Optional[Union[np.ndarray, tf.Tensor]] = None, - context_input_ids: Optional[Union[np.ndarray, tf.Tensor]] = None, - context_attention_mask: Optional[Union[np.ndarray, tf.Tensor]] = None, + doc_scores: np.ndarray | tf.Tensor | None = None, + context_input_ids: np.ndarray | tf.Tensor | None = None, + context_attention_mask: np.ndarray | tf.Tensor | None = None, use_cache: Optional[bool] = None, output_attentions: Optional[bool] = None, output_hidden_states: Optional[bool] = None, output_retrieved: Optional[bool] = None, n_docs: Optional[int] = None, exclude_bos_score: Optional[bool] = None, - labels: Optional[Union[np.ndarray, tf.Tensor]] = None, + labels: np.ndarray | tf.Tensor | None = None, reduce_loss: Optional[bool] = None, return_dict: Optional[bool] = None, training: bool = False, @@ -1579,8 +1582,8 @@ def gather2d(target, id_tensor): def generate( self, - input_ids: Optional[TFModelInputType] = None, - attention_mask: Optional[tf.Tensor] = None, + input_ids: TFModelInputType | None = None, + attention_mask: tf.Tensor | None = None, context_input_ids=None, context_attention_mask=None, doc_scores=None, diff --git a/src/transformers/models/rag/retrieval_rag.py b/src/transformers/models/rag/retrieval_rag.py index 261255b9f62f59..88cb54115bf548 100644 --- a/src/transformers/models/rag/retrieval_rag.py +++ b/src/transformers/models/rag/retrieval_rag.py @@ -573,10 +573,10 @@ def __call__( Retrieves documents for specified `question_hidden_states`. Args: - question_input_ids: (`List[List[int]]`) batch of input ids + question_input_ids (`List[List[int]]`) batch of input ids question_hidden_states (`np.ndarray` of shape `(batch_size, vector_size)`: A batch of query vectors to retrieve with. - prefix: (`str`, *optional*): + prefix (`str`, *optional*): The prefix used by the generator's tokenizer. n_docs (`int`, *optional*): The number of docs retrieved per query. diff --git a/src/transformers/models/realm/modeling_realm.py b/src/transformers/models/realm/modeling_realm.py index 261b2b4a0b8c38..339cc27e9289c8 100644 --- a/src/transformers/models/realm/modeling_realm.py +++ b/src/transformers/models/realm/modeling_realm.py @@ -726,7 +726,7 @@ class RealmReaderOutput(ModelOutput): The index of the retrieved span candidates in which the predicted answer is most likely. start_pos (`torch.IntTensor` of shape `()`): Predicted answer starting position in *RealmReader*'s inputs. - end_pos: (`torch.IntTensor` of shape `()`): + end_pos (`torch.IntTensor` of shape `()`): Predicted answer ending position in *RealmReader*'s inputs. hidden_states (`tuple(torch.FloatTensor)`, *optional*, returned when `output_hidden_states=True` is passed or when `config.output_hidden_states=True`): Tuple of `torch.FloatTensor` (one for the output of the embeddings + one for the output of each layer) of diff --git a/src/transformers/models/regnet/modeling_tf_regnet.py b/src/transformers/models/regnet/modeling_tf_regnet.py index 2c3a1ac42e5063..254d49a9f1efd5 100644 --- a/src/transformers/models/regnet/modeling_tf_regnet.py +++ b/src/transformers/models/regnet/modeling_tf_regnet.py @@ -14,7 +14,7 @@ # limitations under the License. """ TensorFlow RegNet model.""" -from typing import Dict, Optional, Tuple, Union +from typing import Optional, Tuple, Union import tensorflow as tf @@ -345,33 +345,8 @@ class TFRegNetPreTrainedModel(TFPreTrainedModel): main_input_name = "pixel_values" @property - def dummy_inputs(self) -> Dict[str, tf.Tensor]: - """ - Dummy inputs to build the network. - - Returns: - `Dict[str, tf.Tensor]`: The dummy inputs. - """ - VISION_DUMMY_INPUTS = tf.random.uniform(shape=(3, self.config.num_channels, 224, 224), dtype=tf.float32) - return {"pixel_values": tf.constant(VISION_DUMMY_INPUTS)} - - @tf.function( - input_signature=[ - { - "pixel_values": tf.TensorSpec((None, None, None, None), tf.float32, name="pixel_values"), - } - ] - ) - def serving(self, inputs): - """ - Method used for serving the model. - - Args: - inputs (`Dict[str, tf.Tensor]`): - The input of the saved model as a dictionary of tensors. - """ - output = self.call(inputs) - return self.serving_output(output) + def input_signature(self): + return {"pixel_values": tf.TensorSpec(shape=(None, self.config.num_channels, 224, 224), dtype=tf.float32)} REGNET_START_DOCSTRING = r""" @@ -443,16 +418,6 @@ def call( hidden_states=outputs.hidden_states, ) - def serving_output( - self, output: TFBaseModelOutputWithPoolingAndNoAttention - ) -> TFBaseModelOutputWithPoolingAndNoAttention: - # hidden_states not converted to Tensor with tf.convert_to_tensor as they are all of different dimensions - return TFBaseModelOutputWithPoolingAndNoAttention( - last_hidden_state=output.last_hidden_state, - pooler_output=output.pooler_output, - hidden_states=output.hidden_states, - ) - @add_start_docstrings( """ @@ -514,7 +479,3 @@ def call( return ((loss,) + output) if loss is not None else output return TFSequenceClassifierOutput(loss=loss, logits=logits, hidden_states=outputs.hidden_states) - - def serving_output(self, output: TFSequenceClassifierOutput) -> TFSequenceClassifierOutput: - # hidden_states not converted to Tensor with tf.convert_to_tensor as they are all of different dimensions - return TFSequenceClassifierOutput(logits=output.logits, hidden_states=output.hidden_states) diff --git a/src/transformers/models/rembert/modeling_tf_rembert.py b/src/transformers/models/rembert/modeling_tf_rembert.py index c4dc8c5a148b0c..1595fd8118debd 100644 --- a/src/transformers/models/rembert/modeling_tf_rembert.py +++ b/src/transformers/models/rembert/modeling_tf_rembert.py @@ -15,6 +15,8 @@ """ TF 2.0 RemBERT model.""" +from __future__ import annotations + import math from typing import Dict, Optional, Tuple, Union @@ -47,8 +49,6 @@ ) from ...tf_utils import check_embeddings_within_bounds, shape_list, stable_softmax from ...utils import ( - DUMMY_INPUTS, - MULTIPLE_CHOICE_DUMMY_INPUTS, add_code_sample_docstrings, add_start_docstrings, add_start_docstrings_to_model_forward, @@ -382,9 +382,9 @@ def call( hidden_states: tf.Tensor, attention_mask: tf.Tensor, head_mask: tf.Tensor, - encoder_hidden_states: Optional[tf.Tensor], - encoder_attention_mask: Optional[tf.Tensor], - past_key_value: Optional[Tuple[tf.Tensor]], + encoder_hidden_states: tf.Tensor | None, + encoder_attention_mask: tf.Tensor | None, + past_key_value: Tuple[tf.Tensor] | None, output_attentions: bool, training: bool = False, ) -> Tuple[tf.Tensor]: @@ -645,14 +645,14 @@ class PreTrainedModel # Copied from transformers.models.bert.modeling_tf_bert.TFBertMainLayer.call def call( self, - input_ids: Optional[TFModelInputType] = None, - attention_mask: Optional[Union[np.ndarray, tf.Tensor]] = None, - token_type_ids: Optional[Union[np.ndarray, tf.Tensor]] = None, - position_ids: Optional[Union[np.ndarray, tf.Tensor]] = None, - head_mask: Optional[Union[np.ndarray, tf.Tensor]] = None, - inputs_embeds: Optional[Union[np.ndarray, tf.Tensor]] = None, - encoder_hidden_states: Optional[Union[np.ndarray, tf.Tensor]] = None, - encoder_attention_mask: Optional[Union[np.ndarray, tf.Tensor]] = None, + input_ids: TFModelInputType | None = None, + attention_mask: np.ndarray | tf.Tensor | None = None, + token_type_ids: np.ndarray | tf.Tensor | None = None, + position_ids: np.ndarray | tf.Tensor | None = None, + head_mask: np.ndarray | tf.Tensor | None = None, + inputs_embeds: np.ndarray | tf.Tensor | None = None, + encoder_hidden_states: np.ndarray | tf.Tensor | None = None, + encoder_attention_mask: np.ndarray | tf.Tensor | None = None, past_key_values: Optional[Tuple[Tuple[Union[np.ndarray, tf.Tensor]]]] = None, use_cache: Optional[bool] = None, output_attentions: Optional[bool] = None, @@ -810,24 +810,6 @@ class TFRemBertPreTrainedModel(TFPreTrainedModel): config_class = RemBertConfig base_model_prefix = "rembert" - @property - def dummy_inputs(self): - """ - Dummy inputs to build the network. - - Returns: - `Dict[str, tf.Tensor]`: The dummy inputs. - """ - dummy = {"input_ids": tf.constant(DUMMY_INPUTS, dtype=tf.int32)} - # Add `encoder_hidden_states` to make the cross-attention layers' weights initialized - if self.config.add_cross_attention: - batch_size, seq_len = tf.constant(DUMMY_INPUTS).shape - shape = (batch_size, seq_len) + (self.config.hidden_size,) - h = tf.random.uniform(shape=shape) - dummy["encoder_hidden_states"] = h - - return dummy - REMBERT_START_DOCSTRING = r""" @@ -946,14 +928,14 @@ def __init__(self, config: RemBertConfig, *inputs, **kwargs): ) def call( self, - input_ids: Optional[TFModelInputType] = None, - attention_mask: Optional[Union[np.ndarray, tf.Tensor]] = None, - token_type_ids: Optional[Union[np.ndarray, tf.Tensor]] = None, - position_ids: Optional[Union[np.ndarray, tf.Tensor]] = None, - head_mask: Optional[Union[np.ndarray, tf.Tensor]] = None, - inputs_embeds: Optional[Union[np.ndarray, tf.Tensor]] = None, - encoder_hidden_states: Optional[Union[np.ndarray, tf.Tensor]] = None, - encoder_attention_mask: Optional[Union[np.ndarray, tf.Tensor]] = None, + input_ids: TFModelInputType | None = None, + attention_mask: np.ndarray | tf.Tensor | None = None, + token_type_ids: np.ndarray | tf.Tensor | None = None, + position_ids: np.ndarray | tf.Tensor | None = None, + head_mask: np.ndarray | tf.Tensor | None = None, + inputs_embeds: np.ndarray | tf.Tensor | None = None, + encoder_hidden_states: np.ndarray | tf.Tensor | None = None, + encoder_attention_mask: np.ndarray | tf.Tensor | None = None, past_key_values: Optional[Tuple[Tuple[Union[np.ndarray, tf.Tensor]]]] = None, use_cache: Optional[bool] = None, output_attentions: Optional[bool] = None, @@ -1000,27 +982,6 @@ def call( return outputs - # Copied from transformers.models.bert.modeling_tf_bert.TFBertModel.serving_output - def serving_output( - self, output: TFBaseModelOutputWithPoolingAndCrossAttentions - ) -> TFBaseModelOutputWithPoolingAndCrossAttentions: - output_cache = self.config.use_cache and self.config.is_decoder - pkv = tf.convert_to_tensor(output.past_key_values) if output_cache else None - hs = tf.convert_to_tensor(output.hidden_states) if self.config.output_hidden_states else None - attns = tf.convert_to_tensor(output.attentions) if self.config.output_attentions else None - cross_attns = tf.convert_to_tensor(output.cross_attentions) if output.cross_attentions is not None else None - if not (self.config.output_attentions and self.config.add_cross_attention): - cross_attns = None - - return TFBaseModelOutputWithPoolingAndCrossAttentions( - last_hidden_state=output.last_hidden_state, - pooler_output=output.pooler_output, - past_key_values=pkv, - hidden_states=hs, - attentions=attns, - cross_attentions=cross_attns, - ) - @add_start_docstrings("""RemBERT Model with a `language modeling` head on top.""", REMBERT_START_DOCSTRING) class TFRemBertForMaskedLM(TFRemBertPreTrainedModel, TFMaskedLanguageModelingLoss): @@ -1048,16 +1009,16 @@ def get_lm_head(self) -> tf.keras.layers.Layer: ) def call( self, - input_ids: Optional[TFModelInputType] = None, - attention_mask: Optional[Union[np.ndarray, tf.Tensor]] = None, - token_type_ids: Optional[Union[np.ndarray, tf.Tensor]] = None, - position_ids: Optional[Union[np.ndarray, tf.Tensor]] = None, - head_mask: Optional[Union[np.ndarray, tf.Tensor]] = None, - inputs_embeds: Optional[Union[np.ndarray, tf.Tensor]] = None, + input_ids: TFModelInputType | None = None, + attention_mask: np.ndarray | tf.Tensor | None = None, + token_type_ids: np.ndarray | tf.Tensor | None = None, + position_ids: np.ndarray | tf.Tensor | None = None, + head_mask: np.ndarray | tf.Tensor | None = None, + inputs_embeds: np.ndarray | tf.Tensor | None = None, output_attentions: Optional[bool] = None, output_hidden_states: Optional[bool] = None, return_dict: Optional[bool] = None, - labels: Optional[Union[np.ndarray, tf.Tensor]] = None, + labels: np.ndarray | tf.Tensor | None = None, training: Optional[bool] = False, ) -> Union[TFMaskedLMOutput, Tuple[tf.Tensor]]: r""" @@ -1093,12 +1054,6 @@ def call( attentions=outputs.attentions, ) - def serving_output(self, output: TFMaskedLMOutput) -> TFMaskedLMOutput: - hs = tf.convert_to_tensor(output.hidden_states) if self.config.output_hidden_states else None - attns = tf.convert_to_tensor(output.attentions) if self.config.output_attentions else None - - return TFMaskedLMOutput(logits=output.logits, hidden_states=hs, attentions=attns) - @add_start_docstrings( """RemBERT Model with a `language modeling` head on top for CLM fine-tuning.""", REMBERT_START_DOCSTRING @@ -1137,20 +1092,20 @@ def prepare_inputs_for_generation(self, input_ids, past_key_values=None, attenti ) def call( self, - input_ids: Optional[TFModelInputType] = None, - attention_mask: Optional[Union[np.ndarray, tf.Tensor]] = None, - token_type_ids: Optional[Union[np.ndarray, tf.Tensor]] = None, - position_ids: Optional[Union[np.ndarray, tf.Tensor]] = None, - head_mask: Optional[Union[np.ndarray, tf.Tensor]] = None, - inputs_embeds: Optional[Union[np.ndarray, tf.Tensor]] = None, - encoder_hidden_states: Optional[Union[np.ndarray, tf.Tensor]] = None, - encoder_attention_mask: Optional[Union[np.ndarray, tf.Tensor]] = None, + input_ids: TFModelInputType | None = None, + attention_mask: np.ndarray | tf.Tensor | None = None, + token_type_ids: np.ndarray | tf.Tensor | None = None, + position_ids: np.ndarray | tf.Tensor | None = None, + head_mask: np.ndarray | tf.Tensor | None = None, + inputs_embeds: np.ndarray | tf.Tensor | None = None, + encoder_hidden_states: np.ndarray | tf.Tensor | None = None, + encoder_attention_mask: np.ndarray | tf.Tensor | None = None, past_key_values: Optional[Tuple[Tuple[Union[np.ndarray, tf.Tensor]]]] = None, use_cache: Optional[bool] = None, output_attentions: Optional[bool] = None, output_hidden_states: Optional[bool] = None, return_dict: Optional[bool] = None, - labels: Optional[Union[np.ndarray, tf.Tensor]] = None, + labels: np.ndarray | tf.Tensor | None = None, training: Optional[bool] = False, ) -> Union[TFCausalLMOutputWithCrossAttentions, Tuple[tf.Tensor]]: r""" @@ -1215,20 +1170,6 @@ def call( cross_attentions=outputs.cross_attentions, ) - # Copied from transformers.models.bert.modeling_tf_bert.TFBertLMHeadModel.serving_output - def serving_output(self, output: TFCausalLMOutputWithCrossAttentions) -> TFCausalLMOutputWithCrossAttentions: - output_cache = self.config.use_cache and self.config.is_decoder - pkv = tf.convert_to_tensor(output.past_key_values) if output_cache else None - hs = tf.convert_to_tensor(output.hidden_states) if self.config.output_hidden_states else None - attns = tf.convert_to_tensor(output.attentions) if self.config.output_attentions else None - cross_attns = tf.convert_to_tensor(output.cross_attentions) if output.cross_attentions is not None else None - if not (self.config.output_attentions and self.config.add_cross_attention): - cross_attns = None - - return TFCausalLMOutputWithCrossAttentions( - logits=output.logits, past_key_values=pkv, hidden_states=hs, attentions=attns, cross_attentions=cross_attns - ) - @add_start_docstrings( """ @@ -1259,16 +1200,16 @@ def __init__(self, config: RemBertConfig, *inputs, **kwargs): ) def call( self, - input_ids: Optional[TFModelInputType] = None, - attention_mask: Optional[Union[np.ndarray, tf.Tensor]] = None, - token_type_ids: Optional[Union[np.ndarray, tf.Tensor]] = None, - position_ids: Optional[Union[np.ndarray, tf.Tensor]] = None, - head_mask: Optional[Union[np.ndarray, tf.Tensor]] = None, - inputs_embeds: Optional[Union[np.ndarray, tf.Tensor]] = None, + input_ids: TFModelInputType | None = None, + attention_mask: np.ndarray | tf.Tensor | None = None, + token_type_ids: np.ndarray | tf.Tensor | None = None, + position_ids: np.ndarray | tf.Tensor | None = None, + head_mask: np.ndarray | tf.Tensor | None = None, + inputs_embeds: np.ndarray | tf.Tensor | None = None, output_attentions: Optional[bool] = None, output_hidden_states: Optional[bool] = None, return_dict: Optional[bool] = None, - labels: Optional[Union[np.ndarray, tf.Tensor]] = None, + labels: np.ndarray | tf.Tensor | None = None, training: Optional[bool] = False, ) -> Union[TFSequenceClassifierOutput, Tuple[tf.Tensor]]: r""" @@ -1305,12 +1246,6 @@ def call( attentions=outputs.attentions, ) - def serving_output(self, output: TFSequenceClassifierOutput) -> TFSequenceClassifierOutput: - hs = tf.convert_to_tensor(output.hidden_states) if self.config.output_hidden_states else None - attns = tf.convert_to_tensor(output.attentions) if self.config.output_attentions else None - - return TFSequenceClassifierOutput(logits=output.logits, hidden_states=hs, attentions=attns) - @add_start_docstrings( """ @@ -1329,16 +1264,6 @@ def __init__(self, config: RemBertConfig, *inputs, **kwargs): units=1, kernel_initializer=get_initializer(config.initializer_range), name="classifier" ) - @property - def dummy_inputs(self) -> Dict[str, tf.Tensor]: - """ - Dummy inputs to build the network. - - Returns: - tf.Tensor with dummy inputs - """ - return {"input_ids": tf.constant(MULTIPLE_CHOICE_DUMMY_INPUTS, dtype=tf.int32)} - @unpack_inputs @add_start_docstrings_to_model_forward(REMBERT_INPUTS_DOCSTRING.format("batch_size, num_choices, sequence_length")) @add_code_sample_docstrings( @@ -1348,16 +1273,16 @@ def dummy_inputs(self) -> Dict[str, tf.Tensor]: ) def call( self, - input_ids: Optional[TFModelInputType] = None, - attention_mask: Optional[Union[np.ndarray, tf.Tensor]] = None, - token_type_ids: Optional[Union[np.ndarray, tf.Tensor]] = None, - position_ids: Optional[Union[np.ndarray, tf.Tensor]] = None, - head_mask: Optional[Union[np.ndarray, tf.Tensor]] = None, - inputs_embeds: Optional[Union[np.ndarray, tf.Tensor]] = None, + input_ids: TFModelInputType | None = None, + attention_mask: np.ndarray | tf.Tensor | None = None, + token_type_ids: np.ndarray | tf.Tensor | None = None, + position_ids: np.ndarray | tf.Tensor | None = None, + head_mask: np.ndarray | tf.Tensor | None = None, + inputs_embeds: np.ndarray | tf.Tensor | None = None, output_attentions: Optional[bool] = None, output_hidden_states: Optional[bool] = None, return_dict: Optional[bool] = None, - labels: Optional[Union[np.ndarray, tf.Tensor]] = None, + labels: np.ndarray | tf.Tensor | None = None, training: Optional[bool] = False, ) -> Union[TFMultipleChoiceModelOutput, Tuple[tf.Tensor]]: r""" @@ -1417,26 +1342,6 @@ def call( attentions=outputs.attentions, ) - @tf.function( - input_signature=[ - { - "input_ids": tf.TensorSpec((None, None, None), tf.int32, name="input_ids"), - "attention_mask": tf.TensorSpec((None, None, None), tf.int32, name="attention_mask"), - "token_type_ids": tf.TensorSpec((None, None, None), tf.int32, name="token_type_ids"), - } - ] - ) - def serving(self, inputs: Dict[str, tf.Tensor]) -> TFMultipleChoiceModelOutput: - output = self.call(input_ids=inputs) - - return self.serving_output(output) - - def serving_output(self, output: TFMultipleChoiceModelOutput) -> TFMultipleChoiceModelOutput: - hs = tf.convert_to_tensor(output.hidden_states) if self.config.output_hidden_states else None - attns = tf.convert_to_tensor(output.attentions) if self.config.output_attentions else None - - return TFMultipleChoiceModelOutput(logits=output.logits, hidden_states=hs, attentions=attns) - @add_start_docstrings( """ @@ -1466,16 +1371,16 @@ def __init__(self, config: RemBertConfig, *inputs, **kwargs): ) def call( self, - input_ids: Optional[TFModelInputType] = None, - attention_mask: Optional[Union[np.ndarray, tf.Tensor]] = None, - token_type_ids: Optional[Union[np.ndarray, tf.Tensor]] = None, - position_ids: Optional[Union[np.ndarray, tf.Tensor]] = None, - head_mask: Optional[Union[np.ndarray, tf.Tensor]] = None, - inputs_embeds: Optional[Union[np.ndarray, tf.Tensor]] = None, + input_ids: TFModelInputType | None = None, + attention_mask: np.ndarray | tf.Tensor | None = None, + token_type_ids: np.ndarray | tf.Tensor | None = None, + position_ids: np.ndarray | tf.Tensor | None = None, + head_mask: np.ndarray | tf.Tensor | None = None, + inputs_embeds: np.ndarray | tf.Tensor | None = None, output_attentions: Optional[bool] = None, output_hidden_states: Optional[bool] = None, return_dict: Optional[bool] = None, - labels: Optional[Union[np.ndarray, tf.Tensor]] = None, + labels: np.ndarray | tf.Tensor | None = None, training: Optional[bool] = False, ) -> Union[TFTokenClassifierOutput, Tuple[tf.Tensor]]: r""" @@ -1510,12 +1415,6 @@ def call( attentions=outputs.attentions, ) - def serving_output(self, output: TFTokenClassifierOutput) -> TFTokenClassifierOutput: - hs = tf.convert_to_tensor(output.hidden_states) if self.config.output_hidden_states else None - attns = tf.convert_to_tensor(output.attentions) if self.config.output_attentions else None - - return TFTokenClassifierOutput(logits=output.logits, hidden_states=hs, attentions=attns) - @add_start_docstrings( """ @@ -1544,17 +1443,17 @@ def __init__(self, config: RemBertConfig, *inputs, **kwargs): ) def call( self, - input_ids: Optional[TFModelInputType] = None, - attention_mask: Optional[Union[np.ndarray, tf.Tensor]] = None, - token_type_ids: Optional[Union[np.ndarray, tf.Tensor]] = None, - position_ids: Optional[Union[np.ndarray, tf.Tensor]] = None, - head_mask: Optional[Union[np.ndarray, tf.Tensor]] = None, - inputs_embeds: Optional[Union[np.ndarray, tf.Tensor]] = None, + input_ids: TFModelInputType | None = None, + attention_mask: np.ndarray | tf.Tensor | None = None, + token_type_ids: np.ndarray | tf.Tensor | None = None, + position_ids: np.ndarray | tf.Tensor | None = None, + head_mask: np.ndarray | tf.Tensor | None = None, + inputs_embeds: np.ndarray | tf.Tensor | None = None, output_attentions: Optional[bool] = None, output_hidden_states: Optional[bool] = None, return_dict: Optional[bool] = None, - start_positions: Optional[Union[np.ndarray, tf.Tensor]] = None, - end_positions: Optional[Union[np.ndarray, tf.Tensor]] = None, + start_positions: np.ndarray | tf.Tensor | None = None, + end_positions: np.ndarray | tf.Tensor | None = None, training: Optional[bool] = False, ) -> Union[TFQuestionAnsweringModelOutput, Tuple[tf.Tensor]]: r""" @@ -1602,11 +1501,3 @@ def call( hidden_states=outputs.hidden_states, attentions=outputs.attentions, ) - - def serving_output(self, output: TFQuestionAnsweringModelOutput) -> TFQuestionAnsweringModelOutput: - hs = tf.convert_to_tensor(output.hidden_states) if self.config.output_hidden_states else None - attns = tf.convert_to_tensor(output.attentions) if self.config.output_attentions else None - - return TFQuestionAnsweringModelOutput( - start_logits=output.start_logits, end_logits=output.end_logits, hidden_states=hs, attentions=attns - ) diff --git a/src/transformers/models/resnet/modeling_tf_resnet.py b/src/transformers/models/resnet/modeling_tf_resnet.py index bb6035adf2df64..4ff1b119d42820 100644 --- a/src/transformers/models/resnet/modeling_tf_resnet.py +++ b/src/transformers/models/resnet/modeling_tf_resnet.py @@ -14,7 +14,7 @@ # limitations under the License. """ TensorFlow ResNet model.""" -from typing import Dict, Optional, Tuple, Union +from typing import Optional, Tuple, Union import tensorflow as tf @@ -276,24 +276,8 @@ class TFResNetPreTrainedModel(TFPreTrainedModel): main_input_name = "pixel_values" @property - def dummy_inputs(self) -> Dict[str, tf.Tensor]: - """ - Dummy inputs to build the network. Returns: - `Dict[str, tf.Tensor]`: The dummy inputs. - """ - VISION_DUMMY_INPUTS = tf.random.uniform(shape=(3, self.config.num_channels, 224, 224), dtype=tf.float32) - return {"pixel_values": tf.constant(VISION_DUMMY_INPUTS)} - - @tf.function( - input_signature=[ - { - "pixel_values": tf.TensorSpec((None, None, None, None), tf.float32, name="pixel_values"), - } - ] - ) - def serving(self, inputs): - output = self.call(inputs) - return self.serving_output(output) + def input_signature(self): + return {"pixel_values": tf.TensorSpec(shape=(None, self.config.num_channels, 224, 224), dtype=tf.float32)} RESNET_START_DOCSTRING = r""" @@ -419,16 +403,6 @@ def call( ) return resnet_outputs - def serving_output( - self, output: TFBaseModelOutputWithPoolingAndNoAttention - ) -> TFBaseModelOutputWithPoolingAndNoAttention: - # hidden_states not converted to Tensor with tf.convert_to_tensor as they are all of different dimensions - return TFBaseModelOutputWithPoolingAndNoAttention( - last_hidden_state=output.last_hidden_state, - pooler_output=output.pooler_output, - hidden_states=output.hidden_states, - ) - @add_start_docstrings( """ @@ -492,7 +466,3 @@ def call( return (loss,) + output if loss is not None else output return TFImageClassifierOutputWithNoAttention(loss=loss, logits=logits, hidden_states=outputs.hidden_states) - - def serving_output(self, output: TFImageClassifierOutputWithNoAttention) -> TFImageClassifierOutputWithNoAttention: - # hidden_states not converted to Tensor with tf.convert_to_tensor as they are all of different dimensions - return TFImageClassifierOutputWithNoAttention(logits=output.logits, hidden_states=output.hidden_states) diff --git a/src/transformers/models/roberta/configuration_roberta.py b/src/transformers/models/roberta/configuration_roberta.py index 3025fe2833d904..f82033f4588fde 100644 --- a/src/transformers/models/roberta/configuration_roberta.py +++ b/src/transformers/models/roberta/configuration_roberta.py @@ -46,7 +46,7 @@ class RobertaConfig(PretrainedConfig): Args: - vocab_size (`int`, *optional*, defaults to 30522): + vocab_size (`int`, *optional*, defaults to 50265): Vocabulary size of the RoBERTa model. Defines the number of different tokens that can be represented by the `inputs_ids` passed when calling [`RobertaModel`] or [`TFRobertaModel`]. hidden_size (`int`, *optional*, defaults to 768): @@ -105,7 +105,7 @@ class RobertaConfig(PretrainedConfig): def __init__( self, - vocab_size=30522, + vocab_size=50265, hidden_size=768, num_hidden_layers=12, num_attention_heads=12, diff --git a/src/transformers/models/roberta/modeling_tf_roberta.py b/src/transformers/models/roberta/modeling_tf_roberta.py index 7aa2c9e07a3dff..9b6c491d2761e6 100644 --- a/src/transformers/models/roberta/modeling_tf_roberta.py +++ b/src/transformers/models/roberta/modeling_tf_roberta.py @@ -15,6 +15,9 @@ # limitations under the License. """ TF 2.0 RoBERTa model.""" + +from __future__ import annotations + import math import warnings from typing import Optional, Tuple, Union @@ -48,8 +51,6 @@ ) from ...tf_utils import check_embeddings_within_bounds, shape_list, stable_softmax from ...utils import ( - DUMMY_INPUTS, - MULTIPLE_CHOICE_DUMMY_INPUTS, add_code_sample_docstrings, add_start_docstrings, add_start_docstrings_to_model_forward, @@ -431,9 +432,9 @@ def call( hidden_states: tf.Tensor, attention_mask: tf.Tensor, head_mask: tf.Tensor, - encoder_hidden_states: Optional[tf.Tensor], - encoder_attention_mask: Optional[tf.Tensor], - past_key_value: Optional[Tuple[tf.Tensor]], + encoder_hidden_states: tf.Tensor | None, + encoder_attention_mask: tf.Tensor | None, + past_key_value: Tuple[tf.Tensor] | None, output_attentions: bool, training: bool = False, ) -> Tuple[tf.Tensor]: @@ -510,9 +511,9 @@ def call( hidden_states: tf.Tensor, attention_mask: tf.Tensor, head_mask: tf.Tensor, - encoder_hidden_states: Optional[tf.Tensor], - encoder_attention_mask: Optional[tf.Tensor], - past_key_values: Optional[Tuple[Tuple[tf.Tensor]]], + encoder_hidden_states: tf.Tensor | None, + encoder_attention_mask: tf.Tensor | None, + past_key_values: Tuple[Tuple[tf.Tensor]] | None, use_cache: Optional[bool], output_attentions: bool, output_hidden_states: bool, @@ -609,14 +610,14 @@ class PreTrainedModel # Copied from transformers.models.bert.modeling_tf_bert.TFBertMainLayer.call def call( self, - input_ids: Optional[TFModelInputType] = None, - attention_mask: Optional[Union[np.ndarray, tf.Tensor]] = None, - token_type_ids: Optional[Union[np.ndarray, tf.Tensor]] = None, - position_ids: Optional[Union[np.ndarray, tf.Tensor]] = None, - head_mask: Optional[Union[np.ndarray, tf.Tensor]] = None, - inputs_embeds: Optional[Union[np.ndarray, tf.Tensor]] = None, - encoder_hidden_states: Optional[Union[np.ndarray, tf.Tensor]] = None, - encoder_attention_mask: Optional[Union[np.ndarray, tf.Tensor]] = None, + input_ids: TFModelInputType | None = None, + attention_mask: np.ndarray | tf.Tensor | None = None, + token_type_ids: np.ndarray | tf.Tensor | None = None, + position_ids: np.ndarray | tf.Tensor | None = None, + head_mask: np.ndarray | tf.Tensor | None = None, + inputs_embeds: np.ndarray | tf.Tensor | None = None, + encoder_hidden_states: np.ndarray | tf.Tensor | None = None, + encoder_attention_mask: np.ndarray | tf.Tensor | None = None, past_key_values: Optional[Tuple[Tuple[Union[np.ndarray, tf.Tensor]]]] = None, use_cache: Optional[bool] = None, output_attentions: Optional[bool] = None, @@ -774,38 +775,6 @@ class TFRobertaPreTrainedModel(TFPreTrainedModel): config_class = RobertaConfig base_model_prefix = "roberta" - @property - # Copied from transformers.models.bert.modeling_tf_bert.TFBertPreTrainedModel.dummy_inputs - def dummy_inputs(self): - """ - Dummy inputs to build the network. - - Returns: - `Dict[str, tf.Tensor]`: The dummy inputs. - """ - dummy = {"input_ids": tf.constant(DUMMY_INPUTS, dtype=tf.int32)} - # Add `encoder_hidden_states` to make the cross-attention layers' weights initialized - if self.config.add_cross_attention: - batch_size, seq_len = tf.constant(DUMMY_INPUTS).shape - shape = (batch_size, seq_len) + (self.config.hidden_size,) - h = tf.random.uniform(shape=shape) - dummy["encoder_hidden_states"] = h - - return dummy - - @tf.function( - input_signature=[ - { - "input_ids": tf.TensorSpec((None, None), tf.int32, name="input_ids"), - "attention_mask": tf.TensorSpec((None, None), tf.int32, name="attention_mask"), - } - ] - ) - def serving(self, inputs): - output = self.call(inputs) - - return self.serving_output(output) - ROBERTA_START_DOCSTRING = r""" @@ -923,14 +892,14 @@ def __init__(self, config, *inputs, **kwargs): ) def call( self, - input_ids: Optional[TFModelInputType] = None, - attention_mask: Optional[Union[np.ndarray, tf.Tensor]] = None, - token_type_ids: Optional[Union[np.ndarray, tf.Tensor]] = None, - position_ids: Optional[Union[np.ndarray, tf.Tensor]] = None, - head_mask: Optional[Union[np.ndarray, tf.Tensor]] = None, - inputs_embeds: Optional[Union[np.ndarray, tf.Tensor]] = None, - encoder_hidden_states: Optional[Union[np.ndarray, tf.Tensor]] = None, - encoder_attention_mask: Optional[Union[np.ndarray, tf.Tensor]] = None, + input_ids: TFModelInputType | None = None, + attention_mask: np.ndarray | tf.Tensor | None = None, + token_type_ids: np.ndarray | tf.Tensor | None = None, + position_ids: np.ndarray | tf.Tensor | None = None, + head_mask: np.ndarray | tf.Tensor | None = None, + inputs_embeds: np.ndarray | tf.Tensor | None = None, + encoder_hidden_states: np.ndarray | tf.Tensor | None = None, + encoder_attention_mask: np.ndarray | tf.Tensor | None = None, past_key_values: Optional[Tuple[Tuple[Union[np.ndarray, tf.Tensor]]]] = None, use_cache: Optional[bool] = None, output_attentions: Optional[bool] = None, @@ -977,27 +946,6 @@ def call( return outputs - # Copied from transformers.models.bert.modeling_tf_bert.TFBertModel.serving_output - def serving_output( - self, output: TFBaseModelOutputWithPoolingAndCrossAttentions - ) -> TFBaseModelOutputWithPoolingAndCrossAttentions: - output_cache = self.config.use_cache and self.config.is_decoder - pkv = tf.convert_to_tensor(output.past_key_values) if output_cache else None - hs = tf.convert_to_tensor(output.hidden_states) if self.config.output_hidden_states else None - attns = tf.convert_to_tensor(output.attentions) if self.config.output_attentions else None - cross_attns = tf.convert_to_tensor(output.cross_attentions) if output.cross_attentions is not None else None - if not (self.config.output_attentions and self.config.add_cross_attention): - cross_attns = None - - return TFBaseModelOutputWithPoolingAndCrossAttentions( - last_hidden_state=output.last_hidden_state, - pooler_output=output.pooler_output, - past_key_values=pkv, - hidden_states=hs, - attentions=attns, - cross_attentions=cross_attns, - ) - class TFRobertaLMHead(tf.keras.layers.Layer): """Roberta Head for masked language modeling.""" @@ -1081,16 +1029,16 @@ def get_prefix_bias_name(self): ) def call( self, - input_ids: Optional[TFModelInputType] = None, - attention_mask: Optional[Union[np.ndarray, tf.Tensor]] = None, - token_type_ids: Optional[Union[np.ndarray, tf.Tensor]] = None, - position_ids: Optional[Union[np.ndarray, tf.Tensor]] = None, - head_mask: Optional[Union[np.ndarray, tf.Tensor]] = None, - inputs_embeds: Optional[Union[np.ndarray, tf.Tensor]] = None, + input_ids: TFModelInputType | None = None, + attention_mask: np.ndarray | tf.Tensor | None = None, + token_type_ids: np.ndarray | tf.Tensor | None = None, + position_ids: np.ndarray | tf.Tensor | None = None, + head_mask: np.ndarray | tf.Tensor | None = None, + inputs_embeds: np.ndarray | tf.Tensor | None = None, output_attentions: Optional[bool] = None, output_hidden_states: Optional[bool] = None, return_dict: Optional[bool] = None, - labels: Optional[Union[np.ndarray, tf.Tensor]] = None, + labels: np.ndarray | tf.Tensor | None = None, training: Optional[bool] = False, ) -> Union[TFMaskedLMOutput, Tuple[tf.Tensor]]: r""" @@ -1128,13 +1076,6 @@ def call( attentions=outputs.attentions, ) - # Copied from transformers.models.bert.modeling_tf_bert.TFBertForMaskedLM.serving_output - def serving_output(self, output: TFMaskedLMOutput) -> TFMaskedLMOutput: - hs = tf.convert_to_tensor(output.hidden_states) if self.config.output_hidden_states else None - attns = tf.convert_to_tensor(output.attentions) if self.config.output_attentions else None - - return TFMaskedLMOutput(logits=output.logits, hidden_states=hs, attentions=attns) - class TFRobertaForCausalLM(TFRobertaPreTrainedModel, TFCausalLanguageModelingLoss): # names with a '.' represents the authorized unexpected/missing layers when a TF model is loaded from a PT model @@ -1178,20 +1119,20 @@ def prepare_inputs_for_generation(self, input_ids, past_key_values=None, attenti ) def call( self, - input_ids: Optional[TFModelInputType] = None, - attention_mask: Optional[Union[np.ndarray, tf.Tensor]] = None, - token_type_ids: Optional[Union[np.ndarray, tf.Tensor]] = None, - position_ids: Optional[Union[np.ndarray, tf.Tensor]] = None, - head_mask: Optional[Union[np.ndarray, tf.Tensor]] = None, - inputs_embeds: Optional[Union[np.ndarray, tf.Tensor]] = None, - encoder_hidden_states: Optional[Union[np.ndarray, tf.Tensor]] = None, - encoder_attention_mask: Optional[Union[np.ndarray, tf.Tensor]] = None, + input_ids: TFModelInputType | None = None, + attention_mask: np.ndarray | tf.Tensor | None = None, + token_type_ids: np.ndarray | tf.Tensor | None = None, + position_ids: np.ndarray | tf.Tensor | None = None, + head_mask: np.ndarray | tf.Tensor | None = None, + inputs_embeds: np.ndarray | tf.Tensor | None = None, + encoder_hidden_states: np.ndarray | tf.Tensor | None = None, + encoder_attention_mask: np.ndarray | tf.Tensor | None = None, past_key_values: Optional[Tuple[Tuple[Union[np.ndarray, tf.Tensor]]]] = None, use_cache: Optional[bool] = None, output_attentions: Optional[bool] = None, output_hidden_states: Optional[bool] = None, return_dict: Optional[bool] = None, - labels: Optional[Union[np.ndarray, tf.Tensor]] = None, + labels: np.ndarray | tf.Tensor | None = None, training: Optional[bool] = False, ) -> Union[TFCausalLMOutputWithCrossAttentions, Tuple[tf.Tensor]]: r""" @@ -1257,20 +1198,6 @@ def call( cross_attentions=outputs.cross_attentions, ) - # Copied from transformers.models.bert.modeling_tf_bert.TFBertLMHeadModel.serving_output - def serving_output(self, output: TFCausalLMOutputWithCrossAttentions) -> TFCausalLMOutputWithCrossAttentions: - output_cache = self.config.use_cache and self.config.is_decoder - pkv = tf.convert_to_tensor(output.past_key_values) if output_cache else None - hs = tf.convert_to_tensor(output.hidden_states) if self.config.output_hidden_states else None - attns = tf.convert_to_tensor(output.attentions) if self.config.output_attentions else None - cross_attns = tf.convert_to_tensor(output.cross_attentions) if output.cross_attentions is not None else None - if not (self.config.output_attentions and self.config.add_cross_attention): - cross_attns = None - - return TFCausalLMOutputWithCrossAttentions( - logits=output.logits, past_key_values=pkv, hidden_states=hs, attentions=attns, cross_attentions=cross_attns - ) - class TFRobertaClassificationHead(tf.keras.layers.Layer): """Head for sentence-level classification tasks.""" @@ -1329,16 +1256,16 @@ def __init__(self, config, *inputs, **kwargs): ) def call( self, - input_ids: Optional[TFModelInputType] = None, - attention_mask: Optional[Union[np.ndarray, tf.Tensor]] = None, - token_type_ids: Optional[Union[np.ndarray, tf.Tensor]] = None, - position_ids: Optional[Union[np.ndarray, tf.Tensor]] = None, - head_mask: Optional[Union[np.ndarray, tf.Tensor]] = None, - inputs_embeds: Optional[Union[np.ndarray, tf.Tensor]] = None, + input_ids: TFModelInputType | None = None, + attention_mask: np.ndarray | tf.Tensor | None = None, + token_type_ids: np.ndarray | tf.Tensor | None = None, + position_ids: np.ndarray | tf.Tensor | None = None, + head_mask: np.ndarray | tf.Tensor | None = None, + inputs_embeds: np.ndarray | tf.Tensor | None = None, output_attentions: Optional[bool] = None, output_hidden_states: Optional[bool] = None, return_dict: Optional[bool] = None, - labels: Optional[Union[np.ndarray, tf.Tensor]] = None, + labels: np.ndarray | tf.Tensor | None = None, training: Optional[bool] = False, ) -> Union[TFSequenceClassifierOutput, Tuple[tf.Tensor]]: r""" @@ -1375,13 +1302,6 @@ def call( attentions=outputs.attentions, ) - # Copied from transformers.models.bert.modeling_tf_bert.TFBertForSequenceClassification.serving_output - def serving_output(self, output: TFSequenceClassifierOutput) -> TFSequenceClassifierOutput: - hs = tf.convert_to_tensor(output.hidden_states) if self.config.output_hidden_states else None - attns = tf.convert_to_tensor(output.attentions) if self.config.output_attentions else None - - return TFSequenceClassifierOutput(logits=output.logits, hidden_states=hs, attentions=attns) - @add_start_docstrings( """ @@ -1404,16 +1324,6 @@ def __init__(self, config, *inputs, **kwargs): 1, kernel_initializer=get_initializer(config.initializer_range), name="classifier" ) - @property - def dummy_inputs(self): - """ - Dummy inputs to build the network. - - Returns: - tf.Tensor with dummy inputs - """ - return {"input_ids": tf.constant(MULTIPLE_CHOICE_DUMMY_INPUTS, dtype=tf.int32)} - @unpack_inputs @add_start_docstrings_to_model_forward(ROBERTA_INPUTS_DOCSTRING.format("batch_size, num_choices, sequence_length")) @add_code_sample_docstrings( @@ -1423,16 +1333,16 @@ def dummy_inputs(self): ) def call( self, - input_ids: Optional[TFModelInputType] = None, - attention_mask: Optional[Union[np.ndarray, tf.Tensor]] = None, - token_type_ids: Optional[Union[np.ndarray, tf.Tensor]] = None, - position_ids: Optional[Union[np.ndarray, tf.Tensor]] = None, - head_mask: Optional[Union[np.ndarray, tf.Tensor]] = None, - inputs_embeds: Optional[Union[np.ndarray, tf.Tensor]] = None, + input_ids: TFModelInputType | None = None, + attention_mask: np.ndarray | tf.Tensor | None = None, + token_type_ids: np.ndarray | tf.Tensor | None = None, + position_ids: np.ndarray | tf.Tensor | None = None, + head_mask: np.ndarray | tf.Tensor | None = None, + inputs_embeds: np.ndarray | tf.Tensor | None = None, output_attentions: Optional[bool] = None, output_hidden_states: Optional[bool] = None, return_dict: Optional[bool] = None, - labels: Optional[Union[np.ndarray, tf.Tensor]] = None, + labels: np.ndarray | tf.Tensor | None = None, training: Optional[bool] = False, ) -> Union[TFMultipleChoiceModelOutput, Tuple[tf.Tensor]]: r""" @@ -1482,26 +1392,6 @@ def call( attentions=outputs.attentions, ) - @tf.function( - input_signature=[ - { - "input_ids": tf.TensorSpec((None, None, None), tf.int32, name="input_ids"), - "attention_mask": tf.TensorSpec((None, None, None), tf.int32, name="attention_mask"), - } - ] - ) - def serving(self, inputs): - output = self.call(inputs) - - return self.serving_output(output) - - # Copied from transformers.models.bert.modeling_tf_bert.TFBertForMultipleChoice.serving_output - def serving_output(self, output: TFMultipleChoiceModelOutput) -> TFMultipleChoiceModelOutput: - hs = tf.convert_to_tensor(output.hidden_states) if self.config.output_hidden_states else None - attns = tf.convert_to_tensor(output.attentions) if self.config.output_attentions else None - - return TFMultipleChoiceModelOutput(logits=output.logits, hidden_states=hs, attentions=attns) - @add_start_docstrings( """ @@ -1539,16 +1429,16 @@ def __init__(self, config, *inputs, **kwargs): ) def call( self, - input_ids: Optional[TFModelInputType] = None, - attention_mask: Optional[Union[np.ndarray, tf.Tensor]] = None, - token_type_ids: Optional[Union[np.ndarray, tf.Tensor]] = None, - position_ids: Optional[Union[np.ndarray, tf.Tensor]] = None, - head_mask: Optional[Union[np.ndarray, tf.Tensor]] = None, - inputs_embeds: Optional[Union[np.ndarray, tf.Tensor]] = None, + input_ids: TFModelInputType | None = None, + attention_mask: np.ndarray | tf.Tensor | None = None, + token_type_ids: np.ndarray | tf.Tensor | None = None, + position_ids: np.ndarray | tf.Tensor | None = None, + head_mask: np.ndarray | tf.Tensor | None = None, + inputs_embeds: np.ndarray | tf.Tensor | None = None, output_attentions: Optional[bool] = None, output_hidden_states: Optional[bool] = None, return_dict: Optional[bool] = None, - labels: Optional[Union[np.ndarray, tf.Tensor]] = None, + labels: np.ndarray | tf.Tensor | None = None, training: Optional[bool] = False, ) -> Union[TFTokenClassifierOutput, Tuple[tf.Tensor]]: r""" @@ -1585,13 +1475,6 @@ def call( attentions=outputs.attentions, ) - # Copied from transformers.models.bert.modeling_tf_bert.TFBertForTokenClassification.serving_output - def serving_output(self, output: TFTokenClassifierOutput) -> TFTokenClassifierOutput: - hs = tf.convert_to_tensor(output.hidden_states) if self.config.output_hidden_states else None - attns = tf.convert_to_tensor(output.attentions) if self.config.output_attentions else None - - return TFTokenClassifierOutput(logits=output.logits, hidden_states=hs, attentions=attns) - @add_start_docstrings( """ @@ -1624,17 +1507,17 @@ def __init__(self, config, *inputs, **kwargs): ) def call( self, - input_ids: Optional[TFModelInputType] = None, - attention_mask: Optional[Union[np.ndarray, tf.Tensor]] = None, - token_type_ids: Optional[Union[np.ndarray, tf.Tensor]] = None, - position_ids: Optional[Union[np.ndarray, tf.Tensor]] = None, - head_mask: Optional[Union[np.ndarray, tf.Tensor]] = None, - inputs_embeds: Optional[Union[np.ndarray, tf.Tensor]] = None, + input_ids: TFModelInputType | None = None, + attention_mask: np.ndarray | tf.Tensor | None = None, + token_type_ids: np.ndarray | tf.Tensor | None = None, + position_ids: np.ndarray | tf.Tensor | None = None, + head_mask: np.ndarray | tf.Tensor | None = None, + inputs_embeds: np.ndarray | tf.Tensor | None = None, output_attentions: Optional[bool] = None, output_hidden_states: Optional[bool] = None, return_dict: Optional[bool] = None, - start_positions: Optional[Union[np.ndarray, tf.Tensor]] = None, - end_positions: Optional[Union[np.ndarray, tf.Tensor]] = None, + start_positions: np.ndarray | tf.Tensor | None = None, + end_positions: np.ndarray | tf.Tensor | None = None, training: Optional[bool] = False, ) -> Union[TFQuestionAnsweringModelOutput, Tuple[tf.Tensor]]: r""" @@ -1683,12 +1566,3 @@ def call( hidden_states=outputs.hidden_states, attentions=outputs.attentions, ) - - # Copied from transformers.models.bert.modeling_tf_bert.TFBertForQuestionAnswering.serving_output - def serving_output(self, output: TFQuestionAnsweringModelOutput) -> TFQuestionAnsweringModelOutput: - hs = tf.convert_to_tensor(output.hidden_states) if self.config.output_hidden_states else None - attns = tf.convert_to_tensor(output.attentions) if self.config.output_attentions else None - - return TFQuestionAnsweringModelOutput( - start_logits=output.start_logits, end_logits=output.end_logits, hidden_states=hs, attentions=attns - ) diff --git a/src/transformers/models/roberta_prelayernorm/configuration_roberta_prelayernorm.py b/src/transformers/models/roberta_prelayernorm/configuration_roberta_prelayernorm.py index 49f92586c1b732..fca6763f274eab 100644 --- a/src/transformers/models/roberta_prelayernorm/configuration_roberta_prelayernorm.py +++ b/src/transformers/models/roberta_prelayernorm/configuration_roberta_prelayernorm.py @@ -45,7 +45,7 @@ class RobertaPreLayerNormConfig(PretrainedConfig): Args: - vocab_size (`int`, *optional*, defaults to 30522): + vocab_size (`int`, *optional*, defaults to 50265): Vocabulary size of the RoBERTa-PreLayerNorm model. Defines the number of different tokens that can be represented by the `inputs_ids` passed when calling [`RobertaPreLayerNormModel`] or [`TFRobertaPreLayerNormModel`]. @@ -106,7 +106,7 @@ class RobertaPreLayerNormConfig(PretrainedConfig): def __init__( self, - vocab_size=30522, + vocab_size=50265, hidden_size=768, num_hidden_layers=12, num_attention_heads=12, diff --git a/src/transformers/models/roberta_prelayernorm/modeling_tf_roberta_prelayernorm.py b/src/transformers/models/roberta_prelayernorm/modeling_tf_roberta_prelayernorm.py index fedfea56a7a9b2..2f98a5f5d0cff4 100644 --- a/src/transformers/models/roberta_prelayernorm/modeling_tf_roberta_prelayernorm.py +++ b/src/transformers/models/roberta_prelayernorm/modeling_tf_roberta_prelayernorm.py @@ -15,6 +15,9 @@ # limitations under the License. """ TF 2.0 RoBERTa-PreLayerNorm model.""" + +from __future__ import annotations + import math import warnings from typing import Optional, Tuple, Union @@ -48,8 +51,6 @@ ) from ...tf_utils import check_embeddings_within_bounds, shape_list, stable_softmax from ...utils import ( - DUMMY_INPUTS, - MULTIPLE_CHOICE_DUMMY_INPUTS, add_code_sample_docstrings, add_start_docstrings, add_start_docstrings_to_model_forward, @@ -435,9 +436,9 @@ def call( hidden_states: tf.Tensor, attention_mask: tf.Tensor, head_mask: tf.Tensor, - encoder_hidden_states: Optional[tf.Tensor], - encoder_attention_mask: Optional[tf.Tensor], - past_key_value: Optional[Tuple[tf.Tensor]], + encoder_hidden_states: tf.Tensor | None, + encoder_attention_mask: tf.Tensor | None, + past_key_value: Tuple[tf.Tensor] | None, output_attentions: bool, training: bool = False, ) -> Tuple[tf.Tensor]: @@ -514,9 +515,9 @@ def call( hidden_states: tf.Tensor, attention_mask: tf.Tensor, head_mask: tf.Tensor, - encoder_hidden_states: Optional[tf.Tensor], - encoder_attention_mask: Optional[tf.Tensor], - past_key_values: Optional[Tuple[Tuple[tf.Tensor]]], + encoder_hidden_states: tf.Tensor | None, + encoder_attention_mask: tf.Tensor | None, + past_key_values: Tuple[Tuple[tf.Tensor]] | None, use_cache: Optional[bool], output_attentions: bool, output_hidden_states: bool, @@ -610,14 +611,14 @@ class PreTrainedModel @unpack_inputs def call( self, - input_ids: Optional[TFModelInputType] = None, - attention_mask: Optional[Union[np.ndarray, tf.Tensor]] = None, - token_type_ids: Optional[Union[np.ndarray, tf.Tensor]] = None, - position_ids: Optional[Union[np.ndarray, tf.Tensor]] = None, - head_mask: Optional[Union[np.ndarray, tf.Tensor]] = None, - inputs_embeds: Optional[Union[np.ndarray, tf.Tensor]] = None, - encoder_hidden_states: Optional[Union[np.ndarray, tf.Tensor]] = None, - encoder_attention_mask: Optional[Union[np.ndarray, tf.Tensor]] = None, + input_ids: TFModelInputType | None = None, + attention_mask: np.ndarray | tf.Tensor | None = None, + token_type_ids: np.ndarray | tf.Tensor | None = None, + position_ids: np.ndarray | tf.Tensor | None = None, + head_mask: np.ndarray | tf.Tensor | None = None, + inputs_embeds: np.ndarray | tf.Tensor | None = None, + encoder_hidden_states: np.ndarray | tf.Tensor | None = None, + encoder_attention_mask: np.ndarray | tf.Tensor | None = None, past_key_values: Optional[Tuple[Tuple[Union[np.ndarray, tf.Tensor]]]] = None, use_cache: Optional[bool] = None, output_attentions: Optional[bool] = None, @@ -775,38 +776,6 @@ class TFRobertaPreLayerNormPreTrainedModel(TFPreTrainedModel): config_class = RobertaPreLayerNormConfig base_model_prefix = "roberta_prelayernorm" - @property - # Copied from transformers.models.bert.modeling_tf_bert.TFBertPreTrainedModel.dummy_inputs - def dummy_inputs(self): - """ - Dummy inputs to build the network. - - Returns: - `Dict[str, tf.Tensor]`: The dummy inputs. - """ - dummy = {"input_ids": tf.constant(DUMMY_INPUTS, dtype=tf.int32)} - # Add `encoder_hidden_states` to make the cross-attention layers' weights initialized - if self.config.add_cross_attention: - batch_size, seq_len = tf.constant(DUMMY_INPUTS).shape - shape = (batch_size, seq_len) + (self.config.hidden_size,) - h = tf.random.uniform(shape=shape) - dummy["encoder_hidden_states"] = h - - return dummy - - @tf.function( - input_signature=[ - { - "input_ids": tf.TensorSpec((None, None), tf.int32, name="input_ids"), - "attention_mask": tf.TensorSpec((None, None), tf.int32, name="attention_mask"), - } - ] - ) - def serving(self, inputs): - output = self.call(inputs) - - return self.serving_output(output) - ROBERTA_PRELAYERNORM_START_DOCSTRING = r""" @@ -925,14 +894,14 @@ def __init__(self, config, *inputs, **kwargs): ) def call( self, - input_ids: Optional[TFModelInputType] = None, - attention_mask: Optional[Union[np.ndarray, tf.Tensor]] = None, - token_type_ids: Optional[Union[np.ndarray, tf.Tensor]] = None, - position_ids: Optional[Union[np.ndarray, tf.Tensor]] = None, - head_mask: Optional[Union[np.ndarray, tf.Tensor]] = None, - inputs_embeds: Optional[Union[np.ndarray, tf.Tensor]] = None, - encoder_hidden_states: Optional[Union[np.ndarray, tf.Tensor]] = None, - encoder_attention_mask: Optional[Union[np.ndarray, tf.Tensor]] = None, + input_ids: TFModelInputType | None = None, + attention_mask: np.ndarray | tf.Tensor | None = None, + token_type_ids: np.ndarray | tf.Tensor | None = None, + position_ids: np.ndarray | tf.Tensor | None = None, + head_mask: np.ndarray | tf.Tensor | None = None, + inputs_embeds: np.ndarray | tf.Tensor | None = None, + encoder_hidden_states: np.ndarray | tf.Tensor | None = None, + encoder_attention_mask: np.ndarray | tf.Tensor | None = None, past_key_values: Optional[Tuple[Tuple[Union[np.ndarray, tf.Tensor]]]] = None, use_cache: Optional[bool] = None, output_attentions: Optional[bool] = None, @@ -979,27 +948,6 @@ def call( return outputs - # Copied from transformers.models.bert.modeling_tf_bert.TFBertModel.serving_output - def serving_output( - self, output: TFBaseModelOutputWithPoolingAndCrossAttentions - ) -> TFBaseModelOutputWithPoolingAndCrossAttentions: - output_cache = self.config.use_cache and self.config.is_decoder - pkv = tf.convert_to_tensor(output.past_key_values) if output_cache else None - hs = tf.convert_to_tensor(output.hidden_states) if self.config.output_hidden_states else None - attns = tf.convert_to_tensor(output.attentions) if self.config.output_attentions else None - cross_attns = tf.convert_to_tensor(output.cross_attentions) if output.cross_attentions is not None else None - if not (self.config.output_attentions and self.config.add_cross_attention): - cross_attns = None - - return TFBaseModelOutputWithPoolingAndCrossAttentions( - last_hidden_state=output.last_hidden_state, - pooler_output=output.pooler_output, - past_key_values=pkv, - hidden_states=hs, - attentions=attns, - cross_attentions=cross_attns, - ) - # Copied from transformers.models.roberta.modeling_tf_roberta.TFRobertaLMHead with Roberta->RobertaPreLayerNorm class TFRobertaPreLayerNormLMHead(tf.keras.layers.Layer): @@ -1090,16 +1038,16 @@ def get_prefix_bias_name(self): # Copied from transformers.models.roberta.modeling_tf_roberta.TFRobertaForMaskedLM.call with ROBERTA->ROBERTA_PRELAYERNORM,Roberta->RobertaPreLayerNorm,roberta->roberta_prelayernorm def call( self, - input_ids: Optional[TFModelInputType] = None, - attention_mask: Optional[Union[np.ndarray, tf.Tensor]] = None, - token_type_ids: Optional[Union[np.ndarray, tf.Tensor]] = None, - position_ids: Optional[Union[np.ndarray, tf.Tensor]] = None, - head_mask: Optional[Union[np.ndarray, tf.Tensor]] = None, - inputs_embeds: Optional[Union[np.ndarray, tf.Tensor]] = None, + input_ids: TFModelInputType | None = None, + attention_mask: np.ndarray | tf.Tensor | None = None, + token_type_ids: np.ndarray | tf.Tensor | None = None, + position_ids: np.ndarray | tf.Tensor | None = None, + head_mask: np.ndarray | tf.Tensor | None = None, + inputs_embeds: np.ndarray | tf.Tensor | None = None, output_attentions: Optional[bool] = None, output_hidden_states: Optional[bool] = None, return_dict: Optional[bool] = None, - labels: Optional[Union[np.ndarray, tf.Tensor]] = None, + labels: np.ndarray | tf.Tensor | None = None, training: Optional[bool] = False, ) -> Union[TFMaskedLMOutput, Tuple[tf.Tensor]]: r""" @@ -1137,13 +1085,6 @@ def call( attentions=outputs.attentions, ) - # Copied from transformers.models.bert.modeling_tf_bert.TFBertForMaskedLM.serving_output - def serving_output(self, output: TFMaskedLMOutput) -> TFMaskedLMOutput: - hs = tf.convert_to_tensor(output.hidden_states) if self.config.output_hidden_states else None - attns = tf.convert_to_tensor(output.attentions) if self.config.output_attentions else None - - return TFMaskedLMOutput(logits=output.logits, hidden_states=hs, attentions=attns) - # Copied from transformers.models.roberta.modeling_tf_roberta.TFRobertaForCausalLM with ROBERTA->ROBERTA_PRELAYERNORM,Roberta->RobertaPreLayerNorm,roberta->roberta_prelayernorm class TFRobertaPreLayerNormForCausalLM(TFRobertaPreLayerNormPreTrainedModel, TFCausalLanguageModelingLoss): @@ -1194,20 +1135,20 @@ def prepare_inputs_for_generation(self, input_ids, past_key_values=None, attenti ) def call( self, - input_ids: Optional[TFModelInputType] = None, - attention_mask: Optional[Union[np.ndarray, tf.Tensor]] = None, - token_type_ids: Optional[Union[np.ndarray, tf.Tensor]] = None, - position_ids: Optional[Union[np.ndarray, tf.Tensor]] = None, - head_mask: Optional[Union[np.ndarray, tf.Tensor]] = None, - inputs_embeds: Optional[Union[np.ndarray, tf.Tensor]] = None, - encoder_hidden_states: Optional[Union[np.ndarray, tf.Tensor]] = None, - encoder_attention_mask: Optional[Union[np.ndarray, tf.Tensor]] = None, + input_ids: TFModelInputType | None = None, + attention_mask: np.ndarray | tf.Tensor | None = None, + token_type_ids: np.ndarray | tf.Tensor | None = None, + position_ids: np.ndarray | tf.Tensor | None = None, + head_mask: np.ndarray | tf.Tensor | None = None, + inputs_embeds: np.ndarray | tf.Tensor | None = None, + encoder_hidden_states: np.ndarray | tf.Tensor | None = None, + encoder_attention_mask: np.ndarray | tf.Tensor | None = None, past_key_values: Optional[Tuple[Tuple[Union[np.ndarray, tf.Tensor]]]] = None, use_cache: Optional[bool] = None, output_attentions: Optional[bool] = None, output_hidden_states: Optional[bool] = None, return_dict: Optional[bool] = None, - labels: Optional[Union[np.ndarray, tf.Tensor]] = None, + labels: np.ndarray | tf.Tensor | None = None, training: Optional[bool] = False, ) -> Union[TFCausalLMOutputWithCrossAttentions, Tuple[tf.Tensor]]: r""" @@ -1273,20 +1214,6 @@ def call( cross_attentions=outputs.cross_attentions, ) - # Copied from transformers.models.bert.modeling_tf_bert.TFBertLMHeadModel.serving_output - def serving_output(self, output: TFCausalLMOutputWithCrossAttentions) -> TFCausalLMOutputWithCrossAttentions: - output_cache = self.config.use_cache and self.config.is_decoder - pkv = tf.convert_to_tensor(output.past_key_values) if output_cache else None - hs = tf.convert_to_tensor(output.hidden_states) if self.config.output_hidden_states else None - attns = tf.convert_to_tensor(output.attentions) if self.config.output_attentions else None - cross_attns = tf.convert_to_tensor(output.cross_attentions) if output.cross_attentions is not None else None - if not (self.config.output_attentions and self.config.add_cross_attention): - cross_attns = None - - return TFCausalLMOutputWithCrossAttentions( - logits=output.logits, past_key_values=pkv, hidden_states=hs, attentions=attns, cross_attentions=cross_attns - ) - # Copied from transformers.models.roberta.modeling_tf_roberta.TFRobertaClassificationHead with Roberta->RobertaPreLayerNorm class TFRobertaPreLayerNormClassificationHead(tf.keras.layers.Layer): @@ -1349,16 +1276,16 @@ def __init__(self, config, *inputs, **kwargs): # Copied from transformers.models.roberta.modeling_tf_roberta.TFRobertaForSequenceClassification.call with roberta->roberta_prelayernorm def call( self, - input_ids: Optional[TFModelInputType] = None, - attention_mask: Optional[Union[np.ndarray, tf.Tensor]] = None, - token_type_ids: Optional[Union[np.ndarray, tf.Tensor]] = None, - position_ids: Optional[Union[np.ndarray, tf.Tensor]] = None, - head_mask: Optional[Union[np.ndarray, tf.Tensor]] = None, - inputs_embeds: Optional[Union[np.ndarray, tf.Tensor]] = None, + input_ids: TFModelInputType | None = None, + attention_mask: np.ndarray | tf.Tensor | None = None, + token_type_ids: np.ndarray | tf.Tensor | None = None, + position_ids: np.ndarray | tf.Tensor | None = None, + head_mask: np.ndarray | tf.Tensor | None = None, + inputs_embeds: np.ndarray | tf.Tensor | None = None, output_attentions: Optional[bool] = None, output_hidden_states: Optional[bool] = None, return_dict: Optional[bool] = None, - labels: Optional[Union[np.ndarray, tf.Tensor]] = None, + labels: np.ndarray | tf.Tensor | None = None, training: Optional[bool] = False, ) -> Union[TFSequenceClassifierOutput, Tuple[tf.Tensor]]: r""" @@ -1395,13 +1322,6 @@ def call( attentions=outputs.attentions, ) - # Copied from transformers.models.bert.modeling_tf_bert.TFBertForSequenceClassification.serving_output - def serving_output(self, output: TFSequenceClassifierOutput) -> TFSequenceClassifierOutput: - hs = tf.convert_to_tensor(output.hidden_states) if self.config.output_hidden_states else None - attns = tf.convert_to_tensor(output.attentions) if self.config.output_attentions else None - - return TFSequenceClassifierOutput(logits=output.logits, hidden_states=hs, attentions=attns) - @add_start_docstrings( """ @@ -1425,16 +1345,6 @@ def __init__(self, config, *inputs, **kwargs): 1, kernel_initializer=get_initializer(config.initializer_range), name="classifier" ) - @property - def dummy_inputs(self): - """ - Dummy inputs to build the network. - - Returns: - tf.Tensor with dummy inputs - """ - return {"input_ids": tf.constant(MULTIPLE_CHOICE_DUMMY_INPUTS, dtype=tf.int32)} - @unpack_inputs @add_start_docstrings_to_model_forward( ROBERTA_PRELAYERNORM_INPUTS_DOCSTRING.format("batch_size, num_choices, sequence_length") @@ -1446,16 +1356,16 @@ def dummy_inputs(self): ) def call( self, - input_ids: Optional[TFModelInputType] = None, - attention_mask: Optional[Union[np.ndarray, tf.Tensor]] = None, - token_type_ids: Optional[Union[np.ndarray, tf.Tensor]] = None, - position_ids: Optional[Union[np.ndarray, tf.Tensor]] = None, - head_mask: Optional[Union[np.ndarray, tf.Tensor]] = None, - inputs_embeds: Optional[Union[np.ndarray, tf.Tensor]] = None, + input_ids: TFModelInputType | None = None, + attention_mask: np.ndarray | tf.Tensor | None = None, + token_type_ids: np.ndarray | tf.Tensor | None = None, + position_ids: np.ndarray | tf.Tensor | None = None, + head_mask: np.ndarray | tf.Tensor | None = None, + inputs_embeds: np.ndarray | tf.Tensor | None = None, output_attentions: Optional[bool] = None, output_hidden_states: Optional[bool] = None, return_dict: Optional[bool] = None, - labels: Optional[Union[np.ndarray, tf.Tensor]] = None, + labels: np.ndarray | tf.Tensor | None = None, training: Optional[bool] = False, ) -> Union[TFMultipleChoiceModelOutput, Tuple[tf.Tensor]]: r""" @@ -1505,26 +1415,6 @@ def call( attentions=outputs.attentions, ) - @tf.function( - input_signature=[ - { - "input_ids": tf.TensorSpec((None, None, None), tf.int32, name="input_ids"), - "attention_mask": tf.TensorSpec((None, None, None), tf.int32, name="attention_mask"), - } - ] - ) - def serving(self, inputs): - output = self.call(inputs) - - return self.serving_output(output) - - # Copied from transformers.models.bert.modeling_tf_bert.TFBertForMultipleChoice.serving_output - def serving_output(self, output: TFMultipleChoiceModelOutput) -> TFMultipleChoiceModelOutput: - hs = tf.convert_to_tensor(output.hidden_states) if self.config.output_hidden_states else None - attns = tf.convert_to_tensor(output.attentions) if self.config.output_attentions else None - - return TFMultipleChoiceModelOutput(logits=output.logits, hidden_states=hs, attentions=attns) - @add_start_docstrings( """ @@ -1563,16 +1453,16 @@ def __init__(self, config, *inputs, **kwargs): # Copied from transformers.models.roberta.modeling_tf_roberta.TFRobertaForTokenClassification.call with roberta->roberta_prelayernorm def call( self, - input_ids: Optional[TFModelInputType] = None, - attention_mask: Optional[Union[np.ndarray, tf.Tensor]] = None, - token_type_ids: Optional[Union[np.ndarray, tf.Tensor]] = None, - position_ids: Optional[Union[np.ndarray, tf.Tensor]] = None, - head_mask: Optional[Union[np.ndarray, tf.Tensor]] = None, - inputs_embeds: Optional[Union[np.ndarray, tf.Tensor]] = None, + input_ids: TFModelInputType | None = None, + attention_mask: np.ndarray | tf.Tensor | None = None, + token_type_ids: np.ndarray | tf.Tensor | None = None, + position_ids: np.ndarray | tf.Tensor | None = None, + head_mask: np.ndarray | tf.Tensor | None = None, + inputs_embeds: np.ndarray | tf.Tensor | None = None, output_attentions: Optional[bool] = None, output_hidden_states: Optional[bool] = None, return_dict: Optional[bool] = None, - labels: Optional[Union[np.ndarray, tf.Tensor]] = None, + labels: np.ndarray | tf.Tensor | None = None, training: Optional[bool] = False, ) -> Union[TFTokenClassifierOutput, Tuple[tf.Tensor]]: r""" @@ -1609,13 +1499,6 @@ def call( attentions=outputs.attentions, ) - # Copied from transformers.models.bert.modeling_tf_bert.TFBertForTokenClassification.serving_output - def serving_output(self, output: TFTokenClassifierOutput) -> TFTokenClassifierOutput: - hs = tf.convert_to_tensor(output.hidden_states) if self.config.output_hidden_states else None - attns = tf.convert_to_tensor(output.attentions) if self.config.output_attentions else None - - return TFTokenClassifierOutput(logits=output.logits, hidden_states=hs, attentions=attns) - @add_start_docstrings( """ @@ -1649,17 +1532,17 @@ def __init__(self, config, *inputs, **kwargs): # Copied from transformers.models.roberta.modeling_tf_roberta.TFRobertaForQuestionAnswering.call with roberta->roberta_prelayernorm def call( self, - input_ids: Optional[TFModelInputType] = None, - attention_mask: Optional[Union[np.ndarray, tf.Tensor]] = None, - token_type_ids: Optional[Union[np.ndarray, tf.Tensor]] = None, - position_ids: Optional[Union[np.ndarray, tf.Tensor]] = None, - head_mask: Optional[Union[np.ndarray, tf.Tensor]] = None, - inputs_embeds: Optional[Union[np.ndarray, tf.Tensor]] = None, + input_ids: TFModelInputType | None = None, + attention_mask: np.ndarray | tf.Tensor | None = None, + token_type_ids: np.ndarray | tf.Tensor | None = None, + position_ids: np.ndarray | tf.Tensor | None = None, + head_mask: np.ndarray | tf.Tensor | None = None, + inputs_embeds: np.ndarray | tf.Tensor | None = None, output_attentions: Optional[bool] = None, output_hidden_states: Optional[bool] = None, return_dict: Optional[bool] = None, - start_positions: Optional[Union[np.ndarray, tf.Tensor]] = None, - end_positions: Optional[Union[np.ndarray, tf.Tensor]] = None, + start_positions: np.ndarray | tf.Tensor | None = None, + end_positions: np.ndarray | tf.Tensor | None = None, training: Optional[bool] = False, ) -> Union[TFQuestionAnsweringModelOutput, Tuple[tf.Tensor]]: r""" @@ -1708,12 +1591,3 @@ def call( hidden_states=outputs.hidden_states, attentions=outputs.attentions, ) - - # Copied from transformers.models.bert.modeling_tf_bert.TFBertForQuestionAnswering.serving_output - def serving_output(self, output: TFQuestionAnsweringModelOutput) -> TFQuestionAnsweringModelOutput: - hs = tf.convert_to_tensor(output.hidden_states) if self.config.output_hidden_states else None - attns = tf.convert_to_tensor(output.attentions) if self.config.output_attentions else None - - return TFQuestionAnsweringModelOutput( - start_logits=output.start_logits, end_logits=output.end_logits, hidden_states=hs, attentions=attns - ) diff --git a/src/transformers/models/roformer/modeling_tf_roformer.py b/src/transformers/models/roformer/modeling_tf_roformer.py index 2d1387d2d8d84d..f6067f9237f45e 100644 --- a/src/transformers/models/roformer/modeling_tf_roformer.py +++ b/src/transformers/models/roformer/modeling_tf_roformer.py @@ -15,6 +15,8 @@ """ TF 2.0 RoFormer model.""" +from __future__ import annotations + import math from typing import Dict, Optional, Tuple, Union @@ -48,7 +50,6 @@ ) from ...tf_utils import check_embeddings_within_bounds, shape_list, stable_softmax from ...utils import ( - MULTIPLE_CHOICE_DUMMY_INPUTS, add_code_sample_docstrings, add_start_docstrings, add_start_docstrings_to_model_forward, @@ -604,11 +605,11 @@ class PreTrainedModel @unpack_inputs def call( self, - input_ids: Optional[TFModelInputType] = None, - attention_mask: Optional[Union[np.ndarray, tf.Tensor]] = None, - token_type_ids: Optional[Union[np.ndarray, tf.Tensor]] = None, - head_mask: Optional[Union[np.ndarray, tf.Tensor]] = None, - inputs_embeds: Optional[Union[np.ndarray, tf.Tensor]] = None, + input_ids: TFModelInputType | None = None, + attention_mask: np.ndarray | tf.Tensor | None = None, + token_type_ids: np.ndarray | tf.Tensor | None = None, + head_mask: np.ndarray | tf.Tensor | None = None, + inputs_embeds: np.ndarray | tf.Tensor | None = None, output_attentions: Optional[bool] = None, output_hidden_states: Optional[bool] = None, return_dict: Optional[bool] = None, @@ -809,11 +810,11 @@ def __init__(self, config: RoFormerConfig, *inputs, **kwargs): ) def call( self, - input_ids: Optional[TFModelInputType] = None, - attention_mask: Optional[Union[np.ndarray, tf.Tensor]] = None, - token_type_ids: Optional[Union[np.ndarray, tf.Tensor]] = None, - head_mask: Optional[Union[np.ndarray, tf.Tensor]] = None, - inputs_embeds: Optional[Union[np.ndarray, tf.Tensor]] = None, + input_ids: TFModelInputType | None = None, + attention_mask: np.ndarray | tf.Tensor | None = None, + token_type_ids: np.ndarray | tf.Tensor | None = None, + head_mask: np.ndarray | tf.Tensor | None = None, + inputs_embeds: np.ndarray | tf.Tensor | None = None, output_attentions: Optional[bool] = None, output_hidden_states: Optional[bool] = None, return_dict: Optional[bool] = None, @@ -833,12 +834,6 @@ def call( return outputs - def serving_output(self, output: TFBaseModelOutput) -> TFBaseModelOutput: - hs = tf.convert_to_tensor(output.hidden_states) if self.config.output_hidden_states else None - attns = tf.convert_to_tensor(output.attentions) if self.config.output_attentions else None - - return TFBaseModelOutput(last_hidden_state=output.last_hidden_state, hidden_states=hs, attentions=attns) - @add_start_docstrings("""RoFormer Model with a `language modeling` head on top.""", ROFORMER_START_DOCSTRING) class TFRoFormerForMaskedLM(TFRoFormerPreTrainedModel, TFMaskedLanguageModelingLoss): @@ -866,15 +861,15 @@ def get_lm_head(self) -> tf.keras.layers.Layer: ) def call( self, - input_ids: Optional[TFModelInputType] = None, - attention_mask: Optional[Union[np.ndarray, tf.Tensor]] = None, - token_type_ids: Optional[Union[np.ndarray, tf.Tensor]] = None, - head_mask: Optional[Union[np.ndarray, tf.Tensor]] = None, - inputs_embeds: Optional[Union[np.ndarray, tf.Tensor]] = None, + input_ids: TFModelInputType | None = None, + attention_mask: np.ndarray | tf.Tensor | None = None, + token_type_ids: np.ndarray | tf.Tensor | None = None, + head_mask: np.ndarray | tf.Tensor | None = None, + inputs_embeds: np.ndarray | tf.Tensor | None = None, output_attentions: Optional[bool] = None, output_hidden_states: Optional[bool] = None, return_dict: Optional[bool] = None, - labels: Optional[Union[np.ndarray, tf.Tensor]] = None, + labels: np.ndarray | tf.Tensor | None = None, training: Optional[bool] = False, ) -> Union[TFMaskedLMOutput, Tuple[tf.Tensor]]: r""" @@ -909,12 +904,6 @@ def call( attentions=outputs.attentions, ) - def serving_output(self, output: TFMaskedLMOutput) -> TFMaskedLMOutput: - hs = tf.convert_to_tensor(output.hidden_states) if self.config.output_hidden_states else None - attns = tf.convert_to_tensor(output.attentions) if self.config.output_attentions else None - - return TFMaskedLMOutput(logits=output.logits, hidden_states=hs, attentions=attns) - @add_start_docstrings( """RoFormer Model with a `language modeling` head on top for CLM fine-tuning.""", ROFORMER_START_DOCSTRING @@ -940,15 +929,15 @@ def get_lm_head(self) -> tf.keras.layers.Layer: ) def call( self, - input_ids: Optional[TFModelInputType] = None, - attention_mask: Optional[Union[np.ndarray, tf.Tensor]] = None, - token_type_ids: Optional[Union[np.ndarray, tf.Tensor]] = None, - head_mask: Optional[Union[np.ndarray, tf.Tensor]] = None, - inputs_embeds: Optional[Union[np.ndarray, tf.Tensor]] = None, + input_ids: TFModelInputType | None = None, + attention_mask: np.ndarray | tf.Tensor | None = None, + token_type_ids: np.ndarray | tf.Tensor | None = None, + head_mask: np.ndarray | tf.Tensor | None = None, + inputs_embeds: np.ndarray | tf.Tensor | None = None, output_attentions: Optional[bool] = None, output_hidden_states: Optional[bool] = None, return_dict: Optional[bool] = None, - labels: Optional[Union[np.ndarray, tf.Tensor]] = None, + labels: np.ndarray | tf.Tensor | None = None, training: Optional[bool] = False, ) -> Union[TFCausalLMOutput, Tuple[tf.Tensor]]: r""" @@ -988,12 +977,6 @@ def call( attentions=outputs.attentions, ) - def serving_output(self, output: TFCausalLMOutput) -> TFCausalLMOutput: - hs = tf.convert_to_tensor(output.hidden_states) if self.config.output_hidden_states else None - attns = tf.convert_to_tensor(output.attentions) if self.config.output_attentions else None - - return TFCausalLMOutput(logits=output.logits, hidden_states=hs, attentions=attns) - class TFRoFormerClassificationHead(tf.keras.layers.Layer): """Head for sentence-level classification tasks.""" @@ -1049,15 +1032,15 @@ def __init__(self, config: RoFormerConfig, *inputs, **kwargs): ) def call( self, - input_ids: Optional[TFModelInputType] = None, - attention_mask: Optional[Union[np.ndarray, tf.Tensor]] = None, - token_type_ids: Optional[Union[np.ndarray, tf.Tensor]] = None, - head_mask: Optional[Union[np.ndarray, tf.Tensor]] = None, - inputs_embeds: Optional[Union[np.ndarray, tf.Tensor]] = None, + input_ids: TFModelInputType | None = None, + attention_mask: np.ndarray | tf.Tensor | None = None, + token_type_ids: np.ndarray | tf.Tensor | None = None, + head_mask: np.ndarray | tf.Tensor | None = None, + inputs_embeds: np.ndarray | tf.Tensor | None = None, output_attentions: Optional[bool] = None, output_hidden_states: Optional[bool] = None, return_dict: Optional[bool] = None, - labels: Optional[Union[np.ndarray, tf.Tensor]] = None, + labels: np.ndarray | tf.Tensor | None = None, training: Optional[bool] = False, ) -> Union[TFSequenceClassifierOutput, Tuple[tf.Tensor]]: r""" @@ -1092,12 +1075,6 @@ def call( attentions=outputs.attentions, ) - def serving_output(self, output: TFSequenceClassifierOutput) -> TFSequenceClassifierOutput: - hs = tf.convert_to_tensor(output.hidden_states) if self.config.output_hidden_states else None - attns = tf.convert_to_tensor(output.attentions) if self.config.output_attentions else None - - return TFSequenceClassifierOutput(logits=output.logits, hidden_states=hs, attentions=attns) - @add_start_docstrings( """ @@ -1116,17 +1093,6 @@ def __init__(self, config: RoFormerConfig, *inputs, **kwargs): units=1, kernel_initializer=get_initializer(config.initializer_range), name="classifier" ) - @property - def dummy_inputs(self) -> Dict[str, tf.Tensor]: - """ - Dummy inputs to build the network. - - - Returns: - tf.Tensor with dummy inputs - """ - return {"input_ids": tf.constant(MULTIPLE_CHOICE_DUMMY_INPUTS, dtype=tf.int32)} - @unpack_inputs @add_start_docstrings_to_model_forward( ROFORMER_INPUTS_DOCSTRING.format("batch_size, num_choices, sequence_length") @@ -1138,15 +1104,15 @@ def dummy_inputs(self) -> Dict[str, tf.Tensor]: ) def call( self, - input_ids: Optional[TFModelInputType] = None, - attention_mask: Optional[Union[np.ndarray, tf.Tensor]] = None, - token_type_ids: Optional[Union[np.ndarray, tf.Tensor]] = None, - head_mask: Optional[Union[np.ndarray, tf.Tensor]] = None, - inputs_embeds: Optional[Union[np.ndarray, tf.Tensor]] = None, + input_ids: TFModelInputType | None = None, + attention_mask: np.ndarray | tf.Tensor | None = None, + token_type_ids: np.ndarray | tf.Tensor | None = None, + head_mask: np.ndarray | tf.Tensor | None = None, + inputs_embeds: np.ndarray | tf.Tensor | None = None, output_attentions: Optional[bool] = None, output_hidden_states: Optional[bool] = None, return_dict: Optional[bool] = None, - labels: Optional[Union[np.ndarray, tf.Tensor]] = None, + labels: np.ndarray | tf.Tensor | None = None, training: Optional[bool] = False, ) -> Union[TFMultipleChoiceModelOutput, Tuple[tf.Tensor]]: r""" @@ -1201,26 +1167,6 @@ def call( attentions=outputs.attentions, ) - @tf.function( - input_signature=[ - { - "input_ids": tf.TensorSpec((None, None, None), tf.int32, name="input_ids"), - "attention_mask": tf.TensorSpec((None, None, None), tf.int32, name="attention_mask"), - "token_type_ids": tf.TensorSpec((None, None, None), tf.int32, name="token_type_ids"), - } - ] - ) - def serving(self, inputs: Dict[str, tf.Tensor]) -> TFMultipleChoiceModelOutput: - output = self.call(input_ids=inputs) - - return self.serving_output(output) - - def serving_output(self, output: TFMultipleChoiceModelOutput) -> TFMultipleChoiceModelOutput: - hs = tf.convert_to_tensor(output.hidden_states) if self.config.output_hidden_states else None - attns = tf.convert_to_tensor(output.attentions) if self.config.output_attentions else None - - return TFMultipleChoiceModelOutput(logits=output.logits, hidden_states=hs, attentions=attns) - @add_start_docstrings( """ @@ -1250,15 +1196,15 @@ def __init__(self, config: RoFormerConfig, *inputs, **kwargs): ) def call( self, - input_ids: Optional[TFModelInputType] = None, - attention_mask: Optional[Union[np.ndarray, tf.Tensor]] = None, - token_type_ids: Optional[Union[np.ndarray, tf.Tensor]] = None, - head_mask: Optional[Union[np.ndarray, tf.Tensor]] = None, - inputs_embeds: Optional[Union[np.ndarray, tf.Tensor]] = None, + input_ids: TFModelInputType | None = None, + attention_mask: np.ndarray | tf.Tensor | None = None, + token_type_ids: np.ndarray | tf.Tensor | None = None, + head_mask: np.ndarray | tf.Tensor | None = None, + inputs_embeds: np.ndarray | tf.Tensor | None = None, output_attentions: Optional[bool] = None, output_hidden_states: Optional[bool] = None, return_dict: Optional[bool] = None, - labels: Optional[Union[np.ndarray, tf.Tensor]] = None, + labels: np.ndarray | tf.Tensor | None = None, training: Optional[bool] = False, ) -> Union[TFTokenClassifierOutput, Tuple[tf.Tensor]]: r""" @@ -1292,12 +1238,6 @@ def call( attentions=outputs.attentions, ) - def serving_output(self, output: TFTokenClassifierOutput) -> TFTokenClassifierOutput: - hs = tf.convert_to_tensor(output.hidden_states) if self.config.output_hidden_states else None - attns = tf.convert_to_tensor(output.attentions) if self.config.output_attentions else None - - return TFTokenClassifierOutput(logits=output.logits, hidden_states=hs, attentions=attns) - @add_start_docstrings( """ @@ -1326,16 +1266,16 @@ def __init__(self, config: RoFormerConfig, *inputs, **kwargs): ) def call( self, - input_ids: Optional[TFModelInputType] = None, - attention_mask: Optional[Union[np.ndarray, tf.Tensor]] = None, - token_type_ids: Optional[Union[np.ndarray, tf.Tensor]] = None, - head_mask: Optional[Union[np.ndarray, tf.Tensor]] = None, - inputs_embeds: Optional[Union[np.ndarray, tf.Tensor]] = None, + input_ids: TFModelInputType | None = None, + attention_mask: np.ndarray | tf.Tensor | None = None, + token_type_ids: np.ndarray | tf.Tensor | None = None, + head_mask: np.ndarray | tf.Tensor | None = None, + inputs_embeds: np.ndarray | tf.Tensor | None = None, output_attentions: Optional[bool] = None, output_hidden_states: Optional[bool] = None, return_dict: Optional[bool] = None, - start_positions: Optional[Union[np.ndarray, tf.Tensor]] = None, - end_positions: Optional[Union[np.ndarray, tf.Tensor]] = None, + start_positions: np.ndarray | tf.Tensor | None = None, + end_positions: np.ndarray | tf.Tensor | None = None, training: Optional[bool] = False, ) -> Union[TFQuestionAnsweringModelOutput, Tuple[tf.Tensor]]: r""" @@ -1381,11 +1321,3 @@ def call( hidden_states=outputs.hidden_states, attentions=outputs.attentions, ) - - def serving_output(self, output: TFQuestionAnsweringModelOutput) -> TFQuestionAnsweringModelOutput: - hs = tf.convert_to_tensor(output.hidden_states) if self.config.output_hidden_states else None - attns = tf.convert_to_tensor(output.attentions) if self.config.output_attentions else None - - return TFQuestionAnsweringModelOutput( - start_logits=output.start_logits, end_logits=output.end_logits, hidden_states=hs, attentions=attns - ) diff --git a/src/transformers/models/rwkv/modeling_rwkv.py b/src/transformers/models/rwkv/modeling_rwkv.py index 7a78ec082f45ae..cd577e9c7431af 100644 --- a/src/transformers/models/rwkv/modeling_rwkv.py +++ b/src/transformers/models/rwkv/modeling_rwkv.py @@ -159,7 +159,7 @@ def forward(ctx, time_decay, time_first, key, value, state=None, return_state=Fa @staticmethod # g stands for grad - def backward(ctx, g_output): + def backward(ctx, g_output, g_state=None): input_dtype = ctx.input_dtype time_decay, time_first, key, value, output = ctx.saved_tensors @@ -188,17 +188,14 @@ def backward(ctx, g_output): g_key, g_value, ) - g_time_decay = torch.sum(g_time_decay, dim=0) - g_time_first = torch.sum(g_time_first, dim=0) return ( - None, - None, - None, g_time_decay.to(input_dtype), g_time_first.to(input_dtype), g_key.to(input_dtype), g_value.to(input_dtype), + None, + None, ) diff --git a/src/transformers/models/sam/image_processing_sam.py b/src/transformers/models/sam/image_processing_sam.py index 64f3bae22218f5..821b43624d0769 100644 --- a/src/transformers/models/sam/image_processing_sam.py +++ b/src/transformers/models/sam/image_processing_sam.py @@ -934,7 +934,7 @@ def _generate_crop_boxes( cropped_images, point_grid_per_crop = _generate_crop_images( crop_boxes, image, points_grid, layer_idxs, target_size, original_size ) - + crop_boxes = np.array(crop_boxes) crop_boxes = crop_boxes.astype(np.float32) points_per_crop = np.array([point_grid_per_crop]) points_per_crop = np.transpose(points_per_crop, axes=(0, 2, 1, 3)) diff --git a/src/transformers/models/sam/modeling_sam.py b/src/transformers/models/sam/modeling_sam.py index 7df46117509700..29111c14436216 100644 --- a/src/transformers/models/sam/modeling_sam.py +++ b/src/transformers/models/sam/modeling_sam.py @@ -224,7 +224,7 @@ def _recombine_heads(self, hidden_states: Tensor, point_batch_size: int) -> Tens hidden_states = hidden_states.transpose(1, 2) return hidden_states.reshape(batch // point_batch_size, point_batch_size, n_tokens, n_heads * c_per_head) - def forward(self, query: Tensor, key: Tensor, value: Tensor) -> Tensor: + def forward(self, query: Tensor, key: Tensor, value: Tensor, attention_similarity: Tensor = None) -> Tensor: # Input projections query = self.q_proj(query) key = self.k_proj(key) @@ -242,6 +242,10 @@ def forward(self, query: Tensor, key: Tensor, value: Tensor) -> Tensor: attn = attn / math.sqrt(c_per_head) attn = torch.softmax(attn, dim=-1) + if attention_similarity is not None: + attn = attn + attention_similarity + attn = torch.softmax(attn, dim=-1) + # Get output out = attn @ value out = self._recombine_heads(out, point_batch_size) @@ -290,6 +294,7 @@ def forward( keys: Tensor, query_point_embedding: Tensor, key_point_embedding: Tensor, + attention_similarity: Tensor, output_attentions: bool = False, ): # Self attention block @@ -305,7 +310,9 @@ def forward( query = queries + query_point_embedding key = keys + key_point_embedding - attn_out = self.cross_attn_token_to_image(query=query, key=key, value=keys) + attn_out = self.cross_attn_token_to_image( + query=query, key=key, value=keys, attention_similarity=attention_similarity + ) queries = queries + attn_out queries = self.layer_norm2(queries) @@ -353,6 +360,8 @@ def forward( point_embeddings: Tensor, image_embeddings: Tensor, image_positional_embeddings: Tensor, + attention_similarity: Tensor, + target_embedding=None, output_attentions: Optional[bool] = None, output_hidden_states: Optional[bool] = None, return_dict: Optional[bool] = None, @@ -377,11 +386,15 @@ def forward( # Apply transformer blocks and final layernorm for layer in self.layers: + if target_embedding is not None: + queries += target_embedding + queries, keys, attention_outputs = layer( queries=queries, keys=keys, query_point_embedding=point_embeddings, key_point_embedding=image_positional_embeddings, + attention_similarity=attention_similarity, output_attentions=output_attentions, ) @@ -460,6 +473,8 @@ def forward( dense_prompt_embeddings: torch.Tensor, multimask_output: bool, output_attentions: Optional[bool] = None, + attention_similarity: torch.Tensor = None, + target_embedding: torch.Tensor = None, ) -> Tuple[torch.Tensor, torch.Tensor]: """ Predict masks given image and prompt embeddings. @@ -500,6 +515,8 @@ def forward( point_embeddings=point_embeddings, image_embeddings=image_embeddings, image_positional_embeddings=image_positional_embeddings, + attention_similarity=attention_similarity, + target_embedding=target_embedding, output_attentions=output_attentions, ) iou_token_out = point_embedding[:, :, 0, :] @@ -576,8 +593,12 @@ def __init__(self, config: SamPromptEncoderConfig): self.conv1 = nn.Conv2d(1, self.mask_input_channels, kernel_size=2, stride=2) self.conv2 = nn.Conv2d(self.mask_input_channels, config.mask_input_channels, kernel_size=2, stride=2) self.conv3 = nn.Conv2d(config.mask_input_channels, config.hidden_size, kernel_size=1) - self.layer_norm1 = SamLayerNorm(self.mask_input_channels, config.layer_norm_eps) - self.layer_norm2 = SamLayerNorm(self.mask_input_channels * 4, config.layer_norm_eps) + self.layer_norm1 = SamLayerNorm( + self.mask_input_channels, eps=config.layer_norm_eps, data_format="channels_first" + ) + self.layer_norm2 = SamLayerNorm( + self.mask_input_channels * 4, eps=config.layer_norm_eps, data_format="channels_first" + ) def forward(self, masks): hidden_states = self.conv1(masks) @@ -1146,6 +1167,12 @@ def _init_weights(self, module): In the original implementation and paper, the model always outputs 3 masks per image (or per point / per bounding box if relevant). However, it is possible to just output a single mask, that corresponds to the "best" mask, by specifying `multimask_output=False`. + attention_similarity (`torch.FloatTensor`, *optional*): + Attention similarity tensor, to be provided to the mask decoder for target-guided attention in case the + model is used for personalization as introduced in [PerSAM](https://arxiv.org/abs/2305.03048). + target_embedding (`torch.FloatTensor`, *optional*): + Embedding of the target concept, to be provided to the mask decoder for target-semantic prompting in case + the model is used for personalization as introduced in [PerSAM](https://arxiv.org/abs/2305.03048). output_attentions (`bool`, *optional*): Whether or not to return the attentions tensors of all attention layers. See `attentions` under returned tensors for more detail. @@ -1265,6 +1292,8 @@ def forward( input_masks: Optional[torch.LongTensor] = None, image_embeddings: Optional[torch.FloatTensor] = None, multimask_output: bool = True, + attention_similarity: Optional[torch.FloatTensor] = None, + target_embedding: Optional[torch.FloatTensor] = None, output_attentions: Optional[bool] = None, output_hidden_states: Optional[bool] = None, return_dict=None, @@ -1374,6 +1403,8 @@ def forward( sparse_prompt_embeddings=sparse_embeddings, dense_prompt_embeddings=dense_embeddings, multimask_output=multimask_output, + attention_similarity=attention_similarity, + target_embedding=target_embedding, output_attentions=output_attentions, ) diff --git a/src/transformers/models/sam/modeling_tf_sam.py b/src/transformers/models/sam/modeling_tf_sam.py index ddd8e526a79a02..46710b3298474a 100644 --- a/src/transformers/models/sam/modeling_tf_sam.py +++ b/src/transformers/models/sam/modeling_tf_sam.py @@ -17,6 +17,9 @@ discrepancy, the original file should be regarded as the 'reference' version. """ + +from __future__ import annotations + import collections from dataclasses import dataclass from typing import Dict, List, Optional, Tuple, Union @@ -26,7 +29,7 @@ from ...activations_tf import ACT2FN from ...modeling_tf_outputs import TFBaseModelOutput -from ...modeling_tf_utils import TFPreTrainedModel, shape_list, unpack_inputs +from ...modeling_tf_utils import TFModelInputType, TFPreTrainedModel, shape_list, unpack_inputs from ...tf_utils import flatten, functional_layernorm from ...utils import ModelOutput, add_start_docstrings, add_start_docstrings_to_model_forward, logging from .configuration_sam import SamConfig, SamMaskDecoderConfig, SamPromptEncoderConfig, SamVisionConfig @@ -69,10 +72,10 @@ class TFSamVisionEncoderOutput(ModelOutput): heads. """ - image_embeds: Optional[tf.Tensor] = None + image_embeds: tf.Tensor | None = None last_hidden_state: tf.Tensor = None - hidden_states: Optional[Tuple[tf.Tensor]] = None - attentions: Optional[Tuple[tf.Tensor]] = None + hidden_states: Tuple[tf.Tensor] | None = None + attentions: Tuple[tf.Tensor] | None = None @dataclass @@ -106,9 +109,9 @@ class TFSamImageSegmentationOutput(ModelOutput): iou_scores: tf.Tensor = None pred_masks: tf.Tensor = None - vision_hidden_states: Optional[Tuple[tf.Tensor]] = None - vision_attentions: Optional[Tuple[tf.Tensor]] = None - mask_decoder_attentions: Optional[Tuple[tf.Tensor]] = None + vision_hidden_states: Tuple[tf.Tensor] | None = None + vision_attentions: Tuple[tf.Tensor] | None = None + mask_decoder_attentions: Tuple[tf.Tensor] | None = None class TFSamPatchEmbeddings(tf.keras.layers.Layer): @@ -223,7 +226,8 @@ def _recombine_heads(self, hidden_states: tf.Tensor, point_batch_size: int) -> t batch, n_heads, n_tokens, c_per_head = shape_list(hidden_states) hidden_states = tf.transpose(hidden_states, perm=[0, 2, 1, 3]) return tf.reshape( - hidden_states, (batch // max(1, point_batch_size), point_batch_size, n_tokens, n_heads * c_per_head) + hidden_states, + (batch // tf.reduce_max([1, point_batch_size]), point_batch_size, n_tokens, n_heads * c_per_head), ) def call(self, query: tf.Tensor, key: tf.Tensor, value: tf.Tensor) -> tf.Tensor: @@ -506,7 +510,7 @@ def call( # Matt: The original Torch code checked that the sum of sparse_prompt_embeddings equalled 0. However, this only # happens when the sparse prompt embeddings are an empty tensor with shape[1] == 0. I replaced # it with an explicit shape check to avoid data-dependent control flow which breaks XLA. - if sparse_prompt_embeddings.shape[1] != 0: + if shape_list(sparse_prompt_embeddings)[1] != 0: tokens = tf.concat((output_tokens, sparse_prompt_embeddings), axis=2) else: tokens = output_tokens @@ -692,8 +696,8 @@ def _embed_points(self, points: tf.Tensor, labels: tf.Tensor, pad: bool) -> tf.T """Embeds point prompts.""" points = points + 0.5 # Shift to center of pixel if pad: - target_point_shape = (points.shape[0], points.shape[1], 1, points.shape[-1]) - target_labels_shape = (points.shape[0], points.shape[1], 1) + target_point_shape = (shape_list(points)[0], shape_list(points)[1], 1, shape_list(points)[-1]) + target_labels_shape = (shape_list(points)[0], shape_list(points)[1], 1) padding_point = tf.zeros(target_point_shape, dtype=points.dtype) padding_label = -tf.ones(target_labels_shape, dtype=labels.dtype) points = tf.concat([points, padding_point], axis=2) @@ -719,12 +723,12 @@ def _embed_points(self, points: tf.Tensor, labels: tf.Tensor, pad: bool) -> tf.T def _embed_boxes(self, boxes: tf.Tensor) -> tf.Tensor: """Embeds box prompts.""" boxes = boxes + 0.5 # Shift to center of pixel - batch_size, nb_boxes = boxes.shape[:2] + batch_size, nb_boxes = shape_list(boxes)[:2] coords = tf.reshape(boxes, (batch_size, nb_boxes, 2, 2)) input_shape = (self.input_image_size, self.input_image_size) corner_embedding = self.shared_embedding(coords, input_shape) corner_embedding += tf.where( - tf.range(corner_embedding.shape[2])[None, None, :, None] == 0, + tf.range(shape_list(corner_embedding)[2])[None, None, :, None] == 0, self.point_embed[2][0], self.point_embed[3][0], ) @@ -734,9 +738,9 @@ def call( self, batch_size: Optional[int], input_points: Optional[Tuple[tf.Tensor, tf.Tensor]], - input_labels: Optional[tf.Tensor], - input_boxes: Optional[tf.Tensor], - input_masks: Optional[tf.Tensor], + input_labels: tf.Tensor | None, + input_boxes: tf.Tensor | None, + input_masks: tf.Tensor | None, ) -> Tuple[tf.Tensor, tf.Tensor]: """ Embeds different types of prompts, returning both sparse and dense embeddings. @@ -751,7 +755,7 @@ def call( """ sparse_embeddings = None if input_points is not None: - batch_size, point_batch_size = input_points.shape[:2] + batch_size, point_batch_size = shape_list(input_points)[:2] if input_labels is None: raise ValueError("If points are provided, labels must also be provided.") point_embeddings = self._embed_points(input_points, input_labels, pad=(input_boxes is None)) @@ -760,7 +764,7 @@ def call( ) sparse_embeddings = tf.concat([sparse_embeddings, point_embeddings], axis=2) if input_boxes is not None: - batch_size = input_boxes.shape[0] + batch_size = shape_list(input_boxes)[0] box_embeddings = self._embed_boxes(input_boxes) if sparse_embeddings is None: sparse_embeddings = box_embeddings @@ -1084,7 +1088,7 @@ def get_input_embeddings(self): def call( self, - pixel_values: Optional[tf.Tensor] = None, + pixel_values: tf.Tensor | None = None, output_attentions: Optional[bool] = None, output_hidden_states: Optional[bool] = None, return_dict: Optional[bool] = None, @@ -1144,15 +1148,11 @@ class TFSamPreTrainedModel(TFPreTrainedModel): @property def dummy_inputs(self) -> Dict[str, tf.Tensor]: - """ - Dummy inputs to build the network. - - Returns: - `Dict[str, tf.Tensor]`: The dummy inputs. - """ + # We override the default dummy inputs here because SAM has some really explosive memory usage in the + # attention layers, so we want to pass the smallest possible batches VISION_DUMMY_INPUTS = tf.random.uniform( shape=( - 3, + 1, self.config.vision_config.num_channels, self.config.vision_config.image_size, self.config.vision_config.image_size, @@ -1161,25 +1161,6 @@ def dummy_inputs(self) -> Dict[str, tf.Tensor]: ) return {"pixel_values": tf.constant(VISION_DUMMY_INPUTS)} - @tf.function( - input_signature=[ - { - "pixel_values": tf.TensorSpec((None, None, None, None), tf.float32, name="pixel_values"), - } - ] - ) - def serving(self, inputs): - """ - Method used for serving the model. - - Args: - inputs (`Dict[str, tf.Tensor]`): - The input of the saved model as a dictionary of tensors. - """ - output = self.call(inputs) - - return self.serving_output(output) - SAM_START_DOCSTRING = r""" This model inherits from [`TFPreTrainedModel`]. Check the superclass documentation for the generic methods the @@ -1326,10 +1307,10 @@ def get_image_embeddings( def get_prompt_embeddings( self, - input_points: Optional[tf.Tensor] = None, - input_labels: Optional[tf.Tensor] = None, - input_boxes: Optional[tf.Tensor] = None, - input_masks: Optional[tf.Tensor] = None, + input_points: tf.Tensor | None = None, + input_labels: tf.Tensor | None = None, + input_boxes: tf.Tensor | None = None, + input_masks: tf.Tensor | None = None, ): r""" Returns the prompt embeddings by passing the input points, labels, boxes and masks through the prompt encoder. @@ -1360,12 +1341,12 @@ def get_prompt_embeddings( @add_start_docstrings_to_model_forward(SAM_INPUTS_DOCSTRING) def call( self, - pixel_values: Optional[tf.Tensor] = None, - input_points: Optional[tf.Tensor] = None, - input_labels: Optional[tf.Tensor] = None, - input_boxes: Optional[tf.Tensor] = None, - input_masks: Optional[tf.Tensor] = None, - image_embeddings: Optional[tf.Tensor] = None, + pixel_values: TFModelInputType | None = None, + input_points: tf.Tensor | None = None, + input_labels: tf.Tensor | None = None, + input_boxes: tf.Tensor | None = None, + input_masks: tf.Tensor | None = None, + image_embeddings: tf.Tensor | None = None, multimask_output: bool = True, output_attentions: Optional[bool] = None, output_hidden_states: Optional[bool] = None, @@ -1396,8 +1377,8 @@ def call( " got {}.".format(input_boxes.shape), ) if input_points is not None and input_boxes is not None: - point_batch_size = input_points.shape[1] - box_batch_size = input_boxes.shape[1] + point_batch_size = shape_list(input_points)[1] + box_batch_size = shape_list(input_boxes)[1] if point_batch_size != box_batch_size: raise ValueError( "You should provide as many bounding boxes as input points per box. Got {} and {}.".format( diff --git a/src/transformers/models/segformer/modeling_tf_segformer.py b/src/transformers/models/segformer/modeling_tf_segformer.py index c877e86acfab4c..b3090135afc290 100644 --- a/src/transformers/models/segformer/modeling_tf_segformer.py +++ b/src/transformers/models/segformer/modeling_tf_segformer.py @@ -14,8 +14,11 @@ # limitations under the License. """ TensorFlow SegFormer model.""" + +from __future__ import annotations + import math -from typing import Dict, Optional, Tuple, Union +from typing import Optional, Tuple, Union import tensorflow as tf @@ -518,34 +521,8 @@ class TFSegformerPreTrainedModel(TFPreTrainedModel): main_input_name = "pixel_values" @property - def dummy_inputs(self) -> Dict[str, tf.Tensor]: - """ - Dummy inputs to build the network. - - Returns: - `Dict[str, tf.Tensor]`: The dummy inputs. - """ - VISION_DUMMY_INPUTS = tf.random.uniform(shape=(3, self.config.num_channels, 512, 512), dtype=tf.float32) - return {"pixel_values": tf.constant(VISION_DUMMY_INPUTS)} - - @tf.function( - input_signature=[ - { - "pixel_values": tf.TensorSpec((None, None, None, None), tf.float32, name="pixel_values"), - } - ] - ) - def serving(self, inputs): - """ - Method used for serving the model. - - Args: - inputs (`Dict[str, tf.Tensor]`): - The input of the saved model as a dictionary of tensors. - """ - output = self.call(inputs) - - return self.serving_output(output) + def input_signature(self): + return {"pixel_values": tf.TensorSpec(shape=(None, self.config.num_channels, 512, 512), dtype=tf.float32)} SEGFORMER_START_DOCSTRING = r""" @@ -628,14 +605,6 @@ def call( ) return outputs - def serving_output(self, output: TFBaseModelOutput) -> TFBaseModelOutput: - # hidden_states and attention not converted to Tensor with tf.convert_to_tensor as they are all of different dimensions - return TFBaseModelOutput( - last_hidden_state=output.last_hidden_state, - hidden_states=output.hidden_states, - attentions=output.attentions, - ) - @add_start_docstrings( """ @@ -664,8 +633,8 @@ def __init__(self, config: SegformerConfig, *inputs, **kwargs): ) def call( self, - pixel_values: Optional[tf.Tensor] = None, - labels: Optional[tf.Tensor] = None, + pixel_values: tf.Tensor | None = None, + labels: tf.Tensor | None = None, output_attentions: Optional[bool] = None, output_hidden_states: Optional[bool] = None, return_dict: Optional[bool] = None, @@ -699,12 +668,6 @@ def call( loss=loss, logits=logits, hidden_states=outputs.hidden_states, attentions=outputs.attentions ) - def serving_output(self, output: TFSequenceClassifierOutput) -> TFSequenceClassifierOutput: - # hidden_states and attention not converted to Tensor with tf.convert_to_tensor as they are all of different dimensions - return TFSequenceClassifierOutput( - logits=output.logits, hidden_states=output.hidden_states, attentions=output.attentions - ) - class TFSegformerMLP(tf.keras.layers.Layer): """ @@ -816,7 +779,7 @@ def masked_loss(real, pred): def call( self, pixel_values: tf.Tensor, - labels: Optional[tf.Tensor] = None, + labels: tf.Tensor | None = None, output_attentions: Optional[bool] = None, output_hidden_states: Optional[bool] = None, return_dict: Optional[bool] = None, @@ -889,9 +852,3 @@ def call( hidden_states=outputs.hidden_states if output_hidden_states else None, attentions=outputs.attentions, ) - - def serving_output(self, output: TFSemanticSegmenterOutput) -> TFSemanticSegmenterOutput: - # hidden_states and attention not converted to Tensor with tf.convert_to_tensor as they are all of different dimensions - return TFSemanticSegmenterOutput( - logits=output.logits, hidden_states=output.hidden_states, attentions=output.attentions - ) diff --git a/src/transformers/models/sew/configuration_sew.py b/src/transformers/models/sew/configuration_sew.py index af7041843de3a9..07e3a7df26d84c 100644 --- a/src/transformers/models/sew/configuration_sew.py +++ b/src/transformers/models/sew/configuration_sew.py @@ -63,6 +63,9 @@ class SEWConfig(PretrainedConfig): The dropout ratio for the attention probabilities. final_dropout (`float`, *optional*, defaults to 0.1): The dropout probability for the final projection layer of [`SEWForCTC`]. + layerdrop (`float`, *optional*, defaults to 0.1): + The LayerDrop probability. See the [LayerDrop paper](see https://arxiv.org/abs/1909.11556) for more + details. initializer_range (`float`, *optional*, defaults to 0.02): The standard deviation of the truncated_normal_initializer for initializing all weight matrices. layer_norm_eps (`float`, *optional*, defaults to 1e-12): diff --git a/src/transformers/models/sew_d/modeling_sew_d.py b/src/transformers/models/sew_d/modeling_sew_d.py index 7cdce062eee335..417fd81c6e9d42 100644 --- a/src/transformers/models/sew_d/modeling_sew_d.py +++ b/src/transformers/models/sew_d/modeling_sew_d.py @@ -559,7 +559,7 @@ def symbolic(g, self, mask, dim): r_mask = g.op( "Cast", g.op("Sub", g.op("Constant", value_t=torch.tensor(1, dtype=torch.int64)), mask_cast_value), - to_i=sym_help.cast_pytorch_to_onnx["Byte"], + to_i=sym_help.cast_pytorch_to_onnx["Bool"], ) output = masked_fill( g, self, r_mask, g.op("Constant", value_t=torch.tensor(torch.finfo(self.type().dtype()).min)) @@ -754,7 +754,7 @@ def forward( Input states to the module usually the output from previous layer, it will be the Q,K and V in *Attention(Q,K,V)* - attention_mask (`torch.ByteTensor`): + attention_mask (`torch.BoolTensor`): An attention mask matrix of shape [*B*, *N*, *N*] where *B* is the batch size, *N* is the maximum sequence length in which element [i,j] = *1* means the *i* th token in the input can attend to the *j* th token. @@ -1086,7 +1086,6 @@ def get_attention_mask(self, attention_mask): if attention_mask.dim() <= 2: extended_attention_mask = attention_mask.unsqueeze(1).unsqueeze(2) attention_mask = extended_attention_mask * extended_attention_mask.squeeze(-2).unsqueeze(-1) - attention_mask = attention_mask.byte() elif attention_mask.dim() == 3: attention_mask = attention_mask.unsqueeze(1) @@ -1117,7 +1116,7 @@ def forward( if attention_mask.dim() <= 2: input_mask = attention_mask else: - input_mask = (attention_mask.sum(-2) > 0).byte() + input_mask = attention_mask.sum(-2) > 0 attention_mask = self.get_attention_mask(attention_mask) relative_pos = self.get_rel_pos(hidden_states, query_states, relative_pos) diff --git a/src/transformers/models/speech_to_text/feature_extraction_speech_to_text.py b/src/transformers/models/speech_to_text/feature_extraction_speech_to_text.py index a5e6b0d4004264..81f2ea4e99be22 100644 --- a/src/transformers/models/speech_to_text/feature_extraction_speech_to_text.py +++ b/src/transformers/models/speech_to_text/feature_extraction_speech_to_text.py @@ -141,7 +141,8 @@ def __call__( Args: raw_speech (`np.ndarray`, `List[float]`, `List[np.ndarray]`, `List[List[float]]`): The sequence or batch of sequences to be padded. Each sequence can be a numpy array, a list of float - values, a list of numpy arrays or a list of list of float values. + values, a list of numpy arrays or a list of list of float values. Must be mono channel audio, not + stereo, i.e. single float per timestep. padding (`bool`, `str` or [`~utils.PaddingStrategy`], *optional*, defaults to `True`): Select a strategy to pad the returned sequences (according to the model's padding side and padding index) among: @@ -200,9 +201,11 @@ def __call__( "Failing to do so can result in silent errors that might be hard to debug." ) - is_batched = bool( - isinstance(raw_speech, (list, tuple)) - and (isinstance(raw_speech[0], np.ndarray) or isinstance(raw_speech[0], (tuple, list))) + is_batched_numpy = isinstance(raw_speech, np.ndarray) and len(raw_speech.shape) > 1 + if is_batched_numpy and len(raw_speech.shape) > 2: + raise ValueError(f"Only mono-channel audio is supported for input to {self}") + is_batched = is_batched_numpy or ( + isinstance(raw_speech, (list, tuple)) and (isinstance(raw_speech[0], (np.ndarray, tuple, list))) ) if is_batched: diff --git a/src/transformers/models/speech_to_text/modeling_tf_speech_to_text.py b/src/transformers/models/speech_to_text/modeling_tf_speech_to_text.py index e5c38afa83cbab..59caabffab9c16 100755 --- a/src/transformers/models/speech_to_text/modeling_tf_speech_to_text.py +++ b/src/transformers/models/speech_to_text/modeling_tf_speech_to_text.py @@ -15,8 +15,10 @@ """ TensorFlow Speech2Text model.""" +from __future__ import annotations + import random -from typing import Dict, Optional, Tuple, Union +from typing import Optional, Tuple, Union import numpy as np import tensorflow as tf @@ -273,12 +275,12 @@ def _shape(self, tensor: tf.Tensor, seq_len: int, bsz: int): def call( self, hidden_states: tf.Tensor, - key_value_states: Optional[tf.Tensor] = None, - past_key_value: Optional[Tuple[Tuple[tf.Tensor]]] = None, - attention_mask: Optional[tf.Tensor] = None, - layer_head_mask: Optional[tf.Tensor] = None, + key_value_states: tf.Tensor | None = None, + past_key_value: Tuple[Tuple[tf.Tensor]] | None = None, + attention_mask: tf.Tensor | None = None, + layer_head_mask: tf.Tensor | None = None, training: Optional[bool] = False, - ) -> Tuple[tf.Tensor, Optional[tf.Tensor]]: + ) -> Tuple[tf.Tensor, tf.Tensor | None]: """Input shape: Batch x Time x Channel""" # if key_value_states are provided this layer is used as a cross-attention layer @@ -476,12 +478,12 @@ def __init__(self, config: Speech2TextConfig, **kwargs): def call( self, hidden_states, - attention_mask: Optional[tf.Tensor] = None, - encoder_hidden_states: Optional[tf.Tensor] = None, - encoder_attention_mask: Optional[tf.Tensor] = None, - layer_head_mask: Optional[tf.Tensor] = None, - cross_attn_layer_head_mask: Optional[tf.Tensor] = None, - past_key_value: Optional[Tuple[tf.Tensor]] = None, + attention_mask: tf.Tensor | None = None, + encoder_hidden_states: tf.Tensor | None = None, + encoder_attention_mask: tf.Tensor | None = None, + layer_head_mask: tf.Tensor | None = None, + cross_attn_layer_head_mask: tf.Tensor | None = None, + past_key_value: Tuple[tf.Tensor] | None = None, training=False, ) -> Tuple[tf.Tensor, tf.Tensor, Tuple[Tuple[tf.Tensor]]]: """ @@ -561,26 +563,6 @@ class TFSpeech2TextPreTrainedModel(TFPreTrainedModel): base_model_prefix = "model" main_input_name = "input_features" - # Overwritten property due to different expected input shape and type - @property - def dummy_inputs(self) -> Dict[str, tf.Tensor]: - """ - Dummy inputs to build the network. - - Returns: - `Dict[str, tf.Tensor]`: The dummy inputs. - """ - return { - self.main_input_name: tf.random.uniform( - [ - 1, - random.randint(1, self.config.max_source_positions), # time - self.config.input_feat_per_channel * self.config.input_channels, # input channels - ] - ), - "decoder_input_ids": tf.constant([[2, 3]], dtype=tf.int32), - } - def _get_feat_extract_output_lengths(self, input_lengths: tf.Tensor): """ Computes the output length of the convolutional layers @@ -590,20 +572,18 @@ def _get_feat_extract_output_lengths(self, input_lengths: tf.Tensor): return input_lengths - @tf.function( - input_signature=[ - { - "input_features": tf.TensorSpec((None, None, None), tf.float32, name="input_features"), - "attention_mask": tf.TensorSpec((None, None), tf.int32, name="attention_mask"), - "decoder_input_ids": tf.TensorSpec((None, None), tf.int32, name="decoder_input_ids"), - "decoder_attention_mask": tf.TensorSpec((None, None), tf.int32, name="decoder_attention_mask"), - } - ] - ) - def serving(self, inputs): - output = self.call(inputs) - - return self.serving_output(output) + @property + def input_signature(self): + return { + "input_features": tf.TensorSpec( + (None, None, self.config.input_feat_per_channel * self.config.input_channels), + tf.float32, + name="input_features", + ), + "attention_mask": tf.TensorSpec((None, None), tf.int32, name="attention_mask"), + "decoder_input_ids": tf.TensorSpec((None, None), tf.int32, name="decoder_input_ids"), + "decoder_attention_mask": tf.TensorSpec((None, None), tf.int32, name="decoder_attention_mask"), + } SPEECH_TO_TEXT_START_DOCSTRING = r""" @@ -1253,16 +1233,16 @@ def get_decoder(self): ) def call( self, - input_features: Optional[TFModelInputType] = None, - attention_mask: Optional[Union[np.ndarray, tf.Tensor]] = None, - decoder_input_ids: Optional[Union[np.ndarray, tf.Tensor]] = None, - decoder_attention_mask: Optional[Union[np.ndarray, tf.Tensor]] = None, - head_mask: Optional[Union[np.ndarray, tf.Tensor]] = None, - decoder_head_mask: Optional[Union[np.ndarray, tf.Tensor]] = None, - cross_attn_head_mask: Optional[Union[np.ndarray, tf.Tensor]] = None, - encoder_outputs: Optional[Union[np.ndarray, tf.Tensor]] = None, + input_features: TFModelInputType | None = None, + attention_mask: np.ndarray | tf.Tensor | None = None, + decoder_input_ids: np.ndarray | tf.Tensor | None = None, + decoder_attention_mask: np.ndarray | tf.Tensor | None = None, + head_mask: np.ndarray | tf.Tensor | None = None, + decoder_head_mask: np.ndarray | tf.Tensor | None = None, + cross_attn_head_mask: np.ndarray | tf.Tensor | None = None, + encoder_outputs: np.ndarray | tf.Tensor | None = None, past_key_values: Optional[Tuple[Tuple[Union[np.ndarray, tf.Tensor]]]] = None, - decoder_inputs_embeds: Optional[Union[np.ndarray, tf.Tensor]] = None, + decoder_inputs_embeds: np.ndarray | tf.Tensor | None = None, use_cache: Optional[bool] = None, output_attentions: Optional[bool] = None, output_hidden_states: Optional[bool] = None, @@ -1343,17 +1323,17 @@ def set_output_embeddings(self, new_embeddings): @replace_return_docstrings(output_type=TFSeq2SeqLMOutput, config_class=_CONFIG_FOR_DOC) def call( self, - input_features: Optional[TFModelInputType] = None, - attention_mask: Optional[Union[np.ndarray, tf.Tensor]] = None, - decoder_input_ids: Optional[Union[np.ndarray, tf.Tensor]] = None, - decoder_attention_mask: Optional[Union[np.ndarray, tf.Tensor]] = None, - head_mask: Optional[Union[np.ndarray, tf.Tensor]] = None, - decoder_head_mask: Optional[Union[np.ndarray, tf.Tensor]] = None, - cross_attn_head_mask: Optional[Union[np.ndarray, tf.Tensor]] = None, - encoder_outputs: Optional[Union[np.ndarray, tf.Tensor]] = None, + input_features: TFModelInputType | None = None, + attention_mask: np.ndarray | tf.Tensor | None = None, + decoder_input_ids: np.ndarray | tf.Tensor | None = None, + decoder_attention_mask: np.ndarray | tf.Tensor | None = None, + head_mask: np.ndarray | tf.Tensor | None = None, + decoder_head_mask: np.ndarray | tf.Tensor | None = None, + cross_attn_head_mask: np.ndarray | tf.Tensor | None = None, + encoder_outputs: np.ndarray | tf.Tensor | None = None, past_key_values: Optional[Tuple[Tuple[Union[np.ndarray, tf.Tensor]]]] = None, - decoder_inputs_embeds: Optional[Union[np.ndarray, tf.Tensor]] = None, - labels: Optional[Union[np.ndarray, tf.Tensor]] = None, + decoder_inputs_embeds: np.ndarray | tf.Tensor | None = None, + labels: np.ndarray | tf.Tensor | None = None, use_cache: Optional[bool] = None, output_attentions: Optional[bool] = None, output_hidden_states: Optional[bool] = None, diff --git a/src/transformers/models/speecht5/feature_extraction_speecht5.py b/src/transformers/models/speecht5/feature_extraction_speecht5.py index 5fe6ca39765c1f..dd5ff4c8a1afae 100644 --- a/src/transformers/models/speecht5/feature_extraction_speecht5.py +++ b/src/transformers/models/speecht5/feature_extraction_speecht5.py @@ -201,7 +201,8 @@ def __call__( Args: audio (`np.ndarray`, `List[float]`, `List[np.ndarray]`, `List[List[float]]`, *optional*): The sequence or batch of sequences to be processed. Each sequence can be a numpy array, a list of float - values, a list of numpy arrays or a list of list of float values. This outputs waveform features. + values, a list of numpy arrays or a list of list of float values. This outputs waveform features. Must + be mono channel audio, not stereo, i.e. single float per timestep. audio_target (`np.ndarray`, `List[float]`, `List[np.ndarray]`, `List[List[float]]`, *optional*): The sequence or batch of sequences to be processed as targets. Each sequence can be a numpy array, a list of float values, a list of numpy arrays or a list of list of float values. This outputs log-mel @@ -307,9 +308,11 @@ def _process_audio( return_tensors: Optional[Union[str, TensorType]] = None, **kwargs, ) -> BatchFeature: - is_batched = bool( - isinstance(speech, (list, tuple)) - and (isinstance(speech[0], np.ndarray) or isinstance(speech[0], (tuple, list))) + is_batched_numpy = isinstance(speech, np.ndarray) and len(speech.shape) > 1 + if is_batched_numpy and len(speech.shape) > 2: + raise ValueError(f"Only mono-channel audio is supported for input to {self}") + is_batched = is_batched_numpy or ( + isinstance(speech, (list, tuple)) and (isinstance(speech[0], (np.ndarray, tuple, list))) ) if is_batched: diff --git a/src/transformers/models/swin/modeling_tf_swin.py b/src/transformers/models/swin/modeling_tf_swin.py index 61352843c2f248..02ec39edb0fe14 100644 --- a/src/transformers/models/swin/modeling_tf_swin.py +++ b/src/transformers/models/swin/modeling_tf_swin.py @@ -15,6 +15,8 @@ """ TF 2.0 Swin Transformer model.""" +from __future__ import annotations + import collections.abc import math import warnings @@ -95,9 +97,9 @@ class TFSwinEncoderOutput(ModelOutput): """ last_hidden_state: tf.Tensor = None - hidden_states: Optional[Tuple[tf.Tensor]] = None - attentions: Optional[Tuple[tf.Tensor]] = None - reshaped_hidden_states: Optional[Tuple[tf.Tensor]] = None + hidden_states: Tuple[tf.Tensor] | None = None + attentions: Tuple[tf.Tensor] | None = None + reshaped_hidden_states: Tuple[tf.Tensor] | None = None @dataclass @@ -130,10 +132,10 @@ class TFSwinModelOutput(ModelOutput): """ last_hidden_state: tf.Tensor = None - pooler_output: Optional[tf.Tensor] = None - hidden_states: Optional[Tuple[tf.Tensor]] = None - attentions: Optional[Tuple[tf.Tensor]] = None - reshaped_hidden_states: Optional[Tuple[tf.Tensor]] = None + pooler_output: tf.Tensor | None = None + hidden_states: Tuple[tf.Tensor] | None = None + attentions: Tuple[tf.Tensor] | None = None + reshaped_hidden_states: Tuple[tf.Tensor] | None = None @dataclass @@ -165,11 +167,11 @@ class TFSwinMaskedImageModelingOutput(ModelOutput): include the spatial dimensions. """ - loss: Optional[tf.Tensor] = None + loss: tf.Tensor | None = None reconstruction: tf.Tensor = None - hidden_states: Optional[Tuple[tf.Tensor]] = None - attentions: Optional[Tuple[tf.Tensor]] = None - reshaped_hidden_states: Optional[Tuple[tf.Tensor]] = None + hidden_states: Tuple[tf.Tensor] | None = None + attentions: Tuple[tf.Tensor] | None = None + reshaped_hidden_states: Tuple[tf.Tensor] | None = None @property def logits(self): @@ -210,11 +212,11 @@ class TFSwinImageClassifierOutput(ModelOutput): include the spatial dimensions. """ - loss: Optional[tf.Tensor] = None + loss: tf.Tensor | None = None logits: tf.Tensor = None - hidden_states: Optional[Tuple[tf.Tensor]] = None - attentions: Optional[Tuple[tf.Tensor]] = None - reshaped_hidden_states: Optional[Tuple[tf.Tensor]] = None + hidden_states: Tuple[tf.Tensor] | None = None + attentions: Tuple[tf.Tensor] | None = None + reshaped_hidden_states: Tuple[tf.Tensor] | None = None def window_partition(input_feature: tf.Tensor, window_size: int) -> tf.Tensor: @@ -529,8 +531,8 @@ def transpose_for_scores(self, x: tf.Tensor) -> tf.Tensor: def call( self, hidden_states: tf.Tensor, - attention_mask: Optional[tf.Tensor] = None, - head_mask: Optional[tf.Tensor] = None, + attention_mask: tf.Tensor | None = None, + head_mask: tf.Tensor | None = None, output_attentions: bool = False, training: bool = False, ) -> Tuple[tf.Tensor, ...]: @@ -619,8 +621,8 @@ def prune_heads(self, heads): def call( self, hidden_states: tf.Tensor, - attention_mask: Optional[tf.Tensor] = None, - head_mask: Optional[tf.Tensor] = None, + attention_mask: tf.Tensor | None = None, + head_mask: tf.Tensor | None = None, output_attentions: bool = False, training: bool = False, ) -> tf.Tensor: @@ -683,7 +685,7 @@ def __init__( self.intermediate = TFSwinIntermediate(config, dim, name="intermediate") self.swin_output = TFSwinOutput(config, dim, name="output") - def get_attn_mask(self, height: int, width: int, window_size: int, shift_size: int) -> Optional[tf.Tensor]: + def get_attn_mask(self, height: int, width: int, window_size: int, shift_size: int) -> tf.Tensor | None: img_mask = tf.zeros((height, width)) height_slices = ((0, -window_size), (-window_size, -shift_size), (-shift_size, -1)) width_slices = ((0, -window_size), (-window_size, -shift_size), (-shift_size, -1)) @@ -725,7 +727,7 @@ def call( self, hidden_states: tf.Tensor, input_dimensions: Tuple[int, int], - head_mask: Optional[tf.Tensor] = None, + head_mask: tf.Tensor | None = None, output_attentions: bool = False, training: bool = False, ) -> tf.Tensor: @@ -832,7 +834,7 @@ def call( self, hidden_states: tf.Tensor, input_dimensions: Tuple[int, int], - head_mask: Optional[tf.Tensor] = None, + head_mask: tf.Tensor | None = None, output_attentions: Optional[bool] = False, training: bool = False, ) -> Tuple[tf.Tensor, ...]: @@ -886,7 +888,7 @@ def call( self, hidden_states: tf.Tensor, input_dimensions: Tuple[int, int], - head_mask: Optional[tf.Tensor] = None, + head_mask: tf.Tensor | None = None, output_attentions: bool = False, output_hidden_states: bool = False, return_dict: bool = True, @@ -955,29 +957,6 @@ def _set_gradient_checkpointing(self, module, value=False) -> None: if isinstance(module, TFSwinEncoder): module.gradient_checkpointing = value - @property - def dummy_inputs(self) -> Dict[str, tf.Tensor]: - """ - Dummy inputs to build the network. Returns: - `Dict[str, tf.Tensor]`: The dummy inputs. - """ - VISION_DUMMY_INPUTS = tf.random.uniform( - shape=(3, self.config.num_channels, self.config.image_size, self.config.image_size), - dtype=tf.float32, - ) - return {"pixel_values": tf.constant(VISION_DUMMY_INPUTS)} - - @tf.function( - input_signature=[ - { - "pixel_values": tf.TensorSpec((None, None, None, None), tf.float32, name="pixel_values"), - } - ] - ) - def serving(self, inputs): - output = self.call(inputs) - return self.serving_output(output) - SWIN_START_DOCSTRING = r""" This model is a Tensorflow @@ -1128,9 +1107,9 @@ def get_head_mask(self, head_mask: Optional[Any]) -> List: @unpack_inputs def call( self, - pixel_values: Optional[tf.Tensor] = None, - bool_masked_pos: Optional[tf.Tensor] = None, - head_mask: Optional[tf.Tensor] = None, + pixel_values: tf.Tensor | None = None, + bool_masked_pos: tf.Tensor | None = None, + head_mask: tf.Tensor | None = None, output_attentions: Optional[bool] = None, output_hidden_states: Optional[bool] = None, return_dict: Optional[bool] = None, @@ -1210,9 +1189,9 @@ def __init__( @unpack_inputs def call( self, - pixel_values: Optional[tf.Tensor] = None, - bool_masked_pos: Optional[tf.Tensor] = None, - head_mask: Optional[tf.Tensor] = None, + pixel_values: tf.Tensor | None = None, + bool_masked_pos: tf.Tensor | None = None, + head_mask: tf.Tensor | None = None, output_attentions: Optional[bool] = None, output_hidden_states: Optional[bool] = None, return_dict: Optional[bool] = None, @@ -1243,16 +1222,6 @@ def call( return swin_outputs - def serving_output(self, output: TFSwinModelOutput) -> TFSwinModelOutput: - # hidden_states and attentions not converted to Tensor with tf.convert_to_tensor as they are all of different dimensions - return TFSwinModelOutput( - last_hidden_state=output.last_hidden_state, - pooler_output=output.pooler_output, - hidden_states=output.hidden_states, - attentions=output.attentions, - reshaped_hidden_states=output.reshaped_hidden_states, - ) - class TFSwinPixelShuffle(tf.keras.layers.Layer): """TF layer implementation of torch.nn.PixelShuffle""" @@ -1317,9 +1286,9 @@ def __init__(self, config: SwinConfig): @unpack_inputs def call( self, - pixel_values: Optional[tf.Tensor] = None, - bool_masked_pos: Optional[tf.Tensor] = None, - head_mask: Optional[tf.Tensor] = None, + pixel_values: tf.Tensor | None = None, + bool_masked_pos: tf.Tensor | None = None, + head_mask: tf.Tensor | None = None, output_attentions: Optional[bool] = None, output_hidden_states: Optional[bool] = None, return_dict: Optional[bool] = None, @@ -1408,15 +1377,6 @@ def call( reshaped_hidden_states=outputs.reshaped_hidden_states, ) - def serving_output(self, output: TFSwinMaskedImageModelingOutput) -> TFSwinMaskedImageModelingOutput: - # hidden_states and attentions not converted to Tensor with tf.convert_to_tensor as they are all of different dimensions - return TFSwinMaskedImageModelingOutput( - reconstruction=output.reconstruction, - hidden_states=output.hidden_states, - attentions=output.attentions, - reshaped_hidden_states=output.reshaped_hidden_states, - ) - @add_start_docstrings( """ @@ -1449,9 +1409,9 @@ def __init__(self, config: SwinConfig): @unpack_inputs def call( self, - pixel_values: Optional[tf.Tensor] = None, - head_mask: Optional[tf.Tensor] = None, - labels: Optional[tf.Tensor] = None, + pixel_values: tf.Tensor | None = None, + head_mask: tf.Tensor | None = None, + labels: tf.Tensor | None = None, output_attentions: Optional[bool] = None, output_hidden_states: Optional[bool] = None, return_dict: Optional[bool] = None, @@ -1491,12 +1451,3 @@ def call( attentions=outputs.attentions, reshaped_hidden_states=outputs.reshaped_hidden_states, ) - - def serving_output(self, output: TFSwinImageClassifierOutput) -> TFSwinImageClassifierOutput: - # hidden_states and attentions not converted to Tensor with tf.convert_to_tensor as they are all of different dimensions - return TFSwinImageClassifierOutput( - logits=output.logits, - hidden_states=output.hidden_states, - attentions=output.attentions, - reshaped_hidden_states=output.reshaped_hidden_states, - ) diff --git a/src/transformers/models/t5/modeling_tf_t5.py b/src/transformers/models/t5/modeling_tf_t5.py index ec3e67db26d1ed..daef8bfb7fddc9 100644 --- a/src/transformers/models/t5/modeling_tf_t5.py +++ b/src/transformers/models/t5/modeling_tf_t5.py @@ -15,6 +15,9 @@ # limitations under the License. """ TF 2.0 T5 model.""" + +from __future__ import annotations + import copy import itertools import math @@ -42,8 +45,6 @@ ) from ...tf_utils import check_embeddings_within_bounds, shape_list, stable_softmax from ...utils import ( - DUMMY_INPUTS, - DUMMY_MASK, ContextManagers, add_start_docstrings, add_start_docstrings_to_model_forward, @@ -864,32 +865,6 @@ class TFT5PreTrainedModel(TFPreTrainedModel): # names with a '.' represents the authorized unexpected/missing layers when a TF model is loaded from a PT model _keys_to_ignore_on_load_unexpected = [r"decoder\Wblock[\W_0]+layer[\W_1]+EncDecAttention\Wrelative_attention_bias"] - @property - def dummy_inputs(self): - inputs = tf.constant(DUMMY_INPUTS, dtype=tf.int32) - input_mask = tf.constant(DUMMY_MASK, dtype=tf.int32) - dummy_inputs = { - "input_ids": inputs, - "decoder_input_ids": inputs, - "decoder_attention_mask": input_mask, - } - return dummy_inputs - - @tf.function( - input_signature=[ - { - "input_ids": tf.TensorSpec((None, None), tf.int32, name="input_ids"), - "attention_mask": tf.TensorSpec((None, None), tf.int32, name="attention_mask"), - "decoder_input_ids": tf.TensorSpec((None, None), tf.int32, name="decoder_input_ids"), - "decoder_attention_mask": tf.TensorSpec((None, None), tf.int32, name="decoder_attention_mask"), - } - ] - ) - def serving(self, inputs): - output = self.call(inputs) - - return self.serving_output(output) - def get_input_embeddings(self): return self.shared @@ -1148,16 +1123,16 @@ def get_decoder(self): @replace_return_docstrings(output_type=TFSeq2SeqModelOutput, config_class=_CONFIG_FOR_DOC) def call( self, - input_ids: Optional[TFModelInputType] = None, - attention_mask: Optional[Union[np.ndarray, tf.Tensor]] = None, - decoder_input_ids: Optional[Union[np.ndarray, tf.Tensor]] = None, - decoder_attention_mask: Optional[Union[np.ndarray, tf.Tensor]] = None, - head_mask: Optional[Union[np.ndarray, tf.Tensor]] = None, - decoder_head_mask: Optional[Union[np.ndarray, tf.Tensor]] = None, - encoder_outputs: Optional[Union[np.ndarray, tf.Tensor]] = None, + input_ids: TFModelInputType | None = None, + attention_mask: np.ndarray | tf.Tensor | None = None, + decoder_input_ids: np.ndarray | tf.Tensor | None = None, + decoder_attention_mask: np.ndarray | tf.Tensor | None = None, + head_mask: np.ndarray | tf.Tensor | None = None, + decoder_head_mask: np.ndarray | tf.Tensor | None = None, + encoder_outputs: np.ndarray | tf.Tensor | None = None, past_key_values: Optional[Tuple[Tuple[Union[np.ndarray, tf.Tensor]]]] = None, - inputs_embeds: Optional[Union[np.ndarray, tf.Tensor]] = None, - decoder_inputs_embeds: Optional[Union[np.ndarray, tf.Tensor]] = None, + inputs_embeds: np.ndarray | tf.Tensor | None = None, + decoder_inputs_embeds: np.ndarray | tf.Tensor | None = None, use_cache: Optional[bool] = None, output_attentions: Optional[bool] = None, output_hidden_states: Optional[bool] = None, @@ -1246,25 +1221,6 @@ def call( encoder_attentions=encoder_outputs.attentions, ) - def serving_output(self, output): - pkv = tf.convert_to_tensor(output.past_key_values[1:]) if self.config.use_cache else None - dec_hs = tf.convert_to_tensor(output.decoder_hidden_states) if self.config.output_hidden_states else None - dec_attns = tf.convert_to_tensor(output.decoder_attentions) if self.config.output_attentions else None - cross_attns = tf.convert_to_tensor(output.cross_attentions) if self.config.output_attentions else None - enc_hs = tf.convert_to_tensor(output.encoder_hidden_states) if self.config.output_hidden_states else None - enc_attns = tf.convert_to_tensor(output.encoder_attentions) if self.config.output_attentions else None - - return TFSeq2SeqModelOutput( - last_hidden_state=output.last_hidden_state, - past_key_values=pkv, - decoder_hidden_states=dec_hs, - decoder_attentions=dec_attns, - encoder_last_hidden_state=output.encoder_last_hidden_state, - cross_attentions=cross_attns, - encoder_hidden_states=enc_hs, - encoder_attentions=enc_attns, - ) - @add_start_docstrings("""T5 Model with a `language modeling` head on top.""", T5_START_DOCSTRING) class TFT5ForConditionalGeneration(TFT5PreTrainedModel, TFCausalLanguageModelingLoss): @@ -1327,17 +1283,17 @@ def get_decoder(self): @replace_return_docstrings(output_type=TFSeq2SeqLMOutput, config_class=_CONFIG_FOR_DOC) def call( self, - input_ids: Optional[TFModelInputType] = None, - attention_mask: Optional[Union[np.ndarray, tf.Tensor]] = None, - decoder_input_ids: Optional[Union[np.ndarray, tf.Tensor]] = None, - decoder_attention_mask: Optional[Union[np.ndarray, tf.Tensor]] = None, - head_mask: Optional[Union[np.ndarray, tf.Tensor]] = None, - decoder_head_mask: Optional[Union[np.ndarray, tf.Tensor]] = None, - encoder_outputs: Optional[Union[np.ndarray, tf.Tensor]] = None, + input_ids: TFModelInputType | None = None, + attention_mask: np.ndarray | tf.Tensor | None = None, + decoder_input_ids: np.ndarray | tf.Tensor | None = None, + decoder_attention_mask: np.ndarray | tf.Tensor | None = None, + head_mask: np.ndarray | tf.Tensor | None = None, + decoder_head_mask: np.ndarray | tf.Tensor | None = None, + encoder_outputs: np.ndarray | tf.Tensor | None = None, past_key_values: Optional[Tuple[Tuple[Union[np.ndarray, tf.Tensor]]]] = None, - inputs_embeds: Optional[Union[np.ndarray, tf.Tensor]] = None, - decoder_inputs_embeds: Optional[Union[np.ndarray, tf.Tensor]] = None, - labels: Optional[Union[np.ndarray, tf.Tensor]] = None, + inputs_embeds: np.ndarray | tf.Tensor | None = None, + decoder_inputs_embeds: np.ndarray | tf.Tensor | None = None, + labels: np.ndarray | tf.Tensor | None = None, use_cache: Optional[bool] = None, output_attentions: Optional[bool] = None, output_hidden_states: Optional[bool] = None, @@ -1536,10 +1492,6 @@ def __init__(self, config, *inputs, **kwargs): encoder_config.use_cache = False self.encoder = TFT5MainLayer(encoder_config, self.shared, name="encoder") - @property - def dummy_inputs(self): - return {"input_ids": tf.constant(DUMMY_INPUTS, dtype=tf.int32)} - def get_encoder(self): return self.encoder @@ -1548,10 +1500,10 @@ def get_encoder(self): @replace_return_docstrings(output_type=TFBaseModelOutput, config_class=_CONFIG_FOR_DOC) def call( self, - input_ids: Optional[TFModelInputType] = None, - attention_mask: Optional[Union[np.ndarray, tf.Tensor]] = None, - head_mask: Optional[Union[np.ndarray, tf.Tensor]] = None, - inputs_embeds: Optional[Union[np.ndarray, tf.Tensor]] = None, + input_ids: TFModelInputType | None = None, + attention_mask: np.ndarray | tf.Tensor | None = None, + head_mask: np.ndarray | tf.Tensor | None = None, + inputs_embeds: np.ndarray | tf.Tensor | None = None, output_attentions: Optional[bool] = None, output_hidden_states: Optional[bool] = None, return_dict: Optional[bool] = None, @@ -1597,23 +1549,3 @@ def call( hidden_states=encoder_outputs.hidden_states, attentions=encoder_outputs.attentions, ) - - @tf.function( - input_signature=[ - { - "input_ids": tf.TensorSpec((None, None), tf.int32, name="input_ids"), - "attention_mask": tf.TensorSpec((None, None), tf.int32, name="attention_mask"), - } - ] - ) - def serving(self, inputs): - output = self.call(inputs) - - return self.serving_output(output) - - # Copied from transformers.models.distilbert.modeling_tf_distilbert.TFDistilBertModel.serving_output - def serving_output(self, output): - hs = tf.convert_to_tensor(output.hidden_states) if self.config.output_hidden_states else None - attns = tf.convert_to_tensor(output.attentions) if self.config.output_attentions else None - - return TFBaseModelOutput(last_hidden_state=output.last_hidden_state, hidden_states=hs, attentions=attns) diff --git a/src/transformers/models/tapas/modeling_tf_tapas.py b/src/transformers/models/tapas/modeling_tf_tapas.py index f876730b095d50..62e77a6678deec 100644 --- a/src/transformers/models/tapas/modeling_tf_tapas.py +++ b/src/transformers/models/tapas/modeling_tf_tapas.py @@ -14,6 +14,9 @@ # limitations under the License. """TF 2.0 TAPAS model.""" + +from __future__ import annotations + import enum import math from dataclasses import dataclass @@ -132,11 +135,11 @@ class TFTableQuestionAnsweringOutput(ModelOutput): the self-attention heads. """ - loss: Optional[tf.Tensor] = None + loss: tf.Tensor | None = None logits: tf.Tensor = None - logits_aggregation: Optional[tf.Tensor] = None - hidden_states: Optional[Tuple[tf.Tensor]] = None - attentions: Optional[Tuple[tf.Tensor]] = None + logits_aggregation: tf.Tensor | None = None + hidden_states: Tuple[tf.Tensor] | None = None + attentions: Tuple[tf.Tensor] | None = None class TFTapasEmbeddings(tf.keras.layers.Layer): @@ -486,9 +489,9 @@ def call( hidden_states: tf.Tensor, attention_mask: tf.Tensor, head_mask: tf.Tensor, - encoder_hidden_states: Optional[tf.Tensor], - encoder_attention_mask: Optional[tf.Tensor], - past_key_value: Optional[Tuple[tf.Tensor]], + encoder_hidden_states: tf.Tensor | None, + encoder_attention_mask: tf.Tensor | None, + past_key_value: Tuple[tf.Tensor] | None, output_attentions: bool, training: bool = False, ) -> Tuple[tf.Tensor]: @@ -565,9 +568,9 @@ def call( hidden_states: tf.Tensor, attention_mask: tf.Tensor, head_mask: tf.Tensor, - encoder_hidden_states: Optional[tf.Tensor], - encoder_attention_mask: Optional[tf.Tensor], - past_key_values: Optional[Tuple[Tuple[tf.Tensor]]], + encoder_hidden_states: tf.Tensor | None, + encoder_attention_mask: tf.Tensor | None, + past_key_values: Tuple[Tuple[tf.Tensor]] | None, use_cache: Optional[bool], output_attentions: bool, output_hidden_states: bool, @@ -758,12 +761,12 @@ class PreTrainedModel @unpack_inputs def call( self, - input_ids: Optional[TFModelInputType] = None, - attention_mask: Optional[Union[np.ndarray, tf.Tensor]] = None, - token_type_ids: Optional[Union[np.ndarray, tf.Tensor]] = None, - position_ids: Optional[Union[np.ndarray, tf.Tensor]] = None, - head_mask: Optional[Union[np.ndarray, tf.Tensor]] = None, - inputs_embeds: Optional[Union[np.ndarray, tf.Tensor]] = None, + input_ids: TFModelInputType | None = None, + attention_mask: np.ndarray | tf.Tensor | None = None, + token_type_ids: np.ndarray | tf.Tensor | None = None, + position_ids: np.ndarray | tf.Tensor | None = None, + head_mask: np.ndarray | tf.Tensor | None = None, + inputs_embeds: np.ndarray | tf.Tensor | None = None, output_attentions: Optional[bool] = None, output_hidden_states: Optional[bool] = None, return_dict: Optional[bool] = None, @@ -859,18 +862,13 @@ class TFTapasPreTrainedModel(TFPreTrainedModel): config_class = TapasConfig base_model_prefix = "tapas" - @tf.function( - input_signature=[ - { - "input_ids": tf.TensorSpec((None, None), tf.int32, name="input_ids"), - "attention_mask": tf.TensorSpec((None, None), tf.float32, name="attention_mask"), - "token_type_ids": tf.TensorSpec((None, None, None), tf.int32, name="token_type_ids"), - } - ] - ) - def serving(self, inputs): - output = self.call(inputs) - return self.serving_output(output) + @property + def input_signature(self): + return { + "input_ids": tf.TensorSpec((None, None), tf.int32, name="input_ids"), + "attention_mask": tf.TensorSpec((None, None), tf.float32, name="attention_mask"), + "token_type_ids": tf.TensorSpec((None, None, 7), tf.int32, name="token_type_ids"), + } TAPAS_START_DOCSTRING = r""" @@ -984,12 +982,12 @@ def __init__(self, config: TapasConfig, *inputs, **kwargs): @replace_return_docstrings(output_type=TFBaseModelOutputWithPooling, config_class=_CONFIG_FOR_DOC) def call( self, - input_ids: Optional[TFModelInputType] = None, - attention_mask: Optional[Union[np.ndarray, tf.Tensor]] = None, - token_type_ids: Optional[Union[np.ndarray, tf.Tensor]] = None, - position_ids: Optional[Union[np.ndarray, tf.Tensor]] = None, - head_mask: Optional[Union[np.ndarray, tf.Tensor]] = None, - inputs_embeds: Optional[Union[np.ndarray, tf.Tensor]] = None, + input_ids: TFModelInputType | None = None, + attention_mask: np.ndarray | tf.Tensor | None = None, + token_type_ids: np.ndarray | tf.Tensor | None = None, + position_ids: np.ndarray | tf.Tensor | None = None, + head_mask: np.ndarray | tf.Tensor | None = None, + inputs_embeds: np.ndarray | tf.Tensor | None = None, output_attentions: Optional[bool] = None, output_hidden_states: Optional[bool] = None, return_dict: Optional[bool] = None, @@ -1035,17 +1033,6 @@ def call( return outputs - def serving_output(self, output: TFBaseModelOutputWithPooling) -> TFBaseModelOutputWithPooling: - hidden_states = tf.convert_to_tensor(output.hidden_states) if self.config.output_hidden_states else None - attentions = tf.convert_to_tensor(output.attentions) if self.config.output_attentions else None - - return TFBaseModelOutputWithPooling( - last_hidden_state=output.last_hidden_state, - pooler_output=output.pooler_output, - hidden_states=hidden_states, - attentions=attentions, - ) - @add_start_docstrings("""Tapas Model with a `language modeling` head on top.""", TAPAS_START_DOCSTRING) class TFTapasForMaskedLM(TFTapasPreTrainedModel, TFMaskedLanguageModelingLoss): @@ -1069,16 +1056,16 @@ def get_lm_head(self) -> tf.keras.layers.Layer: @replace_return_docstrings(output_type=TFMaskedLMOutput, config_class=_CONFIG_FOR_DOC) def call( self, - input_ids: Optional[TFModelInputType] = None, - attention_mask: Optional[Union[np.ndarray, tf.Tensor]] = None, - token_type_ids: Optional[Union[np.ndarray, tf.Tensor]] = None, - position_ids: Optional[Union[np.ndarray, tf.Tensor]] = None, - head_mask: Optional[Union[np.ndarray, tf.Tensor]] = None, - inputs_embeds: Optional[Union[np.ndarray, tf.Tensor]] = None, + input_ids: TFModelInputType | None = None, + attention_mask: np.ndarray | tf.Tensor | None = None, + token_type_ids: np.ndarray | tf.Tensor | None = None, + position_ids: np.ndarray | tf.Tensor | None = None, + head_mask: np.ndarray | tf.Tensor | None = None, + inputs_embeds: np.ndarray | tf.Tensor | None = None, output_attentions: Optional[bool] = None, output_hidden_states: Optional[bool] = None, return_dict: Optional[bool] = None, - labels: Optional[Union[np.ndarray, tf.Tensor]] = None, + labels: np.ndarray | tf.Tensor | None = None, training: Optional[bool] = False, ) -> Union[TFMaskedLMOutput, Tuple[tf.Tensor]]: r""" @@ -1142,12 +1129,6 @@ def call( attentions=outputs.attentions, ) - def serving_output(self, output: TFMaskedLMOutput) -> TFMaskedLMOutput: - hidden_states = tf.convert_to_tensor(output.hidden_states) if self.config.output_hidden_states else None - attentions = tf.convert_to_tensor(output.attentions) if self.config.output_attentions else None - - return TFMaskedLMOutput(logits=output.logits, hidden_states=hidden_states, attentions=attentions) - class TFTapasComputeTokenLogits(tf.keras.layers.Layer): def __init__(self, config: TapasConfig, **kwargs): @@ -1281,21 +1262,21 @@ def __init__(self, config: TapasConfig, *inputs, **kwargs): @replace_return_docstrings(output_type=TFTableQuestionAnsweringOutput, config_class=_CONFIG_FOR_DOC) def call( self, - input_ids: Optional[TFModelInputType] = None, - attention_mask: Optional[Union[np.ndarray, tf.Tensor]] = None, - token_type_ids: Optional[Union[np.ndarray, tf.Tensor]] = None, - position_ids: Optional[Union[np.ndarray, tf.Tensor]] = None, - head_mask: Optional[Union[np.ndarray, tf.Tensor]] = None, - inputs_embeds: Optional[Union[np.ndarray, tf.Tensor]] = None, - table_mask: Optional[Union[np.ndarray, tf.Tensor]] = None, - aggregation_labels: Optional[Union[np.ndarray, tf.Tensor]] = None, - float_answer: Optional[Union[np.ndarray, tf.Tensor]] = None, - numeric_values: Optional[Union[np.ndarray, tf.Tensor]] = None, - numeric_values_scale: Optional[Union[np.ndarray, tf.Tensor]] = None, + input_ids: TFModelInputType | None = None, + attention_mask: np.ndarray | tf.Tensor | None = None, + token_type_ids: np.ndarray | tf.Tensor | None = None, + position_ids: np.ndarray | tf.Tensor | None = None, + head_mask: np.ndarray | tf.Tensor | None = None, + inputs_embeds: np.ndarray | tf.Tensor | None = None, + table_mask: np.ndarray | tf.Tensor | None = None, + aggregation_labels: np.ndarray | tf.Tensor | None = None, + float_answer: np.ndarray | tf.Tensor | None = None, + numeric_values: np.ndarray | tf.Tensor | None = None, + numeric_values_scale: np.ndarray | tf.Tensor | None = None, output_attentions: Optional[bool] = None, output_hidden_states: Optional[bool] = None, return_dict: Optional[bool] = None, - labels: Optional[Union[np.ndarray, tf.Tensor]] = None, + labels: np.ndarray | tf.Tensor | None = None, training: Optional[bool] = False, ) -> Union[TFTableQuestionAnsweringOutput, Tuple[tf.Tensor]]: r""" @@ -1571,17 +1552,6 @@ def call( attentions=outputs.attentions, ) - def serving_output(self, output: TFTableQuestionAnsweringOutput) -> TFTableQuestionAnsweringOutput: - hidden_states = tf.convert_to_tensor(output.hidden_states) if self.config.output_hidden_states else None - attentions = tf.convert_to_tensor(output.attentions) if self.config.output_attentions else None - - return TFTableQuestionAnsweringOutput( - logits=output.logits, - logits_aggregation=output.logits_aggregation, - hidden_states=hidden_states, - attentions=attentions, - ) - @add_start_docstrings( """ @@ -1606,16 +1576,16 @@ def __init__(self, config: TapasConfig, *inputs, **kwargs): @replace_return_docstrings(output_type=TFSequenceClassifierOutput, config_class=_CONFIG_FOR_DOC) def call( self, - input_ids: Optional[TFModelInputType] = None, - attention_mask: Optional[Union[np.ndarray, tf.Tensor]] = None, - token_type_ids: Optional[Union[np.ndarray, tf.Tensor]] = None, - position_ids: Optional[Union[np.ndarray, tf.Tensor]] = None, - head_mask: Optional[Union[np.ndarray, tf.Tensor]] = None, - inputs_embeds: Optional[Union[np.ndarray, tf.Tensor]] = None, + input_ids: TFModelInputType | None = None, + attention_mask: np.ndarray | tf.Tensor | None = None, + token_type_ids: np.ndarray | tf.Tensor | None = None, + position_ids: np.ndarray | tf.Tensor | None = None, + head_mask: np.ndarray | tf.Tensor | None = None, + inputs_embeds: np.ndarray | tf.Tensor | None = None, output_attentions: Optional[bool] = None, output_hidden_states: Optional[bool] = None, return_dict: Optional[bool] = None, - labels: Optional[Union[np.ndarray, tf.Tensor]] = None, + labels: np.ndarray | tf.Tensor | None = None, training: Optional[bool] = False, ) -> Union[TFSequenceClassifierOutput, Tuple[tf.Tensor]]: r""" @@ -1684,12 +1654,6 @@ def call( attentions=outputs.attentions, ) - def serving_output(self, output: TFSequenceClassifierOutput) -> TFSequenceClassifierOutput: - hidden_states = tf.convert_to_tensor(output.hidden_states) if self.config.output_hidden_states else None - attentions = tf.convert_to_tensor(output.attentions) if self.config.output_attentions else None - - return TFSequenceClassifierOutput(logits=output.logits, hidden_states=hidden_states, attentions=attentions) - """ TAPAS utilities.""" diff --git a/src/transformers/models/time_series_transformer/configuration_time_series_transformer.py b/src/transformers/models/time_series_transformer/configuration_time_series_transformer.py index b8b21045b5424f..9676b50ed0b954 100644 --- a/src/transformers/models/time_series_transformer/configuration_time_series_transformer.py +++ b/src/transformers/models/time_series_transformer/configuration_time_series_transformer.py @@ -217,9 +217,6 @@ def __init__( self.activation_function = activation_function self.init_std = init_std - self.output_attentions = False - self.output_hidden_states = False - self.use_cache = use_cache super().__init__(is_encoder_decoder=is_encoder_decoder, **kwargs) diff --git a/src/transformers/models/time_series_transformer/modeling_time_series_transformer.py b/src/transformers/models/time_series_transformer/modeling_time_series_transformer.py index 812a8d6b4d0130..d5ffa069d95a38 100644 --- a/src/transformers/models/time_series_transformer/modeling_time_series_transformer.py +++ b/src/transformers/models/time_series_transformer/modeling_time_series_transformer.py @@ -140,7 +140,9 @@ def __init__( self.default_scale = default_scale @torch.no_grad() - def forward(self, data: torch.Tensor, observed_indicator: torch.Tensor) -> Tuple[torch.Tensor, torch.Tensor]: + def forward( + self, data: torch.Tensor, observed_indicator: torch.Tensor + ) -> Tuple[torch.Tensor, torch.Tensor, torch.Tensor]: # shape: (N, [C], T=1) ts_sum = (data * observed_indicator).abs().sum(self.dim, keepdim=True) num_observed = observed_indicator.sum(self.dim, keepdim=True) @@ -1394,7 +1396,7 @@ def forward( >>> from transformers import TimeSeriesTransformerModel >>> file = hf_hub_download( - ... repo_id="kashif/tourism-monthly-batch", filename="train-batch.pt", repo_type="dataset" + ... repo_id="hf-internal-testing/tourism-monthly-batch", filename="train-batch.pt", repo_type="dataset" ... ) >>> batch = torch.load(file) @@ -1558,7 +1560,7 @@ def forward( >>> from transformers import TimeSeriesTransformerForPrediction >>> file = hf_hub_download( - ... repo_id="kashif/tourism-monthly-batch", filename="train-batch.pt", repo_type="dataset" + ... repo_id="hf-internal-testing/tourism-monthly-batch", filename="train-batch.pt", repo_type="dataset" ... ) >>> batch = torch.load(file) diff --git a/src/transformers/models/transfo_xl/modeling_tf_transfo_xl.py b/src/transformers/models/transfo_xl/modeling_tf_transfo_xl.py index 93af2165111288..2ef67426f87cdb 100644 --- a/src/transformers/models/transfo_xl/modeling_tf_transfo_xl.py +++ b/src/transformers/models/transfo_xl/modeling_tf_transfo_xl.py @@ -17,6 +17,8 @@ TF 2.0 Transformer XL model. """ +from __future__ import annotations + from dataclasses import dataclass from typing import List, Optional, Tuple, Union @@ -541,14 +543,14 @@ def _update_mems(self, hids, mems, mlen, qlen): @unpack_inputs def call( self, - input_ids: Optional[TFModelInputType] = None, - mems: Optional[List[tf.Tensor]] = None, - head_mask: Optional[Union[np.ndarray, tf.Tensor]] = None, - inputs_embeds: Optional[Union[np.ndarray, tf.Tensor]] = None, + input_ids: TFModelInputType | None = None, + mems: List[tf.Tensor] | None = None, + head_mask: np.ndarray | tf.Tensor | None = None, + inputs_embeds: np.ndarray | tf.Tensor | None = None, output_attentions: Optional[bool] = None, output_hidden_states: Optional[bool] = None, return_dict: Optional[bool] = None, - labels: Optional[Union[np.ndarray, tf.Tensor]] = None, + labels: np.ndarray | tf.Tensor | None = None, training: bool = False, ): # the original code for Transformer-XL used shapes [len, bsz] but we want a unified interface in the library @@ -682,18 +684,6 @@ class TFTransfoXLPreTrainedModel(TFPreTrainedModel): config_class = TransfoXLConfig base_model_prefix = "transformer" - @tf.function( - input_signature=[ - { - "input_ids": tf.TensorSpec((None, None), tf.int32, name="input_ids"), - } - ] - ) - def serving(self, inputs): - output = self.call(inputs) - - return self.serving_output(output) - @dataclass class TFTransfoXLModelOutput(ModelOutput): @@ -722,8 +712,8 @@ class TFTransfoXLModelOutput(ModelOutput): last_hidden_state: tf.Tensor = None mems: List[tf.Tensor] = None - hidden_states: Optional[Tuple[tf.Tensor]] = None - attentions: Optional[Tuple[tf.Tensor]] = None + hidden_states: Tuple[tf.Tensor] | None = None + attentions: Tuple[tf.Tensor] | None = None @dataclass @@ -755,8 +745,8 @@ class TFTransfoXLLMHeadModelOutput(ModelOutput): prediction_scores: tf.Tensor = None mems: List[tf.Tensor] = None - hidden_states: Optional[Tuple[tf.Tensor]] = None - attentions: Optional[Tuple[tf.Tensor]] = None + hidden_states: Tuple[tf.Tensor] | None = None + attentions: Tuple[tf.Tensor] | None = None @dataclass @@ -786,11 +776,11 @@ class TFTransfoXLSequenceClassifierOutputWithPast(ModelOutput): heads. """ - loss: Optional[tf.Tensor] = None + loss: tf.Tensor | None = None logits: tf.Tensor = None mems: List[tf.Tensor] = None - hidden_states: Optional[Tuple[tf.Tensor]] = None - attentions: Optional[Tuple[tf.Tensor]] = None + hidden_states: Tuple[tf.Tensor] | None = None + attentions: Tuple[tf.Tensor] | None = None TRANSFO_XL_START_DOCSTRING = r""" @@ -892,10 +882,10 @@ def __init__(self, config, *inputs, **kwargs): ) def call( self, - input_ids: Optional[TFModelInputType] = None, - mems: Optional[List[tf.Tensor]] = None, - head_mask: Optional[Union[np.ndarray, tf.Tensor]] = None, - inputs_embeds: Optional[Union[np.ndarray, tf.Tensor]] = None, + input_ids: TFModelInputType | None = None, + mems: List[tf.Tensor] | None = None, + head_mask: np.ndarray | tf.Tensor | None = None, + inputs_embeds: np.ndarray | tf.Tensor | None = None, output_attentions: Optional[bool] = None, output_hidden_states: Optional[bool] = None, return_dict: Optional[bool] = None, @@ -914,17 +904,6 @@ def call( return outputs - def serving_output(self, output): - hs = tf.convert_to_tensor(output.hidden_states) if self.config.output_hidden_states else None - attns = tf.convert_to_tensor(output.attentions) if self.config.output_attentions else None - - return TFTransfoXLModelOutput( - last_hidden_state=output.last_hidden_state, - mems=tf.convert_to_tensor(output.mems), - hidden_states=hs, - attentions=attns, - ) - @add_start_docstrings( """ @@ -971,14 +950,14 @@ def init_mems(self, bsz): ) def call( self, - input_ids: Optional[TFModelInputType] = None, - mems: Optional[List[tf.Tensor]] = None, - head_mask: Optional[Union[np.ndarray, tf.Tensor]] = None, - inputs_embeds: Optional[Union[np.ndarray, tf.Tensor]] = None, + input_ids: TFModelInputType | None = None, + mems: List[tf.Tensor] | None = None, + head_mask: np.ndarray | tf.Tensor | None = None, + inputs_embeds: np.ndarray | tf.Tensor | None = None, output_attentions: Optional[bool] = None, output_hidden_states: Optional[bool] = None, return_dict: Optional[bool] = None, - labels: Optional[Union[np.ndarray, tf.Tensor]] = None, + labels: np.ndarray | tf.Tensor | None = None, training: bool = False, ): if input_ids is not None: @@ -1013,17 +992,6 @@ def call( attentions=transformer_outputs.attentions, ) - def serving_output(self, output): - hs = tf.convert_to_tensor(output.hidden_states) if self.config.output_hidden_states else None - attns = tf.convert_to_tensor(output.attentions) if self.config.output_attentions else None - - return TFTransfoXLLMHeadModelOutput( - prediction_scores=output.prediction_scores, - mems=tf.convert_to_tensor(output.mems), - hidden_states=hs, - attentions=attns, - ) - def prepare_inputs_for_generation(self, input_ids, past_key_values=None, **model_kwargs): inputs = {} @@ -1075,14 +1043,14 @@ def get_output_embeddings(self): ) def call( self, - input_ids: Optional[TFModelInputType] = None, - mems: Optional[List[tf.Tensor]] = None, - head_mask: Optional[Union[np.ndarray, tf.Tensor]] = None, - inputs_embeds: Optional[Union[np.ndarray, tf.Tensor]] = None, + input_ids: TFModelInputType | None = None, + mems: List[tf.Tensor] | None = None, + head_mask: np.ndarray | tf.Tensor | None = None, + inputs_embeds: np.ndarray | tf.Tensor | None = None, output_attentions: Optional[bool] = None, output_hidden_states: Optional[bool] = None, return_dict: Optional[bool] = None, - labels: Optional[Union[np.ndarray, tf.Tensor]] = None, + labels: np.ndarray | tf.Tensor | None = None, training: Optional[bool] = False, ) -> Union[Tuple, TFTransfoXLSequenceClassifierOutputWithPast]: r""" @@ -1155,11 +1123,3 @@ def call( hidden_states=transformer_outputs.hidden_states, attentions=transformer_outputs.attentions, ) - - def serving_output(self, output): - hs = tf.convert_to_tensor(output.hidden_states) if self.config.output_hidden_states else None - attns = tf.convert_to_tensor(output.attentions) if self.config.output_attentions else None - - return TFTransfoXLSequenceClassifierOutputWithPast( - logits=output.logits, mems=tf.convert_to_tensor(output.mems), hidden_states=hs, attentions=attns - ) diff --git a/src/transformers/models/tvlt/feature_extraction_tvlt.py b/src/transformers/models/tvlt/feature_extraction_tvlt.py index 6d919550cf558d..d5beba76bd6986 100644 --- a/src/transformers/models/tvlt/feature_extraction_tvlt.py +++ b/src/transformers/models/tvlt/feature_extraction_tvlt.py @@ -129,7 +129,8 @@ def __call__( Args: raw_speech (`np.ndarray`, `List[float]`, `List[np.ndarray]`, `List[List[float]]`): The sequence or batch of sequences to be padded. Each sequence can be a numpy array, a list of float - values, a list of numpy arrays or a list of list of float values. + values, a list of numpy arrays or a list of list of float values. Must be mono channel audio, not + stereo, i.e. single float per timestep. return_tensors (`str` or [`~utils.TensorType`], *optional*): If set, will return tensors instead of list of python integers. Acceptable values are: - `'pt'`: Return PyTorch `torch.Tensor` objects. @@ -176,9 +177,11 @@ def __call__( "Failing to do so can result in silent errors that might be hard to debug." ) - is_batched = bool( - isinstance(raw_speech, (list, tuple)) - and (isinstance(raw_speech[0], np.ndarray) or isinstance(raw_speech[0], (tuple, list))) + is_batched_numpy = isinstance(raw_speech, np.ndarray) and len(raw_speech.shape) > 1 + if is_batched_numpy and len(raw_speech.shape) > 2: + raise ValueError(f"Only mono-channel audio is supported for input to {self}") + is_batched = is_batched_numpy or ( + isinstance(raw_speech, (list, tuple)) and (isinstance(raw_speech[0], (np.ndarray, tuple, list))) ) if is_batched: raw_speech = [np.asarray([speech], dtype=np.float32).T for speech in raw_speech] diff --git a/src/transformers/models/unispeech/configuration_unispeech.py b/src/transformers/models/unispeech/configuration_unispeech.py index f4e8df659e9a01..4054c49f3e0cfb 100644 --- a/src/transformers/models/unispeech/configuration_unispeech.py +++ b/src/transformers/models/unispeech/configuration_unispeech.py @@ -65,6 +65,9 @@ class UniSpeechConfig(PretrainedConfig): The dropout ratio for the attention probabilities. final_dropout (`float`, *optional*, defaults to 0.1): The dropout probability for the final projection layer of [`UniSpeechForCTC`]. + layerdrop (`float`, *optional*, defaults to 0.1): + The LayerDrop probability. See the [LayerDrop paper](see https://arxiv.org/abs/1909.11556) for more + details. initializer_range (`float`, *optional*, defaults to 0.02): The standard deviation of the truncated_normal_initializer for initializing all weight matrices. layer_norm_eps (`float`, *optional*, defaults to 1e-12): diff --git a/src/transformers/models/unispeech_sat/configuration_unispeech_sat.py b/src/transformers/models/unispeech_sat/configuration_unispeech_sat.py index 222f982fe769bb..8bd482f394d0e6 100644 --- a/src/transformers/models/unispeech_sat/configuration_unispeech_sat.py +++ b/src/transformers/models/unispeech_sat/configuration_unispeech_sat.py @@ -66,6 +66,9 @@ class UniSpeechSatConfig(PretrainedConfig): The dropout ratio for the attention probabilities. final_dropout (`float`, *optional*, defaults to 0.1): The dropout probability for the final projection layer of [`UniSpeechSatForCTC`]. + layerdrop (`float`, *optional*, defaults to 0.1): + The LayerDrop probability. See the [LayerDrop paper](see https://arxiv.org/abs/1909.11556) for more + details. initializer_range (`float`, *optional*, defaults to 0.02): The standard deviation of the truncated_normal_initializer for initializing all weight matrices. layer_norm_eps (`float`, *optional*, defaults to 1e-12): diff --git a/src/transformers/models/vision_encoder_decoder/modeling_tf_vision_encoder_decoder.py b/src/transformers/models/vision_encoder_decoder/modeling_tf_vision_encoder_decoder.py index 439c5d668a93f2..9667c529b56445 100644 --- a/src/transformers/models/vision_encoder_decoder/modeling_tf_vision_encoder_decoder.py +++ b/src/transformers/models/vision_encoder_decoder/modeling_tf_vision_encoder_decoder.py @@ -15,6 +15,8 @@ """ Classes to support TF Vision-Encoder-Text-Decoder architectures""" +from __future__ import annotations + import re import warnings from typing import Optional, Tuple, Union @@ -27,7 +29,6 @@ from ...modeling_tf_utils import TFCausalLanguageModelingLoss, TFPreTrainedModel, get_initializer, unpack_inputs from ...tf_utils import shape_list from ...utils import ( - DUMMY_INPUTS, ModelOutput, add_start_docstrings, add_start_docstrings_to_model_forward, @@ -252,29 +253,26 @@ def __init__( ) @property - def dummy_inputs(self): - """ - Dummy inputs to build the network. - - Returns: - `Dict[str, tf.Tensor]`: The dummy inputs. - """ - decoder_input_ids = tf.constant(DUMMY_INPUTS, dtype=tf.int32) - batch_size, seq_len = decoder_input_ids.shape - - VISION_DUMMY_INPUTS = tf.random.uniform( - shape=( - batch_size, - self.config.encoder.num_channels, - self.config.encoder.image_size, - self.config.encoder.image_size, + def input_signature(self): + vision_config = self.config.encoder + if hasattr(vision_config, "vision_config"): + vision_config = vision_config.vision_config + if hasattr(vision_config, "image_size"): + image_size = vision_config.image_size + else: + image_size = vision_config.input_size + return { + "pixel_values": tf.TensorSpec( + shape=( + None, + vision_config.num_channels, + image_size, + image_size, + ), + dtype=tf.float32, ), - dtype=tf.float32, - ) - pixel_values = tf.constant(VISION_DUMMY_INPUTS) - # Add `decoder_input_ids` because `self.decoder` requires it. - dummy = {"pixel_values": pixel_values, "decoder_input_ids": decoder_input_ids} - return dummy + "decoder_input_ids": tf.TensorSpec(shape=(None, None), dtype=tf.int32, name="decoder_input_ids"), + } def get_encoder(self): return self.encoder @@ -492,13 +490,13 @@ def from_encoder_decoder_pretrained( @replace_return_docstrings(output_type=TFSeq2SeqLMOutput, config_class=_CONFIG_FOR_DOC) def call( self, - pixel_values: Optional[Union[np.ndarray, tf.Tensor]] = None, - decoder_input_ids: Optional[Union[np.ndarray, tf.Tensor]] = None, - decoder_attention_mask: Optional[Union[np.ndarray, tf.Tensor]] = None, + pixel_values: np.ndarray | tf.Tensor | None = None, + decoder_input_ids: np.ndarray | tf.Tensor | None = None, + decoder_attention_mask: np.ndarray | tf.Tensor | None = None, encoder_outputs: Optional[Union[Tuple, TFBaseModelOutput]] = None, past_key_values: Optional[Tuple[Tuple[Union[np.ndarray, tf.Tensor]]]] = None, - decoder_inputs_embeds: Optional[Union[np.ndarray, tf.Tensor]] = None, - labels: Optional[Union[np.ndarray, tf.Tensor]] = None, + decoder_inputs_embeds: np.ndarray | tf.Tensor | None = None, + labels: np.ndarray | tf.Tensor | None = None, use_cache: Optional[bool] = None, output_attentions: Optional[bool] = None, output_hidden_states: Optional[bool] = None, diff --git a/src/transformers/models/vision_text_dual_encoder/modeling_tf_vision_text_dual_encoder.py b/src/transformers/models/vision_text_dual_encoder/modeling_tf_vision_text_dual_encoder.py index a2211f245ec518..6e0c65a813f16a 100644 --- a/src/transformers/models/vision_text_dual_encoder/modeling_tf_vision_text_dual_encoder.py +++ b/src/transformers/models/vision_text_dual_encoder/modeling_tf_vision_text_dual_encoder.py @@ -15,6 +15,8 @@ """TensorFlow VisionTextDualEncoder model.""" +from __future__ import annotations + import re from typing import Optional, Tuple, Union @@ -340,12 +342,12 @@ def get_image_features( @replace_return_docstrings(output_type=TFCLIPOutput, config_class=_CONFIG_FOR_DOC) def call( self, - input_ids: Optional[tf.Tensor] = None, - pixel_values: Optional[tf.Tensor] = None, - attention_mask: Optional[tf.Tensor] = None, - position_ids: Optional[tf.Tensor] = None, + input_ids: tf.Tensor | None = None, + pixel_values: tf.Tensor | None = None, + attention_mask: tf.Tensor | None = None, + position_ids: tf.Tensor | None = None, return_loss: Optional[bool] = None, - token_type_ids: Optional[tf.Tensor] = None, + token_type_ids: tf.Tensor | None = None, output_attentions: Optional[bool] = None, output_hidden_states: Optional[bool] = None, return_dict: Optional[bool] = None, diff --git a/src/transformers/models/vit/modeling_tf_vit.py b/src/transformers/models/vit/modeling_tf_vit.py index 6d0c579a43db0a..727db8dfc6c081 100644 --- a/src/transformers/models/vit/modeling_tf_vit.py +++ b/src/transformers/models/vit/modeling_tf_vit.py @@ -15,9 +15,11 @@ """ TF 2.0 ViT model.""" +from __future__ import annotations + import collections.abc import math -from typing import Dict, Optional, Tuple, Union +from typing import Optional, Tuple, Union import numpy as np import tensorflow as tf @@ -487,8 +489,8 @@ class PreTrainedModel @unpack_inputs def call( self, - pixel_values: Optional[TFModelInputType] = None, - head_mask: Optional[Union[np.ndarray, tf.Tensor]] = None, + pixel_values: TFModelInputType | None = None, + head_mask: np.ndarray | tf.Tensor | None = None, output_attentions: Optional[bool] = None, output_hidden_states: Optional[bool] = None, interpolate_pos_encoding: Optional[bool] = None, @@ -548,38 +550,6 @@ class TFViTPreTrainedModel(TFPreTrainedModel): base_model_prefix = "vit" main_input_name = "pixel_values" - @property - def dummy_inputs(self) -> Dict[str, tf.Tensor]: - """ - Dummy inputs to build the network. - - Returns: - `Dict[str, tf.Tensor]`: The dummy inputs. - """ - VISION_DUMMY_INPUTS = tf.random.uniform( - shape=(3, self.config.num_channels, self.config.image_size, self.config.image_size), dtype=tf.float32 - ) - return {"pixel_values": tf.constant(VISION_DUMMY_INPUTS)} - - @tf.function( - input_signature=[ - { - "pixel_values": tf.TensorSpec((None, None, None, None), tf.float32, name="pixel_values"), - } - ] - ) - def serving(self, inputs): - """ - Method used for serving the model. - - Args: - inputs (`Dict[str, tf.Tensor]`): - The input of the saved model as a dictionary of tensors. - """ - output = self.call(inputs) - - return self.serving_output(output) - VIT_START_DOCSTRING = r""" @@ -675,8 +645,8 @@ def __init__(self, config: ViTConfig, *inputs, add_pooling_layer=True, **kwargs) ) def call( self, - pixel_values: Optional[TFModelInputType] = None, - head_mask: Optional[Union[np.ndarray, tf.Tensor]] = None, + pixel_values: TFModelInputType | None = None, + head_mask: np.ndarray | tf.Tensor | None = None, output_attentions: Optional[bool] = None, output_hidden_states: Optional[bool] = None, interpolate_pos_encoding: Optional[bool] = None, @@ -695,17 +665,6 @@ def call( return outputs - def serving_output(self, output: TFBaseModelOutputWithPooling) -> TFBaseModelOutputWithPooling: - hs = tf.convert_to_tensor(output.hidden_states) if self.config.output_hidden_states else None - attns = tf.convert_to_tensor(output.attentions) if self.config.output_attentions else None - - return TFBaseModelOutputWithPooling( - last_hidden_state=output.last_hidden_state, - pooler_output=output.pooler_output, - hidden_states=hs, - attentions=attns, - ) - class TFViTPooler(tf.keras.layers.Layer): def __init__(self, config: ViTConfig, **kwargs): @@ -766,13 +725,13 @@ def __init__(self, config: ViTConfig, *inputs, **kwargs): ) def call( self, - pixel_values: Optional[TFModelInputType] = None, - head_mask: Optional[Union[np.ndarray, tf.Tensor]] = None, + pixel_values: TFModelInputType | None = None, + head_mask: np.ndarray | tf.Tensor | None = None, output_attentions: Optional[bool] = None, output_hidden_states: Optional[bool] = None, interpolate_pos_encoding: Optional[bool] = None, return_dict: Optional[bool] = None, - labels: Optional[Union[np.ndarray, tf.Tensor]] = None, + labels: np.ndarray | tf.Tensor | None = None, training: Optional[bool] = False, ) -> Union[TFSequenceClassifierOutput, Tuple[tf.Tensor]]: r""" @@ -805,9 +764,3 @@ def call( hidden_states=outputs.hidden_states, attentions=outputs.attentions, ) - - def serving_output(self, output: TFSequenceClassifierOutput) -> TFSequenceClassifierOutput: - hs = tf.convert_to_tensor(output.hidden_states) if self.config.output_hidden_states else None - attns = tf.convert_to_tensor(output.attentions) if self.config.output_attentions else None - - return TFSequenceClassifierOutput(logits=output.logits, hidden_states=hs, attentions=attns) diff --git a/src/transformers/models/vit_mae/modeling_tf_vit_mae.py b/src/transformers/models/vit_mae/modeling_tf_vit_mae.py index afb40478ccdae1..e7d7770bcf26d7 100644 --- a/src/transformers/models/vit_mae/modeling_tf_vit_mae.py +++ b/src/transformers/models/vit_mae/modeling_tf_vit_mae.py @@ -14,11 +14,14 @@ # limitations under the License. """ TF 2.0 ViT MAE (masked autoencoder) model.""" + +from __future__ import annotations + import collections.abc import math from copy import deepcopy from dataclasses import dataclass -from typing import Dict, Optional, Tuple, Union +from typing import Optional, Tuple, Union import numpy as np import tensorflow as tf @@ -74,8 +77,8 @@ class TFViTMAEModelOutput(ModelOutput): last_hidden_state: tf.Tensor = None mask: tf.Tensor = None ids_restore: tf.Tensor = None - hidden_states: Optional[Tuple[tf.Tensor]] = None - attentions: Optional[Tuple[tf.Tensor]] = None + hidden_states: Tuple[tf.Tensor] | None = None + attentions: Tuple[tf.Tensor] | None = None @dataclass @@ -97,8 +100,8 @@ class TFViTMAEDecoderOutput(ModelOutput): """ logits: tf.Tensor = None - hidden_states: Optional[Tuple[tf.Tensor]] = None - attentions: Optional[Tuple[tf.Tensor]] = None + hidden_states: Tuple[tf.Tensor] | None = None + attentions: Tuple[tf.Tensor] | None = None @dataclass @@ -125,12 +128,12 @@ class TFViTMAEForPreTrainingOutput(ModelOutput): the self-attention heads. """ - loss: Optional[tf.Tensor] = None + loss: tf.Tensor | None = None logits: tf.Tensor = None mask: tf.Tensor = None ids_restore: tf.Tensor = None - hidden_states: Optional[Tuple[tf.Tensor]] = None - attentions: Optional[Tuple[tf.Tensor]] = None + hidden_states: Tuple[tf.Tensor] | None = None + attentions: Tuple[tf.Tensor] | None = None def get_2d_sincos_pos_embed(embed_dim, grid_size, add_cls_token=False): @@ -232,7 +235,7 @@ def build(self, input_shape: tf.TensorShape): super().build(input_shape) - def random_masking(self, sequence: tf.Tensor, noise: Optional[tf.Tensor] = None): + def random_masking(self, sequence: tf.Tensor, noise: tf.Tensor | None = None): """ Perform per-sample random masking by per-sample shuffling. Per-sample shuffling is done by argsort random noise. @@ -639,9 +642,9 @@ class PreTrainedModel @unpack_inputs def call( self, - pixel_values: Optional[TFModelInputType] = None, + pixel_values: TFModelInputType | None = None, noise: tf.Tensor = None, - head_mask: Optional[Union[np.ndarray, tf.Tensor]] = None, + head_mask: np.ndarray | tf.Tensor | None = None, output_attentions: Optional[bool] = None, output_hidden_states: Optional[bool] = None, return_dict: Optional[bool] = None, @@ -695,36 +698,6 @@ class TFViTMAEPreTrainedModel(TFPreTrainedModel): base_model_prefix = "vit" main_input_name = "pixel_values" - @property - def dummy_inputs(self) -> Dict[str, tf.Tensor]: - """ - Dummy inputs to build the network. Returns: - `Dict[str, tf.Tensor]`: The dummy inputs. - """ - VISION_DUMMY_INPUTS = tf.random.uniform( - shape=(3, self.config.num_channels, self.config.image_size, self.config.image_size), - dtype=tf.float32, - ) - return {"pixel_values": tf.constant(VISION_DUMMY_INPUTS)} - - @tf.function( - input_signature=[ - { - "pixel_values": tf.TensorSpec((None, None, None, None), tf.float32, name="pixel_values"), - } - ] - ) - def serving(self, inputs): - """ - Method used for serving the model. - - Args: - inputs (`Dict[str, tf.Tensor]`): - The input of the saved model as a dictionary of tensors. - """ - output = self.call(inputs) - return self.serving_output(output) - VIT_MAE_START_DOCSTRING = r""" This model inherits from [`TFPreTrainedModel`]. Check the superclass documentation for the generic methods the @@ -816,9 +789,9 @@ def get_input_embeddings(self): @replace_return_docstrings(output_type=TFViTMAEModelOutput, config_class=_CONFIG_FOR_DOC) def call( self, - pixel_values: Optional[TFModelInputType] = None, + pixel_values: TFModelInputType | None = None, noise: tf.Tensor = None, - head_mask: Optional[Union[np.ndarray, tf.Tensor]] = None, + head_mask: np.ndarray | tf.Tensor | None = None, output_attentions: Optional[bool] = None, output_hidden_states: Optional[bool] = None, return_dict: Optional[bool] = None, @@ -856,18 +829,6 @@ def call( return outputs - def serving_output(self, output: TFViTMAEModelOutput) -> TFViTMAEModelOutput: - hidden_states = tf.convert_to_tensor(output.hidden_states) if self.config.output_hidden_states else None - attentions = tf.convert_to_tensor(output.attentions) if self.config.output_attentions else None - - return TFViTMAEModelOutput( - last_hidden_state=output.last_hidden_state, - mask=output.mask, - ids_restore=output.ids_restore, - hidden_states=hidden_states, - attentions=attentions, - ) - class TFViTMAEDecoder(tf.keras.layers.Layer): def __init__(self, config, num_patches, **kwargs): @@ -1107,9 +1068,9 @@ def forward_loss(self, pixel_values, pred, mask): @replace_return_docstrings(output_type=TFViTMAEForPreTrainingOutput, config_class=_CONFIG_FOR_DOC) def call( self, - pixel_values: Optional[TFModelInputType] = None, + pixel_values: TFModelInputType | None = None, noise: tf.Tensor = None, - head_mask: Optional[Union[np.ndarray, tf.Tensor]] = None, + head_mask: np.ndarray | tf.Tensor | None = None, output_attentions: Optional[bool] = None, output_hidden_states: Optional[bool] = None, return_dict: Optional[bool] = None, @@ -1170,15 +1131,3 @@ def call( hidden_states=outputs.hidden_states, attentions=outputs.attentions, ) - - def serving_output(self, output: TFViTMAEForPreTrainingOutput) -> TFViTMAEForPreTrainingOutput: - hidden_states = tf.convert_to_tensor(output.hidden_states) if self.config.output_hidden_states else None - attentions = tf.convert_to_tensor(output.attentions) if self.config.output_attentions else None - - return TFViTMAEForPreTrainingOutput( - logits=output.logits, - mask=output.mask, - ids_restore=output.ids_restore, - hidden_states=hidden_states, - attentions=attentions, - ) diff --git a/src/transformers/models/wav2vec2/configuration_wav2vec2.py b/src/transformers/models/wav2vec2/configuration_wav2vec2.py index 7afcd3f0ee28e2..6f7709e535d12d 100644 --- a/src/transformers/models/wav2vec2/configuration_wav2vec2.py +++ b/src/transformers/models/wav2vec2/configuration_wav2vec2.py @@ -63,6 +63,9 @@ class Wav2Vec2Config(PretrainedConfig): The dropout ratio for the attention probabilities. final_dropout (`float`, *optional*, defaults to 0.1): The dropout probability for the final projection layer of [`Wav2Vec2ForCTC`]. + layerdrop (`float`, *optional*, defaults to 0.1): + The LayerDrop probability. See the [LayerDrop paper](see https://arxiv.org/abs/1909.11556) for more + details. initializer_range (`float`, *optional*, defaults to 0.02): The standard deviation of the truncated_normal_initializer for initializing all weight matrices. layer_norm_eps (`float`, *optional*, defaults to 1e-12): diff --git a/src/transformers/models/wav2vec2/feature_extraction_wav2vec2.py b/src/transformers/models/wav2vec2/feature_extraction_wav2vec2.py index 9550b7c2a9ef90..2c2066739ddd49 100644 --- a/src/transformers/models/wav2vec2/feature_extraction_wav2vec2.py +++ b/src/transformers/models/wav2vec2/feature_extraction_wav2vec2.py @@ -117,7 +117,8 @@ def __call__( Args: raw_speech (`np.ndarray`, `List[float]`, `List[np.ndarray]`, `List[List[float]]`): The sequence or batch of sequences to be padded. Each sequence can be a numpy array, a list of float - values, a list of numpy arrays or a list of list of float values. + values, a list of numpy arrays or a list of list of float values. Must be mono channel audio, not + stereo, i.e. single float per timestep. padding (`bool`, `str` or [`~utils.PaddingStrategy`], *optional*, defaults to `False`): Select a strategy to pad the returned sequences (according to the model's padding side and padding index) among: @@ -181,9 +182,11 @@ def __call__( "Failing to do so can result in silent errors that might be hard to debug." ) - is_batched = bool( - isinstance(raw_speech, (list, tuple)) - and (isinstance(raw_speech[0], np.ndarray) or isinstance(raw_speech[0], (tuple, list))) + is_batched_numpy = isinstance(raw_speech, np.ndarray) and len(raw_speech.shape) > 1 + if is_batched_numpy and len(raw_speech.shape) > 2: + raise ValueError(f"Only mono-channel audio is supported for input to {self}") + is_batched = is_batched_numpy or ( + isinstance(raw_speech, (list, tuple)) and (isinstance(raw_speech[0], (np.ndarray, tuple, list))) ) # always return batch diff --git a/src/transformers/models/wav2vec2/modeling_tf_wav2vec2.py b/src/transformers/models/wav2vec2/modeling_tf_wav2vec2.py index dcc59d7f7322aa..39e1539e70a787 100644 --- a/src/transformers/models/wav2vec2/modeling_tf_wav2vec2.py +++ b/src/transformers/models/wav2vec2/modeling_tf_wav2vec2.py @@ -14,9 +14,12 @@ # limitations under the License. """ TensorFlow Wav2Vec2 model.""" + +from __future__ import annotations + import warnings from dataclasses import dataclass -from typing import Any, Dict, Optional, Tuple, Union +from typing import Any, Optional, Tuple, Union import numpy as np import tensorflow as tf @@ -84,8 +87,8 @@ class TFWav2Vec2BaseModelOutput(ModelOutput): last_hidden_state: tf.Tensor = None extract_features: tf.Tensor = None - hidden_states: Optional[Tuple[tf.Tensor]] = None - attentions: Optional[Tuple[tf.Tensor]] = None + hidden_states: Tuple[tf.Tensor] | None = None + attentions: Tuple[tf.Tensor] | None = None def _sample_without_replacement(distribution, num_samples): @@ -673,12 +676,12 @@ def _shape(self, tensor: tf.Tensor, seq_len: int, bsz: int): def call( self, hidden_states: tf.Tensor, - key_value_states: Optional[tf.Tensor] = None, - past_key_value: Optional[Tuple[Tuple[tf.Tensor]]] = None, - attention_mask: Optional[tf.Tensor] = None, - layer_head_mask: Optional[tf.Tensor] = None, + key_value_states: tf.Tensor | None = None, + past_key_value: Tuple[Tuple[tf.Tensor]] | None = None, + attention_mask: tf.Tensor | None = None, + layer_head_mask: tf.Tensor | None = None, training: Optional[bool] = False, - ) -> Tuple[tf.Tensor, Optional[tf.Tensor]]: + ) -> Tuple[tf.Tensor, tf.Tensor | None]: """Input shape: Batch x Time x Channel""" # if key_value_states are provided this layer is used as a cross-attention layer @@ -841,7 +844,7 @@ def __init__(self, config: Wav2Vec2Config, **kwargs): def call( self, hidden_states: tf.Tensor, - attention_mask: Optional[tf.Tensor] = None, + attention_mask: tf.Tensor | None = None, output_attentions: Optional[bool] = False, training: bool = False, ) -> Tuple[tf.Tensor]: @@ -884,7 +887,7 @@ def __init__(self, config: Wav2Vec2Config, **kwargs): def call( self, hidden_states: tf.Tensor, - attention_mask: Optional[tf.Tensor] = None, + attention_mask: tf.Tensor | None = None, output_attentions: Optional[bool] = False, training: bool = False, ) -> Tuple[tf.Tensor]: @@ -917,7 +920,7 @@ def __init__(self, config: Wav2Vec2Config, **kwargs): def call( self, hidden_states: tf.Tensor, - attention_mask: Optional[tf.Tensor] = None, + attention_mask: tf.Tensor | None = None, output_attentions: Optional[bool] = False, output_hidden_states: Optional[bool] = False, return_dict: Optional[bool] = True, @@ -984,7 +987,7 @@ def __init__(self, config: Wav2Vec2Config, **kwargs): def call( self, hidden_states: tf.Tensor, - attention_mask: Optional[tf.Tensor] = None, + attention_mask: tf.Tensor | None = None, output_attentions: Optional[bool] = False, output_hidden_states: Optional[bool] = False, return_dict: Optional[bool] = True, @@ -1074,7 +1077,7 @@ def _conv_out_length(input_length, kernel_size, stride): return input_lengths - def _mask_hidden_states(self, hidden_states: tf.Tensor, mask_time_indices: Optional[tf.Tensor] = None): + def _mask_hidden_states(self, hidden_states: tf.Tensor, mask_time_indices: tf.Tensor | None = None): """ Masks extracted features along time axis and/or along feature axis according to [SpecAugment](https://arxiv.org/abs/1904.08779). @@ -1122,11 +1125,11 @@ def _mask_hidden_states(self, hidden_states: tf.Tensor, mask_time_indices: Optio def call( self, input_values: tf.Tensor, - attention_mask: Optional[tf.Tensor] = None, - token_type_ids: Optional[tf.Tensor] = None, - position_ids: Optional[tf.Tensor] = None, - head_mask: Optional[tf.Tensor] = None, - inputs_embeds: Optional[tf.Tensor] = None, + attention_mask: tf.Tensor | None = None, + token_type_ids: tf.Tensor | None = None, + position_ids: tf.Tensor | None = None, + head_mask: tf.Tensor | None = None, + inputs_embeds: tf.Tensor | None = None, output_attentions: Optional[bool] = None, output_hidden_states: Optional[bool] = None, return_dict: Optional[bool] = None, @@ -1182,14 +1185,18 @@ class TFWav2Vec2PreTrainedModel(TFPreTrainedModel): main_input_name = "input_values" @property - def dummy_inputs(self) -> Dict[str, tf.Tensor]: - pad_token = 0.0 - input_values = tf.convert_to_tensor(np.random.rand(1, 16000), tf.float32) - dummy_inputs = { - "input_values": input_values, - "attention_mask": tf.cast(tf.not_equal(input_values, pad_token), tf.float32), + def input_signature(self): + return { + "input_values": tf.TensorSpec((None, None), tf.float32, name="input_values"), + "attention_mask": tf.TensorSpec((None, None), tf.float32, name="attention_mask"), + } + + @property + def dummy_inputs(self): + return { + "input_values": tf.random.uniform(shape=(1, 16000), dtype=tf.float32), + "attention_mask": tf.ones(shape=(1, 16000), dtype=tf.float32), } - return dummy_inputs def __init__(self, config, *inputs, **kwargs): super().__init__(config, *inputs, **kwargs) @@ -1198,20 +1205,6 @@ def __init__(self, config, *inputs, **kwargs): "to train/fine-tine this model, you need a GPU or a TPU" ) - @tf.function( - input_signature=[ - { - "input_values": tf.TensorSpec((None, None), tf.float32, name="input_values"), - "attention_mask": tf.TensorSpec((None, None), tf.int32, name="attention_mask"), - "token_type_ids": tf.TensorSpec((None, None), tf.int32, name="token_type_ids"), - } - ] - ) - def serving(self, inputs): - output = self.call(input_values=inputs, training=False) - - return self.serving_output(output) - def _get_feat_extract_output_lengths(self, input_lengths, add_adapter=None): """ Computes the output length of the convolutional layers @@ -1367,11 +1360,11 @@ def __init__(self, config: Wav2Vec2Config, *inputs, **kwargs): def call( self, input_values: tf.Tensor, - attention_mask: Optional[tf.Tensor] = None, - token_type_ids: Optional[tf.Tensor] = None, - position_ids: Optional[tf.Tensor] = None, - head_mask: Optional[tf.Tensor] = None, - inputs_embeds: Optional[tf.Tensor] = None, + attention_mask: tf.Tensor | None = None, + token_type_ids: tf.Tensor | None = None, + position_ids: tf.Tensor | None = None, + head_mask: tf.Tensor | None = None, + inputs_embeds: tf.Tensor | None = None, output_attentions: Optional[bool] = None, output_hidden_states: Optional[bool] = None, return_dict: Optional[bool] = None, @@ -1424,17 +1417,6 @@ def call( return outputs - def serving_output(self, output): - hidden_states = tf.convert_to_tensor(output.hidden_states) if self.config.output_hidden_states else None - attentions = tf.convert_to_tensor(output.attentions) if self.config.output_attentions else None - - return TFWav2Vec2BaseModelOutput( - last_hidden_state=output.last_hidden_state, - extract_features=output.extract_features, - hidden_states=hidden_states, - attentions=attentions, - ) - @add_start_docstrings( """TFWav2Vec2 Model with a `language modeling` head on top for Connectionist Temporal Classification (CTC).""", @@ -1473,13 +1455,13 @@ def freeze_feature_encoder(self): def call( self, input_values: tf.Tensor, - attention_mask: Optional[tf.Tensor] = None, - token_type_ids: Optional[tf.Tensor] = None, - position_ids: Optional[tf.Tensor] = None, - head_mask: Optional[tf.Tensor] = None, - inputs_embeds: Optional[tf.Tensor] = None, + attention_mask: tf.Tensor | None = None, + token_type_ids: tf.Tensor | None = None, + position_ids: tf.Tensor | None = None, + head_mask: tf.Tensor | None = None, + inputs_embeds: tf.Tensor | None = None, output_attentions: Optional[bool] = None, - labels: Optional[tf.Tensor] = None, + labels: tf.Tensor | None = None, output_hidden_states: Optional[bool] = None, return_dict: Optional[bool] = None, training: Optional[bool] = False, @@ -1588,11 +1570,6 @@ def call( attentions=outputs.attentions, ) - def serving_output(self, output: TFCausalLMOutput) -> TFCausalLMOutput: - hidden_states = tf.convert_to_tensor(output.hidden_states) if self.config.output_hidden_states else None - attentions = tf.convert_to_tensor(output.attentions) if self.config.output_attentions else None - return TFCausalLMOutput(logits=output.logits, hidden_states=hidden_states, attentions=attentions) - class TFWav2Vec2ForSequenceClassification(TFWav2Vec2PreTrainedModel): def __init__(self, config): @@ -1639,11 +1616,11 @@ def freeze_base_model(self): def call( self, input_values: tf.Tensor, - attention_mask: Optional[tf.Tensor] = None, + attention_mask: tf.Tensor | None = None, output_attentions: Optional[bool] = None, output_hidden_states: Optional[bool] = None, return_dict: Optional[bool] = None, - labels: Optional[tf.Tensor] = None, + labels: tf.Tensor | None = None, training: bool = False, ): return_dict = return_dict if return_dict is not None else self.config.use_return_dict @@ -1690,27 +1667,3 @@ def call( hidden_states=outputs.hidden_states, attentions=outputs.attentions, ) - - def serving_output(self, output): - hidden_states = tf.convert_to_tensor(output.hidden_states) if self.config.output_hidden_states else None - attentions = tf.convert_to_tensor(output.attentions) if self.config.output_attentions else None - - return TFSequenceClassifierOutput( - logits=output.logits, - hidden_states=hidden_states, - attentions=attentions, - ) - - @tf.function( - input_signature=[ - { - "input_values": tf.TensorSpec((None, None), tf.float32, name="input_values"), - "attention_mask": tf.TensorSpec((None, None), tf.int32, name="attention_mask"), - "token_type_ids": tf.TensorSpec((None, None), tf.int32, name="token_type_ids"), - } - ] - ) - def serving(self, inputs): - output = self.call(input_values=inputs) - - return self.serving_output(output) diff --git a/src/transformers/models/wav2vec2/tokenization_wav2vec2.py b/src/transformers/models/wav2vec2/tokenization_wav2vec2.py index 1708dbf12512a4..15d3471da0d2aa 100644 --- a/src/transformers/models/wav2vec2/tokenization_wav2vec2.py +++ b/src/transformers/models/wav2vec2/tokenization_wav2vec2.py @@ -817,12 +817,15 @@ def __call__( Args: raw_speech (`np.ndarray`, `List[float]`, `List[np.ndarray]`, `List[List[float]]`): The sequence or batch of sequences to be padded. Each sequence can be a numpy array, a list of float - values, a list of numpy arrayr or a list of list of float values. + values, a list of numpy array or a list of list of float values. Must be mono channel audio, not + stereo, i.e. single float per timestep. """ - is_batched = bool( - isinstance(raw_speech, (list, tuple)) - and (isinstance(raw_speech[0], np.ndarray) or isinstance(raw_speech[0], (tuple, list))) + is_batched_numpy = isinstance(raw_speech, np.ndarray) and len(raw_speech.shape) > 1 + if is_batched_numpy and len(raw_speech.shape) > 2: + raise ValueError(f"Only mono-channel audio is supported for input to {self}") + is_batched = is_batched_numpy or ( + isinstance(raw_speech, (list, tuple)) and (isinstance(raw_speech[0], (np.ndarray, tuple, list))) ) # make sure input is in list format diff --git a/src/transformers/models/wav2vec2_conformer/configuration_wav2vec2_conformer.py b/src/transformers/models/wav2vec2_conformer/configuration_wav2vec2_conformer.py index b9f24c7e708057..24b7dca73944d1 100644 --- a/src/transformers/models/wav2vec2_conformer/configuration_wav2vec2_conformer.py +++ b/src/transformers/models/wav2vec2_conformer/configuration_wav2vec2_conformer.py @@ -65,6 +65,9 @@ class Wav2Vec2ConformerConfig(PretrainedConfig): The dropout ratio for the attention probabilities. final_dropout (`float`, *optional*, defaults to 0.1): The dropout probability for the final projection layer of [`Wav2Vec2ConformerForCTC`]. + layerdrop (`float`, *optional*, defaults to 0.1): + The LayerDrop probability. See the [LayerDrop paper](see https://arxiv.org/abs/1909.11556) for more + details. initializer_range (`float`, *optional*, defaults to 0.02): The standard deviation of the truncated_normal_initializer for initializing all weight matrices. layer_norm_eps (`float`, *optional*, defaults to 1e-12): diff --git a/src/transformers/models/wavlm/configuration_wavlm.py b/src/transformers/models/wavlm/configuration_wavlm.py index becbe100240d88..c3ac6ba196a8b9 100644 --- a/src/transformers/models/wavlm/configuration_wavlm.py +++ b/src/transformers/models/wavlm/configuration_wavlm.py @@ -62,6 +62,9 @@ class WavLMConfig(PretrainedConfig): The dropout ratio for the attention probabilities. final_dropout (`float`, *optional*, defaults to 0.1): The dropout probability for the final projection layer of [`WavLMForCTC`]. + layerdrop (`float`, *optional*, defaults to 0.1): + The LayerDrop probability. See the [LayerDrop paper](see https://arxiv.org/abs/1909.11556) for more + details. initializer_range (`float`, *optional*, defaults to 0.02): The standard deviation of the truncated_normal_initializer for initializing all weight matrices. layer_norm_eps (`float`, *optional*, defaults to 1e-12): diff --git a/src/transformers/models/whisper/feature_extraction_whisper.py b/src/transformers/models/whisper/feature_extraction_whisper.py index e0b772216205fe..70eb8bd94e7676 100644 --- a/src/transformers/models/whisper/feature_extraction_whisper.py +++ b/src/transformers/models/whisper/feature_extraction_whisper.py @@ -152,7 +152,8 @@ def __call__( Args: raw_speech (`np.ndarray`, `List[float]`, `List[np.ndarray]`, `List[List[float]]`): The sequence or batch of sequences to be padded. Each sequence can be a numpy array, a list of float - values, a list of numpy arrays or a list of list of float values. + values, a list of numpy arrays or a list of list of float values. Must be mono channel audio, not + stereo, i.e. single float per timestep. truncation (`bool`, *optional*, default to `True`): Activates truncation to cut input sequences longer than *max_length* to *max_length*. pad_to_multiple_of (`int`, *optional*, defaults to None): @@ -203,9 +204,11 @@ def __call__( "Failing to do so can result in silent errors that might be hard to debug." ) - is_batched = bool( - isinstance(raw_speech, (list, tuple)) - and (isinstance(raw_speech[0], np.ndarray) or isinstance(raw_speech[0], (tuple, list))) + is_batched_numpy = isinstance(raw_speech, np.ndarray) and len(raw_speech.shape) > 1 + if is_batched_numpy and len(raw_speech.shape) > 2: + raise ValueError(f"Only mono-channel audio is supported for input to {self}") + is_batched = is_batched_numpy or ( + isinstance(raw_speech, (list, tuple)) and (isinstance(raw_speech[0], (np.ndarray, tuple, list))) ) if is_batched: diff --git a/src/transformers/models/whisper/modeling_tf_whisper.py b/src/transformers/models/whisper/modeling_tf_whisper.py index 0d2a2682cc97d4..b8cd87f67ef03a 100644 --- a/src/transformers/models/whisper/modeling_tf_whisper.py +++ b/src/transformers/models/whisper/modeling_tf_whisper.py @@ -15,6 +15,8 @@ """ TensorFlow Whisper model.""" +from __future__ import annotations + import math import random from typing import Dict, Optional, Tuple, Union @@ -171,12 +173,12 @@ def _shape(self, tensor: tf.Tensor, seq_len: int, bsz: int): def call( self, hidden_states: tf.Tensor, - key_value_states: Optional[tf.Tensor] = None, - past_key_value: Optional[Tuple[Tuple[tf.Tensor]]] = None, - attention_mask: Optional[tf.Tensor] = None, - layer_head_mask: Optional[tf.Tensor] = None, + key_value_states: tf.Tensor | None = None, + past_key_value: Tuple[Tuple[tf.Tensor]] | None = None, + attention_mask: tf.Tensor | None = None, + layer_head_mask: tf.Tensor | None = None, training: Optional[bool] = False, - ) -> Tuple[tf.Tensor, Optional[tf.Tensor]]: + ) -> Tuple[tf.Tensor, tf.Tensor | None]: """Input shape: Batch x Time x Channel""" # if key_value_states are provided this layer is used as a cross-attention layer @@ -376,12 +378,12 @@ def __init__(self, config: WhisperConfig, **kwargs): def call( self, hidden_states, - attention_mask: Optional[tf.Tensor] = None, - encoder_hidden_states: Optional[tf.Tensor] = None, - encoder_attention_mask: Optional[tf.Tensor] = None, - layer_head_mask: Optional[tf.Tensor] = None, - cross_attn_layer_head_mask: Optional[tf.Tensor] = None, - past_key_value: Optional[Tuple[tf.Tensor]] = None, + attention_mask: tf.Tensor | None = None, + encoder_hidden_states: tf.Tensor | None = None, + encoder_attention_mask: tf.Tensor | None = None, + layer_head_mask: tf.Tensor | None = None, + cross_attn_layer_head_mask: tf.Tensor | None = None, + past_key_value: Tuple[tf.Tensor] | None = None, training=False, ) -> Tuple[tf.Tensor, tf.Tensor, Tuple[Tuple[tf.Tensor]]]: """ @@ -484,18 +486,13 @@ def dummy_inputs(self) -> Dict[str, tf.Tensor]: "decoder_input_ids": tf.constant([[2, 3]], dtype=tf.int32), } - @tf.function( - input_signature=[ - { - "input_features": tf.TensorSpec((None, None, None), tf.float32, name="input_features"), - "decoder_input_ids": tf.TensorSpec((None, None), tf.int32, name="decoder_input_ids"), - "decoder_attention_mask": tf.TensorSpec((None, None), tf.int32, name="decoder_attention_mask"), - } - ] - ) - def serving(self, inputs): - output = self.call(inputs) - return self.serving_output(output) + @property + def input_signature(self): + return { + "input_features": tf.TensorSpec((None, self.config.num_mel_bins, None), tf.float32, name="input_features"), + "decoder_input_ids": tf.TensorSpec((None, None), tf.int32, name="decoder_input_ids"), + "decoder_attention_mask": tf.TensorSpec((None, None), tf.int32, name="decoder_attention_mask"), + } WHISPER_START_DOCSTRING = r""" @@ -1119,13 +1116,13 @@ def encoder(self): @unpack_inputs def call( self, - input_features: Optional[TFModelInputType] = None, - decoder_input_ids: Optional[Union[np.ndarray, tf.Tensor]] = None, - decoder_attention_mask: Optional[Union[np.ndarray, tf.Tensor]] = None, - decoder_position_ids: Optional[Union[np.ndarray, tf.Tensor]] = None, - head_mask: Optional[Union[np.ndarray, tf.Tensor]] = None, - decoder_head_mask: Optional[Union[np.ndarray, tf.Tensor]] = None, - cross_attn_head_mask: Optional[Union[np.ndarray, tf.Tensor]] = None, + input_features: TFModelInputType | None = None, + decoder_input_ids: np.ndarray | tf.Tensor | None = None, + decoder_attention_mask: np.ndarray | tf.Tensor | None = None, + decoder_position_ids: np.ndarray | tf.Tensor | None = None, + head_mask: np.ndarray | tf.Tensor | None = None, + decoder_head_mask: np.ndarray | tf.Tensor | None = None, + cross_attn_head_mask: np.ndarray | tf.Tensor | None = None, encoder_outputs: Optional[Tuple[Tuple[Union[np.ndarray, tf.Tensor]]]] = None, past_key_values: Optional[Tuple[Tuple[Union[np.ndarray, tf.Tensor]]]] = None, decoder_inputs_embeds: Optional[Tuple[Union[np.ndarray, tf.Tensor]]] = None, @@ -1234,17 +1231,17 @@ def resize_token_embeddings(self, new_num_tokens: int) -> tf.keras.layers.Embedd @unpack_inputs def call( self, - input_features: Optional[TFModelInputType] = None, - decoder_input_ids: Optional[Union[np.ndarray, tf.Tensor]] = None, - decoder_attention_mask: Optional[Union[np.ndarray, tf.Tensor]] = None, - decoder_position_ids: Optional[Union[np.ndarray, tf.Tensor]] = None, - head_mask: Optional[Union[np.ndarray, tf.Tensor]] = None, - decoder_head_mask: Optional[Union[np.ndarray, tf.Tensor]] = None, - cross_attn_head_mask: Optional[Union[np.ndarray, tf.Tensor]] = None, + input_features: TFModelInputType | None = None, + decoder_input_ids: np.ndarray | tf.Tensor | None = None, + decoder_attention_mask: np.ndarray | tf.Tensor | None = None, + decoder_position_ids: np.ndarray | tf.Tensor | None = None, + head_mask: np.ndarray | tf.Tensor | None = None, + decoder_head_mask: np.ndarray | tf.Tensor | None = None, + cross_attn_head_mask: np.ndarray | tf.Tensor | None = None, encoder_outputs: Optional[Tuple[Tuple[Union[np.ndarray, tf.Tensor]]]] = None, past_key_values: Optional[Tuple[Tuple[Union[np.ndarray, tf.Tensor]]]] = None, decoder_inputs_embeds: Optional[Tuple[Union[np.ndarray, tf.Tensor]]] = None, - labels: Optional[Union[np.ndarray, tf.Tensor]] = None, + labels: np.ndarray | tf.Tensor | None = None, use_cache: Optional[bool] = None, output_attentions: Optional[bool] = None, output_hidden_states: Optional[bool] = None, diff --git a/src/transformers/models/whisper/modeling_whisper.py b/src/transformers/models/whisper/modeling_whisper.py index 96f91a0a43dd77..6175009e4128d2 100644 --- a/src/transformers/models/whisper/modeling_whisper.py +++ b/src/transformers/models/whisper/modeling_whisper.py @@ -1633,6 +1633,9 @@ def generate( ) prompt_ids = prompt_ids.tolist() decoder_start_token_id, *text_prompt_ids = prompt_ids + # Slicing the text prompt ids in a manner consistent with the OpenAI implementation + # to accomodate context space for the prefix (see https://github.com/openai/whisper/blob/c09a7ae299c4c34c5839a76380ae407e7d785914/whisper/decoding.py#L599) + text_prompt_ids = text_prompt_ids[-self.config.max_length // 2 - 1 :] # Set the decoder_start_token_id to <|startofprev|> kwargs.update({"decoder_start_token_id": decoder_start_token_id}) @@ -1647,9 +1650,7 @@ def generate( kwargs.pop("forced_decoder_ids", None) or generation_config.forced_decoder_ids ) forced_decoder_ids = [ - # Slicing the text prompt ids in a manner consistent with the OpenAI implementation - # to accomodate context space for the prefix (see https://github.com/openai/whisper/blob/c09a7ae299c4c34c5839a76380ae407e7d785914/whisper/decoding.py#L599) - *text_prompt_ids[-self.config.max_length // 2 - 1 :], + *text_prompt_ids, generation_config.decoder_start_token_id, *[token for _rank, token in non_prompt_forced_decoder_ids], ] diff --git a/src/transformers/models/xglm/modeling_tf_xglm.py b/src/transformers/models/xglm/modeling_tf_xglm.py index 1a0146bf19d799..6cc9db021cf9ac 100644 --- a/src/transformers/models/xglm/modeling_tf_xglm.py +++ b/src/transformers/models/xglm/modeling_tf_xglm.py @@ -15,6 +15,8 @@ """ TF 2.0 XGLM model.""" +from __future__ import annotations + import math import random from typing import Any, Optional, Tuple, Union @@ -26,7 +28,6 @@ # Public API from ...file_utils import ( - DUMMY_INPUTS, add_code_sample_docstrings, add_start_docstrings, add_start_docstrings_to_model_forward, @@ -185,12 +186,12 @@ def _shape(self, tensor: tf.Tensor, seq_len: int, bsz: int): def call( self, hidden_states: tf.Tensor, - key_value_states: Optional[tf.Tensor] = None, - past_key_value: Optional[Tuple[Tuple[tf.Tensor]]] = None, - attention_mask: Optional[tf.Tensor] = None, - layer_head_mask: Optional[tf.Tensor] = None, + key_value_states: tf.Tensor | None = None, + past_key_value: Tuple[Tuple[tf.Tensor]] | None = None, + attention_mask: tf.Tensor | None = None, + layer_head_mask: tf.Tensor | None = None, training: Optional[bool] = False, - ) -> Tuple[tf.Tensor, Optional[tf.Tensor]]: + ) -> Tuple[tf.Tensor, tf.Tensor | None]: """Input shape: Batch x Time x Channel""" # if key_value_states are provided this layer is used as a cross-attention layer @@ -337,12 +338,12 @@ def __init__(self, config: XGLMConfig, **kwargs: Any) -> None: def call( self, hidden_states: tf.Tensor, - attention_mask: Optional[tf.Tensor] = None, - encoder_hidden_states: Optional[tf.Tensor] = None, - encoder_attention_mask: Optional[tf.Tensor] = None, - layer_head_mask: Optional[tf.Tensor] = None, - cross_attn_layer_head_mask: Optional[tf.Tensor] = None, - past_key_value: Optional[Tuple[tf.Tensor]] = None, + attention_mask: tf.Tensor | None = None, + encoder_hidden_states: tf.Tensor | None = None, + encoder_attention_mask: tf.Tensor | None = None, + layer_head_mask: tf.Tensor | None = None, + cross_attn_layer_head_mask: tf.Tensor | None = None, + past_key_value: Tuple[tf.Tensor] | None = None, training: Optional[bool] = False, ) -> Tuple[tf.Tensor, tf.Tensor, Tuple[Tuple[tf.Tensor]]]: """ @@ -456,13 +457,13 @@ def set_input_embeddings(self, value: TFSharedEmbeddings) -> None: def _prepare_decoder_attention_mask( self, - attention_mask: Optional[tf.Tensor], + attention_mask: tf.Tensor | None, input_shape: tf.TensorShape, past_key_values_length: int, ) -> tf.Tensor: # create causal mask # [bsz, seq_len] -> [bsz, 1, tgt_seq_len, src_seq_len] - combined_attention_mask: Optional[tf.Tensor] = None + combined_attention_mask: tf.Tensor | None = None if input_shape[-1] > 1: combined_attention_mask = _make_causal_mask(input_shape, past_key_values_length) @@ -476,7 +477,7 @@ def _prepare_decoder_attention_mask( return combined_attention_mask - def embed_positions(self, position_ids: Optional[Union[np.ndarray, tf.Tensor]] = None) -> tf.Tensor: + def embed_positions(self, position_ids: np.ndarray | tf.Tensor | None = None) -> tf.Tensor: position_ids += self.offset positions = tf.gather(self._embed_positions_weights, position_ids, axis=0) return positions @@ -484,15 +485,15 @@ def embed_positions(self, position_ids: Optional[Union[np.ndarray, tf.Tensor]] = @unpack_inputs def call( self, - input_ids: Optional[TFModelInputType] = None, - attention_mask: Optional[Union[np.ndarray, tf.Tensor]] = None, - position_ids: Optional[Union[np.ndarray, tf.Tensor]] = None, - encoder_hidden_states: Optional[Union[np.ndarray, tf.Tensor]] = None, - encoder_attention_mask: Optional[Union[np.ndarray, tf.Tensor]] = None, - head_mask: Optional[Union[np.ndarray, tf.Tensor]] = None, - cross_attn_head_mask: Optional[Union[np.ndarray, tf.Tensor]] = None, + input_ids: TFModelInputType | None = None, + attention_mask: np.ndarray | tf.Tensor | None = None, + position_ids: np.ndarray | tf.Tensor | None = None, + encoder_hidden_states: np.ndarray | tf.Tensor | None = None, + encoder_attention_mask: np.ndarray | tf.Tensor | None = None, + head_mask: np.ndarray | tf.Tensor | None = None, + cross_attn_head_mask: np.ndarray | tf.Tensor | None = None, past_key_values: Optional[Tuple[Tuple[Union[np.ndarray, tf.Tensor]]]] = None, - inputs_embeds: Optional[Union[np.ndarray, tf.Tensor]] = None, + inputs_embeds: np.ndarray | tf.Tensor | None = None, use_cache: Optional[bool] = None, output_attentions: Optional[bool] = None, output_hidden_states: Optional[bool] = None, @@ -618,29 +619,6 @@ class TFXGLMPreTrainedModel(TFPreTrainedModel): config_class = XGLMConfig base_model_prefix = "model" - @property - def dummy_inputs(self): - pad_token = 1 - input_ids = tf.cast(tf.convert_to_tensor(DUMMY_INPUTS), tf.int32) - dummy_inputs = { - "input_ids": input_ids, - "attention_mask": tf.cast(input_ids != pad_token, tf.int32), - } - return dummy_inputs - - @tf.function( - input_signature=[ - { - "input_ids": tf.TensorSpec((None, None), tf.int32, name="input_ids"), - "attention_mask": tf.TensorSpec((None, None), tf.int32, name="attention_mask"), - } - ] - ) - def serving(self, inputs): - output = self.call(inputs) - - return self.serving_output(output) - XGLM_START_DOCSTRING = r""" This model inherits from [`TFPreTrainedModel`]. Check the superclass documentation for the generic methods the @@ -785,15 +763,15 @@ def __init__( ) def call( self, - input_ids: Optional[TFModelInputType] = None, - attention_mask: Optional[Union[np.ndarray, tf.Tensor]] = None, - position_ids: Optional[Union[np.ndarray, tf.Tensor]] = None, - encoder_hidden_states: Optional[Union[np.ndarray, tf.Tensor]] = None, - encoder_attention_mask: Optional[Union[np.ndarray, tf.Tensor]] = None, - head_mask: Optional[Union[np.ndarray, tf.Tensor]] = None, - cross_attn_head_mask: Optional[Union[np.ndarray, tf.Tensor]] = None, + input_ids: TFModelInputType | None = None, + attention_mask: np.ndarray | tf.Tensor | None = None, + position_ids: np.ndarray | tf.Tensor | None = None, + encoder_hidden_states: np.ndarray | tf.Tensor | None = None, + encoder_attention_mask: np.ndarray | tf.Tensor | None = None, + head_mask: np.ndarray | tf.Tensor | None = None, + cross_attn_head_mask: np.ndarray | tf.Tensor | None = None, past_key_values: Optional[Tuple[Tuple[Union[np.ndarray, tf.Tensor]]]] = None, - inputs_embeds: Optional[Union[np.ndarray, tf.Tensor]] = None, + inputs_embeds: np.ndarray | tf.Tensor | None = None, use_cache: Optional[bool] = None, output_attentions: Optional[bool] = None, output_hidden_states: Optional[bool] = None, @@ -819,24 +797,6 @@ def call( return outputs - def serving_output(self, output): - pkv = tf.convert_to_tensor(output.past_key_values) if self.config.use_cache else None - hs = tf.convert_to_tensor(output.hidden_states) if self.config.output_hidden_states else None - attns = tf.convert_to_tensor(output.attentions) if self.config.output_attentions else None - cross_attns = ( - tf.convert_to_tensor(output.cross_attentions) - if self.config.output_attentions and self.config.add_cross_attention - else None - ) - - return TFBaseModelOutputWithPastAndCrossAttentions( - last_hidden_state=output.hidden_states, - past_key_values=pkv, - hidden_states=hs, - attentions=attns, - cross_attentions=cross_attns, - ) - @add_start_docstrings( """ @@ -905,16 +865,16 @@ def prepare_inputs_for_generation(self, inputs, past_key_values=None, use_cache= ) def call( self, - input_ids: Optional[TFModelInputType] = None, - attention_mask: Optional[Union[np.ndarray, tf.Tensor]] = None, - position_ids: Optional[Union[np.ndarray, tf.Tensor]] = None, - encoder_hidden_states: Optional[Union[np.ndarray, tf.Tensor]] = None, - encoder_attention_mask: Optional[Union[np.ndarray, tf.Tensor]] = None, - head_mask: Optional[Union[np.ndarray, tf.Tensor]] = None, - cross_attn_head_mask: Optional[Union[np.ndarray, tf.Tensor]] = None, + input_ids: TFModelInputType | None = None, + attention_mask: np.ndarray | tf.Tensor | None = None, + position_ids: np.ndarray | tf.Tensor | None = None, + encoder_hidden_states: np.ndarray | tf.Tensor | None = None, + encoder_attention_mask: np.ndarray | tf.Tensor | None = None, + head_mask: np.ndarray | tf.Tensor | None = None, + cross_attn_head_mask: np.ndarray | tf.Tensor | None = None, past_key_values: Optional[Tuple[Tuple[Union[np.ndarray, tf.Tensor]]]] = None, - inputs_embeds: Optional[Union[np.ndarray, tf.Tensor]] = None, - labels: Optional[Union[np.ndarray, tf.Tensor]] = None, + inputs_embeds: np.ndarray | tf.Tensor | None = None, + labels: np.ndarray | tf.Tensor | None = None, use_cache: Optional[bool] = None, output_attentions: Optional[bool] = None, output_hidden_states: Optional[bool] = None, @@ -969,22 +929,3 @@ def call( attentions=outputs.attentions, cross_attentions=outputs.cross_attentions, ) - - def serving_output(self, output): - pkv = tf.convert_to_tensor(output.past_key_values) if self.config.use_cache else None - hs = tf.convert_to_tensor(output.hidden_states) if self.config.output_hidden_states else None - attns = tf.convert_to_tensor(output.attentions) if self.config.output_attentions else None - cross_attns = ( - tf.convert_to_tensor(output.cross_attentions) - if self.config.output_attentions and self.config.add_cross_attention - else None - ) - - return TFCausalLMOutputWithCrossAttentions( - loss=output.loss, - logits=output.logits, - past_key_values=pkv, - hidden_states=hs, - attentions=attns, - cross_attentions=cross_attns, - ) diff --git a/src/transformers/models/xglm/modeling_xglm.py b/src/transformers/models/xglm/modeling_xglm.py index 3cf16352a70cc1..7f1fca9f94a824 100755 --- a/src/transformers/models/xglm/modeling_xglm.py +++ b/src/transformers/models/xglm/modeling_xglm.py @@ -315,7 +315,9 @@ def forward( f"Attention mask should be of size {(bsz, 1, tgt_len, src_len)}, but is {attention_mask.size()}" ) attn_weights = attn_weights.view(bsz, self.num_heads, tgt_len, src_len) + attention_mask - attn_weights = torch.max(attn_weights, torch.tensor(torch.finfo(attn_weights.dtype).min)) + attn_weights = torch.max( + attn_weights, torch.tensor(torch.finfo(attn_weights.dtype).min, device=attn_weights.device) + ) attn_weights = attn_weights.view(bsz * self.num_heads, tgt_len, src_len) # upcast to fp32 if the weights are in fp16. Please see https://github.com/huggingface/transformers/pull/17437 diff --git a/src/transformers/models/xlm/modeling_tf_xlm.py b/src/transformers/models/xlm/modeling_tf_xlm.py index da9bd1c6034fc3..80a214280cb6f3 100644 --- a/src/transformers/models/xlm/modeling_tf_xlm.py +++ b/src/transformers/models/xlm/modeling_tf_xlm.py @@ -16,6 +16,9 @@ TF 2.0 XLM model. """ + +from __future__ import annotations + import itertools import warnings from dataclasses import dataclass @@ -558,8 +561,8 @@ class TFXLMWithLMHeadModelOutput(ModelOutput): """ logits: tf.Tensor = None - hidden_states: Optional[Tuple[tf.Tensor]] = None - attentions: Optional[Tuple[tf.Tensor]] = None + hidden_states: Tuple[tf.Tensor] | None = None + attentions: Tuple[tf.Tensor] | None = None XLM_START_DOCSTRING = r""" @@ -729,13 +732,6 @@ def call( return outputs - # Copied from transformers.models.distilbert.modeling_tf_distilbert.TFDistilBertModel.serving_output - def serving_output(self, output): - hs = tf.convert_to_tensor(output.hidden_states) if self.config.output_hidden_states else None - attns = tf.convert_to_tensor(output.attentions) if self.config.output_attentions else None - - return TFBaseModelOutput(last_hidden_state=output.last_hidden_state, hidden_states=hs, attentions=attns) - class TFXLMPredLayer(tf.keras.layers.Layer): """ @@ -833,15 +829,15 @@ def prepare_inputs_for_generation(self, inputs, **kwargs): ) def call( self, - input_ids: Optional[TFModelInputType] = None, - attention_mask: Optional[Union[np.ndarray, tf.Tensor]] = None, - langs: Optional[Union[np.ndarray, tf.Tensor]] = None, - token_type_ids: Optional[Union[np.ndarray, tf.Tensor]] = None, - position_ids: Optional[Union[np.ndarray, tf.Tensor]] = None, - lengths: Optional[Union[np.ndarray, tf.Tensor]] = None, + input_ids: TFModelInputType | None = None, + attention_mask: np.ndarray | tf.Tensor | None = None, + langs: np.ndarray | tf.Tensor | None = None, + token_type_ids: np.ndarray | tf.Tensor | None = None, + position_ids: np.ndarray | tf.Tensor | None = None, + lengths: np.ndarray | tf.Tensor | None = None, cache: Optional[Dict[str, tf.Tensor]] = None, - head_mask: Optional[Union[np.ndarray, tf.Tensor]] = None, - inputs_embeds: Optional[Union[np.ndarray, tf.Tensor]] = None, + head_mask: np.ndarray | tf.Tensor | None = None, + inputs_embeds: np.ndarray | tf.Tensor | None = None, output_attentions: Optional[bool] = None, output_hidden_states: Optional[bool] = None, return_dict: Optional[bool] = None, @@ -873,12 +869,6 @@ def call( logits=outputs, hidden_states=transformer_outputs.hidden_states, attentions=transformer_outputs.attentions ) - def serving_output(self, output): - hs = tf.convert_to_tensor(output.hidden_states) if self.config.output_hidden_states else None - attns = tf.convert_to_tensor(output.attentions) if self.config.output_attentions else None - - return TFXLMWithLMHeadModelOutput(logits=output.logits, hidden_states=hs, attentions=attns) - @add_start_docstrings( """ @@ -904,19 +894,19 @@ def __init__(self, config, *inputs, **kwargs): ) def call( self, - input_ids: Optional[TFModelInputType] = None, - attention_mask: Optional[Union[np.ndarray, tf.Tensor]] = None, - langs: Optional[Union[np.ndarray, tf.Tensor]] = None, - token_type_ids: Optional[Union[np.ndarray, tf.Tensor]] = None, - position_ids: Optional[Union[np.ndarray, tf.Tensor]] = None, - lengths: Optional[Union[np.ndarray, tf.Tensor]] = None, + input_ids: TFModelInputType | None = None, + attention_mask: np.ndarray | tf.Tensor | None = None, + langs: np.ndarray | tf.Tensor | None = None, + token_type_ids: np.ndarray | tf.Tensor | None = None, + position_ids: np.ndarray | tf.Tensor | None = None, + lengths: np.ndarray | tf.Tensor | None = None, cache: Optional[Dict[str, tf.Tensor]] = None, - head_mask: Optional[Union[np.ndarray, tf.Tensor]] = None, - inputs_embeds: Optional[Union[np.ndarray, tf.Tensor]] = None, + head_mask: np.ndarray | tf.Tensor | None = None, + inputs_embeds: np.ndarray | tf.Tensor | None = None, output_attentions: Optional[bool] = None, output_hidden_states: Optional[bool] = None, return_dict: Optional[bool] = None, - labels: Optional[Union[np.ndarray, tf.Tensor]] = None, + labels: np.ndarray | tf.Tensor | None = None, training: bool = False, ) -> Union[TFSequenceClassifierOutput, Tuple[tf.Tensor]]: r""" @@ -957,13 +947,6 @@ def call( attentions=transformer_outputs.attentions, ) - # Copied from transformers.models.bert.modeling_tf_bert.TFBertForSequenceClassification.serving_output - def serving_output(self, output: TFSequenceClassifierOutput) -> TFSequenceClassifierOutput: - hs = tf.convert_to_tensor(output.hidden_states) if self.config.output_hidden_states else None - attns = tf.convert_to_tensor(output.attentions) if self.config.output_attentions else None - - return TFSequenceClassifierOutput(logits=output.logits, hidden_states=hs, attentions=attns) - @add_start_docstrings( """ @@ -1010,19 +993,19 @@ def dummy_inputs(self): ) def call( self, - input_ids: Optional[TFModelInputType] = None, - attention_mask: Optional[Union[np.ndarray, tf.Tensor]] = None, - langs: Optional[Union[np.ndarray, tf.Tensor]] = None, - token_type_ids: Optional[Union[np.ndarray, tf.Tensor]] = None, - position_ids: Optional[Union[np.ndarray, tf.Tensor]] = None, - lengths: Optional[Union[np.ndarray, tf.Tensor]] = None, + input_ids: TFModelInputType | None = None, + attention_mask: np.ndarray | tf.Tensor | None = None, + langs: np.ndarray | tf.Tensor | None = None, + token_type_ids: np.ndarray | tf.Tensor | None = None, + position_ids: np.ndarray | tf.Tensor | None = None, + lengths: np.ndarray | tf.Tensor | None = None, cache: Optional[Dict[str, tf.Tensor]] = None, - head_mask: Optional[Union[np.ndarray, tf.Tensor]] = None, - inputs_embeds: Optional[Union[np.ndarray, tf.Tensor]] = None, + head_mask: np.ndarray | tf.Tensor | None = None, + inputs_embeds: np.ndarray | tf.Tensor | None = None, output_attentions: Optional[bool] = None, output_hidden_states: Optional[bool] = None, return_dict: Optional[bool] = None, - labels: Optional[Union[np.ndarray, tf.Tensor]] = None, + labels: np.ndarray | tf.Tensor | None = None, training: bool = False, ) -> Union[TFMultipleChoiceModelOutput, Tuple[tf.Tensor]]: if input_ids is not None: @@ -1083,28 +1066,6 @@ def call( attentions=transformer_outputs.attentions, ) - @tf.function( - input_signature=[ - { - "input_ids": tf.TensorSpec((None, None, None), tf.int32, name="input_ids"), - "attention_mask": tf.TensorSpec((None, None, None), tf.int32, name="attention_mask"), - "token_type_ids": tf.TensorSpec((None, None, None), tf.int32, name="token_type_ids"), - } - ] - ) - # Copied from transformers.models.bert.modeling_tf_bert.TFBertForMultipleChoice.serving - def serving(self, inputs: Dict[str, tf.Tensor]): - output = self.call(input_ids=inputs) - - return self.serving_output(output) - - # Copied from transformers.models.bert.modeling_tf_bert.TFBertForMultipleChoice.serving_output - def serving_output(self, output: TFMultipleChoiceModelOutput) -> TFMultipleChoiceModelOutput: - hs = tf.convert_to_tensor(output.hidden_states) if self.config.output_hidden_states else None - attns = tf.convert_to_tensor(output.attentions) if self.config.output_attentions else None - - return TFMultipleChoiceModelOutput(logits=output.logits, hidden_states=hs, attentions=attns) - @add_start_docstrings( """ @@ -1133,19 +1094,19 @@ def __init__(self, config, *inputs, **kwargs): ) def call( self, - input_ids: Optional[TFModelInputType] = None, - attention_mask: Optional[Union[np.ndarray, tf.Tensor]] = None, - langs: Optional[Union[np.ndarray, tf.Tensor]] = None, - token_type_ids: Optional[Union[np.ndarray, tf.Tensor]] = None, - position_ids: Optional[Union[np.ndarray, tf.Tensor]] = None, - lengths: Optional[Union[np.ndarray, tf.Tensor]] = None, + input_ids: TFModelInputType | None = None, + attention_mask: np.ndarray | tf.Tensor | None = None, + langs: np.ndarray | tf.Tensor | None = None, + token_type_ids: np.ndarray | tf.Tensor | None = None, + position_ids: np.ndarray | tf.Tensor | None = None, + lengths: np.ndarray | tf.Tensor | None = None, cache: Optional[Dict[str, tf.Tensor]] = None, - head_mask: Optional[Union[np.ndarray, tf.Tensor]] = None, - inputs_embeds: Optional[Union[np.ndarray, tf.Tensor]] = None, + head_mask: np.ndarray | tf.Tensor | None = None, + inputs_embeds: np.ndarray | tf.Tensor | None = None, output_attentions: Optional[bool] = None, output_hidden_states: Optional[bool] = None, return_dict: Optional[bool] = None, - labels: Optional[Union[np.ndarray, tf.Tensor]] = None, + labels: np.ndarray | tf.Tensor | None = None, training: bool = False, ) -> Union[TFTokenClassifierOutput, Tuple[tf.Tensor]]: r""" @@ -1185,13 +1146,6 @@ def call( attentions=transformer_outputs.attentions, ) - # Copied from transformers.models.bert.modeling_tf_bert.TFBertForTokenClassification.serving_output - def serving_output(self, output: TFTokenClassifierOutput) -> TFTokenClassifierOutput: - hs = tf.convert_to_tensor(output.hidden_states) if self.config.output_hidden_states else None - attns = tf.convert_to_tensor(output.attentions) if self.config.output_attentions else None - - return TFTokenClassifierOutput(logits=output.logits, hidden_states=hs, attentions=attns) - @add_start_docstrings( """ @@ -1217,20 +1171,20 @@ def __init__(self, config, *inputs, **kwargs): ) def call( self, - input_ids: Optional[TFModelInputType] = None, - attention_mask: Optional[Union[np.ndarray, tf.Tensor]] = None, - langs: Optional[Union[np.ndarray, tf.Tensor]] = None, - token_type_ids: Optional[Union[np.ndarray, tf.Tensor]] = None, - position_ids: Optional[Union[np.ndarray, tf.Tensor]] = None, - lengths: Optional[Union[np.ndarray, tf.Tensor]] = None, + input_ids: TFModelInputType | None = None, + attention_mask: np.ndarray | tf.Tensor | None = None, + langs: np.ndarray | tf.Tensor | None = None, + token_type_ids: np.ndarray | tf.Tensor | None = None, + position_ids: np.ndarray | tf.Tensor | None = None, + lengths: np.ndarray | tf.Tensor | None = None, cache: Optional[Dict[str, tf.Tensor]] = None, - head_mask: Optional[Union[np.ndarray, tf.Tensor]] = None, - inputs_embeds: Optional[Union[np.ndarray, tf.Tensor]] = None, + head_mask: np.ndarray | tf.Tensor | None = None, + inputs_embeds: np.ndarray | tf.Tensor | None = None, output_attentions: Optional[bool] = None, output_hidden_states: Optional[bool] = None, return_dict: Optional[bool] = None, - start_positions: Optional[Union[np.ndarray, tf.Tensor]] = None, - end_positions: Optional[Union[np.ndarray, tf.Tensor]] = None, + start_positions: np.ndarray | tf.Tensor | None = None, + end_positions: np.ndarray | tf.Tensor | None = None, training: bool = False, ) -> Union[TFQuestionAnsweringModelOutput, Tuple[tf.Tensor]]: r""" @@ -1282,12 +1236,3 @@ def call( hidden_states=transformer_outputs.hidden_states, attentions=transformer_outputs.attentions, ) - - # Copied from transformers.models.bert.modeling_tf_bert.TFBertForQuestionAnswering.serving_output - def serving_output(self, output: TFQuestionAnsweringModelOutput) -> TFQuestionAnsweringModelOutput: - hs = tf.convert_to_tensor(output.hidden_states) if self.config.output_hidden_states else None - attns = tf.convert_to_tensor(output.attentions) if self.config.output_attentions else None - - return TFQuestionAnsweringModelOutput( - start_logits=output.start_logits, end_logits=output.end_logits, hidden_states=hs, attentions=attns - ) diff --git a/src/transformers/models/xlm_roberta/modeling_tf_xlm_roberta.py b/src/transformers/models/xlm_roberta/modeling_tf_xlm_roberta.py index 2f51c032f150db..65f3be9e2f277f 100644 --- a/src/transformers/models/xlm_roberta/modeling_tf_xlm_roberta.py +++ b/src/transformers/models/xlm_roberta/modeling_tf_xlm_roberta.py @@ -15,6 +15,9 @@ # limitations under the License. """ TF 2.0 XLM-RoBERTa model.""" + +from __future__ import annotations + import math import warnings from typing import Optional, Tuple, Union @@ -48,8 +51,6 @@ ) from ...tf_utils import check_embeddings_within_bounds, shape_list, stable_softmax from ...utils import ( - DUMMY_INPUTS, - MULTIPLE_CHOICE_DUMMY_INPUTS, add_code_sample_docstrings, add_start_docstrings, add_start_docstrings_to_model_forward, @@ -520,9 +521,9 @@ def call( hidden_states: tf.Tensor, attention_mask: tf.Tensor, head_mask: tf.Tensor, - encoder_hidden_states: Optional[tf.Tensor], - encoder_attention_mask: Optional[tf.Tensor], - past_key_value: Optional[Tuple[tf.Tensor]], + encoder_hidden_states: tf.Tensor | None, + encoder_attention_mask: tf.Tensor | None, + past_key_value: Tuple[tf.Tensor] | None, output_attentions: bool, training: bool = False, ) -> Tuple[tf.Tensor]: @@ -599,9 +600,9 @@ def call( hidden_states: tf.Tensor, attention_mask: tf.Tensor, head_mask: tf.Tensor, - encoder_hidden_states: Optional[tf.Tensor], - encoder_attention_mask: Optional[tf.Tensor], - past_key_values: Optional[Tuple[Tuple[tf.Tensor]]], + encoder_hidden_states: tf.Tensor | None, + encoder_attention_mask: tf.Tensor | None, + past_key_values: Tuple[Tuple[tf.Tensor]] | None, use_cache: Optional[bool], output_attentions: bool, output_hidden_states: bool, @@ -699,14 +700,14 @@ class PreTrainedModel # Copied from transformers.models.bert.modeling_tf_bert.TFBertMainLayer.call def call( self, - input_ids: Optional[TFModelInputType] = None, - attention_mask: Optional[Union[np.ndarray, tf.Tensor]] = None, - token_type_ids: Optional[Union[np.ndarray, tf.Tensor]] = None, - position_ids: Optional[Union[np.ndarray, tf.Tensor]] = None, - head_mask: Optional[Union[np.ndarray, tf.Tensor]] = None, - inputs_embeds: Optional[Union[np.ndarray, tf.Tensor]] = None, - encoder_hidden_states: Optional[Union[np.ndarray, tf.Tensor]] = None, - encoder_attention_mask: Optional[Union[np.ndarray, tf.Tensor]] = None, + input_ids: TFModelInputType | None = None, + attention_mask: np.ndarray | tf.Tensor | None = None, + token_type_ids: np.ndarray | tf.Tensor | None = None, + position_ids: np.ndarray | tf.Tensor | None = None, + head_mask: np.ndarray | tf.Tensor | None = None, + inputs_embeds: np.ndarray | tf.Tensor | None = None, + encoder_hidden_states: np.ndarray | tf.Tensor | None = None, + encoder_attention_mask: np.ndarray | tf.Tensor | None = None, past_key_values: Optional[Tuple[Tuple[Union[np.ndarray, tf.Tensor]]]] = None, use_cache: Optional[bool] = None, output_attentions: Optional[bool] = None, @@ -865,38 +866,6 @@ class TFXLMRobertaPreTrainedModel(TFPreTrainedModel): config_class = XLMRobertaConfig base_model_prefix = "roberta" - @property - # Copied from transformers.models.bert.modeling_tf_bert.TFBertPreTrainedModel.dummy_inputs - def dummy_inputs(self): - """ - Dummy inputs to build the network. - - Returns: - `Dict[str, tf.Tensor]`: The dummy inputs. - """ - dummy = {"input_ids": tf.constant(DUMMY_INPUTS, dtype=tf.int32)} - # Add `encoder_hidden_states` to make the cross-attention layers' weights initialized - if self.config.add_cross_attention: - batch_size, seq_len = tf.constant(DUMMY_INPUTS).shape - shape = (batch_size, seq_len) + (self.config.hidden_size,) - h = tf.random.uniform(shape=shape) - dummy["encoder_hidden_states"] = h - - return dummy - - @tf.function( - input_signature=[ - { - "input_ids": tf.TensorSpec((None, None), tf.int32, name="input_ids"), - "attention_mask": tf.TensorSpec((None, None), tf.int32, name="attention_mask"), - } - ] - ) - def serving(self, inputs): - output = self.call(inputs) - - return self.serving_output(output) - @add_start_docstrings( "The bare XLM RoBERTa Model transformer outputting raw hidden-states without any specific head on top.", @@ -917,14 +886,14 @@ def __init__(self, config, *inputs, **kwargs): ) def call( self, - input_ids: Optional[TFModelInputType] = None, - attention_mask: Optional[Union[np.ndarray, tf.Tensor]] = None, - token_type_ids: Optional[Union[np.ndarray, tf.Tensor]] = None, - position_ids: Optional[Union[np.ndarray, tf.Tensor]] = None, - head_mask: Optional[Union[np.ndarray, tf.Tensor]] = None, - inputs_embeds: Optional[Union[np.ndarray, tf.Tensor]] = None, - encoder_hidden_states: Optional[Union[np.ndarray, tf.Tensor]] = None, - encoder_attention_mask: Optional[Union[np.ndarray, tf.Tensor]] = None, + input_ids: TFModelInputType | None = None, + attention_mask: np.ndarray | tf.Tensor | None = None, + token_type_ids: np.ndarray | tf.Tensor | None = None, + position_ids: np.ndarray | tf.Tensor | None = None, + head_mask: np.ndarray | tf.Tensor | None = None, + inputs_embeds: np.ndarray | tf.Tensor | None = None, + encoder_hidden_states: np.ndarray | tf.Tensor | None = None, + encoder_attention_mask: np.ndarray | tf.Tensor | None = None, past_key_values: Optional[Tuple[Tuple[Union[np.ndarray, tf.Tensor]]]] = None, use_cache: Optional[bool] = None, output_attentions: Optional[bool] = None, @@ -971,27 +940,6 @@ def call( return outputs - # Copied from transformers.models.bert.modeling_tf_bert.TFBertModel.serving_output - def serving_output( - self, output: TFBaseModelOutputWithPoolingAndCrossAttentions - ) -> TFBaseModelOutputWithPoolingAndCrossAttentions: - output_cache = self.config.use_cache and self.config.is_decoder - pkv = tf.convert_to_tensor(output.past_key_values) if output_cache else None - hs = tf.convert_to_tensor(output.hidden_states) if self.config.output_hidden_states else None - attns = tf.convert_to_tensor(output.attentions) if self.config.output_attentions else None - cross_attns = tf.convert_to_tensor(output.cross_attentions) if output.cross_attentions is not None else None - if not (self.config.output_attentions and self.config.add_cross_attention): - cross_attns = None - - return TFBaseModelOutputWithPoolingAndCrossAttentions( - last_hidden_state=output.last_hidden_state, - pooler_output=output.pooler_output, - past_key_values=pkv, - hidden_states=hs, - attentions=attns, - cross_attentions=cross_attns, - ) - # Copied from transformers.models.roberta.modeling_tf_roberta.TFRobertaLMHead with Roberta->XLMRoberta class TFXLMRobertaLMHead(tf.keras.layers.Layer): @@ -1077,16 +1025,16 @@ def get_prefix_bias_name(self): ) def call( self, - input_ids: Optional[TFModelInputType] = None, - attention_mask: Optional[Union[np.ndarray, tf.Tensor]] = None, - token_type_ids: Optional[Union[np.ndarray, tf.Tensor]] = None, - position_ids: Optional[Union[np.ndarray, tf.Tensor]] = None, - head_mask: Optional[Union[np.ndarray, tf.Tensor]] = None, - inputs_embeds: Optional[Union[np.ndarray, tf.Tensor]] = None, + input_ids: TFModelInputType | None = None, + attention_mask: np.ndarray | tf.Tensor | None = None, + token_type_ids: np.ndarray | tf.Tensor | None = None, + position_ids: np.ndarray | tf.Tensor | None = None, + head_mask: np.ndarray | tf.Tensor | None = None, + inputs_embeds: np.ndarray | tf.Tensor | None = None, output_attentions: Optional[bool] = None, output_hidden_states: Optional[bool] = None, return_dict: Optional[bool] = None, - labels: Optional[Union[np.ndarray, tf.Tensor]] = None, + labels: np.ndarray | tf.Tensor | None = None, training: Optional[bool] = False, ) -> Union[TFMaskedLMOutput, Tuple[tf.Tensor]]: r""" @@ -1124,13 +1072,6 @@ def call( attentions=outputs.attentions, ) - # Copied from transformers.models.bert.modeling_tf_bert.TFBertForMaskedLM.serving_output - def serving_output(self, output: TFMaskedLMOutput) -> TFMaskedLMOutput: - hs = tf.convert_to_tensor(output.hidden_states) if self.config.output_hidden_states else None - attns = tf.convert_to_tensor(output.attentions) if self.config.output_attentions else None - - return TFMaskedLMOutput(logits=output.logits, hidden_states=hs, attentions=attns) - @add_start_docstrings( "XLM-RoBERTa Model with a `language modeling` head on top for CLM fine-tuning.", @@ -1179,20 +1120,20 @@ def prepare_inputs_for_generation(self, input_ids, past_key_values=None, attenti ) def call( self, - input_ids: Optional[TFModelInputType] = None, - attention_mask: Optional[Union[np.ndarray, tf.Tensor]] = None, - token_type_ids: Optional[Union[np.ndarray, tf.Tensor]] = None, - position_ids: Optional[Union[np.ndarray, tf.Tensor]] = None, - head_mask: Optional[Union[np.ndarray, tf.Tensor]] = None, - inputs_embeds: Optional[Union[np.ndarray, tf.Tensor]] = None, - encoder_hidden_states: Optional[Union[np.ndarray, tf.Tensor]] = None, - encoder_attention_mask: Optional[Union[np.ndarray, tf.Tensor]] = None, + input_ids: TFModelInputType | None = None, + attention_mask: np.ndarray | tf.Tensor | None = None, + token_type_ids: np.ndarray | tf.Tensor | None = None, + position_ids: np.ndarray | tf.Tensor | None = None, + head_mask: np.ndarray | tf.Tensor | None = None, + inputs_embeds: np.ndarray | tf.Tensor | None = None, + encoder_hidden_states: np.ndarray | tf.Tensor | None = None, + encoder_attention_mask: np.ndarray | tf.Tensor | None = None, past_key_values: Optional[Tuple[Tuple[Union[np.ndarray, tf.Tensor]]]] = None, use_cache: Optional[bool] = None, output_attentions: Optional[bool] = None, output_hidden_states: Optional[bool] = None, return_dict: Optional[bool] = None, - labels: Optional[Union[np.ndarray, tf.Tensor]] = None, + labels: np.ndarray | tf.Tensor | None = None, training: Optional[bool] = False, ) -> Union[TFCausalLMOutputWithCrossAttentions, Tuple[tf.Tensor]]: r""" @@ -1258,20 +1199,6 @@ def call( cross_attentions=outputs.cross_attentions, ) - # Copied from transformers.models.bert.modeling_tf_bert.TFBertLMHeadModel.serving_output - def serving_output(self, output: TFCausalLMOutputWithCrossAttentions) -> TFCausalLMOutputWithCrossAttentions: - output_cache = self.config.use_cache and self.config.is_decoder - pkv = tf.convert_to_tensor(output.past_key_values) if output_cache else None - hs = tf.convert_to_tensor(output.hidden_states) if self.config.output_hidden_states else None - attns = tf.convert_to_tensor(output.attentions) if self.config.output_attentions else None - cross_attns = tf.convert_to_tensor(output.cross_attentions) if output.cross_attentions is not None else None - if not (self.config.output_attentions and self.config.add_cross_attention): - cross_attns = None - - return TFCausalLMOutputWithCrossAttentions( - logits=output.logits, past_key_values=pkv, hidden_states=hs, attentions=attns, cross_attentions=cross_attns - ) - # Copied from transformers.models.roberta.modeling_tf_roberta.TFRobertaClassificationHead with Roberta->XLMRoberta class TFXLMRobertaClassificationHead(tf.keras.layers.Layer): @@ -1332,16 +1259,16 @@ def __init__(self, config, *inputs, **kwargs): ) def call( self, - input_ids: Optional[TFModelInputType] = None, - attention_mask: Optional[Union[np.ndarray, tf.Tensor]] = None, - token_type_ids: Optional[Union[np.ndarray, tf.Tensor]] = None, - position_ids: Optional[Union[np.ndarray, tf.Tensor]] = None, - head_mask: Optional[Union[np.ndarray, tf.Tensor]] = None, - inputs_embeds: Optional[Union[np.ndarray, tf.Tensor]] = None, + input_ids: TFModelInputType | None = None, + attention_mask: np.ndarray | tf.Tensor | None = None, + token_type_ids: np.ndarray | tf.Tensor | None = None, + position_ids: np.ndarray | tf.Tensor | None = None, + head_mask: np.ndarray | tf.Tensor | None = None, + inputs_embeds: np.ndarray | tf.Tensor | None = None, output_attentions: Optional[bool] = None, output_hidden_states: Optional[bool] = None, return_dict: Optional[bool] = None, - labels: Optional[Union[np.ndarray, tf.Tensor]] = None, + labels: np.ndarray | tf.Tensor | None = None, training: Optional[bool] = False, ) -> Union[TFSequenceClassifierOutput, Tuple[tf.Tensor]]: r""" @@ -1378,13 +1305,6 @@ def call( attentions=outputs.attentions, ) - # Copied from transformers.models.bert.modeling_tf_bert.TFBertForSequenceClassification.serving_output - def serving_output(self, output: TFSequenceClassifierOutput) -> TFSequenceClassifierOutput: - hs = tf.convert_to_tensor(output.hidden_states) if self.config.output_hidden_states else None - attns = tf.convert_to_tensor(output.attentions) if self.config.output_attentions else None - - return TFSequenceClassifierOutput(logits=output.logits, hidden_states=hs, attentions=attns) - @add_start_docstrings( """ @@ -1408,16 +1328,6 @@ def __init__(self, config, *inputs, **kwargs): 1, kernel_initializer=get_initializer(config.initializer_range), name="classifier" ) - @property - def dummy_inputs(self): - """ - Dummy inputs to build the network. - - Returns: - tf.Tensor with dummy inputs - """ - return {"input_ids": tf.constant(MULTIPLE_CHOICE_DUMMY_INPUTS, dtype=tf.int32)} - @unpack_inputs @add_start_docstrings_to_model_forward( XLM_ROBERTA_INPUTS_DOCSTRING.format("batch_size, num_choices, sequence_length") @@ -1429,16 +1339,16 @@ def dummy_inputs(self): ) def call( self, - input_ids: Optional[TFModelInputType] = None, - attention_mask: Optional[Union[np.ndarray, tf.Tensor]] = None, - token_type_ids: Optional[Union[np.ndarray, tf.Tensor]] = None, - position_ids: Optional[Union[np.ndarray, tf.Tensor]] = None, - head_mask: Optional[Union[np.ndarray, tf.Tensor]] = None, - inputs_embeds: Optional[Union[np.ndarray, tf.Tensor]] = None, + input_ids: TFModelInputType | None = None, + attention_mask: np.ndarray | tf.Tensor | None = None, + token_type_ids: np.ndarray | tf.Tensor | None = None, + position_ids: np.ndarray | tf.Tensor | None = None, + head_mask: np.ndarray | tf.Tensor | None = None, + inputs_embeds: np.ndarray | tf.Tensor | None = None, output_attentions: Optional[bool] = None, output_hidden_states: Optional[bool] = None, return_dict: Optional[bool] = None, - labels: Optional[Union[np.ndarray, tf.Tensor]] = None, + labels: np.ndarray | tf.Tensor | None = None, training: Optional[bool] = False, ) -> Union[TFMultipleChoiceModelOutput, Tuple[tf.Tensor]]: r""" @@ -1488,26 +1398,6 @@ def call( attentions=outputs.attentions, ) - @tf.function( - input_signature=[ - { - "input_ids": tf.TensorSpec((None, None, None), tf.int32, name="input_ids"), - "attention_mask": tf.TensorSpec((None, None, None), tf.int32, name="attention_mask"), - } - ] - ) - def serving(self, inputs): - output = self.call(inputs) - - return self.serving_output(output) - - # Copied from transformers.models.bert.modeling_tf_bert.TFBertForMultipleChoice.serving_output - def serving_output(self, output: TFMultipleChoiceModelOutput) -> TFMultipleChoiceModelOutput: - hs = tf.convert_to_tensor(output.hidden_states) if self.config.output_hidden_states else None - attns = tf.convert_to_tensor(output.attentions) if self.config.output_attentions else None - - return TFMultipleChoiceModelOutput(logits=output.logits, hidden_states=hs, attentions=attns) - @add_start_docstrings( """ @@ -1546,16 +1436,16 @@ def __init__(self, config, *inputs, **kwargs): ) def call( self, - input_ids: Optional[TFModelInputType] = None, - attention_mask: Optional[Union[np.ndarray, tf.Tensor]] = None, - token_type_ids: Optional[Union[np.ndarray, tf.Tensor]] = None, - position_ids: Optional[Union[np.ndarray, tf.Tensor]] = None, - head_mask: Optional[Union[np.ndarray, tf.Tensor]] = None, - inputs_embeds: Optional[Union[np.ndarray, tf.Tensor]] = None, + input_ids: TFModelInputType | None = None, + attention_mask: np.ndarray | tf.Tensor | None = None, + token_type_ids: np.ndarray | tf.Tensor | None = None, + position_ids: np.ndarray | tf.Tensor | None = None, + head_mask: np.ndarray | tf.Tensor | None = None, + inputs_embeds: np.ndarray | tf.Tensor | None = None, output_attentions: Optional[bool] = None, output_hidden_states: Optional[bool] = None, return_dict: Optional[bool] = None, - labels: Optional[Union[np.ndarray, tf.Tensor]] = None, + labels: np.ndarray | tf.Tensor | None = None, training: Optional[bool] = False, ) -> Union[TFTokenClassifierOutput, Tuple[tf.Tensor]]: r""" @@ -1592,13 +1482,6 @@ def call( attentions=outputs.attentions, ) - # Copied from transformers.models.bert.modeling_tf_bert.TFBertForTokenClassification.serving_output - def serving_output(self, output: TFTokenClassifierOutput) -> TFTokenClassifierOutput: - hs = tf.convert_to_tensor(output.hidden_states) if self.config.output_hidden_states else None - attns = tf.convert_to_tensor(output.attentions) if self.config.output_attentions else None - - return TFTokenClassifierOutput(logits=output.logits, hidden_states=hs, attentions=attns) - @add_start_docstrings( """ @@ -1632,17 +1515,17 @@ def __init__(self, config, *inputs, **kwargs): ) def call( self, - input_ids: Optional[TFModelInputType] = None, - attention_mask: Optional[Union[np.ndarray, tf.Tensor]] = None, - token_type_ids: Optional[Union[np.ndarray, tf.Tensor]] = None, - position_ids: Optional[Union[np.ndarray, tf.Tensor]] = None, - head_mask: Optional[Union[np.ndarray, tf.Tensor]] = None, - inputs_embeds: Optional[Union[np.ndarray, tf.Tensor]] = None, + input_ids: TFModelInputType | None = None, + attention_mask: np.ndarray | tf.Tensor | None = None, + token_type_ids: np.ndarray | tf.Tensor | None = None, + position_ids: np.ndarray | tf.Tensor | None = None, + head_mask: np.ndarray | tf.Tensor | None = None, + inputs_embeds: np.ndarray | tf.Tensor | None = None, output_attentions: Optional[bool] = None, output_hidden_states: Optional[bool] = None, return_dict: Optional[bool] = None, - start_positions: Optional[Union[np.ndarray, tf.Tensor]] = None, - end_positions: Optional[Union[np.ndarray, tf.Tensor]] = None, + start_positions: np.ndarray | tf.Tensor | None = None, + end_positions: np.ndarray | tf.Tensor | None = None, training: Optional[bool] = False, ) -> Union[TFQuestionAnsweringModelOutput, Tuple[tf.Tensor]]: r""" @@ -1691,12 +1574,3 @@ def call( hidden_states=outputs.hidden_states, attentions=outputs.attentions, ) - - # Copied from transformers.models.bert.modeling_tf_bert.TFBertForQuestionAnswering.serving_output - def serving_output(self, output: TFQuestionAnsweringModelOutput) -> TFQuestionAnsweringModelOutput: - hs = tf.convert_to_tensor(output.hidden_states) if self.config.output_hidden_states else None - attns = tf.convert_to_tensor(output.attentions) if self.config.output_attentions else None - - return TFQuestionAnsweringModelOutput( - start_logits=output.start_logits, end_logits=output.end_logits, hidden_states=hs, attentions=attns - ) diff --git a/src/transformers/models/xlnet/modeling_tf_xlnet.py b/src/transformers/models/xlnet/modeling_tf_xlnet.py index 52538ced57ed7e..c5f3805ec98747 100644 --- a/src/transformers/models/xlnet/modeling_tf_xlnet.py +++ b/src/transformers/models/xlnet/modeling_tf_xlnet.py @@ -17,6 +17,9 @@ TF 2.0 XLNet model. """ + +from __future__ import annotations + import warnings from dataclasses import dataclass from typing import List, Optional, Tuple, Union @@ -41,7 +44,6 @@ ) from ...tf_utils import check_embeddings_within_bounds, shape_list, stable_softmax from ...utils import ( - MULTIPLE_CHOICE_DUMMY_INPUTS, ModelOutput, add_code_sample_docstrings, add_start_docstrings, @@ -195,9 +197,9 @@ def call( attn_mask_g, r, seg_mat, - mems: Optional[Union[np.ndarray, tf.Tensor]] = None, - target_mapping: Optional[Union[np.ndarray, tf.Tensor]] = None, - head_mask: Optional[Union[np.ndarray, tf.Tensor]] = None, + mems: np.ndarray | tf.Tensor | None = None, + target_mapping: np.ndarray | tf.Tensor | None = None, + head_mask: np.ndarray | tf.Tensor | None = None, output_attentions: Optional[bool] = False, training: bool = False, ): @@ -369,9 +371,9 @@ def call( attn_mask, pos_emb, seg_mat, - mems: Optional[Union[np.ndarray, tf.Tensor]] = None, - target_mapping: Optional[Union[np.ndarray, tf.Tensor]] = None, - head_mask: Optional[Union[np.ndarray, tf.Tensor]] = None, + mems: np.ndarray | tf.Tensor | None = None, + target_mapping: np.ndarray | tf.Tensor | None = None, + head_mask: np.ndarray | tf.Tensor | None = None, output_attentions: Optional[bool] = False, training: bool = False, ): @@ -582,15 +584,15 @@ def relative_positional_encoding(self, qlen, klen, bsz=None): @unpack_inputs def call( self, - input_ids: Optional[TFModelInputType] = None, - attention_mask: Optional[Union[np.ndarray, tf.Tensor]] = None, - mems: Optional[Union[np.ndarray, tf.Tensor]] = None, - perm_mask: Optional[Union[np.ndarray, tf.Tensor]] = None, - target_mapping: Optional[Union[np.ndarray, tf.Tensor]] = None, - token_type_ids: Optional[Union[np.ndarray, tf.Tensor]] = None, - input_mask: Optional[Union[np.ndarray, tf.Tensor]] = None, - head_mask: Optional[Union[np.ndarray, tf.Tensor]] = None, - inputs_embeds: Optional[Union[np.ndarray, tf.Tensor]] = None, + input_ids: TFModelInputType | None = None, + attention_mask: np.ndarray | tf.Tensor | None = None, + mems: np.ndarray | tf.Tensor | None = None, + perm_mask: np.ndarray | tf.Tensor | None = None, + target_mapping: np.ndarray | tf.Tensor | None = None, + token_type_ids: np.ndarray | tf.Tensor | None = None, + input_mask: np.ndarray | tf.Tensor | None = None, + head_mask: np.ndarray | tf.Tensor | None = None, + inputs_embeds: np.ndarray | tf.Tensor | None = None, use_mems: Optional[bool] = None, output_attentions: Optional[bool] = None, output_hidden_states: Optional[bool] = None, @@ -824,9 +826,9 @@ class TFXLNetModelOutput(ModelOutput): """ last_hidden_state: tf.Tensor = None - mems: Optional[List[tf.Tensor]] = None - hidden_states: Optional[Tuple[tf.Tensor]] = None - attentions: Optional[Tuple[tf.Tensor]] = None + mems: List[tf.Tensor] | None = None + hidden_states: Tuple[tf.Tensor] | None = None + attentions: Tuple[tf.Tensor] | None = None @dataclass @@ -859,11 +861,11 @@ class TFXLNetLMHeadModelOutput(ModelOutput): heads. """ - loss: Optional[tf.Tensor] = None + loss: tf.Tensor | None = None logits: tf.Tensor = None - mems: Optional[List[tf.Tensor]] = None - hidden_states: Optional[Tuple[tf.Tensor]] = None - attentions: Optional[Tuple[tf.Tensor]] = None + mems: List[tf.Tensor] | None = None + hidden_states: Tuple[tf.Tensor] | None = None + attentions: Tuple[tf.Tensor] | None = None @dataclass @@ -893,11 +895,11 @@ class TFXLNetForSequenceClassificationOutput(ModelOutput): heads. """ - loss: Optional[tf.Tensor] = None + loss: tf.Tensor | None = None logits: tf.Tensor = None - mems: Optional[List[tf.Tensor]] = None - hidden_states: Optional[Tuple[tf.Tensor]] = None - attentions: Optional[Tuple[tf.Tensor]] = None + mems: List[tf.Tensor] | None = None + hidden_states: Tuple[tf.Tensor] | None = None + attentions: Tuple[tf.Tensor] | None = None @dataclass @@ -927,11 +929,11 @@ class TFXLNetForTokenClassificationOutput(ModelOutput): heads. """ - loss: Optional[tf.Tensor] = None + loss: tf.Tensor | None = None logits: tf.Tensor = None - mems: Optional[List[tf.Tensor]] = None - hidden_states: Optional[Tuple[tf.Tensor]] = None - attentions: Optional[Tuple[tf.Tensor]] = None + mems: List[tf.Tensor] | None = None + hidden_states: Tuple[tf.Tensor] | None = None + attentions: Tuple[tf.Tensor] | None = None @dataclass @@ -963,11 +965,11 @@ class TFXLNetForMultipleChoiceOutput(ModelOutput): heads. """ - loss: Optional[tf.Tensor] = None + loss: tf.Tensor | None = None logits: tf.Tensor = None - mems: Optional[List[tf.Tensor]] = None - hidden_states: Optional[Tuple[tf.Tensor]] = None - attentions: Optional[Tuple[tf.Tensor]] = None + mems: List[tf.Tensor] | None = None + hidden_states: Tuple[tf.Tensor] | None = None + attentions: Tuple[tf.Tensor] | None = None @dataclass @@ -999,12 +1001,12 @@ class TFXLNetForQuestionAnsweringSimpleOutput(ModelOutput): heads. """ - loss: Optional[tf.Tensor] = None + loss: tf.Tensor | None = None start_logits: tf.Tensor = None end_logits: tf.Tensor = None - mems: Optional[List[tf.Tensor]] = None - hidden_states: Optional[Tuple[tf.Tensor]] = None - attentions: Optional[Tuple[tf.Tensor]] = None + mems: List[tf.Tensor] | None = None + hidden_states: Tuple[tf.Tensor] | None = None + attentions: Tuple[tf.Tensor] | None = None XLNET_START_DOCSTRING = r""" @@ -1140,15 +1142,15 @@ def __init__(self, config, *inputs, **kwargs): ) def call( self, - input_ids: Optional[TFModelInputType] = None, - attention_mask: Optional[Union[np.ndarray, tf.Tensor]] = None, - mems: Optional[Union[np.ndarray, tf.Tensor]] = None, - perm_mask: Optional[Union[np.ndarray, tf.Tensor]] = None, - target_mapping: Optional[Union[np.ndarray, tf.Tensor]] = None, - token_type_ids: Optional[Union[np.ndarray, tf.Tensor]] = None, - input_mask: Optional[Union[np.ndarray, tf.Tensor]] = None, - head_mask: Optional[Union[np.ndarray, tf.Tensor]] = None, - inputs_embeds: Optional[Union[np.ndarray, tf.Tensor]] = None, + input_ids: TFModelInputType | None = None, + attention_mask: np.ndarray | tf.Tensor | None = None, + mems: np.ndarray | tf.Tensor | None = None, + perm_mask: np.ndarray | tf.Tensor | None = None, + target_mapping: np.ndarray | tf.Tensor | None = None, + token_type_ids: np.ndarray | tf.Tensor | None = None, + input_mask: np.ndarray | tf.Tensor | None = None, + head_mask: np.ndarray | tf.Tensor | None = None, + inputs_embeds: np.ndarray | tf.Tensor | None = None, use_mems: Optional[bool] = None, output_attentions: Optional[bool] = None, output_hidden_states: Optional[bool] = None, @@ -1174,15 +1176,6 @@ def call( return outputs - def serving_output(self, output): - hs = tf.convert_to_tensor(output.hidden_states) if self.config.output_hidden_states else None - attns = tf.convert_to_tensor(output.attentions) if self.config.output_attentions else None - mems = tf.convert_to_tensor(output.mems) if output.mems is not None else None - - return TFXLNetModelOutput( - last_hidden_state=output.last_hidden_state, mems=mems, hidden_states=hs, attentions=attns - ) - @add_start_docstrings( """ @@ -1249,20 +1242,20 @@ def prepare_inputs_for_generation(self, inputs, past_key_values=None, use_mems=N @replace_return_docstrings(output_type=TFXLNetLMHeadModelOutput, config_class=_CONFIG_FOR_DOC) def call( self, - input_ids: Optional[TFModelInputType] = None, - attention_mask: Optional[Union[np.ndarray, tf.Tensor]] = None, - mems: Optional[Union[np.ndarray, tf.Tensor]] = None, - perm_mask: Optional[Union[np.ndarray, tf.Tensor]] = None, - target_mapping: Optional[Union[np.ndarray, tf.Tensor]] = None, - token_type_ids: Optional[Union[np.ndarray, tf.Tensor]] = None, - input_mask: Optional[Union[np.ndarray, tf.Tensor]] = None, - head_mask: Optional[Union[np.ndarray, tf.Tensor]] = None, - inputs_embeds: Optional[Union[np.ndarray, tf.Tensor]] = None, + input_ids: TFModelInputType | None = None, + attention_mask: np.ndarray | tf.Tensor | None = None, + mems: np.ndarray | tf.Tensor | None = None, + perm_mask: np.ndarray | tf.Tensor | None = None, + target_mapping: np.ndarray | tf.Tensor | None = None, + token_type_ids: np.ndarray | tf.Tensor | None = None, + input_mask: np.ndarray | tf.Tensor | None = None, + head_mask: np.ndarray | tf.Tensor | None = None, + inputs_embeds: np.ndarray | tf.Tensor | None = None, use_mems: Optional[bool] = None, output_attentions: Optional[bool] = None, output_hidden_states: Optional[bool] = None, return_dict: Optional[bool] = None, - labels: Optional[Union[np.ndarray, tf.Tensor]] = None, + labels: np.ndarray | tf.Tensor | None = None, training: bool = False, ) -> Union[TFXLNetLMHeadModelOutput, Tuple[tf.Tensor]]: r""" @@ -1342,13 +1335,6 @@ def call( attentions=transformer_outputs.attentions, ) - def serving_output(self, output): - hs = tf.convert_to_tensor(output.hidden_states) if self.config.output_hidden_states else None - attns = tf.convert_to_tensor(output.attentions) if self.config.output_attentions else None - mems = tf.convert_to_tensor(output.mems) if output.mems is not None else None - - return TFXLNetLMHeadModelOutput(logits=output.logits, mems=mems, hidden_states=hs, attentions=attns) - @add_start_docstrings( """ @@ -1379,20 +1365,20 @@ def __init__(self, config, *inputs, **kwargs): ) def call( self, - input_ids: Optional[TFModelInputType] = None, - attention_mask: Optional[Union[np.ndarray, tf.Tensor]] = None, - mems: Optional[Union[np.ndarray, tf.Tensor]] = None, - perm_mask: Optional[Union[np.ndarray, tf.Tensor]] = None, - target_mapping: Optional[Union[np.ndarray, tf.Tensor]] = None, - token_type_ids: Optional[Union[np.ndarray, tf.Tensor]] = None, - input_mask: Optional[Union[np.ndarray, tf.Tensor]] = None, - head_mask: Optional[Union[np.ndarray, tf.Tensor]] = None, - inputs_embeds: Optional[Union[np.ndarray, tf.Tensor]] = None, + input_ids: TFModelInputType | None = None, + attention_mask: np.ndarray | tf.Tensor | None = None, + mems: np.ndarray | tf.Tensor | None = None, + perm_mask: np.ndarray | tf.Tensor | None = None, + target_mapping: np.ndarray | tf.Tensor | None = None, + token_type_ids: np.ndarray | tf.Tensor | None = None, + input_mask: np.ndarray | tf.Tensor | None = None, + head_mask: np.ndarray | tf.Tensor | None = None, + inputs_embeds: np.ndarray | tf.Tensor | None = None, use_mems: Optional[bool] = None, output_attentions: Optional[bool] = None, output_hidden_states: Optional[bool] = None, return_dict: Optional[bool] = None, - labels: Optional[Union[np.ndarray, tf.Tensor]] = None, + labels: np.ndarray | tf.Tensor | None = None, training: bool = False, ) -> Union[TFXLNetForSequenceClassificationOutput, Tuple[tf.Tensor]]: r""" @@ -1436,15 +1422,6 @@ def call( attentions=transformer_outputs.attentions, ) - def serving_output(self, output): - hs = tf.convert_to_tensor(output.hidden_states) if self.config.output_hidden_states else None - attns = tf.convert_to_tensor(output.attentions) if self.config.output_attentions else None - mems = tf.convert_to_tensor(output.mems) if output.mems is not None else None - - return TFXLNetForSequenceClassificationOutput( - logits=output.logits, mems=mems, hidden_states=hs, attentions=attns - ) - @add_start_docstrings( """ @@ -1465,16 +1442,6 @@ def __init__(self, config, *inputs, **kwargs): 1, kernel_initializer=get_initializer(config.initializer_range), name="logits_proj" ) - @property - def dummy_inputs(self): - """ - Dummy inputs to build the network. - - Returns: - tf.Tensor with dummy inputs - """ - return {"input_ids": tf.constant(MULTIPLE_CHOICE_DUMMY_INPUTS, dtype=tf.int32)} - @unpack_inputs @add_start_docstrings_to_model_forward(XLNET_INPUTS_DOCSTRING.format("batch_size, num_choices, sequence_length")) @add_code_sample_docstrings( @@ -1484,20 +1451,20 @@ def dummy_inputs(self): ) def call( self, - input_ids: Optional[TFModelInputType] = None, - token_type_ids: Optional[Union[np.ndarray, tf.Tensor]] = None, - input_mask: Optional[Union[np.ndarray, tf.Tensor]] = None, - attention_mask: Optional[Union[np.ndarray, tf.Tensor]] = None, - mems: Optional[Union[np.ndarray, tf.Tensor]] = None, - perm_mask: Optional[Union[np.ndarray, tf.Tensor]] = None, - target_mapping: Optional[Union[np.ndarray, tf.Tensor]] = None, - head_mask: Optional[Union[np.ndarray, tf.Tensor]] = None, - inputs_embeds: Optional[Union[np.ndarray, tf.Tensor]] = None, + input_ids: TFModelInputType | None = None, + token_type_ids: np.ndarray | tf.Tensor | None = None, + input_mask: np.ndarray | tf.Tensor | None = None, + attention_mask: np.ndarray | tf.Tensor | None = None, + mems: np.ndarray | tf.Tensor | None = None, + perm_mask: np.ndarray | tf.Tensor | None = None, + target_mapping: np.ndarray | tf.Tensor | None = None, + head_mask: np.ndarray | tf.Tensor | None = None, + inputs_embeds: np.ndarray | tf.Tensor | None = None, use_mems: Optional[bool] = None, output_attentions: Optional[bool] = None, output_hidden_states: Optional[bool] = None, return_dict: Optional[bool] = None, - labels: Optional[Union[np.ndarray, tf.Tensor]] = None, + labels: np.ndarray | tf.Tensor | None = None, training: bool = False, ) -> Union[TFXLNetForMultipleChoiceOutput, Tuple[tf.Tensor]]: r""" @@ -1556,27 +1523,6 @@ def call( attentions=transformer_outputs.attentions, ) - @tf.function( - input_signature=[ - { - "input_ids": tf.TensorSpec((None, None, None), tf.int32, name="input_ids"), - "attention_mask": tf.TensorSpec((None, None, None), tf.int32, name="attention_mask"), - "token_type_ids": tf.TensorSpec((None, None, None), tf.int32, name="token_type_ids"), - } - ] - ) - def serving(self, inputs): - output = self.call(inputs) - - return self.serving_output(output) - - def serving_output(self, output): - hs = tf.convert_to_tensor(output.hidden_states) if self.config.output_hidden_states else None - attns = tf.convert_to_tensor(output.attentions) if self.config.output_attentions else None - mems = tf.convert_to_tensor(output.mems) if output.mems is not None else None - - return TFXLNetForMultipleChoiceOutput(logits=output.logits, mems=mems, hidden_states=hs, attentions=attns) - @add_start_docstrings( """ @@ -1604,20 +1550,20 @@ def __init__(self, config, *inputs, **kwargs): ) def call( self, - input_ids: Optional[TFModelInputType] = None, - attention_mask: Optional[Union[np.ndarray, tf.Tensor]] = None, - mems: Optional[Union[np.ndarray, tf.Tensor]] = None, - perm_mask: Optional[Union[np.ndarray, tf.Tensor]] = None, - target_mapping: Optional[Union[np.ndarray, tf.Tensor]] = None, - token_type_ids: Optional[Union[np.ndarray, tf.Tensor]] = None, - input_mask: Optional[Union[np.ndarray, tf.Tensor]] = None, - head_mask: Optional[Union[np.ndarray, tf.Tensor]] = None, - inputs_embeds: Optional[Union[np.ndarray, tf.Tensor]] = None, + input_ids: TFModelInputType | None = None, + attention_mask: np.ndarray | tf.Tensor | None = None, + mems: np.ndarray | tf.Tensor | None = None, + perm_mask: np.ndarray | tf.Tensor | None = None, + target_mapping: np.ndarray | tf.Tensor | None = None, + token_type_ids: np.ndarray | tf.Tensor | None = None, + input_mask: np.ndarray | tf.Tensor | None = None, + head_mask: np.ndarray | tf.Tensor | None = None, + inputs_embeds: np.ndarray | tf.Tensor | None = None, use_mems: Optional[bool] = None, output_attentions: Optional[bool] = None, output_hidden_states: Optional[bool] = None, return_dict: Optional[bool] = None, - labels: Optional[Union[np.ndarray, tf.Tensor]] = None, + labels: np.ndarray | tf.Tensor | None = None, training: bool = False, ) -> Union[TFXLNetForTokenClassificationOutput, Tuple[tf.Tensor]]: r""" @@ -1657,13 +1603,6 @@ def call( attentions=transformer_outputs.attentions, ) - def serving_output(self, output): - hs = tf.convert_to_tensor(output.hidden_states) if self.config.output_hidden_states else None - attns = tf.convert_to_tensor(output.attentions) if self.config.output_attentions else None - mems = tf.convert_to_tensor(output.mems) if output.mems is not None else None - - return TFXLNetForTokenClassificationOutput(logits=output.logits, mems=mems, hidden_states=hs, attentions=attns) - @add_start_docstrings( """ @@ -1689,21 +1628,21 @@ def __init__(self, config, *inputs, **kwargs): ) def call( self, - input_ids: Optional[TFModelInputType] = None, - attention_mask: Optional[Union[np.ndarray, tf.Tensor]] = None, - mems: Optional[Union[np.ndarray, tf.Tensor]] = None, - perm_mask: Optional[Union[np.ndarray, tf.Tensor]] = None, - target_mapping: Optional[Union[np.ndarray, tf.Tensor]] = None, - token_type_ids: Optional[Union[np.ndarray, tf.Tensor]] = None, - input_mask: Optional[Union[np.ndarray, tf.Tensor]] = None, - head_mask: Optional[Union[np.ndarray, tf.Tensor]] = None, - inputs_embeds: Optional[Union[np.ndarray, tf.Tensor]] = None, + input_ids: TFModelInputType | None = None, + attention_mask: np.ndarray | tf.Tensor | None = None, + mems: np.ndarray | tf.Tensor | None = None, + perm_mask: np.ndarray | tf.Tensor | None = None, + target_mapping: np.ndarray | tf.Tensor | None = None, + token_type_ids: np.ndarray | tf.Tensor | None = None, + input_mask: np.ndarray | tf.Tensor | None = None, + head_mask: np.ndarray | tf.Tensor | None = None, + inputs_embeds: np.ndarray | tf.Tensor | None = None, use_mems: Optional[bool] = None, output_attentions: Optional[bool] = None, output_hidden_states: Optional[bool] = None, return_dict: Optional[bool] = None, - start_positions: Optional[Union[np.ndarray, tf.Tensor]] = None, - end_positions: Optional[Union[np.ndarray, tf.Tensor]] = None, + start_positions: np.ndarray | tf.Tensor | None = None, + end_positions: np.ndarray | tf.Tensor | None = None, training: bool = False, ) -> Union[TFXLNetForQuestionAnsweringSimpleOutput, Tuple[tf.Tensor]]: r""" @@ -1757,16 +1696,3 @@ def call( hidden_states=transformer_outputs.hidden_states, attentions=transformer_outputs.attentions, ) - - def serving_output(self, output): - hs = tf.convert_to_tensor(output.hidden_states) if self.config.output_hidden_states else None - attns = tf.convert_to_tensor(output.attentions) if self.config.output_attentions else None - mems = tf.convert_to_tensor(output.mems) if output.mems is not None else None - - return TFXLNetForQuestionAnsweringSimpleOutput( - start_logits=output.start_logits, - end_logits=output.end_logits, - mems=mems, - hidden_states=hs, - attentions=attns, - ) diff --git a/src/transformers/optimization_tf.py b/src/transformers/optimization_tf.py index b42e04041b8ffa..382eae2a30e284 100644 --- a/src/transformers/optimization_tf.py +++ b/src/transformers/optimization_tf.py @@ -117,9 +117,9 @@ def create_optimizer( The beta2 to use in Adam. adam_epsilon (`float`, *optional*, defaults to 1e-8): The epsilon to use in Adam. - adam_clipnorm: (`float`, *optional*, defaults to `None`): + adam_clipnorm (`float`, *optional*, defaults to `None`): If not `None`, clip the gradient norm for each weight tensor to this value. - adam_global_clipnorm: (`float`, *optional*, defaults to `None`) + adam_global_clipnorm (`float`, *optional*, defaults to `None`) If not `None`, clip gradient norm to this value. When using this argument, the norm is computed over all weight tensors, as if they were concatenated into a single vector. weight_decay_rate (`float`, *optional*, defaults to 0): diff --git a/src/transformers/pipelines/__init__.py b/src/transformers/pipelines/__init__.py index 84d461cd1ae730..818164f3c28b6d 100755 --- a/src/transformers/pipelines/__init__.py +++ b/src/transformers/pipelines/__init__.py @@ -505,7 +505,7 @@ def clean_custom_task(task_info): def pipeline( task: str = None, - model: Optional = None, + model: Optional[Union[str, "PreTrainedModel", "TFPreTrainedModel"]] = None, config: Optional[Union[str, PretrainedConfig]] = None, tokenizer: Optional[Union[str, PreTrainedTokenizer, "PreTrainedTokenizerFast"]] = None, feature_extractor: Optional[Union[str, PreTrainedFeatureExtractor]] = None, diff --git a/src/transformers/pipelines/audio_utils.py b/src/transformers/pipelines/audio_utils.py index 62c2b00c467ab3..f17dd68d6439d9 100644 --- a/src/transformers/pipelines/audio_utils.py +++ b/src/transformers/pipelines/audio_utils.py @@ -119,7 +119,7 @@ def ffmpeg_microphone_live( The length of the striding to be used. Stride is used to provide context to a model on the (left, right) of an audio sample but without using that part to actually make the prediction. Setting this does not change the length of the chunk. - format_for_conversion: (`str`, defalts to `f32le`) + format_for_conversion (`str`, defalts to `f32le`) The name of the format of the audio samples to be returned by ffmpeg. The standard is `f32le`, `s16le` could also be used. Return: diff --git a/src/transformers/pipelines/base.py b/src/transformers/pipelines/base.py index de6c9a8ec4d945..4fc256266449bc 100644 --- a/src/transformers/pipelines/base.py +++ b/src/transformers/pipelines/base.py @@ -15,7 +15,6 @@ import collections import csv import importlib -import inspect import json import os import pickle @@ -36,7 +35,7 @@ from ..modelcard import ModelCard from ..models.auto.configuration_auto import AutoConfig from ..tokenization_utils import PreTrainedTokenizer -from ..utils import ModelOutput, add_end_docstrings, is_tf_available, is_torch_available, logging +from ..utils import ModelOutput, add_end_docstrings, infer_framework, is_tf_available, is_torch_available, logging GenericTensor = Union[List["GenericTensor"], "torch.Tensor", "tf.Tensor"] @@ -278,7 +277,7 @@ def infer_framework_load_model( if isinstance(model, str): raise ValueError(f"Could not load model {model} with any of the following classes: {class_tuple}.") - framework = "tf" if "keras.engine.training.Model" in str(inspect.getmro(model.__class__)) else "pt" + framework = infer_framework(model.__class__) return framework, model @@ -351,7 +350,7 @@ def get_framework(model, revision: Optional[str] = None): except OSError: model = TFAutoModel.from_pretrained(model, revision=revision) - framework = "tf" if "keras.engine.training.Model" in str(inspect.getmro(model.__class__)) else "pt" + framework = infer_framework(model.__class__) return framework @@ -515,7 +514,7 @@ def from_str( Creates an instance of the right subclass of [`~pipelines.PipelineDataFormat`] depending on `format`. Args: - format: (`str`): + format (`str`): The format of the desired pipeline. Acceptable values are `"json"`, `"csv"` or `"pipe"`. output_path (`str`, *optional*): Where to save the outgoing data. diff --git a/src/transformers/pipelines/image_to_text.py b/src/transformers/pipelines/image_to_text.py index f34dad3cef8142..1c082c2ecb38b4 100644 --- a/src/transformers/pipelines/image_to_text.py +++ b/src/transformers/pipelines/image_to_text.py @@ -20,6 +20,8 @@ from ..models.auto.modeling_tf_auto import TF_MODEL_FOR_VISION_2_SEQ_MAPPING if is_torch_available(): + import torch + from ..models.auto.modeling_auto import MODEL_FOR_VISION_2_SEQ_MAPPING logger = logging.get_logger(__name__) @@ -56,8 +58,13 @@ def __init__(self, *args, **kwargs): TF_MODEL_FOR_VISION_2_SEQ_MAPPING if self.framework == "tf" else MODEL_FOR_VISION_2_SEQ_MAPPING ) - def _sanitize_parameters(self, max_new_tokens=None, generate_kwargs=None): + def _sanitize_parameters(self, max_new_tokens=None, generate_kwargs=None, prompt=None): forward_kwargs = {} + preprocess_params = {} + + if prompt is not None: + preprocess_params["prompt"] = prompt + if generate_kwargs is not None: forward_kwargs["generate_kwargs"] = generate_kwargs if max_new_tokens is not None: @@ -69,7 +76,7 @@ def _sanitize_parameters(self, max_new_tokens=None, generate_kwargs=None): " please use only one" ) forward_kwargs["generate_kwargs"]["max_new_tokens"] = max_new_tokens - return {}, forward_kwargs, {} + return preprocess_params, forward_kwargs, {} def __call__(self, images: Union[str, List[str], "Image.Image", List["Image.Image"]], **kwargs): """ @@ -98,9 +105,43 @@ def __call__(self, images: Union[str, List[str], "Image.Image", List["Image.Imag """ return super().__call__(images, **kwargs) - def preprocess(self, image): + def preprocess(self, image, prompt=None): image = load_image(image) - model_inputs = self.image_processor(images=image, return_tensors=self.framework) + + if prompt is not None: + if not isinstance(prompt, str): + raise ValueError( + f"Received an invalid text input, got - {type(prompt)} - but expected a single string. " + "Note also that one single text can be provided for conditional image to text generation." + ) + + model_type = self.model.config.model_type + + if model_type == "git": + model_inputs = self.image_processor(images=image, return_tensors=self.framework) + input_ids = self.tokenizer(text=prompt, add_special_tokens=False).input_ids + input_ids = [self.tokenizer.cls_token_id] + input_ids + input_ids = torch.tensor(input_ids).unsqueeze(0) + model_inputs.update({"input_ids": input_ids}) + + elif model_type == "pix2struct": + model_inputs = self.image_processor(images=image, header_text=prompt, return_tensors=self.framework) + + elif model_type != "vision-encoder-decoder": + # vision-encoder-decoder does not support conditional generation + model_inputs = self.image_processor(images=image, return_tensors=self.framework) + text_inputs = self.tokenizer(prompt, return_tensors=self.framework) + model_inputs.update(text_inputs) + + else: + raise ValueError(f"Model type {model_type} does not support conditional text generation") + + else: + model_inputs = self.image_processor(images=image, return_tensors=self.framework) + + if self.model.config.model_type == "git" and prompt is None: + model_inputs["input_ids"] = None + return model_inputs def _forward(self, model_inputs, generate_kwargs=None): diff --git a/src/transformers/tf_utils.py b/src/transformers/tf_utils.py index 306f73c0b1ba36..0900ac587c4646 100644 --- a/src/transformers/tf_utils.py +++ b/src/transformers/tf_utils.py @@ -166,3 +166,90 @@ def check_embeddings_within_bounds(tensor: tf.Tensor, embed_dim: int, tensor_nam f"layer's input dimension ({embed_dim}). The likely cause is some problem at tokenization time." ), ) + + +def save_attributes_to_hdf5_group(group, name, data): + """Saves attributes (data) of the specified name into the HDF5 group. + + This method deals with an inherent problem of HDF5 file which is not able to store data larger than + HDF5_OBJECT_HEADER_LIMIT bytes. + + Args: + group: A pointer to a HDF5 group. + name: A name of the attributes to save. + data: Attributes data to store. + + Raises: + RuntimeError: If any single attribute is too large to be saved. + + Copied from Keras to Transformers to avoid versioning issues. + """ + HDF5_OBJECT_HEADER_LIMIT = 64512 + # Check that no item in `data` is larger than `HDF5_OBJECT_HEADER_LIMIT` + # because in that case even chunking the array would not make the saving + # possible. + bad_attributes = [x for x in data if len(x) > HDF5_OBJECT_HEADER_LIMIT] + + # Expecting this to never be true. + if bad_attributes: + raise RuntimeError( + "The following attributes cannot be saved to HDF5 file because " + f"they are larger than {HDF5_OBJECT_HEADER_LIMIT} " + f"bytes: {bad_attributes}" + ) + + data_npy = np.asarray(data) + + num_chunks = 1 + chunked_data = np.array_split(data_npy, num_chunks) + + # This will never loop forever thanks to the test above. + while any(x.nbytes > HDF5_OBJECT_HEADER_LIMIT for x in chunked_data): + num_chunks += 1 + chunked_data = np.array_split(data_npy, num_chunks) + + if num_chunks > 1: + for chunk_id, chunk_data in enumerate(chunked_data): + group.attrs["%s%d" % (name, chunk_id)] = chunk_data + else: + group.attrs[name] = data + + +def load_attributes_from_hdf5_group(group, name): + """Loads attributes of the specified name from the HDF5 group. + + This method deals with an inherent problem of HDF5 file which is not able to store data larger than + HDF5_OBJECT_HEADER_LIMIT bytes. + + Args: + group: A pointer to a HDF5 group. + name: A name of the attributes to load. + + Returns: + data: Attributes data. + + Copied from Keras to Transformers to avoid versioning issues. + """ + if name in group.attrs: + data = [n.decode("utf8") if hasattr(n, "decode") else n for n in group.attrs[name]] + else: + data = [] + chunk_id = 0 + while "%s%d" % (name, chunk_id) in group.attrs: + data.extend( + [n.decode("utf8") if hasattr(n, "decode") else n for n in group.attrs["%s%d" % (name, chunk_id)]] + ) + chunk_id += 1 + return data + + +def expand_1d(data): + """Expands 1-dimensional `Tensor`s into 2-dimensional `Tensor`s. + Copied from Keras to here to avoid versioning issues.""" + + def _expand_single_1d_tensor(t): + if isinstance(t, tf.Tensor) and t.shape.rank == 1: + return tf.expand_dims(t, axis=-1) + return t + + return tf.nest.map_structure(_expand_single_1d_tensor, data) diff --git a/src/transformers/time_series_utils.py b/src/transformers/time_series_utils.py index b07451253e87d8..02eddd72cebd35 100644 --- a/src/transformers/time_series_utils.py +++ b/src/transformers/time_series_utils.py @@ -171,7 +171,7 @@ class StudentTOutput(DistributionOutput): @classmethod def domain_map(cls, df: torch.Tensor, loc: torch.Tensor, scale: torch.Tensor): - scale = cls.squareplus(scale) + scale = cls.squareplus(scale).clamp_min(torch.finfo(scale.dtype).eps) df = 2.0 + cls.squareplus(df) return df.squeeze(-1), loc.squeeze(-1), scale.squeeze(-1) @@ -186,7 +186,7 @@ class NormalOutput(DistributionOutput): @classmethod def domain_map(cls, loc: torch.Tensor, scale: torch.Tensor): - scale = cls.squareplus(scale) + scale = cls.squareplus(scale).clamp_min(torch.finfo(scale.dtype).eps) return loc.squeeze(-1), scale.squeeze(-1) diff --git a/src/transformers/tokenization_utils_base.py b/src/transformers/tokenization_utils_base.py index a186c8338badcc..ecc9d5011d781d 100644 --- a/src/transformers/tokenization_utils_base.py +++ b/src/transformers/tokenization_utils_base.py @@ -2093,7 +2093,7 @@ def save_pretrained( If `True`, will save the tokenizer in legacy format. If the "slow" tokenizer doesn't exits, a value error is raised. - filename_prefix: (`str`, *optional*): + filename_prefix (`str`, *optional*): A prefix to add to the names of the files saved by the tokenizer. push_to_hub (`bool`, *optional*, defaults to `False`): Whether or not to push your model to the Hugging Face model hub after saving it. You can specify the diff --git a/src/transformers/tools/agents.py b/src/transformers/tools/agents.py index b04f622e15cf99..0ecb600d9212a9 100644 --- a/src/transformers/tools/agents.py +++ b/src/transformers/tools/agents.py @@ -27,7 +27,7 @@ from ..models.auto import AutoTokenizer from ..utils import is_openai_available, is_torch_available, logging from .base import TASK_MAPPING, TOOL_CONFIG_FILE, Tool, load_tool, supports_remote -from .prompts import CHAT_MESSAGE_PROMPT, CHAT_PROMPT_TEMPLATE, RUN_PROMPT_TEMPLATE +from .prompts import CHAT_MESSAGE_PROMPT, download_prompt from .python_interpreter import evaluate @@ -193,9 +193,13 @@ class Agent: Args: chat_prompt_template (`str`, *optional*): - Pass along your own prompt if you want to override the default template for the `chat` method. + Pass along your own prompt if you want to override the default template for the `chat` method. Can be the + actual prompt template or a repo ID (on the Hugging Face Hub). The prompt should be in a file named + `chat_prompt_template.txt` in this repo in this case. run_prompt_template (`str`, *optional*): - Pass along your own prompt if you want to override the default template for the `run` method. + Pass along your own prompt if you want to override the default template for the `run` method. Can be the + actual prompt template or a repo ID (on the Hugging Face Hub). The prompt should be in a file named + `run_prompt_template.txt` in this repo in this case. additional_tools ([`Tool`], list of tools or dictionary with tool values, *optional*): Any additional tools to include on top of the default ones. If you pass along a tool with the same name as one of the default tools, that default tool will be overridden. @@ -204,8 +208,9 @@ class Agent: def __init__(self, chat_prompt_template=None, run_prompt_template=None, additional_tools=None): _setup_default_tools() - self.chat_prompt_template = CHAT_PROMPT_TEMPLATE if chat_prompt_template is None else chat_prompt_template - self.run_prompt_template = RUN_PROMPT_TEMPLATE if run_prompt_template is None else run_prompt_template + agent_name = self.__class__.__name__ + self.chat_prompt_template = download_prompt(chat_prompt_template, agent_name, mode="chat") + self.run_prompt_template = download_prompt(run_prompt_template, agent_name, mode="run") self._toolbox = HUGGINGFACE_DEFAULT_TOOLS.copy() self.log = print if additional_tools is not None: @@ -367,9 +372,13 @@ class OpenAiAgent(Agent): api_key (`str`, *optional*): The API key to use. If unset, will look for the environment variable `"OPENAI_API_KEY"`. chat_prompt_template (`str`, *optional*): - Pass along your own prompt if you want to override the default template for the `chat` method. + Pass along your own prompt if you want to override the default template for the `chat` method. Can be the + actual prompt template or a repo ID (on the Hugging Face Hub). The prompt should be in a file named + `chat_prompt_template.txt` in this repo in this case. run_prompt_template (`str`, *optional*): - Pass along your own prompt if you want to override the default template for the `run` method. + Pass along your own prompt if you want to override the default template for the `run` method. Can be the + actual prompt template or a repo ID (on the Hugging Face Hub). The prompt should be in a file named + `run_prompt_template.txt` in this repo in this case. additional_tools ([`Tool`], list of tools or dictionary with tool values, *optional*): Any additional tools to include on top of the default ones. If you pass along a tool with the same name as one of the default tools, that default tool will be overridden. @@ -455,9 +464,13 @@ class HfAgent(Agent): The token to use as HTTP bearer authorization for remote files. If unset, will use the token generated when running `huggingface-cli login` (stored in `~/.huggingface`). chat_prompt_template (`str`, *optional*): - Pass along your own prompt if you want to override the default template for the `chat` method. + Pass along your own prompt if you want to override the default template for the `chat` method. Can be the + actual prompt template or a repo ID (on the Hugging Face Hub). The prompt should be in a file named + `chat_prompt_template.txt` in this repo in this case. run_prompt_template (`str`, *optional*): - Pass along your own prompt if you want to override the default template for the `run` method. + Pass along your own prompt if you want to override the default template for the `run` method. Can be the + actual prompt template or a repo ID (on the Hugging Face Hub). The prompt should be in a file named + `run_prompt_template.txt` in this repo in this case. additional_tools ([`Tool`], list of tools or dictionary with tool values, *optional*): Any additional tools to include on top of the default ones. If you pass along a tool with the same name as one of the default tools, that default tool will be overridden. @@ -521,9 +534,13 @@ class LocalAgent(Agent): tokenizer ([`PreTrainedTokenizer`]): The tokenizer to use for the agent. chat_prompt_template (`str`, *optional*): - Pass along your own prompt if you want to override the default template for the `chat` method. + Pass along your own prompt if you want to override the default template for the `chat` method. Can be the + actual prompt template or a repo ID (on the Hugging Face Hub). The prompt should be in a file named + `chat_prompt_template.txt` in this repo in this case. run_prompt_template (`str`, *optional*): - Pass along your own prompt if you want to override the default template for the `run` method. + Pass along your own prompt if you want to override the default template for the `run` method. Can be the + actual prompt template or a repo ID (on the Hugging Face Hub). The prompt should be in a file named + `run_prompt_template.txt` in this repo in this case. additional_tools ([`Tool`], list of tools or dictionary with tool values, *optional*): Any additional tools to include on top of the default ones. If you pass along a tool with the same name as one of the default tools, that default tool will be overridden. diff --git a/src/transformers/tools/prompts.py b/src/transformers/tools/prompts.py index 796b3c242ccfe1..2dbb799f859ffe 100644 --- a/src/transformers/tools/prompts.py +++ b/src/transformers/tools/prompts.py @@ -14,173 +14,35 @@ # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. # See the License for the specific language governing permissions and # limitations under the License. +import re -# docstyle-ignore -RUN_PROMPT_TEMPLATE = """I will ask you to perform a task, your job is to come up with a series of simple commands in Python that will perform the task. -To help you, I will give you access to a set of tools that you can use. Each tool is a Python function and has a description explaining the task it performs, the inputs it expects and the outputs it returns. -You should first explain which tool you will use to perform the task and for what reason, then write the code in Python. -Each instruction in Python should be a simple assignment. You can print intermediate results if it makes sense to do so. - -Tools: -<> - - -Task: "Answer the question in the variable `question` about the image stored in the variable `image`. The question is in French." - -I will use the following tools: `translator` to translate the question into English and then `image_qa` to answer the question on the input image. - -Answer: -```py -translated_question = translator(question=question, src_lang="French", tgt_lang="English") -print(f"The translated question is {translated_question}.") -answer = image_qa(image=image, question=translated_question) -print(f"The answer is {answer}") -``` - -Task: "Identify the oldest person in the `document` and create an image showcasing the result." - -I will use the following tools: `document_qa` to find the oldest person in the document, then `image_generator` to generate an image according to the answer. - -Answer: -```py -answer = document_qa(document, question="What is the oldest person?") -print(f"The answer is {answer}.") -image = image_generator(answer) -``` - -Task: "Generate an image using the text given in the variable `caption`." - -I will use the following tool: `image_generator` to generate an image. - -Answer: -```py -image = image_generator(prompt=caption) -``` - -Task: "Summarize the text given in the variable `text` and read it out loud." - -I will use the following tools: `summarizer` to create a summary of the input text, then `text_reader` to read it out loud. - -Answer: -```py -summarized_text = summarizer(text) -print(f"Summary: {summarized_text}") -audio_summary = text_reader(summarized_text) -``` - -Task: "Answer the question in the variable `question` about the text in the variable `text`. Use the answer to generate an image." - -I will use the following tools: `text_qa` to create the answer, then `image_generator` to generate an image according to the answer. - -Answer: -```py -answer = text_qa(text=text, question=question) -print(f"The answer is {answer}.") -image = image_generator(answer) -``` - -Task: "Caption the following `image`." - -I will use the following tool: `image_captioner` to generate a caption for the image. - -Answer: -```py -caption = image_captioner(image) -``` - -Task: "<>" - -I will use the following""" +from ..utils import cached_file # docstyle-ignore -CHAT_PROMPT_TEMPLATE = """Below are a series of dialogues between various people and an AI assistant specialized in coding. The AI assistant tries to be helpful, polite, honest, and humble-but-knowledgeable. - -The job of the AI assistant is to come up with a series of simple commands in Python that will perform the task the human wants to perform. -To help with that, the AI assistant has access to a set of tools. Each tool is a Python function and has a description explaining the task it performs, the inputs it expects and the outputs it returns. -The AI assistant should first explain the tools it will use to perform the task and for what reason, then write the code in Python. -Each instruction in Python should be a simple assignment. The AI assistant can print intermediate results if it makes sense to do so. - -Tools: -<> - -===== - -Human: Answer the question in the variable `question` about the image stored in the variable `image`. - -Assistant: I will use the tool `image_qa` to answer the question on the input image. - -```py -answer = image_qa(text=question, image=image) -print(f"The answer is {answer}") -``` - -Human: I tried this code, it worked but didn't give me a good result. The question is in French - -Assistant: In this case, the question needs to be translated first. I will use the tool `translator` to do this. - -```py -translated_question = translator(question=question, src_lang="French", tgt_lang="English") -print(f"The translated question is {translated_question}.") -answer = image_qa(text=translated_question, image=image) -print(f"The answer is {answer}") -``` - -===== - -Human: Identify the oldest person in the `document`. - -Assistant: I will use the tool `document_qa` to find the oldest person in the document. - -```py -answer = document_qa(document, question="What is the oldest person?") -print(f"The answer is {answer}.") -``` - -Human: Can you generate an image with the result? - -Assistant: I will use the tool `image_generator` to do that. - -```py -image = image_generator(answer) -``` - -===== - -Human: Summarize the text given in the variable `text` and read it out loud. - -Assistant: I will use the tool `summarizer` to create a summary of the input text, then the tool `text_reader` to read it out loud. - -```py -summarized_text = summarizer(text) -print(f"Summary: {summarized_text}") -audio_summary = text_reader(text=summary) -``` - -Human: I got the following error: "The variable `summary` is not defined." - -Assistant: My bad! Let's try this code instead. - -```py -summarized_text = summarizer(text) -print(f"Summary: {summarized_text}") -audio_summary = text_reader(text=summarized_text) -``` +CHAT_MESSAGE_PROMPT = """ +Human: <> -Human: It worked! Can you translate the summary in German? +Assistant: """ -Assistant: I will use the tool `translator` to translate the text in German. -```py -translated_summary = translator(summarized_text, src_lang="English", tgt_lang="German") -``` +DEFAULT_PROMPTS_REPO = "huggingface-tools/default-prompts" +PROMPT_FILES = {"chat": "chat_prompt_template.txt", "run": "run_prompt_template.txt"} -==== -""" +def download_prompt(prompt_or_repo_id, agent_name, mode="run"): + """ + Downloads and caches the prompt from a repo and returns it contents (if necessary) + """ + if prompt_or_repo_id is None: + prompt_or_repo_id = DEFAULT_PROMPTS_REPO -# docstyle-ignore -CHAT_MESSAGE_PROMPT = """ -Human: <> + # prompt is considered a repo ID when it does not contain any kind of space + if re.search("\\s", prompt_or_repo_id) is not None: + return prompt_or_repo_id -Assistant: """ + prompt_file = cached_file( + prompt_or_repo_id, PROMPT_FILES[mode], repo_type="dataset", user_agent={"agent": agent_name} + ) + with open(prompt_file, "r", encoding="utf-8") as f: + return f.read() diff --git a/src/transformers/trainer.py b/src/transformers/trainer.py index ce4e1f24067187..357cfc45bddd64 100755 --- a/src/transformers/trainer.py +++ b/src/transformers/trainer.py @@ -391,8 +391,8 @@ def __init__( ) # At this stage the model is already loaded - if getattr(model, "is_loaded_in_8bit", False): - if getattr(model, "_is_int8_training_enabled", False): + if getattr(model, "is_loaded_in_kbit", False): + if getattr(model, "_is_kbit_training_enabled", False): logger.info( "The model is loaded in 8-bit precision. To train this model you need to add additional modules" " inside the model such as adapters using `peft` library and freeze the model weights. Please" @@ -1170,6 +1170,38 @@ def get_optimizer_cls_and_kwargs(args: TrainingArguments) -> Tuple[Any, Any]: optimizer_kwargs.update(adam_kwargs) except ImportError: raise ValueError("Trainer tried to instantiate apex FusedAdam but apex is not installed!") + elif args.optim in [ + OptimizerNames.ADAMW_BNB, + OptimizerNames.ADAMW_8BIT, + OptimizerNames.PAGED_ADAMW, + OptimizerNames.PAGED_ADAMW_8BIT, + OptimizerNames.LION, + OptimizerNames.LION_8BIT, + OptimizerNames.PAGED_LION, + OptimizerNames.PAGED_LION_8BIT, + ]: + try: + from bitsandbytes.optim import AdamW, Lion + + is_paged = False + optim_bits = 32 + optimizer_cls = None + additional_optim_kwargs = adam_kwargs + if "paged" in args.optim: + is_paged = True + if "8bit" in args.optim: + optim_bits = 8 + if "adam" in args.optim: + optimizer_cls = AdamW + elif "lion" in args.optim: + optimizer_cls = Lion + additional_optim_kwargs = {"betas": (args.adam_beta1, args.adam_beta2)} + + bnb_kwargs = {"is_paged": is_paged, "optim_bits": optim_bits} + optimizer_kwargs.update(additional_optim_kwargs) + optimizer_kwargs.update(bnb_kwargs) + except ImportError: + raise ValueError("Trainer tried to instantiate bnb optimizer but bnb is not installed!") elif args.optim == OptimizerNames.ADAMW_BNB: try: from bitsandbytes.optim import Adam8bit @@ -1672,6 +1704,7 @@ def _inner_training_loop( self, batch_size=None, args=None, resume_from_checkpoint=None, trial=None, ignore_keys_for_eval=None ): self._train_batch_size = batch_size + logger.debug(f"Currently training with a batch size of: {self._train_batch_size}") # Data loader and number of training steps train_dataloader = self.get_train_dataloader() @@ -1779,7 +1812,7 @@ def _inner_training_loop( logger.info("***** Running training *****") logger.info(f" Num examples = {num_examples:,}") logger.info(f" Num Epochs = {num_train_epochs:,}") - logger.info(f" Instantaneous batch size per device = {args.per_device_train_batch_size:,}") + logger.info(f" Instantaneous batch size per device = {self._train_batch_size:,}") logger.info(f" Total train batch size (w. parallel, distributed & accumulation) = {total_train_batch_size:,}") logger.info(f" Gradient Accumulation steps = {args.gradient_accumulation_steps}") logger.info(f" Total optimization steps = {max_steps:,}") @@ -2226,16 +2259,35 @@ def _load_best_model(self): state_dict["_smp_is_partial"] = False load_result = model.load_state_dict(state_dict, strict=True) else: - # We load the model state dict on the CPU to avoid an OOM error. - if self.args.save_safetensors and os.path.isfile(best_safe_model_path): - state_dict = safetensors.torch.load_file(best_safe_model_path, device="cpu") + if hasattr(model, "base_model") and getattr(model.base_model, "is_8bit_serializable", False): + # If train base_8_bit_models using PEFT & LoRA, assume that adapter have been saved properly. + if hasattr(model, "active_adapter") and hasattr(model, "load_adapter"): + if os.path.exists(os.path.join(self.state.best_model_checkpoint, "adapter_model.bin")): + model.load_adapter(self.state.best_model_checkpoint, model.active_adapter) + # Load_adapter has no return value present, modify it when appropriate. + from torch.nn.modules.module import _IncompatibleKeys + + load_result = _IncompatibleKeys([], []) + else: + logger.warning( + "The intermediate checkpoints of PEFT may not be saved correctly, " + "using `TrainerCallback` to save adapter_model.bin in corresponding folders, " + "here are some examples https://github.com/huggingface/peft/issues/96" + ) + else: + # We can't do pure 8bit training using transformers. + logger.warning("Could not loading a quantized checkpoint.") else: - state_dict = torch.load(best_model_path, map_location="cpu") + # We load the model state dict on the CPU to avoid an OOM error. + if self.args.save_safetensors and os.path.isfile(best_safe_model_path): + state_dict = safetensors.torch.load_file(best_safe_model_path, device="cpu") + else: + state_dict = torch.load(best_model_path, map_location="cpu") - # If the model is on the GPU, it still works! - # workaround for FSDP bug https://github.com/pytorch/pytorch/issues/82963 - # which takes *args instead of **kwargs - load_result = model.load_state_dict(state_dict, False) + # If the model is on the GPU, it still works! + # workaround for FSDP bug https://github.com/pytorch/pytorch/issues/82963 + # which takes *args instead of **kwargs + load_result = model.load_state_dict(state_dict, False) if not is_sagemaker_mp_enabled(): self._issue_warnings_after_load(load_result) elif os.path.exists(os.path.join(self.state.best_model_checkpoint, WEIGHTS_INDEX_NAME)): @@ -3347,7 +3399,9 @@ def _nested_gather(self, tensors, name=None): tensors = nested_xla_mesh_reduce(tensors, name) elif is_sagemaker_mp_enabled(): tensors = smp_gather(tensors) - elif self.args.parallel_mode == ParallelMode.DISTRIBUTED: + elif (self.args.distributed_state is not None and self.args.distributed_state.distributed_type != "NO") or ( + self.args.distributed_state is None and self.local_rank != -1 + ): tensors = distributed_concat(tensors) return tensors @@ -3646,9 +3700,10 @@ def _push_from_checkpoint(self, checkpoint_folder): commit_message = f"Training in progress, step {self.state.global_step}" else: commit_message = f"Training in progress, epoch {int(self.state.epoch)}" - _, self.push_in_progress = self.repo.push_to_hub( - commit_message=commit_message, blocking=False, auto_lfs_prune=True - ) + push_work = self.repo.push_to_hub(commit_message=commit_message, blocking=False, auto_lfs_prune=True) + # Return type of `Repository.push_to_hub` is either None or a tuple. + if push_work is not None: + self.push_in_progress = push_work[1] except Exception as e: logger.error(f"Error when pushing to hub: {e}") finally: diff --git a/src/transformers/training_args.py b/src/transformers/training_args.py index b42400e57a6e25..63876e053ad44e 100644 --- a/src/transformers/training_args.py +++ b/src/transformers/training_args.py @@ -139,10 +139,17 @@ class OptimizerNames(ExplicitEnum): ADAMW_TORCH_XLA = "adamw_torch_xla" ADAMW_APEX_FUSED = "adamw_apex_fused" ADAFACTOR = "adafactor" - ADAMW_BNB = "adamw_bnb_8bit" ADAMW_ANYPRECISION = "adamw_anyprecision" SGD = "sgd" ADAGRAD = "adagrad" + ADAMW_BNB = "adamw_bnb_8bit" + ADAMW_8BIT = "adamw_8bit" # just an alias for adamw_bnb_8bit + LION_8BIT = "lion_8bit" + LION = "lion_32bit" + PAGED_ADAMW = "paged_adamw_32bit" + PAGED_ADAMW_8BIT = "paged_adamw_8bit" + PAGED_LION = "paged_lion_32bit" + PAGED_LION_8BIT = "paged_lion_8bit" @dataclass @@ -1622,6 +1629,9 @@ def _setup_devices(self) -> "torch.device": device = torch.device("cuda", local_rank) self._n_gpu = 1 torch.cuda.set_device(device) + elif is_sagemaker_dp_enabled(): + self.distributed_state = PartialState(_use_sagemaker_dp=True) + self._n_gpu = 1 elif self.deepspeed: # Need to do similar for Accelerator init os.environ["ACCELERATE_USE_DEEPSPEED"] = "true" @@ -1646,8 +1656,9 @@ def _setup_devices(self) -> "torch.device": if is_torch_tpu_available(): device = self.distributed_state.device self._n_gpu = 0 - elif is_sagemaker_dp_enabled(): - self._n_gpu = 1 + elif is_sagemaker_dp_enabled() or is_sagemaker_mp_enabled(): + # Already set _n_gpu + pass elif self.distributed_state.distributed_type == DistributedType.NO: if self.use_mps_device: if not torch.backends.mps.is_available(): @@ -1673,7 +1684,9 @@ def _setup_devices(self) -> "torch.device": ) device = torch.device("mps") self._n_gpu = 1 - + elif self.no_cuda: + device = torch.device("cpu") + self._n_gpu = 0 else: # if n_gpu is > 1 we'll use nn.DataParallel. # If you only want to use a specific subset of GPUs use `CUDA_VISIBLE_DEVICES=0` @@ -1708,7 +1721,8 @@ def n_gpu(self): """ requires_backends(self, ["torch"]) # Make sure `self._n_gpu` is properly setup. - _ = self._setup_devices + if not hasattr(self, "_n_gpu"): + _ = self._setup_devices return self._n_gpu @property diff --git a/src/transformers/training_args_tf.py b/src/transformers/training_args_tf.py index 847bbdb78a15b2..461c4086acc341 100644 --- a/src/transformers/training_args_tf.py +++ b/src/transformers/training_args_tf.py @@ -66,7 +66,7 @@ class TFTrainingArguments(TrainingArguments): The batch size per GPU/TPU core/CPU for training. per_device_eval_batch_size (`int`, *optional*, defaults to 8): The batch size per GPU/TPU core/CPU for evaluation. - gradient_accumulation_steps: (`int`, *optional*, defaults to 1): + gradient_accumulation_steps (`int`, *optional*, defaults to 1): Number of updates steps to accumulate the gradients for, before performing a backward/update pass. diff --git a/src/transformers/utils/__init__.py b/src/transformers/utils/__init__.py index 35d3638aecdbed..7169c7daf96923 100644 --- a/src/transformers/utils/__init__.py +++ b/src/transformers/utils/__init__.py @@ -39,6 +39,7 @@ expand_dims, find_labels, flatten_dict, + infer_framework, is_jax_tensor, is_numpy_array, is_tensor, diff --git a/src/transformers/utils/bitsandbytes.py b/src/transformers/utils/bitsandbytes.py index 3768506f41138e..5cb82c44d6999b 100644 --- a/src/transformers/utils/bitsandbytes.py +++ b/src/transformers/utils/bitsandbytes.py @@ -1,3 +1,4 @@ +import warnings from copy import deepcopy from packaging import version @@ -15,7 +16,7 @@ from accelerate.utils import find_tied_parameters -def set_module_8bit_tensor_to_device(module, tensor_name, device, value=None, fp16_statistics=None): +def set_module_quantized_tensor_to_device(module, tensor_name, device, value=None, fp16_statistics=None): """ A helper function to set a given tensor (parameter of buffer) of a module on a specific device (note that doing `param.to(device)` creates a new tensor not linked to the parameter, which is why we need this function). The @@ -52,12 +53,16 @@ class `Int8Params` from `bitsandbytes`. if old_value.device == torch.device("meta") and device not in ["meta", torch.device("meta")] and value is None: raise ValueError(f"{tensor_name} is on the meta device, we need a `value` to put in on {device}.") - if is_buffer: - has_fp16_weights = None + is_4bit = False + is_8bit = False + if is_buffer or not is_bitsandbytes_available(): + is_8bit = False + is_4bit = False else: - has_fp16_weights = getattr(module._parameters[tensor_name], "has_fp16_weights", None) + is_4bit = hasattr(bnb.nn, "Params4bit") and isinstance(module._parameters[tensor_name], bnb.nn.Params4bit) + is_8bit = isinstance(module._parameters[tensor_name], bnb.nn.Int8Params) - if has_fp16_weights is not None: + if is_8bit or is_4bit: param = module._parameters[tensor_name] if param.device.type != "cuda": if value is None: @@ -75,11 +80,17 @@ class `Int8Params` from `bitsandbytes`. ) else: new_value = torch.tensor(value, device="cpu") - new_value = bnb.nn.Int8Params(new_value, requires_grad=False, has_fp16_weights=has_fp16_weights).to(device) - module._parameters[tensor_name] = new_value + kwargs = old_value.__dict__ + if is_8bit: + new_value = bnb.nn.Int8Params(new_value, requires_grad=False, **kwargs).to(device) + elif is_4bit: + new_value = bnb.nn.Params4bit(new_value, requires_grad=False, **kwargs).to(device) + + module._parameters[tensor_name] = new_value if fp16_statistics is not None: setattr(module.weight, "SCB", fp16_statistics.to(device)) + else: if value is None: new_value = old_value.to(device) @@ -95,10 +106,10 @@ class `Int8Params` from `bitsandbytes`. module._parameters[tensor_name] = new_value -def replace_8bit_linear(model, threshold=6.0, modules_to_not_convert=None, current_key_name=None): +def replace_with_bnb_linear(model, modules_to_not_convert=None, current_key_name=None, quantization_config=None): """ A helper function to replace all `torch.nn.Linear` modules by `bnb.nn.Linear8bit` modules from the `bitsandbytes` - library. This will enable running your models using mixed int8 precision as described by the paper `GPT3.int8(): + library. This will enable running your models using mixed int8 precision as described by the paper `LLM.int8(): 8-bit Matrix Multiplication for Transformers at Scale`. Make sure `bitsandbytes` compiled with the correct CUDA version of your hardware is installed before running this function. `pip install -i https://test.pypi.org/simple/ bitsandbytes` @@ -113,9 +124,6 @@ def replace_8bit_linear(model, threshold=6.0, modules_to_not_convert=None, curre Parameters: model (`torch.nn.Module`): Input model or `torch.nn.Module` as the function is run recursively. - threshold (`float`, *optional*, defaults to 6.0): - `int8_threshold` for outlier detection as described in the formentioned paper. This parameters is set to - `6.0` as described by the paper. modules_to_not_convert (`List[`str`]`, *optional*, defaults to `["lm_head"]`): Names of the modules to not convert in `Linear8bitLt`. In practice we keep the `lm_head` in full precision for numerical stability reasons. @@ -128,29 +136,65 @@ def replace_8bit_linear(model, threshold=6.0, modules_to_not_convert=None, curre for name, module in model.named_children(): if current_key_name is None: current_key_name = [] - current_key_name.append(name) - - if len(list(module.children())) > 0: - replace_8bit_linear(module, threshold, modules_to_not_convert, current_key_name) if isinstance(module, nn.Linear) and name not in modules_to_not_convert: # Check if the current key is not in the `modules_to_not_convert` if not any(key in ".".join(current_key_name) for key in modules_to_not_convert): with init_empty_weights(): - model._modules[name] = bnb.nn.Linear8bitLt( - module.in_features, - module.out_features, - module.bias is not None, - has_fp16_weights=False, - threshold=threshold, - ) + if quantization_config.quantization_method() == "llm_int8": + model._modules[name] = bnb.nn.Linear8bitLt( + module.in_features, + module.out_features, + module.bias is not None, + has_fp16_weights=quantization_config.llm_int8_has_fp16_weight, + threshold=quantization_config.llm_int8_threshold, + ) + else: + if ( + quantization_config.llm_int8_skip_modules is not None + and name in quantization_config.llm_int8_skip_modules + ): + pass + else: + model._modules[name] = bnb.nn.Linear4bit( + module.in_features, + module.out_features, + module.bias is not None, + quantization_config.bnb_4bit_compute_dtype, + compress_statistics=quantization_config.bnb_4bit_use_double_quant, + quant_type=quantization_config.bnb_4bit_quant_type, + ) # Force requires grad to False to avoid unexpected errors model._modules[name].requires_grad_(False) # Remove the last key for recursion - current_key_name.pop(-1) + if len(list(module.children())) > 0: + replace_with_bnb_linear( + module, + modules_to_not_convert, + current_key_name, + quantization_config, + ) return model +# For backward compatibility +def replace_8bit_linear(*args, **kwargs): + warnings.warn( + "`replace_8bit_linear` will be deprecated in a future version, please use `replace_with_bnb_linear` instead", + FutureWarning, + ) + return replace_with_bnb_linear(*args, **kwargs) + + +# For backward compatiblity +def set_module_8bit_tensor_to_device(*args, **kwargs): + warnings.warn( + "`set_module_8bit_tensor_to_device` will be deprecated in a future version, please use `set_module_quantized_tensor_to_device` instead", + FutureWarning, + ) + return set_module_quantized_tensor_to_device(*args, **kwargs) + + def get_keys_to_not_convert(model): r""" An utility function to get the key of the module to keep in full precision if any For example for CausalLM modules diff --git a/src/transformers/utils/dummy_pt_objects.py b/src/transformers/utils/dummy_pt_objects.py index ed09fe33ad5f93..c32528c4c76430 100644 --- a/src/transformers/utils/dummy_pt_objects.py +++ b/src/transformers/utils/dummy_pt_objects.py @@ -772,6 +772,30 @@ def __init__(self, *args, **kwargs): requires_backends(self, ["torch"]) +AUTOFORMER_PRETRAINED_MODEL_ARCHIVE_LIST = None + + +class AutoformerForPrediction(metaclass=DummyObject): + _backends = ["torch"] + + def __init__(self, *args, **kwargs): + requires_backends(self, ["torch"]) + + +class AutoformerModel(metaclass=DummyObject): + _backends = ["torch"] + + def __init__(self, *args, **kwargs): + requires_backends(self, ["torch"]) + + +class AutoformerPreTrainedModel(metaclass=DummyObject): + _backends = ["torch"] + + def __init__(self, *args, **kwargs): + requires_backends(self, ["torch"]) + + BART_PRETRAINED_MODEL_ARCHIVE_LIST = None diff --git a/src/transformers/utils/generic.py b/src/transformers/utils/generic.py index 23214db8f8591f..afe102408378f1 100644 --- a/src/transformers/utils/generic.py +++ b/src/transformers/utils/generic.py @@ -398,11 +398,10 @@ def can_return_loss(model_class): Args: model_class (`type`): The class of the model. """ - base_classes = str(inspect.getmro(model_class)) - - if "keras.engine.training.Model" in base_classes: + framework = infer_framework(model_class) + if framework == "tf": signature = inspect.signature(model_class.call) # TensorFlow models - elif "torch.nn.modules.module.Module" in base_classes: + elif framework == "pt": signature = inspect.signature(model_class.forward) # PyTorch models else: signature = inspect.signature(model_class.__call__) # Flax models @@ -422,11 +421,10 @@ def find_labels(model_class): model_class (`type`): The class of the model. """ model_name = model_class.__name__ - base_classes = str(inspect.getmro(model_class)) - - if "keras.engine.training.Model" in base_classes: + framework = infer_framework(model_class) + if framework == "tf": signature = inspect.signature(model_class.call) # TensorFlow models - elif "torch.nn.modules.module.Module" in base_classes: + elif framework == "pt": signature = inspect.signature(model_class.forward) # PyTorch models else: signature = inspect.signature(model_class.__call__) # Flax models @@ -565,3 +563,21 @@ def add_model_info_to_auto_map(auto_map, repo_id): auto_map[key] = f"{repo_id}--{value}" return auto_map + + +def infer_framework(model_class): + """ + Infers the framework of a given model without using isinstance(), because we cannot guarantee that the relevant + classes are imported or available. + """ + for base_class in inspect.getmro(model_class): + module = base_class.__module__ + name = base_class.__name__ + if module.startswith("tensorflow") or module.startswith("keras") or name == "TFPreTrainedModel": + return "tf" + elif module.startswith("torch") or name == "PreTrainedModel": + return "pt" + elif module.startswith("flax") or module.startswith("jax") or name == "FlaxPreTrainedModel": + return "flax" + else: + raise TypeError(f"Could not infer framework from class {model_class}.") diff --git a/src/transformers/utils/import_utils.py b/src/transformers/utils/import_utils.py index 037a0d96a13083..be321c1017c353 100644 --- a/src/transformers/utils/import_utils.py +++ b/src/transformers/utils/import_utils.py @@ -19,6 +19,7 @@ import json import os import shutil +import subprocess import sys import warnings from collections import OrderedDict @@ -94,7 +95,6 @@ def _is_package_available(pkg_name: str, return_version: bool = False) -> Union[ _keras_nlp_available = _is_package_available("keras_nlp") _librosa_available = _is_package_available("librosa") _natten_available = _is_package_available("natten") -_ninja_available = _is_package_available("ninja") _onnx_available = _is_package_available("onnx") _openai_available = _is_package_available("openai") _optimum_available = _is_package_available("optimum") @@ -449,7 +449,16 @@ def is_apex_available(): def is_ninja_available(): - return _ninja_available + r""" + Code comes from *torch.utils.cpp_extension.is_ninja_available()*. Returns `True` if the + [ninja](https://ninja-build.org/) build system is available on the system, `False` otherwise. + """ + try: + subprocess.check_output("ninja --version".split()) + except Exception: + return False + else: + return True def is_ipex_available(): diff --git a/src/transformers/utils/quantization_config.py b/src/transformers/utils/quantization_config.py index f123faaab32f59..2647418a13a666 100644 --- a/src/transformers/utils/quantization_config.py +++ b/src/transformers/utils/quantization_config.py @@ -20,7 +20,14 @@ from dataclasses import dataclass from typing import Any, Dict, Union -from ..utils import logging +from packaging import version + +from ..utils import is_torch_available, logging +from ..utils.import_utils import importlib_metadata + + +if is_torch_available(): + import torch logger = logging.get_logger(__name__) @@ -32,14 +39,17 @@ class BitsAndBytesConfig: This is a wrapper class about all possible attributes and features that you can play with a model that has been loaded using `bitsandbytes`. - This replaces `load_in_8bit` therefore both options are mutually exclusive. + This replaces `load_in_8bit` or `load_in_4bit`therefore both options are mutually exclusive. - For now, only arguments that are relative to `LLM.int8()` are supported, therefore the arguments are all termed as - `llm_int8_*`. If more methods are added to `bitsandbytes`, then more arguments will be added to this class. + Currently only supports `LLM.int8()`, `FP4`, and `NF4` quantization. If more methods are added to `bitsandbytes`, + then more arguments will be added to this class. Args: load_in_8bit (`bool`, *optional*, defaults to `False`): This flag is used to enable 8-bit quantization with LLM.int8(). + load_in_4bit (`bool`, *optional*, defaults to `False`): + This flag is used to enable 4-bit quantization by replacing the Linear layers with FP4/NF4 layers from + `bitsandbytes`. llm_int8_threshold (`float`, *optional*, defaults to 6): This corresponds to the outlier threshold for outlier detection as described in `LLM.int8() : 8-bit Matrix Multiplication for Transformers at Scale` paper: https://arxiv.org/abs/2208.07339 Any hidden states value @@ -58,6 +68,18 @@ class BitsAndBytesConfig: your model in different parts and run some parts in int8 on GPU and some parts in fp32 on CPU, you can use this flag. This is useful for offloading large models such as `google/flan-t5-xxl`. Note that the int8 operations will not be run on CPU. + llm_int8_has_fp16_weight (`bool`, *optional*, defaults to `False`): + This flag runs LLM.int8() with 16-bit main weights. This is useful for fine-tuning as the weights do not + have to be converted back and forth for the backward pass. + bnb_4bit_compute_dtype (`torch.dtype` or str, *optional*, defaults to `torch.float32`): + This sets the computational type which might be different than the input time. For example, inputs might be + fp32, but computation can be set to bf16 for speedups. + bnb_4bit_quant_type (`str`, {fp4, fn4}, defaults to `fp4`): + This sets the quantization data type in the bnb.nn.Linear4Bit layers. Options are FP4 and NF4 data types + which are specified by `fp4` or `fn4`. + bnb_4bit_use_double_quant (`bool`, *optional*, defaults to `False`): + This flag is used for nested quantization where the quantization constants from the first quantization are + quantized again. kwargs (`Dict[str, Any]`, *optional*): Additional parameters from which to initialize the configuration object. """ @@ -65,15 +87,33 @@ class BitsAndBytesConfig: def __init__( self, load_in_8bit=False, + load_in_4bit=False, llm_int8_threshold=6.0, llm_int8_skip_modules=None, llm_int8_enable_fp32_cpu_offload=False, + llm_int8_has_fp16_weight=False, + bnb_4bit_compute_dtype=None, + bnb_4bit_quant_type="fp4", + bnb_4bit_use_double_quant=False, **kwargs, ): self.load_in_8bit = load_in_8bit + self.load_in_4bit = load_in_4bit self.llm_int8_threshold = llm_int8_threshold self.llm_int8_skip_modules = llm_int8_skip_modules self.llm_int8_enable_fp32_cpu_offload = llm_int8_enable_fp32_cpu_offload + self.llm_int8_has_fp16_weight = llm_int8_has_fp16_weight + self.bnb_4bit_quant_type = bnb_4bit_quant_type + self.bnb_4bit_use_double_quant = bnb_4bit_use_double_quant + + if bnb_4bit_compute_dtype is None: + self.bnb_4bit_compute_dtype = torch.float32 + elif isinstance(bnb_4bit_compute_dtype, str): + self.bnb_4bit_compute_dtype = getattr(torch, bnb_4bit_compute_dtype) + elif isinstance(bnb_4bit_compute_dtype, torch.dtype): + self.bnb_4bit_compute_dtype = bnb_4bit_compute_dtype + else: + raise ValueError("bnb_4bit_compute_dtype must be a string or a torch.dtype") self.post_init() @@ -86,10 +126,48 @@ def post_init(self): if self.llm_int8_skip_modules is not None and not isinstance(self.llm_int8_skip_modules, list): raise ValueError("llm_int8_skip_modules must be a list of strings") - if not isinstance(self.llm_int8_enable_fp32_cpu_offload, bool): raise ValueError("llm_int8_enable_fp32_cpu_offload must be a boolean") + if not isinstance(self.llm_int8_has_fp16_weight, bool): + raise ValueError("llm_int8_has_fp16_weight must be a boolean") + + if self.bnb_4bit_compute_dtype is not None and not isinstance(self.bnb_4bit_compute_dtype, torch.dtype): + raise ValueError("bnb_4bit_compute_dtype must be torch.dtype") + + if not isinstance(self.bnb_4bit_quant_type, str): + raise ValueError("bnb_4bit_quant_type must be a string") + + if not isinstance(self.bnb_4bit_use_double_quant, bool): + raise ValueError("bnb_4bit_use_double_quant must be a boolean") + + if self.load_in_4bit and not version.parse(importlib_metadata.version("bitsandbytes")) >= version.parse( + "0.39.0" + ): + raise ValueError( + "4 bit quantization requires bitsandbytes>=0.39.0 - please upgrade your bitsandbytes version" + ) + + def is_quantizable(self): + r""" + Returns `True` if the model is quantizable, `False` otherwise. + """ + return self.load_in_8bit or self.load_in_4bit + + def quantization_method(self): + r""" + This method returns the quantization method used for the model. If the model is not quantizable, it returns + `None`. + """ + if self.load_in_8bit: + return "llm_int8" + elif self.load_in_4bit and self.bnb_4bit_quant_type == "fp4": + return "fp4" + elif self.load_in_4bit and self.bnb_4bit_quant_type == "nf4": + return "nf4" + else: + return None + @classmethod def from_dict(cls, config_dict, return_unused_kwargs, **kwargs): """ @@ -107,6 +185,7 @@ def from_dict(cls, config_dict, return_unused_kwargs, **kwargs): Returns: [`BitsAndBytesConfig`]: The configuration object instantiated from those parameters. """ + config = cls(**config_dict) to_remove = [] @@ -144,5 +223,8 @@ def to_dict(self) -> Dict[str, Any]: Serializes this instance to a Python dictionary. Returns: `Dict[str, Any]`: Dictionary of all the attributes that make up this configuration instance. """ + output = copy.deepcopy(self.__dict__) + output["bnb_4bit_compute_dtype"] = str(output["bnb_4bit_compute_dtype"]).split(".")[1] + return output diff --git a/templates/adding_a_new_model/cookiecutter-template-{{cookiecutter.modelname}}/configuration_{{cookiecutter.lowercase_modelname}}.py b/templates/adding_a_new_model/cookiecutter-template-{{cookiecutter.modelname}}/configuration_{{cookiecutter.lowercase_modelname}}.py index 3221696317bde2..2898b5cf6f8f69 100644 --- a/templates/adding_a_new_model/cookiecutter-template-{{cookiecutter.modelname}}/configuration_{{cookiecutter.lowercase_modelname}}.py +++ b/templates/adding_a_new_model/cookiecutter-template-{{cookiecutter.modelname}}/configuration_{{cookiecutter.lowercase_modelname}}.py @@ -107,10 +107,10 @@ class {{cookiecutter.camelcase_modelname}}Config(PretrainedConfig): just in case (e.g., 512 or 1024 or 2048). init_std (`float`, *optional*, defaults to 0.02): The standard deviation of the truncated_normal_initializer for initializing all weight matrices. - encoder_layerdrop: (`float`, *optional*, defaults to 0.0): + encoder_layerdrop (`float`, *optional*, defaults to 0.0): The LayerDrop probability for the encoder. See the [LayerDrop paper](see https://arxiv.org/abs/1909.11556) for more details. - decoder_layerdrop: (`float`, *optional*, defaults to 0.0): + decoder_layerdrop (`float`, *optional*, defaults to 0.0): The LayerDrop probability for the decoder. See the [LayerDrop paper](see https://arxiv.org/abs/1909.11556) for more details. use_cache (`bool`, *optional*, defaults to `True`): diff --git a/templates/adding_a_new_model/cookiecutter-template-{{cookiecutter.modelname}}/modeling_tf_{{cookiecutter.lowercase_modelname}}.py b/templates/adding_a_new_model/cookiecutter-template-{{cookiecutter.modelname}}/modeling_tf_{{cookiecutter.lowercase_modelname}}.py index ffe5e7de95b9eb..982a5807d20621 100644 --- a/templates/adding_a_new_model/cookiecutter-template-{{cookiecutter.modelname}}/modeling_tf_{{cookiecutter.lowercase_modelname}}.py +++ b/templates/adding_a_new_model/cookiecutter-template-{{cookiecutter.modelname}}/modeling_tf_{{cookiecutter.lowercase_modelname}}.py @@ -386,9 +386,9 @@ def call( hidden_states: tf.Tensor, attention_mask: tf.Tensor, head_mask: tf.Tensor, - encoder_hidden_states: Optional[tf.Tensor], - encoder_attention_mask: Optional[tf.Tensor], - past_key_value: Optional[Tuple[tf.Tensor]], + encoder_hidden_states: tf.Tensor | None, + encoder_attention_mask: tf.Tensor | None, + past_key_value: Tuple[tf.Tensor] | None, output_attentions: bool, training: bool = False, ) -> Tuple[tf.Tensor]: @@ -465,9 +465,9 @@ def call( hidden_states: tf.Tensor, attention_mask: tf.Tensor, head_mask: tf.Tensor, - encoder_hidden_states: Optional[tf.Tensor], - encoder_attention_mask: Optional[tf.Tensor], - past_key_values: Optional[Tuple[Tuple[tf.Tensor]]], + encoder_hidden_states: tf.Tensor | None, + encoder_attention_mask: tf.Tensor | None, + past_key_values: Tuple[Tuple[tf.Tensor]] | None, use_cache: Optional[bool], output_attentions: bool, output_hidden_states: bool, @@ -639,14 +639,14 @@ class PreTrainedModel @unpack_inputs def call( self, - input_ids: Optional[TFModelInputType] = None, - attention_mask: Optional[Union[np.ndarray, tf.Tensor]] = None, - token_type_ids: Optional[Union[np.ndarray, tf.Tensor]] = None, - position_ids: Optional[Union[np.ndarray, tf.Tensor]] = None, - head_mask: Optional[Union[np.ndarray, tf.Tensor]] = None, - inputs_embeds: Optional[Union[np.ndarray, tf.Tensor]] = None, - encoder_hidden_states: Optional[Union[np.ndarray, tf.Tensor]] = None, - encoder_attention_mask: Optional[Union[np.ndarray, tf.Tensor]] = None, + input_ids: TFModelInputType | None = None, + attention_mask: np.ndarray | tf.Tensor | None = None, + token_type_ids: np.ndarray | tf.Tensor | None = None, + position_ids: np.ndarray | tf.Tensor | None = None, + head_mask: np.ndarray | tf.Tensor | None = None, + inputs_embeds: np.ndarray | tf.Tensor | None = None, + encoder_hidden_states: np.ndarray | tf.Tensor | None = None, + encoder_attention_mask: np.ndarray | tf.Tensor | None = None, past_key_values: Optional[Tuple[Tuple[Union[np.ndarray, tf.Tensor]]]] = None, use_cache: Optional[bool] = None, output_attentions: Optional[bool] = None, @@ -803,23 +803,6 @@ class TF{{cookiecutter.camelcase_modelname}}PreTrainedModel(TFPreTrainedModel): config_class = {{cookiecutter.camelcase_modelname}}Config base_model_prefix = "{{cookiecutter.lowercase_modelname}}" - @property - def dummy_inputs(self): - """ - Dummy inputs to build the network. - - Returns: - `Dict[str, tf.Tensor]`: The dummy inputs. - """ - dummy = {"input_ids": tf.constant(DUMMY_INPUTS, dtype=tf.int64)} - # Add `encoder_hidden_states` to make the cross-attention layers' weights initialized - if self.config.add_cross_attention: - batch_size, seq_len = tf.constant(DUMMY_INPUTS).shape - shape = (batch_size, seq_len) + (self.config.hidden_size,) - h = tf.random.uniform(shape=shape) - dummy["encoder_hidden_states"] = h - - return dummy {{cookiecutter.uppercase_modelname}}_START_DOCSTRING = r""" @@ -937,14 +920,14 @@ def __init__(self, config: {{cookiecutter.camelcase_modelname}}Config, *inputs, ) def call( self, - input_ids: Optional[TFModelInputType] = None, - attention_mask: Optional[Union[np.ndarray, tf.Tensor]] = None, - token_type_ids: Optional[Union[np.ndarray, tf.Tensor]] = None, - position_ids: Optional[Union[np.ndarray, tf.Tensor]] = None, - head_mask: Optional[Union[np.ndarray, tf.Tensor]] = None, - inputs_embeds: Optional[Union[np.ndarray, tf.Tensor]] = None, - encoder_hidden_states: Optional[Union[np.ndarray, tf.Tensor]] = None, - encoder_attention_mask: Optional[Union[np.ndarray, tf.Tensor]] = None, + input_ids: TFModelInputType | None = None, + attention_mask: np.ndarray | tf.Tensor | None = None, + token_type_ids: np.ndarray | tf.Tensor | None = None, + position_ids: np.ndarray | tf.Tensor | None = None, + head_mask: np.ndarray | tf.Tensor | None = None, + inputs_embeds: np.ndarray | tf.Tensor | None = None, + encoder_hidden_states: np.ndarray | tf.Tensor | None = None, + encoder_attention_mask: np.ndarray | tf.Tensor | None = None, past_key_values: Optional[Tuple[Tuple[Union[np.ndarray, tf.Tensor]]]] = None, use_cache: Optional[bool] = None, output_attentions: Optional[bool] = None, @@ -991,24 +974,6 @@ def call( return outputs - def serving_output( - self, output: TFBaseModelOutputWithPastAndCrossAttentions - ) -> TFBaseModelOutputWithPastAndCrossAttentions: - output_cache = self.config.use_cache and self.config.is_decoder - pkv = tf.convert_to_tensor(output.past_key_values) if output_cache else None - hs = tf.convert_to_tensor(output.hidden_states) if self.config.output_hidden_states else None - attns = tf.convert_to_tensor(output.attentions) if self.config.output_attentions else None - cross_attns = tf.convert_to_tensor(output.cross_attentions) if output.cross_attentions is not None else None - if not (self.config.output_attentions and self.config.add_cross_attention): - cross_attns = None - - return TFBaseModelOutputWithPastAndCrossAttentions( - last_hidden_state=output.last_hidden_state, - past_key_values=pkv, - hidden_states=hs, - attentions=attns, - cross_attentions=cross_attns, - ) @add_start_docstrings("""{{cookiecutter.modelname}} Model with a `language modeling` head on top. """, {{cookiecutter.uppercase_modelname}}_START_DOCSTRING) @@ -1038,16 +1003,16 @@ def get_lm_head(self) -> tf.keras.layers.Layer: ) def call( self, - input_ids: Optional[TFModelInputType] = None, - attention_mask: Optional[Union[np.ndarray, tf.Tensor]] = None, - token_type_ids: Optional[Union[np.ndarray, tf.Tensor]] = None, - position_ids: Optional[Union[np.ndarray, tf.Tensor]] = None, - head_mask: Optional[Union[np.ndarray, tf.Tensor]] = None, - inputs_embeds: Optional[Union[np.ndarray, tf.Tensor]] = None, + input_ids: TFModelInputType | None = None, + attention_mask: np.ndarray | tf.Tensor | None = None, + token_type_ids: np.ndarray | tf.Tensor | None = None, + position_ids: np.ndarray | tf.Tensor | None = None, + head_mask: np.ndarray | tf.Tensor | None = None, + inputs_embeds: np.ndarray | tf.Tensor | None = None, output_attentions: Optional[bool] = None, output_hidden_states: Optional[bool] = None, return_dict: Optional[bool] = None, - labels: Optional[Union[np.ndarray, tf.Tensor]] = None, + labels: np.ndarray | tf.Tensor | None = None, training: Optional[bool] = False, ) -> Union[TFMaskedLMOutput, Tuple[tf.Tensor]]: r""" @@ -1084,13 +1049,6 @@ def call( attentions=outputs.attentions, ) - # Copied from transformers.models.bert.modeling_tf_bert.TFBertForMaskedLM.serving_output - def serving_output(self, output: TFMaskedLMOutput) -> TFMaskedLMOutput: - hs = tf.convert_to_tensor(output.hidden_states) if self.config.output_hidden_states else None - attns = tf.convert_to_tensor(output.attentions) if self.config.output_attentions else None - - return TFMaskedLMOutput(logits=output.logits, hidden_states=hs, attentions=attns) - @add_start_docstrings( """{{cookiecutter.modelname}} Model with a `language modeling` head on top for CLM fine-tuning. """, {{cookiecutter.uppercase_modelname}}_START_DOCSTRING @@ -1129,20 +1087,20 @@ def prepare_inputs_for_generation(self, inputs, past_key_values=None, attention_ ) def call( self, - input_ids: Optional[TFModelInputType] = None, - attention_mask: Optional[Union[np.ndarray, tf.Tensor]] = None, - token_type_ids: Optional[Union[np.ndarray, tf.Tensor]] = None, - position_ids: Optional[Union[np.ndarray, tf.Tensor]] = None, - head_mask: Optional[Union[np.ndarray, tf.Tensor]] = None, - inputs_embeds: Optional[Union[np.ndarray, tf.Tensor]] = None, - encoder_hidden_states: Optional[Union[np.ndarray, tf.Tensor]] = None, - encoder_attention_mask: Optional[Union[np.ndarray, tf.Tensor]] = None, + input_ids: TFModelInputType | None = None, + attention_mask: np.ndarray | tf.Tensor | None = None, + token_type_ids: np.ndarray | tf.Tensor | None = None, + position_ids: np.ndarray | tf.Tensor | None = None, + head_mask: np.ndarray | tf.Tensor | None = None, + inputs_embeds: np.ndarray | tf.Tensor | None = None, + encoder_hidden_states: np.ndarray | tf.Tensor | None = None, + encoder_attention_mask: np.ndarray | tf.Tensor | None = None, past_key_values: Optional[Tuple[Tuple[Union[np.ndarray, tf.Tensor]]]] = None, use_cache: Optional[bool] = None, output_attentions: Optional[bool] = None, output_hidden_states: Optional[bool] = None, return_dict: Optional[bool] = None, - labels: Optional[Union[np.ndarray, tf.Tensor]] = None, + labels: np.ndarray | tf.Tensor | None = None, training: Optional[bool] = False, ) -> Union[TFCausalLMOutputWithCrossAttentions, Tuple[tf.Tensor]]: r""" @@ -1206,19 +1164,6 @@ def call( cross_attentions=outputs.cross_attentions, ) - # Copied from transformers.models.bert.modeling_tf_bert.TFBertLMHeadModel.serving_output - def serving_output(self, output: TFCausalLMOutputWithCrossAttentions) -> TFCausalLMOutputWithCrossAttentions: - output_cache = self.config.use_cache and self.config.is_decoder - pkv = tf.convert_to_tensor(output.past_key_values) if output_cache else None - hs = tf.convert_to_tensor(output.hidden_states) if self.config.output_hidden_states else None - attns = tf.convert_to_tensor(output.attentions) if self.config.output_attentions else None - cross_attns = tf.convert_to_tensor(output.cross_attentions) if output.cross_attentions is not None else None - if not (self.config.output_attentions and self.config.add_cross_attention): - cross_attns = None - - return TFCausalLMOutputWithCrossAttentions( - logits=output.logits, past_key_values=pkv, hidden_states=hs, attentions=attns, cross_attentions=cross_attns - ) class TF{{cookiecutter.camelcase_modelname}}ClassificationHead(tf.keras.layers.Layer): @@ -1274,16 +1219,16 @@ def __init__(self, config: {{cookiecutter.camelcase_modelname}}Config, *inputs, ) def call( self, - input_ids: Optional[TFModelInputType] = None, - attention_mask: Optional[Union[np.ndarray, tf.Tensor]] = None, - token_type_ids: Optional[Union[np.ndarray, tf.Tensor]] = None, - position_ids: Optional[Union[np.ndarray, tf.Tensor]] = None, - head_mask: Optional[Union[np.ndarray, tf.Tensor]] = None, - inputs_embeds: Optional[Union[np.ndarray, tf.Tensor]] = None, + input_ids: TFModelInputType | None = None, + attention_mask: np.ndarray | tf.Tensor | None = None, + token_type_ids: np.ndarray | tf.Tensor | None = None, + position_ids: np.ndarray | tf.Tensor | None = None, + head_mask: np.ndarray | tf.Tensor | None = None, + inputs_embeds: np.ndarray | tf.Tensor | None = None, output_attentions: Optional[bool] = None, output_hidden_states: Optional[bool] = None, return_dict: Optional[bool] = None, - labels: Optional[Union[np.ndarray, tf.Tensor]] = None, + labels: np.ndarray | tf.Tensor | None = None, training: Optional[bool] = False, ) -> Union[TFSequenceClassifierOutput, Tuple[tf.Tensor]]: r""" @@ -1318,13 +1263,6 @@ def call( attentions=outputs.attentions, ) - # Copied from transformers.models.bert.modeling_tf_bert.TFBertForSequenceClassification.serving_output - def serving_output(self, output: TFSequenceClassifierOutput) -> TFSequenceClassifierOutput: - hs = tf.convert_to_tensor(output.hidden_states) if self.config.output_hidden_states else None - attns = tf.convert_to_tensor(output.attentions) if self.config.output_attentions else None - - return TFSequenceClassifierOutput(logits=output.logits, hidden_states=hs, attentions=attns) - @add_start_docstrings( """{{cookiecutter.modelname}} Model with a multiple choice classification head on top (a linear layer on top of @@ -1343,16 +1281,6 @@ def __init__(self, config: {{cookiecutter.camelcase_modelname}}Config, *inputs, units=1, kernel_initializer=get_initializer(config.initializer_range), name="classifier" ) - @property - def dummy_inputs(self) -> Dict[str, tf.Tensor]: - """ - Dummy inputs to build the network. - - Returns: - tf.Tensor with dummy inputs - """ - return {"input_ids": tf.constant(MULTIPLE_CHOICE_DUMMY_INPUTS, dtype=tf.int64)} - @unpack_inputs @add_start_docstrings_to_model_forward({{cookiecutter.uppercase_modelname}}_INPUTS_DOCSTRING.format("batch_size, num_choices, sequence_length")) @add_code_sample_docstrings( @@ -1362,16 +1290,16 @@ def dummy_inputs(self) -> Dict[str, tf.Tensor]: ) def call( self, - input_ids: Optional[TFModelInputType] = None, - attention_mask: Optional[Union[np.ndarray, tf.Tensor]] = None, - token_type_ids: Optional[Union[np.ndarray, tf.Tensor]] = None, - position_ids: Optional[Union[np.ndarray, tf.Tensor]] = None, - head_mask: Optional[Union[np.ndarray, tf.Tensor]] = None, - inputs_embeds: Optional[Union[np.ndarray, tf.Tensor]] = None, + input_ids: TFModelInputType | None = None, + attention_mask: np.ndarray | tf.Tensor | None = None, + token_type_ids: np.ndarray | tf.Tensor | None = None, + position_ids: np.ndarray | tf.Tensor | None = None, + head_mask: np.ndarray | tf.Tensor | None = None, + inputs_embeds: np.ndarray | tf.Tensor | None = None, output_attentions: Optional[bool] = None, output_hidden_states: Optional[bool] = None, return_dict: Optional[bool] = None, - labels: Optional[Union[np.ndarray, tf.Tensor]] = None, + labels: np.ndarray | tf.Tensor | None = None, training: Optional[bool] = False, ) -> Union[TFMultipleChoiceModelOutput, Tuple[tf.Tensor]]: r""" @@ -1441,24 +1369,6 @@ def call( attentions=outputs.attentions, ) - @tf.function(input_signature=[{ - "input_ids": tf.TensorSpec((None, None, None), tf.int32, name="input_ids"), - "attention_mask": tf.TensorSpec((None, None, None), tf.int32, name="attention_mask"), - "token_type_ids": tf.TensorSpec((None, None, None), tf.int32, name="token_type_ids"), - }]) - # Copied from transformers.models.bert.modeling_tf_bert.TFBertForMultipleChoice.serving - def serving(self, inputs: Dict[str, tf.Tensor]) -> TFMultipleChoiceModelOutput: - output = self.call(input_ids=inputs) - - return self.serving_output(output) - - # Copied from transformers.models.bert.modeling_tf_bert.TFBertForMultipleChoice.serving_output - def serving_output(self, output: TFMultipleChoiceModelOutput) -> TFMultipleChoiceModelOutput: - hs = tf.convert_to_tensor(output.hidden_states) if self.config.output_hidden_states else None - attns = tf.convert_to_tensor(output.attentions) if self.config.output_attentions else None - - return TFMultipleChoiceModelOutput(logits=output.logits, hidden_states=hs, attentions=attns) - @add_start_docstrings( """{{cookiecutter.modelname}} Model with a token classification head on top (a linear layer on top of @@ -1487,16 +1397,16 @@ def __init__(self, config: {{cookiecutter.camelcase_modelname}}Config, *inputs, ) def call( self, - input_ids: Optional[TFModelInputType] = None, - attention_mask: Optional[Union[np.ndarray, tf.Tensor]] = None, - token_type_ids: Optional[Union[np.ndarray, tf.Tensor]] = None, - position_ids: Optional[Union[np.ndarray, tf.Tensor]] = None, - head_mask: Optional[Union[np.ndarray, tf.Tensor]] = None, - inputs_embeds: Optional[Union[np.ndarray, tf.Tensor]] = None, + input_ids: TFModelInputType | None = None, + attention_mask: np.ndarray | tf.Tensor | None = None, + token_type_ids: np.ndarray | tf.Tensor | None = None, + position_ids: np.ndarray | tf.Tensor | None = None, + head_mask: np.ndarray | tf.Tensor | None = None, + inputs_embeds: np.ndarray | tf.Tensor | None = None, output_attentions: Optional[bool] = None, output_hidden_states: Optional[bool] = None, return_dict: Optional[bool] = None, - labels: Optional[Union[np.ndarray, tf.Tensor]] = None, + labels: np.ndarray | tf.Tensor | None = None, training: Optional[bool] = False, ) -> Union[TFTokenClassifierOutput, Tuple[tf.Tensor]]: r""" @@ -1532,13 +1442,6 @@ def call( attentions=outputs.attentions, ) - # Copied from transformers.models.bert.modeling_tf_bert.TFBertForTokenClassification.serving_output - def serving_output(self, output: TFTokenClassifierOutput) -> TFTokenClassifierOutput: - hs = tf.convert_to_tensor(output.hidden_states) if self.config.output_hidden_states else None - attns = tf.convert_to_tensor(output.attentions) if self.config.output_attentions else None - - return TFTokenClassifierOutput(logits=output.logits, hidden_states=hs, attentions=attns) - @add_start_docstrings( """{{cookiecutter.modelname}} Model with a span classification head on top for extractive question-answering tasks like SQuAD (a linear @@ -1566,17 +1469,17 @@ def __init__(self, config: {{cookiecutter.camelcase_modelname}}Config, *inputs, ) def call( self, - input_ids: Optional[TFModelInputType] = None, - attention_mask: Optional[Union[np.ndarray, tf.Tensor]] = None, - token_type_ids: Optional[Union[np.ndarray, tf.Tensor]] = None, - position_ids: Optional[Union[np.ndarray, tf.Tensor]] = None, - head_mask: Optional[Union[np.ndarray, tf.Tensor]] = None, - inputs_embeds: Optional[Union[np.ndarray, tf.Tensor]] = None, + input_ids: TFModelInputType | None = None, + attention_mask: np.ndarray | tf.Tensor | None = None, + token_type_ids: np.ndarray | tf.Tensor | None = None, + position_ids: np.ndarray | tf.Tensor | None = None, + head_mask: np.ndarray | tf.Tensor | None = None, + inputs_embeds: np.ndarray | tf.Tensor | None = None, output_attentions: Optional[bool] = None, output_hidden_states: Optional[bool] = None, return_dict: Optional[bool] = None, - start_positions: Optional[Union[np.ndarray, tf.Tensor]] = None, - end_positions: Optional[Union[np.ndarray, tf.Tensor]] = None, + start_positions: np.ndarray | tf.Tensor | None = None, + end_positions: np.ndarray | tf.Tensor | None = None, training: Optional[bool] = False, ) -> Union[TFQuestionAnsweringModelOutput, Tuple[tf.Tensor]]: r""" @@ -1625,14 +1528,6 @@ def call( attentions=outputs.attentions, ) - # Copied from transformers.models.bert.modeling_tf_bert.TFBertForQuestionAnswering.serving_output - def serving_output(self, output: TFQuestionAnsweringModelOutput) -> TFQuestionAnsweringModelOutput: - hs = tf.convert_to_tensor(output.hidden_states) if self.config.output_hidden_states else None - attns = tf.convert_to_tensor(output.attentions) if self.config.output_attentions else None - - return TFQuestionAnsweringModelOutput( - start_logits=output.start_logits, end_logits=output.end_logits, hidden_states=hs, attentions=attns - ) {% else %} import random @@ -1777,12 +1672,12 @@ def _shape(self, tensor: tf.Tensor, seq_len: int, bsz: int): def call( self, hidden_states: tf.Tensor, - key_value_states: Optional[tf.Tensor] = None, - past_key_value: Optional[Tuple[Tuple[tf.Tensor]]] = None, - attention_mask: Optional[tf.Tensor] = None, - layer_head_mask: Optional[tf.Tensor] = None, + key_value_states: tf.Tensor | None = None, + past_key_value: Tuple[Tuple[tf.Tensor]] | None = None, + attention_mask: tf.Tensor | None = None, + layer_head_mask: tf.Tensor | None = None, training=False, - ) -> Tuple[tf.Tensor, Optional[tf.Tensor]]: + ) -> Tuple[tf.Tensor, tf.Tensor | None]: """Input shape: Batch x Time x Channel""" # if key_value_states are provided this layer is used as a cross-attention layer @@ -1962,12 +1857,12 @@ def __init__(self, config: {{cookiecutter.camelcase_modelname}}Config, **kwargs) def call( self, hidden_states, - attention_mask: Optional[tf.Tensor] = None, - encoder_hidden_states: Optional[tf.Tensor] = None, - encoder_attention_mask: Optional[tf.Tensor] = None, - layer_head_mask: Optional[tf.Tensor] = None, - cross_attn_layer_head_mask: Optional[tf.Tensor] = None, - past_key_value: Optional[Tuple[tf.Tensor]] = None, + attention_mask: tf.Tensor | None = None, + encoder_hidden_states: tf.Tensor | None = None, + encoder_attention_mask: tf.Tensor | None = None, + layer_head_mask: tf.Tensor | None = None, + cross_attn_layer_head_mask: tf.Tensor | None = None, + past_key_value: Tuple[tf.Tensor] | None = None, training=False, ) -> Tuple[tf.Tensor, tf.Tensor, Tuple[Tuple[tf.Tensor]]]: """ @@ -2043,34 +1938,6 @@ class TF{{cookiecutter.camelcase_modelname}}PreTrainedModel(TFPreTrainedModel): config_class = {{cookiecutter.camelcase_modelname}}Config base_model_prefix = "model" - @property - def dummy_inputs(self): - pad_token = 1 - input_ids = tf.cast(tf.convert_to_tensor(DUMMY_INPUTS), tf.int32) - decoder_input_ids = tf.cast(tf.convert_to_tensor(DUMMY_INPUTS), tf.int32) - dummy_inputs = { - "decoder_input_ids": decoder_input_ids, - "attention_mask": tf.math.not_equal(input_ids, pad_token), - "input_ids": input_ids, - } - return dummy_inputs - - @tf.function( - input_signature=[ - { - "input_ids": tf.TensorSpec((None, None), tf.int32, name="input_ids"), - "attention_mask": tf.TensorSpec((None, None), tf.int32, name="attention_mask"), - "decoder_input_ids": tf.TensorSpec((None, None), tf.int32, name="decoder_input_ids"), - "decoder_attention_mask": tf.TensorSpec((None, None), tf.int32, name="decoder_attention_mask"), - } - ] - ) - # Copied from transformers.models.bart.modeling_tf_bart.TFBartPretrainedModel.serving - def serving(self, inputs): - output = self.call(inputs) - - return self.serving_output(output) - {{cookiecutter.uppercase_modelname}}_START_DOCSTRING = r""" This model inherits from [`TFPreTrainedModel`]. Check the superclass documentation for the @@ -2777,26 +2644,6 @@ def call( return outputs - # Copied from transformers.models.bart.modeling_tf_bart.TFBartModel.serving_output - def serving_output(self, output): - pkv = tf.tuple(output.past_key_values)[1] if self.config.use_cache else None - dec_hs = tf.convert_to_tensor(output.decoder_hidden_states) if self.config.output_hidden_states else None - dec_attns = tf.convert_to_tensor(output.decoder_attentions) if self.config.output_attentions else None - cross_attns = tf.convert_to_tensor(output.cross_attentions) if self.config.output_attentions else None - enc_hs = tf.convert_to_tensor(output.encoder_hidden_states) if self.config.output_hidden_states else None - enc_attns = tf.convert_to_tensor(output.encoder_attentions) if self.config.output_attentions else None - - return TFSeq2SeqModelOutput( - last_hidden_state=output.last_hidden_state, - past_key_values=pkv, - decoder_hidden_states=dec_hs, - decoder_attentions=dec_attns, - cross_attentions=cross_attns, - encoder_last_hidden_state=output.encoder_last_hidden_state, - encoder_hidden_states=enc_hs, - encoder_attentions=enc_attns, - ) - # Copied from transformers.models.bart.modeling_tf_bart.BiasLayer class BiasLayer(tf.keras.layers.Layer): @@ -2944,26 +2791,6 @@ def call( encoder_attentions=outputs.encoder_attentions, # 2 of e out ) - # Copied from transformers.models.bart.modeling_tf_bart.TFBartForConditionalGeneration.serving_output - def serving_output(self, output): - pkv = tf.tuple(output.past_key_values)[1] if self.config.use_cache else None - dec_hs = tf.convert_to_tensor(output.decoder_hidden_states) if self.config.output_hidden_states else None - dec_attns = tf.convert_to_tensor(output.decoder_attentions) if self.config.output_attentions else None - cross_attns = tf.convert_to_tensor(output.cross_attentions) if self.config.output_attentions else None - enc_hs = tf.convert_to_tensor(output.encoder_hidden_states) if self.config.output_hidden_states else None - enc_attns = tf.convert_to_tensor(output.encoder_attentions) if self.config.output_attentions else None - - return TFSeq2SeqLMOutput( - logits=output.logits, - past_key_values=pkv, - decoder_hidden_states=dec_hs, - decoder_attentions=dec_attns, - cross_attentions=cross_attns, - encoder_last_hidden_state=output.encoder_last_hidden_state, - encoder_hidden_states=enc_hs, - encoder_attentions=enc_attns, - ) - def prepare_inputs_for_generation( self, decoder_input_ids, diff --git a/tests/mixed_int8/README.md b/tests/bitsandbytes/README.md similarity index 100% rename from tests/mixed_int8/README.md rename to tests/bitsandbytes/README.md diff --git a/tests/mixed_int8/__init__.py b/tests/bitsandbytes/__init__.py similarity index 100% rename from tests/mixed_int8/__init__.py rename to tests/bitsandbytes/__init__.py diff --git a/tests/bitsandbytes/test_4bit.py b/tests/bitsandbytes/test_4bit.py new file mode 100644 index 00000000000000..1d0ea6dc3de281 --- /dev/null +++ b/tests/bitsandbytes/test_4bit.py @@ -0,0 +1,460 @@ +# coding=utf-8 +# Copyright 2022 The HuggingFace Team Inc. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a clone of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +import gc +import tempfile +import unittest + +from packaging import version + +from transformers import ( + AutoModel, + AutoModelForCausalLM, + AutoModelForSeq2SeqLM, + AutoModelForSequenceClassification, + AutoTokenizer, + BitsAndBytesConfig, + pipeline, +) +from transformers.testing_utils import ( + is_torch_available, + require_accelerate, + require_bitsandbytes, + require_torch, + require_torch_gpu, + require_torch_multi_gpu, + slow, +) +from transformers.utils.versions import importlib_metadata + + +if is_torch_available(): + import torch + import torch.nn as nn + + class LoRALayer(nn.Module): + """Wraps a linear layer with LoRA-like adapter - Used for testing purposes only""" + + def __init__(self, module: nn.Module, rank: int): + super().__init__() + self.module = module + self.adapter = nn.Sequential( + nn.Linear(module.in_features, rank, bias=False), + nn.Linear(rank, module.out_features, bias=False), + ) + small_std = (2.0 / (5 * min(module.in_features, module.out_features))) ** 0.5 + nn.init.normal_(self.adapter[0].weight, std=small_std) + nn.init.zeros_(self.adapter[1].weight) + self.adapter.to(module.weight.device) + + def forward(self, input, *args, **kwargs): + return self.module(input, *args, **kwargs) + self.adapter(input) + + +@require_bitsandbytes +@require_accelerate +@require_torch +@require_torch_gpu +@slow +class Base4bitTest(unittest.TestCase): + # We keep the constants inside the init function and model loading inside setUp function + + # We need to test on relatively large models (aka >1b parameters otherwise the quantiztion may not work as expected) + # Therefore here we use only bloom-1b3 to test our module + model_name = "bigscience/bloom-1b7" + + # Constant values + EXPECTED_RELATIVE_DIFFERENCE = ( + 2.109659552692574 # This was obtained on a RTX Titan so the number might slightly change + ) + + input_text = "Hello my name is" + EXPECTED_OUTPUTS = set() + EXPECTED_OUTPUTS.add("Hello my name is John and I am a professional photographer. I") + EXPECTED_OUTPUTS.add("Hello my name is John.\nI am a friend of your father.\n") + MAX_NEW_TOKENS = 10 + + def setUp(self): + # Models and tokenizer + self.tokenizer = AutoTokenizer.from_pretrained(self.model_name) + + +class Bnb4BitTest(Base4bitTest): + def setUp(self): + super().setUp() + + # Models and tokenizer + self.model_fp16 = AutoModelForCausalLM.from_pretrained( + self.model_name, torch_dtype=torch.float16, device_map="auto" + ) + self.model_4bit = AutoModelForCausalLM.from_pretrained(self.model_name, load_in_4bit=True, device_map="auto") + + def tearDown(self): + r""" + TearDown function needs to be called at the end of each test to free the GPU memory and cache, also to + avoid unexpected behaviors. Please see: https://discuss.pytorch.org/t/how-can-we-release-gpu-memory-cache/14530/27 + """ + del self.model_fp16 + del self.model_4bit + + gc.collect() + torch.cuda.empty_cache() + + def test_memory_footprint(self): + r""" + A simple test to check if the model conversion has been done correctly by checking on the + memory footprint of the converted model and the class type of the linear layers of the converted models + """ + from bitsandbytes.nn import Params4bit + + mem_fp16 = self.model_fp16.get_memory_footprint() + mem_4bit = self.model_4bit.get_memory_footprint() + + self.assertAlmostEqual(mem_fp16 / mem_4bit, self.EXPECTED_RELATIVE_DIFFERENCE) + self.assertTrue(self.model_4bit.transformer.h[0].mlp.dense_4h_to_h.weight.__class__ == Params4bit) + + def test_linear_are_4bit(self): + r""" + A simple test to check if the model conversion has been done correctly by checking on the + memory footprint of the converted model and the class type of the linear layers of the converted models + """ + from transformers import T5PreTrainedModel + + self.model_fp16.get_memory_footprint() + self.model_4bit.get_memory_footprint() + + for name, module in self.model_4bit.named_modules(): + if isinstance(module, torch.nn.Linear): + if name not in ["lm_head"] + T5PreTrainedModel._keep_in_fp32_modules: + # 4-bit parameters are packed in uint8 variables + self.assertTrue(module.weight.dtype == torch.uint8) + + def test_generate_quality(self): + r""" + Test the generation quality of the quantized model and see that we are matching the expected output. + Given that we are operating on small numbers + the testing model is relatively small, we might not get + the same output across GPUs. So we'll generate few tokens (5-10) and check their output. + """ + encoded_input = self.tokenizer(self.input_text, return_tensors="pt") + output_sequences = self.model_4bit.generate(input_ids=encoded_input["input_ids"].to(0), max_new_tokens=10) + + self.assertIn(self.tokenizer.decode(output_sequences[0], skip_special_tokens=True), self.EXPECTED_OUTPUTS) + + def test_generate_quality_config(self): + r""" + Test that loading the model with the config is equivalent + """ + bnb_config = BitsAndBytesConfig() + bnb_config.load_in_4bit = True + + model_4bit_from_config = AutoModelForCausalLM.from_pretrained( + self.model_name, quantization_config=bnb_config, device_map="auto" + ) + + encoded_input = self.tokenizer(self.input_text, return_tensors="pt") + output_sequences = model_4bit_from_config.generate( + input_ids=encoded_input["input_ids"].to(0), max_new_tokens=10 + ) + + self.assertIn(self.tokenizer.decode(output_sequences[0], skip_special_tokens=True), self.EXPECTED_OUTPUTS) + + def test_raise_on_save_pretrained(self): + r""" + Test whether trying to save a model after converting it in 8-bit will throw a warning. + """ + with self.assertRaises(NotImplementedError), tempfile.TemporaryDirectory() as tmpdirname: + self.model_4bit.save_pretrained(tmpdirname) + + def test_raise_if_config_and_load_in_4bit(self): + r""" + Test that loading the model with the config and `load_in_4bit` raises an error + """ + bnb_config = BitsAndBytesConfig() + + with self.assertRaises(ValueError): + _ = AutoModelForCausalLM.from_pretrained( + self.model_name, + quantization_config=bnb_config, + load_in_4bit=True, + device_map="auto", + bnb_4bit_quant_type="nf4", + ) + + def test_device_and_dtype_assignment(self): + r""" + Test whether trying to cast (or assigning a device to) a model after converting it in 8-bit will throw an error. + Checks also if other models are casted correctly. + """ + with self.assertRaises(ValueError): + # Tries with `str` + self.model_4bit.to("cpu") + + with self.assertRaises(ValueError): + # Tries with a `dtype`` + self.model_4bit.to(torch.float16) + + with self.assertRaises(ValueError): + # Tries with a `device` + self.model_4bit.to(torch.device("cuda:0")) + + with self.assertRaises(ValueError): + # Tries with a `device` + self.model_4bit.float() + + with self.assertRaises(ValueError): + # Tries with a `device` + self.model_4bit.half() + + # Test if we did not break anything + encoded_input = self.tokenizer(self.input_text, return_tensors="pt") + + self.model_fp16 = self.model_fp16.to(torch.float32) + _ = self.model_fp16.generate(input_ids=encoded_input["input_ids"].to(0), max_new_tokens=10) + + # Check this does not throw an error + _ = self.model_fp16.to("cpu") + + # Check this does not throw an error + _ = self.model_fp16.half() + + # Check this does not throw an error + _ = self.model_fp16.float() + + def test_fp32_4bit_conversion(self): + r""" + Test whether it is possible to mix both `4bit` and `fp32` weights when using `keep_in_fp32_modules` correctly. + """ + model = AutoModelForSeq2SeqLM.from_pretrained("t5-small", load_in_4bit=True, device_map="auto") + self.assertTrue(model.decoder.block[0].layer[2].DenseReluDense.wo.weight.dtype == torch.float32) + + +@require_bitsandbytes +@require_accelerate +@require_torch +@require_torch_gpu +@slow +class Bnb4BitT5Test(unittest.TestCase): + @classmethod + def setUpClass(cls): + cls.model_name = "t5-small" + cls.dense_act_model_name = "google/flan-t5-small" # flan-t5 uses dense-act instead of dense-relu-dense + cls.tokenizer = AutoTokenizer.from_pretrained(cls.model_name) + cls.input_text = "Translate in German: Hello, my dog is cute" + + def tearDown(self): + r""" + TearDown function needs to be called at the end of each test to free the GPU memory and cache, also to + avoid unexpected behaviors. Please see: https://discuss.pytorch.org/t/how-can-we-release-gpu-memory-cache/14530/27 + """ + gc.collect() + torch.cuda.empty_cache() + + def test_inference_without_keep_in_fp32(self): + r""" + Test whether it is possible to mix both `4bit` and `fp32` weights when using `keep_in_fp32_modules` correctly. + `flan-t5-small` uses `T5DenseGatedActDense` whereas `t5-small` uses `T5DenseReluDense`. We need to test + both cases. + """ + from transformers import T5ForConditionalGeneration + + modules = T5ForConditionalGeneration._keep_in_fp32_modules + T5ForConditionalGeneration._keep_in_fp32_modules = None + + # test with `t5-small` + model = T5ForConditionalGeneration.from_pretrained(self.model_name, load_in_4bit=True, device_map="auto") + encoded_input = self.tokenizer(self.input_text, return_tensors="pt").to(0) + _ = model.generate(**encoded_input) + + # test with `flan-t5-small` + model = T5ForConditionalGeneration.from_pretrained( + self.dense_act_model_name, load_in_4bit=True, device_map="auto" + ) + encoded_input = self.tokenizer(self.input_text, return_tensors="pt").to(0) + _ = model.generate(**encoded_input) + T5ForConditionalGeneration._keep_in_fp32_modules = modules + + def test_inference_with_keep_in_fp32(self): + r""" + Test whether it is possible to mix both `4bit` and `fp32` weights when using `keep_in_fp32_modules` correctly. + `flan-t5-small` uses `T5DenseGatedActDense` whereas `t5-small` uses `T5DenseReluDense`. We need to test + both cases. + """ + import bitsandbytes as bnb + + from transformers import T5ForConditionalGeneration + + # test with `t5-small` + model = T5ForConditionalGeneration.from_pretrained(self.model_name, load_in_4bit=True, device_map="auto") + + # there was a bug with decoders - this test checks that it is fixed + self.assertTrue(isinstance(model.decoder.block[0].layer[0].SelfAttention.q, bnb.nn.Linear4bit)) + + encoded_input = self.tokenizer(self.input_text, return_tensors="pt").to(0) + _ = model.generate(**encoded_input) + + # test with `flan-t5-small` + model = T5ForConditionalGeneration.from_pretrained( + self.dense_act_model_name, load_in_4bit=True, device_map="auto" + ) + encoded_input = self.tokenizer(self.input_text, return_tensors="pt").to(0) + _ = model.generate(**encoded_input) + + +class Classes4BitModelTest(Base4bitTest): + def setUp(self): + super().setUp() + # model_name + self.model_name = "bigscience/bloom-560m" + self.seq_to_seq_name = "t5-small" + + # Different types of model + + self.base_model = AutoModel.from_pretrained(self.model_name, load_in_4bit=True, device_map="auto") + # Sequence classification model + self.sequence_model = AutoModelForSequenceClassification.from_pretrained( + self.model_name, load_in_4bit=True, device_map="auto" + ) + # CausalLM model + self.model_4bit = AutoModelForCausalLM.from_pretrained(self.model_name, load_in_4bit=True, device_map="auto") + # Seq2seq model + self.seq_to_seq_model = AutoModelForSeq2SeqLM.from_pretrained( + self.seq_to_seq_name, load_in_4bit=True, device_map="auto" + ) + + def tearDown(self): + r""" + TearDown function needs to be called at the end of each test to free the GPU memory and cache, also to + avoid unexpected behaviors. Please see: https://discuss.pytorch.org/t/how-can-we-release-gpu-memory-cache/14530/27 + """ + del self.base_model + del self.sequence_model + del self.model_4bit + del self.seq_to_seq_model + + gc.collect() + torch.cuda.empty_cache() + + def test_correct_head_class(self): + r""" + A simple test to check if the last modules for some classes (AutoModelForCausalLM or SequenceClassification) + are kept in their native class. + """ + from bitsandbytes.nn import Params4bit + + self.assertTrue(self.base_model.h[-1].mlp.dense_4h_to_h.weight.__class__ == Params4bit) + + # Other heads should be nn.Parameter + self.assertTrue(self.model_4bit.lm_head.weight.__class__ == torch.nn.Parameter) + self.assertTrue(self.sequence_model.score.weight.__class__ == torch.nn.Parameter) + self.assertTrue(self.seq_to_seq_model.lm_head.weight.__class__ == torch.nn.Parameter) + + +class Pipeline4BitTest(Base4bitTest): + def setUp(self): + super().setUp() + + def tearDown(self): + r""" + TearDown function needs to be called at the end of each test to free the GPU memory and cache, also to + avoid unexpected behaviors. Please see: https://discuss.pytorch.org/t/how-can-we-release-gpu-memory-cache/14530/27 + """ + del self.pipe + + gc.collect() + torch.cuda.empty_cache() + + def test_pipeline(self): + r""" + The aim of this test is to verify that the mixed 4bit is compatible with `pipeline` from transformers. Since + we used pipline for inference speed benchmarking we want to make sure that this feature does not break anything + on pipline. + """ + # self._clear_cuda_cache() + self.pipe = pipeline( + "text-generation", + model=self.model_name, + model_kwargs={"device_map": "auto", "load_in_4bit": True, "torch_dtype": torch.float16}, + max_new_tokens=self.MAX_NEW_TOKENS, + ) + + # Real second forward pass + pipeline_output = self.pipe(self.input_text) + self.assertIn(pipeline_output[0]["generated_text"], self.EXPECTED_OUTPUTS) + + +@require_torch_multi_gpu +class Bnb4bitTestMultiGpu(Base4bitTest): + def setUp(self): + super().setUp() + + def test_multi_gpu_loading(self): + r""" + This tests that the model has been loaded and can be used correctly on a multi-GPU setup. + Let's just try to load a model on 2 GPUs and see if it works. The model we test has ~2GB of total, 3GB should suffice + """ + + model_parallel = AutoModelForCausalLM.from_pretrained( + self.model_name, load_in_4bit=True, device_map="balanced" + ) + + # Check correct device map + self.assertEqual(set(model_parallel.hf_device_map.values()), {0, 1}) + + # Check that inference pass works on the model + encoded_input = self.tokenizer(self.input_text, return_tensors="pt") + + # Second real batch + output_parallel = model_parallel.generate(input_ids=encoded_input["input_ids"].to(0), max_new_tokens=10) + self.assertIn(self.tokenizer.decode(output_parallel[0], skip_special_tokens=True), self.EXPECTED_OUTPUTS) + + +class Bnb4BitTestTraining(Base4bitTest): + def setUp(self): + self.model_name = "facebook/opt-350m" + super().setUp() + + def test_training(self): + if version.parse(importlib_metadata.version("bitsandbytes")) < version.parse("0.37.0"): + return + + # Step 1: freeze all parameters + model = AutoModelForCausalLM.from_pretrained(self.model_name, load_in_4bit=True, device_map="auto") + + for param in model.parameters(): + param.requires_grad = False # freeze the model - train adapters later + if param.ndim == 1: + # cast the small parameters (e.g. layernorm) to fp32 for stability + param.data = param.data.to(torch.float32) + + # Step 2: add adapters + for _, module in model.named_modules(): + if "OPTAttention" in repr(type(module)): + module.q_proj = LoRALayer(module.q_proj, rank=16) + module.k_proj = LoRALayer(module.k_proj, rank=16) + module.v_proj = LoRALayer(module.v_proj, rank=16) + + # Step 3: dummy batch + batch = self.tokenizer("Test batch ", return_tensors="pt").to(0) + + # Step 4: Check if the gradient is not None + with torch.cuda.amp.autocast(): + out = model.forward(**batch) + out.logits.norm().backward() + + for module in model.modules(): + if isinstance(module, LoRALayer): + self.assertTrue(module.adapter[1].weight.grad is not None) + self.assertTrue(module.adapter[1].weight.grad.norm().item() > 0) + elif isinstance(module, nn.Embedding): + self.assertTrue(module.weight.grad is None) diff --git a/tests/mixed_int8/test_mixed_int8.py b/tests/bitsandbytes/test_mixed_int8.py similarity index 96% rename from tests/mixed_int8/test_mixed_int8.py rename to tests/bitsandbytes/test_mixed_int8.py index 1628e08155d67e..b31aaa386ae043 100644 --- a/tests/mixed_int8/test_mixed_int8.py +++ b/tests/bitsandbytes/test_mixed_int8.py @@ -29,6 +29,7 @@ pipeline, ) from transformers.testing_utils import ( + is_accelerate_available, is_torch_available, require_accelerate, require_bitsandbytes, @@ -40,6 +41,13 @@ from transformers.utils.versions import importlib_metadata +if is_accelerate_available(): + from accelerate import PartialState + from accelerate.logging import get_logger + + logger = get_logger(__name__) + _ = PartialState() + if is_torch_available(): import torch import torch.nn as nn @@ -123,6 +131,21 @@ def test_memory_footprint(self): self.assertAlmostEqual(mem_fp16 / mem_8bit, self.EXPECTED_RELATIVE_DIFFERENCE) self.assertTrue(self.model_8bit.transformer.h[0].mlp.dense_4h_to_h.weight.__class__ == Int8Params) + def test_linear_are_8bit(self): + r""" + A simple test to check if the model conversion has been done correctly by checking on the + memory footprint of the converted model and the class type of the linear layers of the converted models + """ + from transformers import T5PreTrainedModel + + self.model_fp16.get_memory_footprint() + self.model_8bit.get_memory_footprint() + + for name, module in self.model_8bit.named_modules(): + if isinstance(module, torch.nn.Linear): + if name not in ["lm_head"] + T5PreTrainedModel._keep_in_fp32_modules: + self.assertTrue(module.weight.dtype == torch.int8) + def test_generate_quality(self): r""" Test the generation quality of the quantized model and see that we are matching the expected output. @@ -139,6 +162,7 @@ def test_generate_quality_config(self): Test that loading the model with the config is equivalent """ bnb_config = BitsAndBytesConfig() + bnb_config.load_in_8bit = True model_8bit_from_config = AutoModelForCausalLM.from_pretrained( self.model_name, quantization_config=bnb_config, device_map="auto" @@ -321,6 +345,7 @@ def test_inference_without_keep_in_fp32(self): """ from transformers import T5ForConditionalGeneration + modules = T5ForConditionalGeneration._keep_in_fp32_modules T5ForConditionalGeneration._keep_in_fp32_modules = None # test with `t5-small` @@ -334,6 +359,7 @@ def test_inference_without_keep_in_fp32(self): ) encoded_input = self.tokenizer(self.input_text, return_tensors="pt").to(0) _ = model.generate(**encoded_input) + T5ForConditionalGeneration._keep_in_fp32_modules = modules def test_inference_with_keep_in_fp32(self): r""" diff --git a/tests/generation/test_tf_logits_process.py b/tests/generation/test_tf_logits_process.py index a1f665c9a761fc..e87c843d9cb4de 100644 --- a/tests/generation/test_tf_logits_process.py +++ b/tests/generation/test_tf_logits_process.py @@ -14,6 +14,8 @@ # limitations under the License. +from __future__ import annotations + import unittest import numpy as np diff --git a/tests/generation/test_tf_utils.py b/tests/generation/test_tf_utils.py index 6fdad1ef636531..186e0c8d4327f3 100644 --- a/tests/generation/test_tf_utils.py +++ b/tests/generation/test_tf_utils.py @@ -13,6 +13,8 @@ # See the License for the specific language governing permissions and # limitations under the License. +from __future__ import annotations + import os import tempfile import unittest diff --git a/tests/models/albert/test_modeling_tf_albert.py b/tests/models/albert/test_modeling_tf_albert.py index 104fb092529a80..97b073e850e207 100644 --- a/tests/models/albert/test_modeling_tf_albert.py +++ b/tests/models/albert/test_modeling_tf_albert.py @@ -14,6 +14,8 @@ # limitations under the License. +from __future__ import annotations + import unittest from transformers import AlbertConfig, is_tf_available diff --git a/tests/models/audio_spectrogram_transformer/test_feature_extraction_audio_spectrogram_transformer.py b/tests/models/audio_spectrogram_transformer/test_feature_extraction_audio_spectrogram_transformer.py index a7a81dceb15314..69a1bddc825080 100644 --- a/tests/models/audio_spectrogram_transformer/test_feature_extraction_audio_spectrogram_transformer.py +++ b/tests/models/audio_spectrogram_transformer/test_feature_extraction_audio_spectrogram_transformer.py @@ -125,6 +125,14 @@ def test_call(self): for enc_seq_1, enc_seq_2 in zip(encoded_sequences_1, encoded_sequences_2): self.assertTrue(np.allclose(enc_seq_1, enc_seq_2, atol=1e-3)) + # Test 2-D numpy arrays are batched. + speech_inputs = [floats_list((1, x))[0] for x in (800, 800, 800)] + np_speech_inputs = np.asarray(speech_inputs) + encoded_sequences_1 = feat_extract(speech_inputs, return_tensors="np").input_values + encoded_sequences_2 = feat_extract(np_speech_inputs, return_tensors="np").input_values + for enc_seq_1, enc_seq_2 in zip(encoded_sequences_1, encoded_sequences_2): + self.assertTrue(np.allclose(enc_seq_1, enc_seq_2, atol=1e-3)) + @require_torch def test_double_precision_pad(self): import torch diff --git a/tests/models/auto/test_modeling_tf_auto.py b/tests/models/auto/test_modeling_tf_auto.py index 1a355d88bb5ad3..c8754ca42702fc 100644 --- a/tests/models/auto/test_modeling_tf_auto.py +++ b/tests/models/auto/test_modeling_tf_auto.py @@ -13,6 +13,8 @@ # See the License for the specific language governing permissions and # limitations under the License. +from __future__ import annotations + import copy import tempfile import unittest diff --git a/tests/models/auto/test_modeling_tf_pytorch.py b/tests/models/auto/test_modeling_tf_pytorch.py index c60b8fc2f517f7..3e213f29562ab2 100644 --- a/tests/models/auto/test_modeling_tf_pytorch.py +++ b/tests/models/auto/test_modeling_tf_pytorch.py @@ -14,6 +14,8 @@ # limitations under the License. +from __future__ import annotations + import unittest from transformers import is_tf_available, is_torch_available diff --git a/tests/models/autoformer/__init__.py b/tests/models/autoformer/__init__.py new file mode 100644 index 00000000000000..e69de29bb2d1d6 diff --git a/tests/models/autoformer/test_modeling_autoformer.py b/tests/models/autoformer/test_modeling_autoformer.py new file mode 100644 index 00000000000000..9df5bf236e089d --- /dev/null +++ b/tests/models/autoformer/test_modeling_autoformer.py @@ -0,0 +1,449 @@ +# coding=utf-8 +# Copyright 2023 The HuggingFace Inc. team. All rights reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +""" Testing suite for the PyTorch Autoformer model. """ + +import inspect +import tempfile +import unittest + +from huggingface_hub import hf_hub_download + +from transformers import is_torch_available +from transformers.testing_utils import require_torch, slow, torch_device + +from ...test_configuration_common import ConfigTester +from ...test_modeling_common import ModelTesterMixin, floats_tensor, ids_tensor + + +TOLERANCE = 1e-4 + +if is_torch_available(): + import torch + + from transformers import AutoformerConfig, AutoformerForPrediction, AutoformerModel + from transformers.models.autoformer.modeling_autoformer import AutoformerDecoder, AutoformerEncoder + + +@require_torch +class AutoformerModelTester: + def __init__( + self, + parent, + d_model=16, + batch_size=13, + prediction_length=7, + context_length=14, + label_length=10, + cardinality=19, + embedding_dimension=5, + num_time_features=4, + is_training=True, + hidden_size=16, + num_hidden_layers=2, + num_attention_heads=4, + intermediate_size=4, + hidden_act="gelu", + hidden_dropout_prob=0.1, + attention_probs_dropout_prob=0.1, + lags_sequence=[1, 2, 3, 4, 5], + moving_average=25, + autocorrelation_factor=5, + ): + self.d_model = d_model + self.parent = parent + self.batch_size = batch_size + self.prediction_length = prediction_length + self.context_length = context_length + self.cardinality = cardinality + self.num_time_features = num_time_features + self.lags_sequence = lags_sequence + self.embedding_dimension = embedding_dimension + self.is_training = is_training + self.hidden_size = hidden_size + self.num_hidden_layers = num_hidden_layers + self.num_attention_heads = num_attention_heads + self.intermediate_size = intermediate_size + self.hidden_act = hidden_act + self.hidden_dropout_prob = hidden_dropout_prob + self.attention_probs_dropout_prob = attention_probs_dropout_prob + + self.encoder_seq_length = context_length + self.decoder_seq_length = prediction_length + label_length + self.label_length = label_length + + self.moving_average = moving_average + self.autocorrelation_factor = autocorrelation_factor + + def get_config(self): + return AutoformerConfig( + d_model=self.d_model, + encoder_layers=self.num_hidden_layers, + decoder_layers=self.num_hidden_layers, + encoder_attention_heads=self.num_attention_heads, + decoder_attention_heads=self.num_attention_heads, + encoder_ffn_dim=self.intermediate_size, + decoder_ffn_dim=self.intermediate_size, + dropout=self.hidden_dropout_prob, + attention_dropout=self.attention_probs_dropout_prob, + prediction_length=self.prediction_length, + context_length=self.context_length, + label_length=self.label_length, + lags_sequence=self.lags_sequence, + num_time_features=self.num_time_features, + num_static_categorical_features=1, + cardinality=[self.cardinality], + embedding_dimension=[self.embedding_dimension], + moving_average=self.moving_average, + ) + + def prepare_autoformer_inputs_dict(self, config): + _past_length = config.context_length + max(config.lags_sequence) + + static_categorical_features = ids_tensor([self.batch_size, 1], config.cardinality[0]) + past_time_features = floats_tensor([self.batch_size, _past_length, config.num_time_features]) + past_values = floats_tensor([self.batch_size, _past_length]) + past_observed_mask = floats_tensor([self.batch_size, _past_length]) > 0.5 + + # decoder inputs + future_time_features = floats_tensor([self.batch_size, config.prediction_length, config.num_time_features]) + future_values = floats_tensor([self.batch_size, config.prediction_length]) + + inputs_dict = { + "past_values": past_values, + "static_categorical_features": static_categorical_features, + "past_time_features": past_time_features, + "past_observed_mask": past_observed_mask, + "future_time_features": future_time_features, + "future_values": future_values, + } + return inputs_dict + + def prepare_config_and_inputs(self): + config = self.get_config() + inputs_dict = self.prepare_autoformer_inputs_dict(config) + return config, inputs_dict + + def prepare_config_and_inputs_for_common(self): + config, inputs_dict = self.prepare_config_and_inputs() + return config, inputs_dict + + def check_encoder_decoder_model_standalone(self, config, inputs_dict): + model = AutoformerModel(config=config).to(torch_device).eval() + outputs = model(**inputs_dict) + + encoder_last_hidden_state = outputs.encoder_last_hidden_state + last_hidden_state = outputs.last_hidden_state + + with tempfile.TemporaryDirectory() as tmpdirname: + encoder = model.get_encoder() + encoder.save_pretrained(tmpdirname) + encoder = AutoformerEncoder.from_pretrained(tmpdirname).to(torch_device) + + transformer_inputs, feature, _, _, _ = model.create_network_inputs(**inputs_dict) + seasonal_input, trend_input = model.decomposition_layer(transformer_inputs[:, : config.context_length, ...]) + + enc_input = torch.cat( + (transformer_inputs[:, : config.context_length, ...], feature[:, : config.context_length, ...]), + dim=-1, + ) + encoder_last_hidden_state_2 = encoder(inputs_embeds=enc_input)[0] + self.parent.assertTrue((encoder_last_hidden_state_2 - encoder_last_hidden_state).abs().max().item() < 1e-3) + + mean = ( + torch.mean(transformer_inputs[:, : config.context_length, ...], dim=1) + .unsqueeze(1) + .repeat(1, config.prediction_length, 1) + ) + zeros = torch.zeros( + [transformer_inputs.shape[0], config.prediction_length, transformer_inputs.shape[2]], + device=enc_input.device, + ) + + dec_input = torch.cat( + ( + torch.cat((seasonal_input[:, -config.label_length :, ...], zeros), dim=1), + feature[:, config.context_length - config.label_length :, ...], + ), + dim=-1, + ) + trend_init = torch.cat( + ( + torch.cat((trend_input[:, -config.label_length :, ...], mean), dim=1), + feature[:, config.context_length - config.label_length :, ...], + ), + dim=-1, + ) + + with tempfile.TemporaryDirectory() as tmpdirname: + decoder = model.get_decoder() + decoder.save_pretrained(tmpdirname) + decoder = AutoformerDecoder.from_pretrained(tmpdirname).to(torch_device) + + last_hidden_state_2 = decoder( + trend=trend_init, + inputs_embeds=dec_input, + encoder_hidden_states=encoder_last_hidden_state, + )[0] + + self.parent.assertTrue((last_hidden_state_2 - last_hidden_state).abs().max().item() < 1e-3) + + +@require_torch +class AutoformerModelTest(ModelTesterMixin, unittest.TestCase): + all_model_classes = (AutoformerModel, AutoformerForPrediction) if is_torch_available() else () + all_generative_model_classes = (AutoformerForPrediction,) if is_torch_available() else () + test_pruning = False + test_head_masking = False + test_missing_keys = False + test_torchscript = False + test_inputs_embeds = False + test_model_common_attributes = False + + def setUp(self): + self.model_tester = AutoformerModelTester(self) + self.config_tester = ConfigTester(self, config_class=AutoformerConfig, has_text_modality=False) + + def test_config(self): + self.config_tester.run_common_tests() + + def test_save_load_strict(self): + config, inputs_dict = self.model_tester.prepare_config_and_inputs() + for model_class in self.all_model_classes: + model = model_class(config) + + with tempfile.TemporaryDirectory() as tmpdirname: + model.save_pretrained(tmpdirname) + model2, info = model_class.from_pretrained(tmpdirname, output_loading_info=True) + self.assertEqual(info["missing_keys"], []) + + def test_encoder_decoder_model_standalone(self): + config_and_inputs = self.model_tester.prepare_config_and_inputs_for_common() + self.model_tester.check_encoder_decoder_model_standalone(*config_and_inputs) + + @unittest.skip(reason="Model has no tokens embeddings") + def test_resize_tokens_embeddings(self): + pass + + # # Input is 'static_categorical_features' not 'input_ids' + def test_model_main_input_name(self): + model_signature = inspect.signature(getattr(AutoformerModel, "forward")) + # The main input is the name of the argument after `self` + observed_main_input_name = list(model_signature.parameters.keys())[1] + self.assertEqual(AutoformerModel.main_input_name, observed_main_input_name) + + def test_forward_signature(self): + config, _ = self.model_tester.prepare_config_and_inputs_for_common() + + for model_class in self.all_model_classes: + model = model_class(config) + signature = inspect.signature(model.forward) + # signature.parameters is an OrderedDict => so arg_names order is deterministic + arg_names = [*signature.parameters.keys()] + + expected_arg_names = [ + "past_values", + "past_time_features", + "past_observed_mask", + "static_categorical_features", + "static_real_features", + "future_values", + "future_time_features", + ] + + if model.__class__.__name__ in ["AutoformerForPrediction"]: + expected_arg_names.append("future_observed_mask") + + expected_arg_names.extend( + [ + "decoder_attention_mask", + "head_mask", + "decoder_head_mask", + "cross_attn_head_mask", + "encoder_outputs", + "past_key_values", + "output_hidden_states", + "output_attentions", + "use_cache", + "return_dict", + ] + ) + + self.assertListEqual(arg_names[: len(expected_arg_names)], expected_arg_names) + + def test_attention_outputs(self): + config, inputs_dict = self.model_tester.prepare_config_and_inputs_for_common() + config.return_dict = True + + seq_len = getattr(self.model_tester, "seq_length", None) + decoder_seq_length = getattr(self.model_tester, "decoder_seq_length", seq_len) + encoder_seq_length = getattr(self.model_tester, "encoder_seq_length", seq_len) + d_model = getattr(self.model_tester, "d_model", None) + num_attention_heads = getattr(self.model_tester, "num_attention_heads", None) + dim = d_model // num_attention_heads + + for model_class in self.all_model_classes: + inputs_dict["output_attentions"] = True + inputs_dict["output_hidden_states"] = False + config.return_dict = True + model = model_class(config) + model.to(torch_device) + model.eval() + with torch.no_grad(): + outputs = model(**self._prepare_for_class(inputs_dict, model_class)) + attentions = outputs.encoder_attentions if config.is_encoder_decoder else outputs.attentions + self.assertEqual(len(attentions), self.model_tester.num_hidden_layers) + + # check that output_attentions also work using config + del inputs_dict["output_attentions"] + config.output_attentions = True + model = model_class(config) + model.to(torch_device) + model.eval() + with torch.no_grad(): + outputs = model(**self._prepare_for_class(inputs_dict, model_class)) + attentions = outputs.encoder_attentions + self.assertEqual(len(attentions), self.model_tester.num_hidden_layers) + + self.assertListEqual( + list(attentions[0].shape[-3:]), + [self.model_tester.num_attention_heads, encoder_seq_length, dim], + ) + out_len = len(outputs) + + correct_outlen = 7 + + if "last_hidden_state" in outputs: + correct_outlen += 1 + + if "trend" in outputs: + correct_outlen += 1 + + if "past_key_values" in outputs: + correct_outlen += 1 # past_key_values have been returned + + if "loss" in outputs: + correct_outlen += 1 + + if "params" in outputs: + correct_outlen += 1 + + self.assertEqual(out_len, correct_outlen) + + # decoder attentions + decoder_attentions = outputs.decoder_attentions + self.assertIsInstance(decoder_attentions, (list, tuple)) + self.assertEqual(len(decoder_attentions), self.model_tester.num_hidden_layers) + self.assertListEqual( + list(decoder_attentions[0].shape[-3:]), + [self.model_tester.num_attention_heads, decoder_seq_length, dim], + ) + + # cross attentions + cross_attentions = outputs.cross_attentions + self.assertIsInstance(cross_attentions, (list, tuple)) + self.assertEqual(len(cross_attentions), self.model_tester.num_hidden_layers) + self.assertListEqual( + list(cross_attentions[0].shape[-3:]), + [self.model_tester.num_attention_heads, decoder_seq_length, dim], + ) + + # Check attention is always last and order is fine + inputs_dict["output_attentions"] = True + inputs_dict["output_hidden_states"] = True + model = model_class(config) + model.to(torch_device) + model.eval() + with torch.no_grad(): + outputs = model(**self._prepare_for_class(inputs_dict, model_class)) + + self.assertEqual(out_len + 2, len(outputs)) + + self_attentions = outputs.encoder_attentions if config.is_encoder_decoder else outputs.attentions + + self.assertEqual(len(self_attentions), self.model_tester.num_hidden_layers) + self.assertListEqual( + list(self_attentions[0].shape[-3:]), + [self.model_tester.num_attention_heads, encoder_seq_length, dim], + ) + + +def prepare_batch(filename="train-batch.pt"): + file = hf_hub_download(repo_id="hf-internal-testing/tourism-monthly-batch", filename=filename, repo_type="dataset") + batch = torch.load(file, map_location=torch_device) + return batch + + +@require_torch +@slow +class AutoformerModelIntegrationTests(unittest.TestCase): + def test_inference_no_head(self): + model = AutoformerModel.from_pretrained("huggingface/autoformer-tourism-monthly").to(torch_device) + batch = prepare_batch() + + with torch.no_grad(): + output = model( + past_values=batch["past_values"], + past_time_features=batch["past_time_features"], + past_observed_mask=batch["past_observed_mask"], + static_categorical_features=batch["static_categorical_features"], + future_values=batch["future_values"], + future_time_features=batch["future_time_features"], + )[0] + + expected_shape = torch.Size( + (64, model.config.prediction_length + model.config.label_length, model.config.feature_size) + ) + self.assertEqual(output.shape, expected_shape) + + expected_slice = torch.tensor( + [[0.3593, -1.3398, 0.6330], [0.2279, 1.5396, -0.1792], [0.0450, 1.3225, -0.2335]], device=torch_device + ) + self.assertTrue(torch.allclose(output[0, :3, :3], expected_slice, atol=TOLERANCE)) + + def test_inference_head(self): + model = AutoformerForPrediction.from_pretrained("huggingface/autoformer-tourism-monthly").to(torch_device) + batch = prepare_batch("val-batch.pt") + with torch.no_grad(): + output = model( + past_values=batch["past_values"], + past_time_features=batch["past_time_features"], + past_observed_mask=batch["past_observed_mask"], + static_categorical_features=batch["static_categorical_features"], + ).encoder_last_hidden_state + expected_shape = torch.Size((64, model.config.context_length, model.config.d_model)) + self.assertEqual(output.shape, expected_shape) + + expected_slice = torch.tensor( + [[-0.0734, -0.9036, 0.8358], [4.7186, 2.4113, 1.9581], [1.7953, 2.3558, 1.2970]], device=torch_device + ) + self.assertTrue(torch.allclose(output[0, :3, :3], expected_slice, atol=TOLERANCE)) + + def test_seq_to_seq_generation(self): + model = AutoformerForPrediction.from_pretrained("huggingface/autoformer-tourism-monthly").to(torch_device) + batch = prepare_batch("val-batch.pt") + with torch.no_grad(): + outputs = model.generate( + static_categorical_features=batch["static_categorical_features"], + past_time_features=batch["past_time_features"], + past_values=batch["past_values"], + future_time_features=batch["future_time_features"], + past_observed_mask=batch["past_observed_mask"], + ) + expected_shape = torch.Size((64, model.config.num_parallel_samples, model.config.prediction_length)) + self.assertEqual(outputs.sequences.shape, expected_shape) + + expected_slice = torch.tensor([3130.6763, 4056.5293, 7053.0786], device=torch_device) + mean_prediction = outputs.sequences.mean(dim=1) + self.assertTrue(torch.allclose(mean_prediction[0, -3:], expected_slice, rtol=1e-1)) diff --git a/tests/models/bart/test_modeling_tf_bart.py b/tests/models/bart/test_modeling_tf_bart.py index 0f0f8f9793c057..c113011c567d0d 100644 --- a/tests/models/bart/test_modeling_tf_bart.py +++ b/tests/models/bart/test_modeling_tf_bart.py @@ -13,6 +13,8 @@ # See the License for the specific language governing permissions and # limitations under the License. +from __future__ import annotations + import copy import tempfile import unittest diff --git a/tests/models/bert/test_modeling_tf_bert.py b/tests/models/bert/test_modeling_tf_bert.py index 59521acec398a5..a8a2159fe13bb1 100644 --- a/tests/models/bert/test_modeling_tf_bert.py +++ b/tests/models/bert/test_modeling_tf_bert.py @@ -14,6 +14,8 @@ # limitations under the License. +from __future__ import annotations + import unittest from transformers import BertConfig, is_tf_available diff --git a/tests/models/blenderbot/test_modeling_tf_blenderbot.py b/tests/models/blenderbot/test_modeling_tf_blenderbot.py index 2db959e9f7f5c9..5fd6faefec865b 100644 --- a/tests/models/blenderbot/test_modeling_tf_blenderbot.py +++ b/tests/models/blenderbot/test_modeling_tf_blenderbot.py @@ -14,6 +14,8 @@ # limitations under the License. +from __future__ import annotations + import unittest from transformers import BlenderbotConfig, BlenderbotTokenizer, is_tf_available diff --git a/tests/models/blenderbot_small/test_modeling_tf_blenderbot_small.py b/tests/models/blenderbot_small/test_modeling_tf_blenderbot_small.py index 67a4f7ad7bfb1b..5bc5c4afe91dea 100644 --- a/tests/models/blenderbot_small/test_modeling_tf_blenderbot_small.py +++ b/tests/models/blenderbot_small/test_modeling_tf_blenderbot_small.py @@ -14,6 +14,8 @@ # limitations under the License. +from __future__ import annotations + import unittest from transformers import BlenderbotSmallConfig, BlenderbotSmallTokenizer, is_tf_available diff --git a/tests/models/blip/test_modeling_tf_blip.py b/tests/models/blip/test_modeling_tf_blip.py index 3bb7b87edbb5c6..af7533c6989365 100644 --- a/tests/models/blip/test_modeling_tf_blip.py +++ b/tests/models/blip/test_modeling_tf_blip.py @@ -15,6 +15,8 @@ """ Testing suite for the TensorFlow Blip model. """ +from __future__ import annotations + import inspect import tempfile import unittest diff --git a/tests/models/blip/test_modeling_tf_blip_text.py b/tests/models/blip/test_modeling_tf_blip_text.py index 261056e918eec6..2733a9fa6a4cb6 100644 --- a/tests/models/blip/test_modeling_tf_blip_text.py +++ b/tests/models/blip/test_modeling_tf_blip_text.py @@ -13,6 +13,8 @@ # See the License for the specific language governing permissions and # limitations under the License. """ Testing suite for the TensorFlow Blip model. """ +from __future__ import annotations + import unittest import numpy as np diff --git a/tests/models/bort/test_modeling_tf_bort.py b/tests/models/bort/test_modeling_tf_bort.py index 8053afbd30cfc0..35abe53d89d73f 100644 --- a/tests/models/bort/test_modeling_tf_bort.py +++ b/tests/models/bort/test_modeling_tf_bort.py @@ -13,6 +13,8 @@ # See the License for the specific language governing permissions and # limitations under the License. +from __future__ import annotations + import unittest from transformers import is_tf_available diff --git a/tests/models/bridgetower/test_modeling_bridgetower.py b/tests/models/bridgetower/test_modeling_bridgetower.py index 1a4d34d3967f98..9c40f376a7b573 100644 --- a/tests/models/bridgetower/test_modeling_bridgetower.py +++ b/tests/models/bridgetower/test_modeling_bridgetower.py @@ -627,7 +627,8 @@ def _get_non_used_layer_names(self, model_class): non_used_layer_names = ["text_model.pooler"] if model_class == BridgeTowerForMaskedLM: non_used_layer_names = non_used_layer_names + [ - "cross_modal_image_layers.5", + # This number `1` actually depends on the number of layers in `cross_modal_image_layers` (by minus 1) + "cross_modal_image_layers.1", "cross_modal_image_pooler", "cross_modal_text_pooler", ] diff --git a/tests/models/camembert/test_modeling_tf_camembert.py b/tests/models/camembert/test_modeling_tf_camembert.py index dc542526852de7..425bdbc4b0acc6 100644 --- a/tests/models/camembert/test_modeling_tf_camembert.py +++ b/tests/models/camembert/test_modeling_tf_camembert.py @@ -13,6 +13,8 @@ # See the License for the specific language governing permissions and # limitations under the License. +from __future__ import annotations + import unittest from transformers import is_tf_available diff --git a/tests/models/clap/test_feature_extraction_clap.py b/tests/models/clap/test_feature_extraction_clap.py index 733dd66681c473..c49d045ba87407 100644 --- a/tests/models/clap/test_feature_extraction_clap.py +++ b/tests/models/clap/test_feature_extraction_clap.py @@ -139,6 +139,14 @@ def test_call(self): for enc_seq_1, enc_seq_2 in zip(encoded_sequences_1, encoded_sequences_2): self.assertTrue(np.allclose(enc_seq_1, enc_seq_2, atol=1e-3)) + # Test 2-D numpy arrays are batched. + speech_inputs = [floats_list((1, x))[0] for x in (800, 800, 800)] + np_speech_inputs = np.asarray(speech_inputs) + encoded_sequences_1 = feature_extractor(speech_inputs, return_tensors="np").input_features + encoded_sequences_2 = feature_extractor(np_speech_inputs, return_tensors="np").input_features + for enc_seq_1, enc_seq_2 in zip(encoded_sequences_1, encoded_sequences_2): + self.assertTrue(np.allclose(enc_seq_1, enc_seq_2, atol=1e-3)) + def test_double_precision_pad(self): import torch diff --git a/tests/models/clip/test_modeling_tf_clip.py b/tests/models/clip/test_modeling_tf_clip.py index 6cd20a47a7f058..10b9954fc88423 100644 --- a/tests/models/clip/test_modeling_tf_clip.py +++ b/tests/models/clip/test_modeling_tf_clip.py @@ -15,6 +15,8 @@ """ Testing suite for the TensorFlow CLIP model. """ +from __future__ import annotations + import inspect import os import tempfile diff --git a/tests/models/convbert/test_modeling_tf_convbert.py b/tests/models/convbert/test_modeling_tf_convbert.py index 0c259110e71626..84ed4de818f699 100644 --- a/tests/models/convbert/test_modeling_tf_convbert.py +++ b/tests/models/convbert/test_modeling_tf_convbert.py @@ -12,6 +12,8 @@ # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. # See the License for the specific language governing permissions and # limitations under the License. +from __future__ import annotations + import os import tempfile import unittest diff --git a/tests/models/convnext/test_modeling_tf_convnext.py b/tests/models/convnext/test_modeling_tf_convnext.py index 72981c09d65e45..8d049cf9f501a0 100644 --- a/tests/models/convnext/test_modeling_tf_convnext.py +++ b/tests/models/convnext/test_modeling_tf_convnext.py @@ -14,6 +14,8 @@ # limitations under the License. """ Testing suite for the TensorFlow ConvNext model. """ +from __future__ import annotations + import inspect import unittest from typing import List, Tuple diff --git a/tests/models/ctrl/test_modeling_tf_ctrl.py b/tests/models/ctrl/test_modeling_tf_ctrl.py index c71c96bc9da982..4d94a97828cf85 100644 --- a/tests/models/ctrl/test_modeling_tf_ctrl.py +++ b/tests/models/ctrl/test_modeling_tf_ctrl.py @@ -14,6 +14,8 @@ # limitations under the License. +from __future__ import annotations + import unittest from transformers import CTRLConfig, is_tf_available diff --git a/tests/models/cvt/test_modeling_tf_cvt.py b/tests/models/cvt/test_modeling_tf_cvt.py index 484bd295d17291..78d95931b3b522 100644 --- a/tests/models/cvt/test_modeling_tf_cvt.py +++ b/tests/models/cvt/test_modeling_tf_cvt.py @@ -1,6 +1,8 @@ """ Testing suite for the Tensorflow CvT model. """ +from __future__ import annotations + import inspect import unittest from math import floor @@ -186,6 +188,7 @@ def test_dataset_conversion(self): def test_keras_fit(self): super().test_keras_fit() + @unittest.skip(reason="Get `Failed to determine best cudnn convolution algo.` error after using TF 2.12+cuda 11.8") def test_keras_fit_mixed_precision(self): policy = tf.keras.mixed_precision.Policy("mixed_float16") tf.keras.mixed_precision.set_global_policy(policy) diff --git a/tests/models/data2vec/test_modeling_tf_data2vec_vision.py b/tests/models/data2vec/test_modeling_tf_data2vec_vision.py index dfa890d25a9e90..6a30c83ebaf941 100644 --- a/tests/models/data2vec/test_modeling_tf_data2vec_vision.py +++ b/tests/models/data2vec/test_modeling_tf_data2vec_vision.py @@ -14,6 +14,8 @@ # limitations under the License. """ Testing suite for the TensorFlow Data2VecVision model. """ +from __future__ import annotations + import collections.abc import inspect import unittest diff --git a/tests/models/deberta/test_modeling_tf_deberta.py b/tests/models/deberta/test_modeling_tf_deberta.py index 424d9e0b2b4100..9b69d55001cf02 100644 --- a/tests/models/deberta/test_modeling_tf_deberta.py +++ b/tests/models/deberta/test_modeling_tf_deberta.py @@ -14,6 +14,8 @@ # limitations under the License. +from __future__ import annotations + import unittest from transformers import DebertaConfig, is_tf_available diff --git a/tests/models/deberta_v2/test_modeling_tf_deberta_v2.py b/tests/models/deberta_v2/test_modeling_tf_deberta_v2.py index 60391635eedfad..96ebe375d97b3d 100644 --- a/tests/models/deberta_v2/test_modeling_tf_deberta_v2.py +++ b/tests/models/deberta_v2/test_modeling_tf_deberta_v2.py @@ -14,6 +14,8 @@ # limitations under the License. +from __future__ import annotations + import unittest from transformers import DebertaV2Config, is_tf_available diff --git a/tests/models/deit/test_modeling_tf_deit.py b/tests/models/deit/test_modeling_tf_deit.py index 223d164d4aaf3f..b350a5d546b08d 100644 --- a/tests/models/deit/test_modeling_tf_deit.py +++ b/tests/models/deit/test_modeling_tf_deit.py @@ -15,6 +15,8 @@ """ Testing suite for the TensorFlow DeiT model. """ +from __future__ import annotations + import inspect import unittest @@ -242,7 +244,7 @@ def _prepare_for_class(self, inputs_dict, model_class, return_labels=False): inputs_dict = super()._prepare_for_class(inputs_dict, model_class, return_labels=return_labels) if return_labels: - if model_class.__name__ == "DeiTForImageClassificationWithTeacher": + if "labels" in inputs_dict and "labels" not in inspect.signature(model_class.call).parameters: del inputs_dict["labels"] return inputs_dict diff --git a/tests/models/distilbert/test_modeling_tf_distilbert.py b/tests/models/distilbert/test_modeling_tf_distilbert.py index 1f4f3c2b469716..4e96c909765a49 100644 --- a/tests/models/distilbert/test_modeling_tf_distilbert.py +++ b/tests/models/distilbert/test_modeling_tf_distilbert.py @@ -14,6 +14,8 @@ # limitations under the License. +from __future__ import annotations + import unittest from transformers import DistilBertConfig, is_tf_available diff --git a/tests/models/dpr/test_modeling_tf_dpr.py b/tests/models/dpr/test_modeling_tf_dpr.py index 64dea041b53f38..f788a5163398ac 100644 --- a/tests/models/dpr/test_modeling_tf_dpr.py +++ b/tests/models/dpr/test_modeling_tf_dpr.py @@ -13,6 +13,8 @@ # See the License for the specific language governing permissions and # limitations under the License. +from __future__ import annotations + import unittest from transformers import is_tf_available diff --git a/tests/models/electra/test_modeling_tf_electra.py b/tests/models/electra/test_modeling_tf_electra.py index ae092e8a17dc5e..fe60c5627103c4 100644 --- a/tests/models/electra/test_modeling_tf_electra.py +++ b/tests/models/electra/test_modeling_tf_electra.py @@ -14,6 +14,8 @@ # limitations under the License. +from __future__ import annotations + import unittest from transformers import ElectraConfig, is_tf_available diff --git a/tests/models/encoder_decoder/test_modeling_tf_encoder_decoder.py b/tests/models/encoder_decoder/test_modeling_tf_encoder_decoder.py index 76ebd687f77edf..aa22e961f65344 100644 --- a/tests/models/encoder_decoder/test_modeling_tf_encoder_decoder.py +++ b/tests/models/encoder_decoder/test_modeling_tf_encoder_decoder.py @@ -14,6 +14,8 @@ # limitations under the License. +from __future__ import annotations + import copy import os import tempfile diff --git a/tests/models/esm/test_modeling_tf_esm.py b/tests/models/esm/test_modeling_tf_esm.py index dc9d430d07edae..d06e3c59ba8778 100644 --- a/tests/models/esm/test_modeling_tf_esm.py +++ b/tests/models/esm/test_modeling_tf_esm.py @@ -14,6 +14,8 @@ # limitations under the License. +from __future__ import annotations + import unittest from transformers import EsmConfig, is_tf_available diff --git a/tests/models/flaubert/test_modeling_tf_flaubert.py b/tests/models/flaubert/test_modeling_tf_flaubert.py index 6b7f4fc0316ea7..b751445d12cc98 100644 --- a/tests/models/flaubert/test_modeling_tf_flaubert.py +++ b/tests/models/flaubert/test_modeling_tf_flaubert.py @@ -13,6 +13,8 @@ # See the License for the specific language governing permissions and # limitations under the License. +from __future__ import annotations + import unittest from transformers import is_tf_available diff --git a/tests/models/funnel/test_modeling_tf_funnel.py b/tests/models/funnel/test_modeling_tf_funnel.py index 6780605e893644..5aea7e4309b51e 100644 --- a/tests/models/funnel/test_modeling_tf_funnel.py +++ b/tests/models/funnel/test_modeling_tf_funnel.py @@ -14,6 +14,8 @@ # limitations under the License. +from __future__ import annotations + import unittest from transformers import FunnelConfig, is_tf_available diff --git a/tests/models/gpt2/test_modeling_tf_gpt2.py b/tests/models/gpt2/test_modeling_tf_gpt2.py index 7171997546d6b4..c69ab863373d93 100644 --- a/tests/models/gpt2/test_modeling_tf_gpt2.py +++ b/tests/models/gpt2/test_modeling_tf_gpt2.py @@ -13,6 +13,8 @@ # See the License for the specific language governing permissions and # limitations under the License. +from __future__ import annotations + import unittest from transformers import GPT2Config, is_tf_available diff --git a/tests/models/gptj/test_modeling_tf_gptj.py b/tests/models/gptj/test_modeling_tf_gptj.py index 3aa63d2790a431..0e4dc9f5831dec 100644 --- a/tests/models/gptj/test_modeling_tf_gptj.py +++ b/tests/models/gptj/test_modeling_tf_gptj.py @@ -13,6 +13,8 @@ # See the License for the specific language governing permissions and # limitations under the License. +from __future__ import annotations + import unittest from transformers import AutoTokenizer, GPTJConfig, is_tf_available diff --git a/tests/models/groupvit/test_modeling_tf_groupvit.py b/tests/models/groupvit/test_modeling_tf_groupvit.py index bd499a50fb6cc4..a80ef606e5fcb2 100644 --- a/tests/models/groupvit/test_modeling_tf_groupvit.py +++ b/tests/models/groupvit/test_modeling_tf_groupvit.py @@ -15,6 +15,8 @@ """ Testing suite for the TensorFlow GroupViT model. """ +from __future__ import annotations + import inspect import os import random diff --git a/tests/models/hubert/test_modeling_tf_hubert.py b/tests/models/hubert/test_modeling_tf_hubert.py index a48ed0634e8489..0b8e1e2df94ad1 100644 --- a/tests/models/hubert/test_modeling_tf_hubert.py +++ b/tests/models/hubert/test_modeling_tf_hubert.py @@ -14,6 +14,8 @@ # limitations under the License. +from __future__ import annotations + import copy import inspect import math diff --git a/tests/models/informer/test_modeling_informer.py b/tests/models/informer/test_modeling_informer.py index 493846b6708b82..f3c8539d845049 100644 --- a/tests/models/informer/test_modeling_informer.py +++ b/tests/models/informer/test_modeling_informer.py @@ -438,7 +438,7 @@ def test_retain_grad_hidden_states_attentions(self): def prepare_batch(filename="train-batch.pt"): - file = hf_hub_download(repo_id="kashif/tourism-monthly-batch", filename=filename, repo_type="dataset") + file = hf_hub_download(repo_id="hf-internal-testing/tourism-monthly-batch", filename=filename, repo_type="dataset") batch = torch.load(file, map_location=torch_device) return batch diff --git a/tests/models/layoutlm/test_modeling_tf_layoutlm.py b/tests/models/layoutlm/test_modeling_tf_layoutlm.py index 95e24023bb23c3..2d134f23d42c7d 100644 --- a/tests/models/layoutlm/test_modeling_tf_layoutlm.py +++ b/tests/models/layoutlm/test_modeling_tf_layoutlm.py @@ -13,6 +13,8 @@ # See the License for the specific language governing permissions and # limitations under the License. +from __future__ import annotations + import unittest import numpy as np diff --git a/tests/models/layoutlmv3/test_modeling_tf_layoutlmv3.py b/tests/models/layoutlmv3/test_modeling_tf_layoutlmv3.py index df103194ab25c9..a1e2cd590836fd 100644 --- a/tests/models/layoutlmv3/test_modeling_tf_layoutlmv3.py +++ b/tests/models/layoutlmv3/test_modeling_tf_layoutlmv3.py @@ -14,6 +14,8 @@ # limitations under the License. """ Testing suite for the TensorFlow LayoutLMv3 model. """ +from __future__ import annotations + import copy import inspect import unittest diff --git a/tests/models/led/test_modeling_tf_led.py b/tests/models/led/test_modeling_tf_led.py index 7bac1ced835b09..8735aeb721d7aa 100644 --- a/tests/models/led/test_modeling_tf_led.py +++ b/tests/models/led/test_modeling_tf_led.py @@ -14,6 +14,8 @@ # limitations under the License. +from __future__ import annotations + import unittest from transformers import LEDConfig, is_tf_available diff --git a/tests/models/llama/test_tokenization_llama.py b/tests/models/llama/test_tokenization_llama.py index 6ce1bb44c03db2..3a1ec2be93bf4e 100644 --- a/tests/models/llama/test_tokenization_llama.py +++ b/tests/models/llama/test_tokenization_llama.py @@ -315,6 +315,39 @@ def integration_tests(self): }, ) + def test_fast_special_tokens(self): + slow_tokenizer = self.tokenizer + fast_tokenizer = self.rust_tokenizer + slow = slow_tokenizer.encode("A sample test", add_special_tokens=True) + assert slow == [1, 319, 4559, 1243] + + fast_tokenizer.add_eos_token = False + fast = fast_tokenizer.encode("A sample test", add_special_tokens=True) + assert fast == [1, 319, 4559, 1243] + + fast_tokenizer.add_eos_token = True + fast = fast_tokenizer.encode("A sample test", add_special_tokens=True) + assert fast == [1, 319, 4559, 1243, 2] + + slow_tokenizer.add_eos_token = True + slow = slow_tokenizer.encode("A sample test", add_special_tokens=True) + assert slow == [1, 319, 4559, 1243, 2] + + fast_tokenizer = LlamaTokenizerFast.from_pretrained( + "hf-internal-testing/llama-tokenizer", add_eos_token=True, add_bos_token=False + ) + fast = fast_tokenizer.encode("A sample test", add_special_tokens=True) + assert fast == [319, 4559, 1243, 2] + + slow_tokenzier = LlamaTokenizer.from_pretrained( + "hf-internal-testing/llama-tokenizer", add_eos_token=True, add_bos_token=False + ) + slow = slow_tokenzier.encode("A sample test", add_special_tokens=True) + assert slow == [319, 4559, 1243, 2] + + self.tokenizer.add_eos_token = False + self.rust_tokenizer.add_eos_token = False + @slow def test_conversion(self): # This is excruciatingly slow since it has to recreate the entire merge diff --git a/tests/models/longformer/test_modeling_tf_longformer.py b/tests/models/longformer/test_modeling_tf_longformer.py index b5452bc80ac502..dcdd68b18ffd0b 100644 --- a/tests/models/longformer/test_modeling_tf_longformer.py +++ b/tests/models/longformer/test_modeling_tf_longformer.py @@ -14,6 +14,8 @@ # limitations under the License. +from __future__ import annotations + import unittest from transformers import is_tf_available diff --git a/tests/models/lxmert/test_modeling_tf_lxmert.py b/tests/models/lxmert/test_modeling_tf_lxmert.py index cd2095f693d62a..411de960f31dae 100644 --- a/tests/models/lxmert/test_modeling_tf_lxmert.py +++ b/tests/models/lxmert/test_modeling_tf_lxmert.py @@ -13,6 +13,8 @@ # See the License for the specific language governing permissions and # limitations under the License. +from __future__ import annotations + import os import tempfile import unittest diff --git a/tests/models/marian/test_modeling_tf_marian.py b/tests/models/marian/test_modeling_tf_marian.py index 16b19b0f9763a2..5a624fda9a38bb 100644 --- a/tests/models/marian/test_modeling_tf_marian.py +++ b/tests/models/marian/test_modeling_tf_marian.py @@ -14,6 +14,8 @@ # limitations under the License. +from __future__ import annotations + import tempfile import unittest import warnings diff --git a/tests/models/mbart/test_modeling_tf_mbart.py b/tests/models/mbart/test_modeling_tf_mbart.py index b143fc6877b56f..6c36d705e81bcc 100644 --- a/tests/models/mbart/test_modeling_tf_mbart.py +++ b/tests/models/mbart/test_modeling_tf_mbart.py @@ -13,6 +13,8 @@ # See the License for the specific language governing permissions and # limitations under the License. +from __future__ import annotations + import tempfile import unittest diff --git a/tests/models/mctct/test_feature_extraction_mctct.py b/tests/models/mctct/test_feature_extraction_mctct.py index cab2911fdd40f4..f3d8f0fea940e9 100644 --- a/tests/models/mctct/test_feature_extraction_mctct.py +++ b/tests/models/mctct/test_feature_extraction_mctct.py @@ -134,6 +134,14 @@ def test_call(self): for enc_seq_1, enc_seq_2 in zip(encoded_sequences_1, encoded_sequences_2): self.assertTrue(np.allclose(enc_seq_1, enc_seq_2, atol=1e-3)) + # Test 2-D numpy arrays are batched. + speech_inputs = [floats_list((1, x))[0] for x in (800, 800, 800)] + np_speech_inputs = np.asarray(speech_inputs) + encoded_sequences_1 = feature_extractor(speech_inputs, return_tensors="np").input_features + encoded_sequences_2 = feature_extractor(np_speech_inputs, return_tensors="np").input_features + for enc_seq_1, enc_seq_2 in zip(encoded_sequences_1, encoded_sequences_2): + self.assertTrue(np.allclose(enc_seq_1, enc_seq_2, atol=1e-3)) + def test_cepstral_mean_and_variance_normalization(self): feature_extractor = self.feature_extraction_class(**self.feat_extract_tester.prepare_feat_extract_dict()) speech_inputs = [floats_list((1, x))[0] for x in range(8000, 14000, 2000)] diff --git a/tests/models/mobilebert/test_modeling_tf_mobilebert.py b/tests/models/mobilebert/test_modeling_tf_mobilebert.py index 69d2fc6768e4e2..293126ab614727 100644 --- a/tests/models/mobilebert/test_modeling_tf_mobilebert.py +++ b/tests/models/mobilebert/test_modeling_tf_mobilebert.py @@ -14,6 +14,8 @@ # limitations under the License. +from __future__ import annotations + import unittest from transformers import MobileBertConfig, is_tf_available diff --git a/tests/models/mobilevit/test_modeling_tf_mobilevit.py b/tests/models/mobilevit/test_modeling_tf_mobilevit.py index e4a956dff2c377..37d7db39e68d3e 100644 --- a/tests/models/mobilevit/test_modeling_tf_mobilevit.py +++ b/tests/models/mobilevit/test_modeling_tf_mobilevit.py @@ -15,6 +15,8 @@ """ Testing suite for the TensorFlow MobileViT model. """ +from __future__ import annotations + import inspect import unittest diff --git a/tests/models/mpnet/test_modeling_tf_mpnet.py b/tests/models/mpnet/test_modeling_tf_mpnet.py index 4936a52899034c..381b6e81dd8ddf 100644 --- a/tests/models/mpnet/test_modeling_tf_mpnet.py +++ b/tests/models/mpnet/test_modeling_tf_mpnet.py @@ -14,6 +14,8 @@ # limitations under the License. +from __future__ import annotations + import unittest from transformers import MPNetConfig, is_tf_available diff --git a/tests/models/mt5/test_modeling_tf_mt5.py b/tests/models/mt5/test_modeling_tf_mt5.py index 0c934f0314c87a..facb63dd7931c7 100644 --- a/tests/models/mt5/test_modeling_tf_mt5.py +++ b/tests/models/mt5/test_modeling_tf_mt5.py @@ -13,6 +13,8 @@ # See the License for the specific language governing permissions and # limitations under the License. +from __future__ import annotations + import unittest from transformers import is_tf_available diff --git a/tests/models/openai/test_modeling_tf_openai.py b/tests/models/openai/test_modeling_tf_openai.py index a4cf71bf1a9f66..a82da911a4c8cb 100644 --- a/tests/models/openai/test_modeling_tf_openai.py +++ b/tests/models/openai/test_modeling_tf_openai.py @@ -14,6 +14,8 @@ # limitations under the License. +from __future__ import annotations + import unittest from transformers import OpenAIGPTConfig, is_tf_available diff --git a/tests/models/opt/test_modeling_tf_opt.py b/tests/models/opt/test_modeling_tf_opt.py index 0ae3411812dc19..85514c9d7204da 100644 --- a/tests/models/opt/test_modeling_tf_opt.py +++ b/tests/models/opt/test_modeling_tf_opt.py @@ -13,6 +13,8 @@ # See the License for the specific language governing permissions and # limitations under the License. +from __future__ import annotations + import unittest import numpy as np diff --git a/tests/models/pegasus/test_modeling_tf_pegasus.py b/tests/models/pegasus/test_modeling_tf_pegasus.py index 6816cc34ef82ec..b34a3dcfb5cc0a 100644 --- a/tests/models/pegasus/test_modeling_tf_pegasus.py +++ b/tests/models/pegasus/test_modeling_tf_pegasus.py @@ -13,6 +13,8 @@ # See the License for the specific language governing permissions and # limitations under the License. +from __future__ import annotations + import tempfile import unittest diff --git a/tests/models/rag/test_modeling_tf_rag.py b/tests/models/rag/test_modeling_tf_rag.py index 4a0e4176b4e39c..b4720f7c7f0dde 100644 --- a/tests/models/rag/test_modeling_tf_rag.py +++ b/tests/models/rag/test_modeling_tf_rag.py @@ -1,3 +1,5 @@ +from __future__ import annotations + import json import os import shutil diff --git a/tests/models/regnet/test_modeling_tf_regnet.py b/tests/models/regnet/test_modeling_tf_regnet.py index f5f5cfd4b99db4..cee3995d210005 100644 --- a/tests/models/regnet/test_modeling_tf_regnet.py +++ b/tests/models/regnet/test_modeling_tf_regnet.py @@ -14,6 +14,8 @@ # limitations under the License. """ Testing suite for the TensorFlow RegNet model. """ +from __future__ import annotations + import inspect import unittest from typing import List, Tuple diff --git a/tests/models/rembert/test_modeling_tf_rembert.py b/tests/models/rembert/test_modeling_tf_rembert.py index 7ab71e9c6e24be..e70bd7033fc1f4 100644 --- a/tests/models/rembert/test_modeling_tf_rembert.py +++ b/tests/models/rembert/test_modeling_tf_rembert.py @@ -14,6 +14,8 @@ # limitations under the License. +from __future__ import annotations + import unittest from transformers import RemBertConfig, is_tf_available diff --git a/tests/models/resnet/test_modeling_tf_resnet.py b/tests/models/resnet/test_modeling_tf_resnet.py index 0a8ccc00415c91..e6f8d121c27875 100644 --- a/tests/models/resnet/test_modeling_tf_resnet.py +++ b/tests/models/resnet/test_modeling_tf_resnet.py @@ -15,6 +15,8 @@ """ Testing suite for the Tensorflow ResNet model. """ +from __future__ import annotations + import inspect import unittest diff --git a/tests/models/roberta/test_modeling_tf_roberta.py b/tests/models/roberta/test_modeling_tf_roberta.py index efa54ba45f9cad..3d7b6953c085c7 100644 --- a/tests/models/roberta/test_modeling_tf_roberta.py +++ b/tests/models/roberta/test_modeling_tf_roberta.py @@ -14,6 +14,8 @@ # limitations under the License. +from __future__ import annotations + import unittest from transformers import RobertaConfig, is_tf_available diff --git a/tests/models/roberta_prelayernorm/test_modeling_tf_roberta_prelayernorm.py b/tests/models/roberta_prelayernorm/test_modeling_tf_roberta_prelayernorm.py index 6de20c0e1d0446..4e1bd2e319af1b 100644 --- a/tests/models/roberta_prelayernorm/test_modeling_tf_roberta_prelayernorm.py +++ b/tests/models/roberta_prelayernorm/test_modeling_tf_roberta_prelayernorm.py @@ -14,6 +14,8 @@ # limitations under the License. +from __future__ import annotations + import unittest from transformers import RobertaPreLayerNormConfig, is_tf_available diff --git a/tests/models/roformer/test_modeling_tf_roformer.py b/tests/models/roformer/test_modeling_tf_roformer.py index 0a632e39a2ed97..52c630e2beaede 100644 --- a/tests/models/roformer/test_modeling_tf_roformer.py +++ b/tests/models/roformer/test_modeling_tf_roformer.py @@ -14,6 +14,8 @@ # limitations under the License. +from __future__ import annotations + import unittest from transformers import RoFormerConfig, is_tf_available diff --git a/tests/models/sam/test_modeling_sam.py b/tests/models/sam/test_modeling_sam.py index 8507c0a6381b60..a7015145222930 100644 --- a/tests/models/sam/test_modeling_sam.py +++ b/tests/models/sam/test_modeling_sam.py @@ -20,7 +20,7 @@ import requests -from transformers import SamConfig, SamMaskDecoderConfig, SamPromptEncoderConfig, SamVisionConfig +from transformers import SamConfig, SamMaskDecoderConfig, SamPromptEncoderConfig, SamVisionConfig, pipeline from transformers.testing_utils import require_torch, slow, torch_device from transformers.utils import is_torch_available, is_vision_available @@ -436,8 +436,9 @@ def test_retain_grad_hidden_states_attentions(self): def test_hidden_states_output(self): pass - def test_pt_tf_model_equivalence(self, allow_missing_keys=True, tol=5e-4): - super().test_pt_tf_model_equivalence(allow_missing_keys=True, tol=tol) + def check_pt_tf_outputs(self, tf_outputs, pt_outputs, model_class, tol=5e-5, name="outputs", attributes=None): + # Use a slightly higher default tol to make the tests non-flaky + super().check_pt_tf_outputs(tf_outputs, pt_outputs, model_class, tol=tol, name=name, attributes=attributes) @slow def test_model_from_pretrained(self): @@ -461,8 +462,8 @@ def prepare_dog_img(): @slow class SamModelIntegrationTest(unittest.TestCase): def test_inference_mask_generation_no_point(self): - model = SamModel.from_pretrained("facebook/sam-vit-huge") - processor = SamProcessor.from_pretrained("facebook/sam-vit-huge") + model = SamModel.from_pretrained("facebook/sam-vit-base") + processor = SamProcessor.from_pretrained("facebook/sam-vit-base") model.to(torch_device) model.eval() @@ -474,13 +475,12 @@ def test_inference_mask_generation_no_point(self): outputs = model(**inputs) scores = outputs.iou_scores.squeeze() masks = outputs.pred_masks[0, 0, 0, 0, :3] - - self.assertTrue(torch.allclose(scores[-1], torch.tensor(0.5798), atol=2e-4)) - self.assertTrue(torch.allclose(masks, torch.tensor([-6.6381, -6.0734, -7.5308]).to(torch_device), atol=2e-4)) + self.assertTrue(torch.allclose(scores[-1], torch.tensor(0.4515), atol=2e-4)) + self.assertTrue(torch.allclose(masks, torch.tensor([-4.1800, -3.4948, -3.4481]).to(torch_device), atol=2e-4)) def test_inference_mask_generation_one_point_one_bb(self): - model = SamModel.from_pretrained("facebook/sam-vit-huge") - processor = SamProcessor.from_pretrained("facebook/sam-vit-huge") + model = SamModel.from_pretrained("facebook/sam-vit-base") + processor = SamProcessor.from_pretrained("facebook/sam-vit-base") model.to(torch_device) model.eval() @@ -497,15 +497,14 @@ def test_inference_mask_generation_one_point_one_bb(self): outputs = model(**inputs) scores = outputs.iou_scores.squeeze() masks = outputs.pred_masks[0, 0, 0, 0, :3] - - self.assertTrue(torch.allclose(scores[-1], torch.tensor(0.9935), atol=2e-4)) + self.assertTrue(torch.allclose(scores[-1], torch.tensor(0.9566), atol=2e-4)) self.assertTrue( - torch.allclose(masks, torch.tensor([-21.5465, -23.1122, -22.3331]).to(torch_device), atol=2e-4) + torch.allclose(masks, torch.tensor([-12.7729, -12.3665, -12.6061]).to(torch_device), atol=2e-4) ) def test_inference_mask_generation_batched_points_batched_images(self): - model = SamModel.from_pretrained("facebook/sam-vit-huge") - processor = SamProcessor.from_pretrained("facebook/sam-vit-huge") + model = SamModel.from_pretrained("facebook/sam-vit-base") + processor = SamProcessor.from_pretrained("facebook/sam-vit-base") model.to(torch_device) model.eval() @@ -528,26 +527,26 @@ def test_inference_mask_generation_batched_points_batched_images(self): EXPECTED_SCORES = torch.tensor( [ [ - [0.9673, 0.9441, 0.9084], - [0.9673, 0.9441, 0.9084], - [0.9673, 0.9441, 0.9084], - [0.9673, 0.9441, 0.9084], + [0.6765, 0.9379, 0.8803], + [0.6765, 0.9379, 0.8803], + [0.6765, 0.9379, 0.8803], + [0.6765, 0.9379, 0.8803], ], [ - [0.8405, 0.6292, 0.3840], - [0.9673, 0.9441, 0.9084], - [0.9673, 0.9441, 0.9084], - [0.9673, 0.9441, 0.9084], + [0.3317, 0.7264, 0.7646], + [0.6765, 0.9379, 0.8803], + [0.6765, 0.9379, 0.8803], + [0.6765, 0.9379, 0.8803], ], ] ) - EXPECTED_MASKS = torch.tensor([-26.5424, -34.0901, -30.6406]) + EXPECTED_MASKS = torch.tensor([-2.8550, -2.7988, -2.9625]) self.assertTrue(torch.allclose(scores, EXPECTED_SCORES, atol=1e-3)) self.assertTrue(torch.allclose(masks, EXPECTED_MASKS, atol=1e-3)) def test_inference_mask_generation_one_point_one_bb_zero(self): - model = SamModel.from_pretrained("facebook/sam-vit-huge") - processor = SamProcessor.from_pretrained("facebook/sam-vit-huge") + model = SamModel.from_pretrained("facebook/sam-vit-base") + processor = SamProcessor.from_pretrained("facebook/sam-vit-base") model.to(torch_device) model.eval() @@ -569,11 +568,11 @@ def test_inference_mask_generation_one_point_one_bb_zero(self): outputs = model(**inputs) scores = outputs.iou_scores.squeeze() - self.assertTrue(torch.allclose(scores[-1], torch.tensor(0.9689), atol=1e-4)) + self.assertTrue(torch.allclose(scores[-1], torch.tensor(0.7894), atol=1e-4)) def test_inference_mask_generation_one_point(self): - model = SamModel.from_pretrained("facebook/sam-vit-huge") - processor = SamProcessor.from_pretrained("facebook/sam-vit-huge") + model = SamModel.from_pretrained("facebook/sam-vit-base") + processor = SamProcessor.from_pretrained("facebook/sam-vit-base") model.to(torch_device) model.eval() @@ -590,8 +589,7 @@ def test_inference_mask_generation_one_point(self): with torch.no_grad(): outputs = model(**inputs) scores = outputs.iou_scores.squeeze() - - self.assertTrue(torch.allclose(scores[-1], torch.tensor(0.9712), atol=1e-4)) + self.assertTrue(torch.allclose(scores[-1], torch.tensor(0.9675), atol=1e-4)) # With no label input_points = [[[400, 650]]] @@ -601,12 +599,11 @@ def test_inference_mask_generation_one_point(self): with torch.no_grad(): outputs = model(**inputs) scores = outputs.iou_scores.squeeze() - - self.assertTrue(torch.allclose(scores[-1], torch.tensor(0.9712), atol=1e-4)) + self.assertTrue(torch.allclose(scores[-1], torch.tensor(0.9675), atol=1e-4)) def test_inference_mask_generation_two_points(self): - model = SamModel.from_pretrained("facebook/sam-vit-huge") - processor = SamProcessor.from_pretrained("facebook/sam-vit-huge") + model = SamModel.from_pretrained("facebook/sam-vit-base") + processor = SamProcessor.from_pretrained("facebook/sam-vit-base") model.to(torch_device) model.eval() @@ -623,8 +620,7 @@ def test_inference_mask_generation_two_points(self): with torch.no_grad(): outputs = model(**inputs) scores = outputs.iou_scores.squeeze() - - self.assertTrue(torch.allclose(scores[-1], torch.tensor(0.9936), atol=1e-4)) + self.assertTrue(torch.allclose(scores[-1], torch.tensor(0.9762), atol=1e-4)) # no labels inputs = processor(images=raw_image, input_points=input_points, return_tensors="pt").to(torch_device) @@ -633,11 +629,11 @@ def test_inference_mask_generation_two_points(self): outputs = model(**inputs) scores = outputs.iou_scores.squeeze() - self.assertTrue(torch.allclose(scores[-1], torch.tensor(0.9936), atol=1e-4)) + self.assertTrue(torch.allclose(scores[-1], torch.tensor(0.9762), atol=1e-4)) def test_inference_mask_generation_two_points_batched(self): - model = SamModel.from_pretrained("facebook/sam-vit-huge") - processor = SamProcessor.from_pretrained("facebook/sam-vit-huge") + model = SamModel.from_pretrained("facebook/sam-vit-base") + processor = SamProcessor.from_pretrained("facebook/sam-vit-base") model.to(torch_device) model.eval() @@ -654,13 +650,12 @@ def test_inference_mask_generation_two_points_batched(self): with torch.no_grad(): outputs = model(**inputs) scores = outputs.iou_scores.squeeze() - - self.assertTrue(torch.allclose(scores[0][-1], torch.tensor(0.9936), atol=1e-4)) - self.assertTrue(torch.allclose(scores[1][-1], torch.tensor(0.9716), atol=1e-4)) + self.assertTrue(torch.allclose(scores[0][-1], torch.tensor(0.9762), atol=1e-4)) + self.assertTrue(torch.allclose(scores[1][-1], torch.tensor(0.9637), atol=1e-4)) def test_inference_mask_generation_one_box(self): - model = SamModel.from_pretrained("facebook/sam-vit-huge") - processor = SamProcessor.from_pretrained("facebook/sam-vit-huge") + model = SamModel.from_pretrained("facebook/sam-vit-base") + processor = SamProcessor.from_pretrained("facebook/sam-vit-base") model.to(torch_device) model.eval() @@ -674,12 +669,11 @@ def test_inference_mask_generation_one_box(self): with torch.no_grad(): outputs = model(**inputs) scores = outputs.iou_scores.squeeze() - - self.assertTrue(torch.allclose(scores[-1], torch.tensor(0.8686), atol=1e-4)) + self.assertTrue(torch.allclose(scores[-1], torch.tensor(0.7937), atol=1e-4)) def test_inference_mask_generation_batched_image_one_point(self): - model = SamModel.from_pretrained("facebook/sam-vit-huge") - processor = SamProcessor.from_pretrained("facebook/sam-vit-huge") + model = SamModel.from_pretrained("facebook/sam-vit-base") + processor = SamProcessor.from_pretrained("facebook/sam-vit-base") model.to(torch_device) model.eval() @@ -707,8 +701,8 @@ def test_inference_mask_generation_batched_image_one_point(self): self.assertTrue(torch.allclose(scores_batched[1, :], scores_single, atol=1e-4)) def test_inference_mask_generation_two_points_point_batch(self): - model = SamModel.from_pretrained("facebook/sam-vit-huge") - processor = SamProcessor.from_pretrained("facebook/sam-vit-huge") + model = SamModel.from_pretrained("facebook/sam-vit-base") + processor = SamProcessor.from_pretrained("facebook/sam-vit-base") model.to(torch_device) model.eval() @@ -729,12 +723,12 @@ def test_inference_mask_generation_two_points_point_batch(self): iou_scores = outputs.iou_scores.cpu() self.assertTrue(iou_scores.shape == (1, 2, 3)) torch.testing.assert_allclose( - iou_scores, torch.tensor([[[0.9848, 0.9788, 0.9713], [0.9211, 0.9128, 0.7427]]]), atol=1e-4, rtol=1e-4 + iou_scores, torch.tensor([[[0.9105, 0.9825, 0.9675], [0.7646, 0.7943, 0.7774]]]), atol=1e-4, rtol=1e-4 ) def test_inference_mask_generation_three_boxes_point_batch(self): - model = SamModel.from_pretrained("facebook/sam-vit-huge") - processor = SamProcessor.from_pretrained("facebook/sam-vit-huge") + model = SamModel.from_pretrained("facebook/sam-vit-base") + processor = SamProcessor.from_pretrained("facebook/sam-vit-base") model.to(torch_device) model.eval() @@ -743,7 +737,9 @@ def test_inference_mask_generation_three_boxes_point_batch(self): # fmt: off input_boxes = torch.Tensor([[[620, 900, 1000, 1255]], [[75, 275, 1725, 850]], [[75, 275, 1725, 850]]]).cpu() - EXPECTED_IOU = torch.tensor([[[1.0071, 1.0032, 0.9946], [0.4962, 0.8770, 0.8686], [0.4962, 0.8770, 0.8686]]]) + EXPECTED_IOU = torch.tensor([[[0.9773, 0.9881, 0.9522], + [0.5996, 0.7661, 0.7937], + [0.5996, 0.7661, 0.7937]]]) # fmt: on input_boxes = input_boxes.unsqueeze(0) @@ -755,3 +751,9 @@ def test_inference_mask_generation_three_boxes_point_batch(self): iou_scores = outputs.iou_scores.cpu() self.assertTrue(iou_scores.shape == (1, 3, 3)) torch.testing.assert_allclose(iou_scores, EXPECTED_IOU, atol=1e-4, rtol=1e-4) + + def test_dummy_pipeline_generation(self): + generator = pipeline("mask-generation", model="facebook/sam-vit-base", device=torch_device) + raw_image = prepare_image() + + _ = generator(raw_image, points_per_batch=64) diff --git a/tests/models/sam/test_modeling_tf_sam.py b/tests/models/sam/test_modeling_tf_sam.py index a07398365fff78..4e918a1cd13ebc 100644 --- a/tests/models/sam/test_modeling_tf_sam.py +++ b/tests/models/sam/test_modeling_tf_sam.py @@ -15,6 +15,8 @@ """ Testing suite for the TensorFlow SAM model. """ +from __future__ import annotations + import inspect import unittest @@ -34,7 +36,6 @@ import tensorflow as tf from transformers import SamProcessor, TFSamModel - from transformers.models.sam.modeling_tf_sam import TF_SAM_PRETRAINED_MODEL_ARCHIVE_LIST if is_vision_available(): from PIL import Image @@ -400,9 +401,8 @@ def test_hidden_states_output(self): @slow def test_model_from_pretrained(self): - for model_name in TF_SAM_PRETRAINED_MODEL_ARCHIVE_LIST[:1]: - model = TFSamModel.from_pretrained(model_name) - self.assertIsNotNone(model) + model = TFSamModel.from_pretrained("facebook/sam-vit-base") # sam-vit-huge blows out our memory + self.assertIsNotNone(model) def check_pt_tf_outputs(self, tf_outputs, pt_outputs, model_class, tol=5e-4, name="outputs", attributes=None): super().check_pt_tf_outputs( @@ -430,8 +430,8 @@ def prepare_dog_img(): @slow class SamModelIntegrationTest(unittest.TestCase): def test_inference_mask_generation_no_point(self): - model = TFSamModel.from_pretrained("facebook/sam-vit-huge") - processor = SamProcessor.from_pretrained("facebook/sam-vit-huge") + model = TFSamModel.from_pretrained("facebook/sam-vit-base") + processor = SamProcessor.from_pretrained("facebook/sam-vit-base") raw_image = prepare_image() inputs = processor(images=raw_image, return_tensors="tf") @@ -439,13 +439,12 @@ def test_inference_mask_generation_no_point(self): outputs = model(**inputs) scores = tf.squeeze(outputs.iou_scores) masks = outputs.pred_masks[0, 0, 0, 0, :3] - - self.assertTrue(np.allclose(scores[-1].numpy(), np.array(0.5798), atol=2e-4)) - self.assertTrue(np.allclose(masks.numpy(), np.array([-6.6381, -6.0734, -7.5308]), atol=1e-2)) + self.assertTrue(np.allclose(scores[-1].numpy(), np.array(0.4515), atol=2e-4)) + self.assertTrue(np.allclose(masks.numpy(), np.array([-4.1807, -3.4949, -3.4483]), atol=1e-2)) def test_inference_mask_generation_one_point_one_bb(self): - model = TFSamModel.from_pretrained("facebook/sam-vit-huge") - processor = SamProcessor.from_pretrained("facebook/sam-vit-huge") + model = TFSamModel.from_pretrained("facebook/sam-vit-base") + processor = SamProcessor.from_pretrained("facebook/sam-vit-base") raw_image = prepare_image() input_boxes = [[[650, 900, 1000, 1250]]] @@ -457,12 +456,12 @@ def test_inference_mask_generation_one_point_one_bb(self): scores = tf.squeeze(outputs.iou_scores) masks = outputs.pred_masks[0, 0, 0, 0, :3] - self.assertTrue(np.allclose(scores[-1], np.array(0.9935), atol=2e-4)) - self.assertTrue(np.allclose(masks.numpy(), np.array([-21.5465, -23.1122, -22.3331]), atol=2e-2)) + self.assertTrue(np.allclose(scores[-1], np.array(0.9566), atol=2e-4)) + self.assertTrue(np.allclose(masks.numpy(), np.array([-12.7657, -12.3683, -12.5985]), atol=2e-2)) def test_inference_mask_generation_batched_points_batched_images(self): - model = TFSamModel.from_pretrained("facebook/sam-vit-huge") - processor = SamProcessor.from_pretrained("facebook/sam-vit-huge") + model = TFSamModel.from_pretrained("facebook/sam-vit-base") + processor = SamProcessor.from_pretrained("facebook/sam-vit-base") raw_image = prepare_image() input_points = [ @@ -479,26 +478,26 @@ def test_inference_mask_generation_batched_points_batched_images(self): EXPECTED_SCORES = np.array( [ [ - [0.9673, 0.9441, 0.9084], - [0.9673, 0.9441, 0.9084], - [0.9673, 0.9441, 0.9084], - [0.9673, 0.9441, 0.9084], + [0.6765, 0.9379, 0.8803], + [0.6765, 0.9379, 0.8803], + [0.6765, 0.9379, 0.8803], + [0.6765, 0.9379, 0.8803], ], [ - [0.8405, 0.6292, 0.3840], - [0.9673, 0.9441, 0.9084], - [0.9673, 0.9441, 0.9084], - [0.9673, 0.9441, 0.9084], + [0.3317, 0.7264, 0.7646], + [0.6765, 0.9379, 0.8803], + [0.6765, 0.9379, 0.8803], + [0.6765, 0.9379, 0.8803], ], ] ) - EXPECTED_MASKS = np.array([-26.5424, -34.0901, -30.6406]) + EXPECTED_MASKS = np.array([-2.8552, -2.7990, -2.9612]) self.assertTrue(np.allclose(scores.numpy(), EXPECTED_SCORES, atol=1e-3)) self.assertTrue(np.allclose(masks.numpy(), EXPECTED_MASKS, atol=3e-2)) def test_inference_mask_generation_one_point_one_bb_zero(self): - model = TFSamModel.from_pretrained("facebook/sam-vit-huge") - processor = SamProcessor.from_pretrained("facebook/sam-vit-huge") + model = TFSamModel.from_pretrained("facebook/sam-vit-base") + processor = SamProcessor.from_pretrained("facebook/sam-vit-base") raw_image = prepare_image() input_boxes = [[[620, 900, 1000, 1255]]] @@ -515,12 +514,11 @@ def test_inference_mask_generation_one_point_one_bb_zero(self): outputs = model(**inputs) scores = tf.squeeze(outputs.iou_scores) - - self.assertTrue(np.allclose(scores[-1].numpy(), np.array(0.9689), atol=1e-4)) + self.assertTrue(np.allclose(scores[-1].numpy(), np.array(0.7894), atol=1e-4)) def test_inference_mask_generation_one_point(self): - model = TFSamModel.from_pretrained("facebook/sam-vit-huge") - processor = SamProcessor.from_pretrained("facebook/sam-vit-huge") + model = TFSamModel.from_pretrained("facebook/sam-vit-base") + processor = SamProcessor.from_pretrained("facebook/sam-vit-base") raw_image = prepare_image() @@ -532,7 +530,7 @@ def test_inference_mask_generation_one_point(self): outputs = model(**inputs) scores = tf.squeeze(outputs.iou_scores) - self.assertTrue(np.allclose(scores[-1], np.array(0.9712), atol=1e-4)) + self.assertTrue(np.allclose(scores[-1], np.array(0.9675), atol=1e-4)) # With no label input_points = [[[400, 650]]] @@ -542,11 +540,11 @@ def test_inference_mask_generation_one_point(self): outputs = model(**inputs) scores = tf.squeeze(outputs.iou_scores) - self.assertTrue(np.allclose(scores[-1].numpy(), np.array(0.9712), atol=1e-4)) + self.assertTrue(np.allclose(scores[-1].numpy(), np.array(0.9675), atol=1e-4)) def test_inference_mask_generation_two_points(self): - model = TFSamModel.from_pretrained("facebook/sam-vit-huge") - processor = SamProcessor.from_pretrained("facebook/sam-vit-huge") + model = TFSamModel.from_pretrained("facebook/sam-vit-base") + processor = SamProcessor.from_pretrained("facebook/sam-vit-base") raw_image = prepare_image() input_points = [[[400, 650], [800, 650]]] @@ -557,7 +555,7 @@ def test_inference_mask_generation_two_points(self): outputs = model(**inputs) scores = tf.squeeze(outputs.iou_scores) - self.assertTrue(np.allclose(scores[-1].numpy(), np.array(0.9936), atol=1e-4)) + self.assertTrue(np.allclose(scores[-1].numpy(), np.array(0.9762), atol=1e-4)) # no labels inputs = processor(images=raw_image, input_points=input_points, return_tensors="tf") @@ -565,11 +563,11 @@ def test_inference_mask_generation_two_points(self): outputs = model(**inputs) scores = tf.squeeze(outputs.iou_scores) - self.assertTrue(np.allclose(scores[-1].numpy(), np.array(0.9936), atol=1e-4)) + self.assertTrue(np.allclose(scores[-1].numpy(), np.array(0.9762), atol=1e-4)) def test_inference_mask_generation_two_points_batched(self): - model = TFSamModel.from_pretrained("facebook/sam-vit-huge") - processor = SamProcessor.from_pretrained("facebook/sam-vit-huge") + model = TFSamModel.from_pretrained("facebook/sam-vit-base") + processor = SamProcessor.from_pretrained("facebook/sam-vit-base") raw_image = prepare_image() @@ -583,12 +581,12 @@ def test_inference_mask_generation_two_points_batched(self): outputs = model(**inputs) scores = tf.squeeze(outputs.iou_scores) - self.assertTrue(np.allclose(scores[0][-1].numpy(), np.array(0.9936), atol=1e-4)) - self.assertTrue(np.allclose(scores[1][-1], np.array(0.9716), atol=1e-4)) + self.assertTrue(np.allclose(scores[0][-1].numpy(), np.array(0.9762), atol=1e-4)) + self.assertTrue(np.allclose(scores[1][-1], np.array(0.9637), atol=1e-4)) def test_inference_mask_generation_one_box(self): - model = TFSamModel.from_pretrained("facebook/sam-vit-huge") - processor = SamProcessor.from_pretrained("facebook/sam-vit-huge") + model = TFSamModel.from_pretrained("facebook/sam-vit-base") + processor = SamProcessor.from_pretrained("facebook/sam-vit-base") raw_image = prepare_image() @@ -599,11 +597,11 @@ def test_inference_mask_generation_one_box(self): outputs = model(**inputs) scores = tf.squeeze(outputs.iou_scores) - self.assertTrue(np.allclose(scores[-1].numpy(), np.array(0.8686), atol=1e-4)) + self.assertTrue(np.allclose(scores[-1].numpy(), np.array(0.7937), atol=1e-4)) def test_inference_mask_generation_batched_image_one_point(self): - model = TFSamModel.from_pretrained("facebook/sam-vit-huge") - processor = SamProcessor.from_pretrained("facebook/sam-vit-huge") + model = TFSamModel.from_pretrained("facebook/sam-vit-base") + processor = SamProcessor.from_pretrained("facebook/sam-vit-base") raw_image = prepare_image() raw_dog_image = prepare_dog_img() @@ -624,8 +622,8 @@ def test_inference_mask_generation_batched_image_one_point(self): self.assertTrue(np.allclose(scores_batched[1, :].numpy(), scores_single.numpy(), atol=1e-4)) def test_inference_mask_generation_two_points_point_batch(self): - model = TFSamModel.from_pretrained("facebook/sam-vit-huge") - processor = SamProcessor.from_pretrained("facebook/sam-vit-huge") + model = TFSamModel.from_pretrained("facebook/sam-vit-base") + processor = SamProcessor.from_pretrained("facebook/sam-vit-base") raw_image = prepare_image() @@ -644,21 +642,23 @@ def test_inference_mask_generation_two_points_point_batch(self): self.assertTrue( np.allclose( iou_scores.numpy(), - np.array([[[0.9848, 0.9788, 0.9713], [0.9211, 0.9128, 0.7427]]]), + np.array([[[0.9105, 0.9825, 0.9675], [0.7646, 0.7943, 0.7774]]]), atol=1e-4, rtol=1e-4, ) ) def test_inference_mask_generation_three_boxes_point_batch(self): - model = TFSamModel.from_pretrained("facebook/sam-vit-huge") - processor = SamProcessor.from_pretrained("facebook/sam-vit-huge") + model = TFSamModel.from_pretrained("facebook/sam-vit-base") + processor = SamProcessor.from_pretrained("facebook/sam-vit-base") raw_image = prepare_image() # fmt: off input_boxes = tf.convert_to_tensor([[[620, 900, 1000, 1255]], [[75, 275, 1725, 850]], [[75, 275, 1725, 850]]]) - EXPECTED_IOU = np.array([[[1.0071, 1.0032, 0.9946], [0.4962, 0.8770, 0.8686], [0.4962, 0.8770, 0.8686]]]) + EXPECTED_IOU = np.array([[[0.9773, 0.9881, 0.9522], + [0.5996, 0.7661, 0.7937], + [0.5996, 0.7661, 0.7937]]]) # fmt: on input_boxes = tf.expand_dims(input_boxes, 0) diff --git a/tests/models/segformer/test_modeling_tf_segformer.py b/tests/models/segformer/test_modeling_tf_segformer.py index 79c58ebe401b2c..b831e8ddbc2b14 100644 --- a/tests/models/segformer/test_modeling_tf_segformer.py +++ b/tests/models/segformer/test_modeling_tf_segformer.py @@ -14,6 +14,8 @@ # limitations under the License. """ Testing suite for the TensorFlow SegFormer model. """ +from __future__ import annotations + import inspect import unittest from typing import List, Tuple diff --git a/tests/models/speech_to_text/test_feature_extraction_speech_to_text.py b/tests/models/speech_to_text/test_feature_extraction_speech_to_text.py index aedd445e5d6393..293b33fde80e3a 100644 --- a/tests/models/speech_to_text/test_feature_extraction_speech_to_text.py +++ b/tests/models/speech_to_text/test_feature_extraction_speech_to_text.py @@ -136,6 +136,14 @@ def test_call(self): for enc_seq_1, enc_seq_2 in zip(encoded_sequences_1, encoded_sequences_2): self.assertTrue(np.allclose(enc_seq_1, enc_seq_2, atol=1e-3)) + # Test 2-D numpy arrays are batched. + speech_inputs = [floats_list((1, x))[0] for x in (800, 800, 800)] + np_speech_inputs = np.asarray(speech_inputs) + encoded_sequences_1 = feature_extractor(speech_inputs, return_tensors="np").input_features + encoded_sequences_2 = feature_extractor(np_speech_inputs, return_tensors="np").input_features + for enc_seq_1, enc_seq_2 in zip(encoded_sequences_1, encoded_sequences_2): + self.assertTrue(np.allclose(enc_seq_1, enc_seq_2, atol=1e-3)) + def test_cepstral_mean_and_variance_normalization(self): feature_extractor = self.feature_extraction_class(**self.feat_extract_tester.prepare_feat_extract_dict()) speech_inputs = [floats_list((1, x))[0] for x in range(800, 1400, 200)] diff --git a/tests/models/speech_to_text/test_modeling_tf_speech_to_text.py b/tests/models/speech_to_text/test_modeling_tf_speech_to_text.py index 75789fd6d9b87a..b283b4478bda03 100644 --- a/tests/models/speech_to_text/test_modeling_tf_speech_to_text.py +++ b/tests/models/speech_to_text/test_modeling_tf_speech_to_text.py @@ -14,6 +14,8 @@ # limitations under the License. """ Testing suite for the TensorFlow Speech2Text model. """ +from __future__ import annotations + import inspect import unittest diff --git a/tests/models/speecht5/test_feature_extraction_speecht5.py b/tests/models/speecht5/test_feature_extraction_speecht5.py index 11ed50de4bbc3a..a09bf7f8ae58d2 100644 --- a/tests/models/speecht5/test_feature_extraction_speecht5.py +++ b/tests/models/speecht5/test_feature_extraction_speecht5.py @@ -275,6 +275,14 @@ def test_call_target(self): for enc_seq_1, enc_seq_2 in zip(encoded_sequences_1, encoded_sequences_2): self.assertTrue(np.allclose(enc_seq_1, enc_seq_2, atol=1e-3)) + # Test 2-D numpy arrays are batched. + speech_inputs = [floats_list((1, x))[0] for x in (800, 800, 800)] + np_speech_inputs = np.asarray(speech_inputs) + encoded_sequences_1 = feature_extractor(speech_inputs, return_tensors="np").input_values + encoded_sequences_2 = feature_extractor(np_speech_inputs, return_tensors="np").input_values + for enc_seq_1, enc_seq_2 in zip(encoded_sequences_1, encoded_sequences_2): + self.assertTrue(np.allclose(enc_seq_1, enc_seq_2, atol=1e-3)) + def test_batch_feature_target(self): speech_inputs = self.feat_extract_tester.prepare_inputs_for_target() feat_extract = self.feature_extraction_class(**self.feat_extract_dict) diff --git a/tests/models/swin/test_modeling_tf_swin.py b/tests/models/swin/test_modeling_tf_swin.py index 32de917a11cb1c..a898d22fb132de 100644 --- a/tests/models/swin/test_modeling_tf_swin.py +++ b/tests/models/swin/test_modeling_tf_swin.py @@ -15,6 +15,8 @@ """ Testing suite for the TF 2.0 Swin model. """ +from __future__ import annotations + import inspect import unittest diff --git a/tests/models/t5/test_modeling_tf_t5.py b/tests/models/t5/test_modeling_tf_t5.py index a1d784ae2f9a8b..7a75f51cd7688d 100644 --- a/tests/models/t5/test_modeling_tf_t5.py +++ b/tests/models/t5/test_modeling_tf_t5.py @@ -13,6 +13,8 @@ # See the License for the specific language governing permissions and # limitations under the License. +from __future__ import annotations + import unittest from transformers import T5Config, is_tf_available diff --git a/tests/models/tapas/test_modeling_tf_tapas.py b/tests/models/tapas/test_modeling_tf_tapas.py index c3cc5fae3a9c5c..ce98394cb8688c 100644 --- a/tests/models/tapas/test_modeling_tf_tapas.py +++ b/tests/models/tapas/test_modeling_tf_tapas.py @@ -13,6 +13,8 @@ # See the License for the specific language governing permissions and # limitations under the License. +from __future__ import annotations + import copy import unittest diff --git a/tests/models/time_series_transformer/test_modeling_time_series_transformer.py b/tests/models/time_series_transformer/test_modeling_time_series_transformer.py index 65834dac42f78c..42319a1dd0a242 100644 --- a/tests/models/time_series_transformer/test_modeling_time_series_transformer.py +++ b/tests/models/time_series_transformer/test_modeling_time_series_transformer.py @@ -459,7 +459,7 @@ def test_retain_grad_hidden_states_attentions(self): def prepare_batch(filename="train-batch.pt"): - file = hf_hub_download(repo_id="kashif/tourism-monthly-batch", filename=filename, repo_type="dataset") + file = hf_hub_download(repo_id="hf-internal-testing/tourism-monthly-batch", filename=filename, repo_type="dataset") batch = torch.load(file, map_location=torch_device) return batch diff --git a/tests/models/transfo_xl/test_modeling_tf_transfo_xl.py b/tests/models/transfo_xl/test_modeling_tf_transfo_xl.py index 47880013b97ef8..ac820ea8fab08d 100644 --- a/tests/models/transfo_xl/test_modeling_tf_transfo_xl.py +++ b/tests/models/transfo_xl/test_modeling_tf_transfo_xl.py @@ -14,6 +14,8 @@ # limitations under the License. +from __future__ import annotations + import random import unittest diff --git a/tests/models/tvlt/test_feature_extraction_tvlt.py b/tests/models/tvlt/test_feature_extraction_tvlt.py index a76f3c9dca08f7..051708a306981f 100644 --- a/tests/models/tvlt/test_feature_extraction_tvlt.py +++ b/tests/models/tvlt/test_feature_extraction_tvlt.py @@ -189,6 +189,15 @@ def test_call(self): self.assertTrue(encoded_audios.shape[-2] <= feature_extractor.spectrogram_length) self.assertTrue(encoded_audios.shape[-3] == feature_extractor.num_channels) + # Test 2-D numpy arrays are batched. + speech_inputs = [floats_list((1, x))[0] for x in (800, 800, 800)] + np_speech_inputs = np.asarray(speech_inputs) + encoded_audios = feature_extractor(np_speech_inputs, return_tensors="np", sampling_rate=44100).audio_values + self.assertTrue(encoded_audios.ndim == 4) + self.assertTrue(encoded_audios.shape[-1] == feature_extractor.feature_size) + self.assertTrue(encoded_audios.shape[-2] <= feature_extractor.spectrogram_length) + self.assertTrue(encoded_audios.shape[-3] == feature_extractor.num_channels) + def _load_datasamples(self, num_samples): ds = load_dataset("hf-internal-testing/librispeech_asr_dummy", "clean", split="validation") # automatic decoding with librispeech diff --git a/tests/models/vision_encoder_decoder/test_modeling_tf_vision_encoder_decoder.py b/tests/models/vision_encoder_decoder/test_modeling_tf_vision_encoder_decoder.py index 1e594f5de551a2..04062014b84697 100644 --- a/tests/models/vision_encoder_decoder/test_modeling_tf_vision_encoder_decoder.py +++ b/tests/models/vision_encoder_decoder/test_modeling_tf_vision_encoder_decoder.py @@ -15,6 +15,8 @@ """ Testing suite for the TensorFlow VisionEncoderDecoder model. """ +from __future__ import annotations + import copy import os import tempfile diff --git a/tests/models/vision_text_dual_encoder/test_modeling_tf_vision_text_dual_encoder.py b/tests/models/vision_text_dual_encoder/test_modeling_tf_vision_text_dual_encoder.py index 696a302722ab41..1f27f831e8d700 100644 --- a/tests/models/vision_text_dual_encoder/test_modeling_tf_vision_text_dual_encoder.py +++ b/tests/models/vision_text_dual_encoder/test_modeling_tf_vision_text_dual_encoder.py @@ -15,6 +15,8 @@ """ Testing suite for the PyTorch VisionTextDualEncoder model. """ +from __future__ import annotations + import collections import tempfile import unittest diff --git a/tests/models/vit/test_modeling_tf_vit.py b/tests/models/vit/test_modeling_tf_vit.py index 111223de323bd5..72ca1b19dc2b7c 100644 --- a/tests/models/vit/test_modeling_tf_vit.py +++ b/tests/models/vit/test_modeling_tf_vit.py @@ -15,6 +15,8 @@ """ Testing suite for the TensorFlow ViT model. """ +from __future__ import annotations + import inspect import unittest diff --git a/tests/models/vit_mae/test_modeling_tf_vit_mae.py b/tests/models/vit_mae/test_modeling_tf_vit_mae.py index 53d68b644ac8f6..d5e16e96385068 100644 --- a/tests/models/vit_mae/test_modeling_tf_vit_mae.py +++ b/tests/models/vit_mae/test_modeling_tf_vit_mae.py @@ -15,6 +15,8 @@ """ Testing suite for the TensorFlow ViTMAE model. """ +from __future__ import annotations + import copy import inspect import json diff --git a/tests/models/wav2vec2/test_feature_extraction_wav2vec2.py b/tests/models/wav2vec2/test_feature_extraction_wav2vec2.py index 44f2ed5b87362d..556f01c6b2ee9f 100644 --- a/tests/models/wav2vec2/test_feature_extraction_wav2vec2.py +++ b/tests/models/wav2vec2/test_feature_extraction_wav2vec2.py @@ -123,6 +123,14 @@ def test_call(self): for enc_seq_1, enc_seq_2 in zip(encoded_sequences_1, encoded_sequences_2): self.assertTrue(np.allclose(enc_seq_1, enc_seq_2, atol=1e-3)) + # Test 2-D numpy arrays are batched. + speech_inputs = [floats_list((1, x))[0] for x in (800, 800, 800)] + np_speech_inputs = np.asarray(speech_inputs) + encoded_sequences_1 = feat_extract(speech_inputs, return_tensors="np").input_values + encoded_sequences_2 = feat_extract(np_speech_inputs, return_tensors="np").input_values + for enc_seq_1, enc_seq_2 in zip(encoded_sequences_1, encoded_sequences_2): + self.assertTrue(np.allclose(enc_seq_1, enc_seq_2, atol=1e-3)) + def test_zero_mean_unit_variance_normalization_np(self): feat_extract = self.feature_extraction_class(**self.feat_extract_tester.prepare_feat_extract_dict()) speech_inputs = [floats_list((1, x))[0] for x in range(800, 1400, 200)] diff --git a/tests/models/wav2vec2/test_modeling_tf_wav2vec2.py b/tests/models/wav2vec2/test_modeling_tf_wav2vec2.py index e30aeb6aaa82e6..ef4c38e2a3039f 100644 --- a/tests/models/wav2vec2/test_modeling_tf_wav2vec2.py +++ b/tests/models/wav2vec2/test_modeling_tf_wav2vec2.py @@ -14,6 +14,8 @@ # limitations under the License. +from __future__ import annotations + import copy import glob import inspect diff --git a/tests/models/wav2vec2/test_tokenization_wav2vec2.py b/tests/models/wav2vec2/test_tokenization_wav2vec2.py index cf5dc100c2a7ae..9715680e27bf38 100644 --- a/tests/models/wav2vec2/test_tokenization_wav2vec2.py +++ b/tests/models/wav2vec2/test_tokenization_wav2vec2.py @@ -164,6 +164,14 @@ def test_call(self): for enc_seq_1, enc_seq_2 in zip(encoded_sequences_1, encoded_sequences_2): self.assertTrue(np.allclose(enc_seq_1, enc_seq_2, atol=1e-3)) + # Test 2-D numpy arrays are batched. + speech_inputs = [floats_list((1, x))[0] for x in (800, 800, 800)] + np_speech_inputs = np.asarray(speech_inputs) + encoded_sequences_1 = tokenizer(speech_inputs, return_tensors="np").input_values + encoded_sequences_2 = tokenizer(np_speech_inputs, return_tensors="np").input_values + for enc_seq_1, enc_seq_2 in zip(encoded_sequences_1, encoded_sequences_2): + self.assertTrue(np.allclose(enc_seq_1, enc_seq_2, atol=1e-3)) + def test_padding(self, max_length=50): def _input_values_have_equal_length(input_values): length = len(input_values[0]) diff --git a/tests/models/whisper/test_feature_extraction_whisper.py b/tests/models/whisper/test_feature_extraction_whisper.py index 31ea28b9ad628c..90cbfc21c04f35 100644 --- a/tests/models/whisper/test_feature_extraction_whisper.py +++ b/tests/models/whisper/test_feature_extraction_whisper.py @@ -173,6 +173,14 @@ def test_call(self): for enc_seq_1, enc_seq_2 in zip(encoded_sequences_1, encoded_sequences_2): self.assertTrue(np.allclose(enc_seq_1, enc_seq_2, atol=1e-3)) + # Test 2-D numpy arrays are batched. + speech_inputs = [floats_list((1, x))[0] for x in (800, 800, 800)] + np_speech_inputs = np.asarray(speech_inputs) + encoded_sequences_1 = feature_extractor(speech_inputs, return_tensors="np").input_features + encoded_sequences_2 = feature_extractor(np_speech_inputs, return_tensors="np").input_features + for enc_seq_1, enc_seq_2 in zip(encoded_sequences_1, encoded_sequences_2): + self.assertTrue(np.allclose(enc_seq_1, enc_seq_2, atol=1e-3)) + # Test truncation required speech_inputs = [floats_list((1, x))[0] for x in range(200, (feature_extractor.n_samples + 500), 200)] np_speech_inputs = [np.asarray(speech_input) for speech_input in speech_inputs] diff --git a/tests/models/whisper/test_modeling_tf_whisper.py b/tests/models/whisper/test_modeling_tf_whisper.py index a52994899ad819..b9ad982176efd2 100644 --- a/tests/models/whisper/test_modeling_tf_whisper.py +++ b/tests/models/whisper/test_modeling_tf_whisper.py @@ -14,6 +14,8 @@ # limitations under the License. """ Testing suite for the TensorFlow Whisper model. """ +from __future__ import annotations + import inspect import tempfile import traceback diff --git a/tests/models/whisper/test_modeling_whisper.py b/tests/models/whisper/test_modeling_whisper.py index 98bbbb3214a7be..3eee5ad4967c07 100644 --- a/tests/models/whisper/test_modeling_whisper.py +++ b/tests/models/whisper/test_modeling_whisper.py @@ -94,7 +94,7 @@ class WhisperModelTester: def __init__( self, parent, - batch_size=13, + batch_size=2, seq_length=1500, is_training=True, use_labels=False, @@ -1477,7 +1477,7 @@ def test_generate_with_prompt_ids(self): model = WhisperForConditionalGeneration.from_pretrained("openai/whisper-tiny") model.to(torch_device) input_speech = self._load_datasamples(4)[-1:] - input_features = processor(input_speech, return_tensors="pt").input_features + input_features = processor(input_speech, return_tensors="pt").input_features.to(torch_device) output_without_prompt = model.generate(input_features) prompt_ids = processor.get_prompt_ids("Leighton") @@ -1494,7 +1494,7 @@ def test_generate_with_prompt_ids_and_forced_decoder_ids(self): model = WhisperForConditionalGeneration.from_pretrained("openai/whisper-tiny") model.to(torch_device) input_speech = self._load_datasamples(1) - input_features = processor(input_speech, return_tensors="pt").input_features + input_features = processor(input_speech, return_tensors="pt").input_features.to(torch_device) task = "translate" language = "de" expected_tokens = [f"<|{task}|>", f"<|{language}|>"] @@ -1513,7 +1513,7 @@ def test_generate_with_prompt_ids_and_no_non_prompt_forced_decoder_ids(self): model = WhisperForConditionalGeneration.from_pretrained("openai/whisper-tiny.en") model.to(torch_device) input_speech = self._load_datasamples(1) - input_features = processor(input_speech, return_tensors="pt").input_features + input_features = processor(input_speech, return_tensors="pt").input_features.to(torch_device) prompt = "test prompt" prompt_ids = processor.get_prompt_ids(prompt) @@ -1537,7 +1537,7 @@ class WhisperEncoderModelTester: def __init__( self, parent, - batch_size=13, + batch_size=2, seq_length=3000, is_training=True, use_labels=True, diff --git a/tests/models/xglm/test_modeling_tf_xglm.py b/tests/models/xglm/test_modeling_tf_xglm.py index 61fd8057259464..e2b8cc2e6cbcfd 100644 --- a/tests/models/xglm/test_modeling_tf_xglm.py +++ b/tests/models/xglm/test_modeling_tf_xglm.py @@ -13,6 +13,8 @@ # See the License for the specific language governing permissions and # limitations under the License. +from __future__ import annotations + import unittest from transformers import XGLMConfig, XGLMTokenizer, is_tf_available diff --git a/tests/models/xlm/test_modeling_tf_xlm.py b/tests/models/xlm/test_modeling_tf_xlm.py index 2b1fb2f963c0a7..5b576f02c91e34 100644 --- a/tests/models/xlm/test_modeling_tf_xlm.py +++ b/tests/models/xlm/test_modeling_tf_xlm.py @@ -14,6 +14,8 @@ # limitations under the License. +from __future__ import annotations + import unittest from transformers import is_tf_available diff --git a/tests/models/xlm_roberta/test_modeling_tf_xlm_roberta.py b/tests/models/xlm_roberta/test_modeling_tf_xlm_roberta.py index 695a403b7b0bb0..1ecac55310fb04 100644 --- a/tests/models/xlm_roberta/test_modeling_tf_xlm_roberta.py +++ b/tests/models/xlm_roberta/test_modeling_tf_xlm_roberta.py @@ -13,6 +13,8 @@ # See the License for the specific language governing permissions and # limitations under the License. +from __future__ import annotations + import unittest from transformers import is_tf_available diff --git a/tests/models/xlnet/test_modeling_tf_xlnet.py b/tests/models/xlnet/test_modeling_tf_xlnet.py index bbc310aa8b1c93..6d76462fda9e6f 100644 --- a/tests/models/xlnet/test_modeling_tf_xlnet.py +++ b/tests/models/xlnet/test_modeling_tf_xlnet.py @@ -14,6 +14,8 @@ # limitations under the License. +from __future__ import annotations + import inspect import random import unittest diff --git a/tests/pipelines/test_pipelines_image_to_text.py b/tests/pipelines/test_pipelines_image_to_text.py index 97fe3a398f5813..2a73206f1ba600 100644 --- a/tests/pipelines/test_pipelines_image_to_text.py +++ b/tests/pipelines/test_pipelines_image_to_text.py @@ -14,6 +14,8 @@ import unittest +import requests + from transformers import MODEL_FOR_VISION_2_SEQ_MAPPING, TF_MODEL_FOR_VISION_2_SEQ_MAPPING, is_vision_available from transformers.pipelines import pipeline from transformers.testing_utils import is_pipeline_test, require_tf, require_torch, require_vision, slow @@ -125,6 +127,15 @@ def test_small_model_pt(self): ], ) + @require_torch + def test_small_model_pt_conditional(self): + pipe = pipeline("image-to-text", model="hf-internal-testing/tiny-random-BlipForConditionalGeneration") + image = "./tests/fixtures/tests_samples/COCO/000000039769.png" + prompt = "a photo of" + + outputs = pipe(image, prompt=prompt) + self.assertTrue(outputs[0]["generated_text"].startswith(prompt)) + @slow @require_torch def test_large_model_pt(self): @@ -143,6 +154,71 @@ def test_large_model_pt(self): ], ) + @slow + @require_torch + def test_generation_pt_blip(self): + pipe = pipeline("image-to-text", model="Salesforce/blip-image-captioning-base") + url = "https://huggingface.co/datasets/sayakpaul/sample-datasets/resolve/main/pokemon.png" + image = Image.open(requests.get(url, stream=True).raw) + + outputs = pipe(image) + self.assertEqual(outputs, [{"generated_text": "a pink pokemon pokemon with a blue shirt and a blue shirt"}]) + + @slow + @require_torch + def test_generation_pt_git(self): + pipe = pipeline("image-to-text", model="microsoft/git-base-coco") + url = "https://huggingface.co/datasets/sayakpaul/sample-datasets/resolve/main/pokemon.png" + image = Image.open(requests.get(url, stream=True).raw) + + outputs = pipe(image) + self.assertEqual(outputs, [{"generated_text": "a cartoon of a purple character."}]) + + @slow + @require_torch + def test_conditional_generation_pt_blip(self): + pipe = pipeline("image-to-text", model="Salesforce/blip-image-captioning-base") + url = "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/transformers/tasks/ai2d-demo.jpg" + image = Image.open(requests.get(url, stream=True).raw) + + prompt = "a photography of" + + outputs = pipe(image, prompt=prompt) + self.assertEqual(outputs, [{"generated_text": "a photography of a volcano"}]) + + with self.assertRaises(ValueError): + outputs = pipe([image, image], prompt=[prompt, prompt]) + + @slow + @require_torch + def test_conditional_generation_pt_git(self): + pipe = pipeline("image-to-text", model="microsoft/git-base-coco") + url = "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/transformers/tasks/ai2d-demo.jpg" + image = Image.open(requests.get(url, stream=True).raw) + + prompt = "a photo of a" + + outputs = pipe(image, prompt=prompt) + self.assertEqual(outputs, [{"generated_text": "a photo of a tent with a tent and a tent in the background."}]) + + with self.assertRaises(ValueError): + outputs = pipe([image, image], prompt=[prompt, prompt]) + + @slow + @require_torch + def test_conditional_generation_pt_pix2struct(self): + pipe = pipeline("image-to-text", model="google/pix2struct-ai2d-base") + url = "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/transformers/tasks/ai2d-demo.jpg" + image = Image.open(requests.get(url, stream=True).raw) + + prompt = "What does the label 15 represent? (1) lava (2) core (3) tunnel (4) ash cloud" + + outputs = pipe(image, prompt=prompt) + self.assertEqual(outputs, [{"generated_text": "ash cloud"}]) + + with self.assertRaises(ValueError): + outputs = pipe([image, image], prompt=[prompt, prompt]) + @slow @require_tf def test_large_model_tf(self): diff --git a/tests/repo_utils/test_tests_fetcher.py b/tests/repo_utils/test_tests_fetcher.py index e02a917700dd2f..6ab213b70cae64 100644 --- a/tests/repo_utils/test_tests_fetcher.py +++ b/tests/repo_utils/test_tests_fetcher.py @@ -42,6 +42,7 @@ get_module_dependencies, get_tree_starting_at, infer_tests_to_run, + init_test_examples_dependencies, parse_commit_message, print_tree_deps_of, ) @@ -149,7 +150,19 @@ def create_tmp_repo(tmp_dir, models=None): f"from transformers import {cls}Config, {cls}Model\nfrom ...test_modeling_common import ModelTesterMixin\n\ncode" ) - repo.index.add(["src", "tests"]) + example_dir = tmp_dir / "examples" + example_dir.mkdir(exist_ok=True) + for framework in ["flax", "pytorch", "tensorflow"]: + framework_dir = example_dir / framework + framework_dir.mkdir(exist_ok=True) + with open(framework_dir / f"test_{framework}_examples.py", "w") as f: + f.write("""test_args = "run_glue.py"\n""") + glue_dir = framework_dir / "text-classification" + glue_dir.mkdir(exist_ok=True) + with open(glue_dir / "run_glue.py", "w") as f: + f.write("from transformers import BertModel\n\ncode") + + repo.index.add(["examples", "src", "tests"]) repo.index.commit("Initial commit") repo.create_head("main") repo.head.reference = repo.refs.main @@ -164,12 +177,14 @@ def patch_transformer_repo_path(new_folder): """ old_repo_path = tests_fetcher.PATH_TO_REPO tests_fetcher.PATH_TO_REPO = Path(new_folder).resolve() + tests_fetcher.PATH_TO_EXAMPLES = tests_fetcher.PATH_TO_REPO / "examples" tests_fetcher.PATH_TO_TRANFORMERS = tests_fetcher.PATH_TO_REPO / "src/transformers" tests_fetcher.PATH_TO_TESTS = tests_fetcher.PATH_TO_REPO / "tests" try: yield finally: tests_fetcher.PATH_TO_REPO = old_repo_path + tests_fetcher.PATH_TO_EXAMPLES = tests_fetcher.PATH_TO_REPO / "examples" tests_fetcher.PATH_TO_TRANFORMERS = tests_fetcher.PATH_TO_REPO / "src/transformers" tests_fetcher.PATH_TO_TESTS = tests_fetcher.PATH_TO_REPO / "tests" @@ -409,6 +424,17 @@ def test_get_module_dependencies(self): with patch_transformer_repo_path(tmp_folder): assert get_module_dependencies(BERT_MODELING_FILE) == expected_bert_dependencies + # Test with an example + create_tmp_repo(tmp_folder) + + expected_example_dependencies = ["src/transformers/models/bert/modeling_bert.py"] + + with patch_transformer_repo_path(tmp_folder): + assert ( + get_module_dependencies("examples/pytorch/text-classification/run_glue.py") + == expected_example_dependencies + ) + def test_create_reverse_dependency_tree(self): with tempfile.TemporaryDirectory() as tmp_folder: tmp_folder = Path(tmp_folder) @@ -494,6 +520,33 @@ def test_print_tree_deps_of(self): assert cs.out.strip() in [expected_std_out, expected_std_out_2] + def test_init_test_examples_dependencies(self): + with tempfile.TemporaryDirectory() as tmp_folder: + tmp_folder = Path(tmp_folder) + create_tmp_repo(tmp_folder) + + expected_example_deps = { + "examples/flax/test_flax_examples.py": ["examples/flax/text-classification/run_glue.py"], + "examples/pytorch/test_pytorch_examples.py": ["examples/pytorch/text-classification/run_glue.py"], + "examples/tensorflow/test_tensorflow_examples.py": [ + "examples/tensorflow/text-classification/run_glue.py" + ], + } + + expected_examples = { + "examples/flax/test_flax_examples.py", + "examples/flax/text-classification/run_glue.py", + "examples/pytorch/test_pytorch_examples.py", + "examples/pytorch/text-classification/run_glue.py", + "examples/tensorflow/test_tensorflow_examples.py", + "examples/tensorflow/text-classification/run_glue.py", + } + + with patch_transformer_repo_path(tmp_folder): + example_deps, all_examples = init_test_examples_dependencies() + assert example_deps == expected_example_deps + assert {str(f.relative_to(tmp_folder)) for f in all_examples} == expected_examples + def test_create_reverse_dependency_map(self): with tempfile.TemporaryDirectory() as tmp_folder: tmp_folder = Path(tmp_folder) @@ -506,6 +559,12 @@ def test_create_reverse_dependency_map(self): "src/transformers/__init__.py", "src/transformers/models/bert/__init__.py", "tests/models/bert/test_modeling_bert.py", + "examples/flax/test_flax_examples.py", + "examples/flax/text-classification/run_glue.py", + "examples/pytorch/test_pytorch_examples.py", + "examples/pytorch/text-classification/run_glue.py", + "examples/tensorflow/test_tensorflow_examples.py", + "examples/tensorflow/text-classification/run_glue.py", } assert set(reverse_map["src/transformers/models/bert/modeling_bert.py"]) == expected_bert_deps @@ -521,6 +580,12 @@ def test_create_reverse_dependency_map(self): "src/transformers/modeling_utils.py", "tests/test_modeling_common.py", "tests/models/bert/test_modeling_bert.py", + "examples/flax/test_flax_examples.py", + "examples/flax/text-classification/run_glue.py", + "examples/pytorch/test_pytorch_examples.py", + "examples/pytorch/text-classification/run_glue.py", + "examples/tensorflow/test_tensorflow_examples.py", + "examples/tensorflow/text-classification/run_glue.py", } assert set(reverse_map["src/transformers/__init__.py"]) == expected_init_deps @@ -529,6 +594,12 @@ def test_create_reverse_dependency_map(self): "src/transformers/models/bert/configuration_bert.py", "src/transformers/models/bert/modeling_bert.py", "tests/models/bert/test_modeling_bert.py", + "examples/flax/test_flax_examples.py", + "examples/flax/text-classification/run_glue.py", + "examples/pytorch/test_pytorch_examples.py", + "examples/pytorch/text-classification/run_glue.py", + "examples/tensorflow/test_tensorflow_examples.py", + "examples/tensorflow/text-classification/run_glue.py", } assert set(reverse_map["src/transformers/models/bert/__init__.py"]) == expected_init_deps @@ -543,6 +614,12 @@ def test_create_reverse_dependency_map(self): "src/transformers/models/bert/configuration_bert.py", "src/transformers/models/bert/modeling_bert.py", "tests/models/bert/test_modeling_bert.py", + "examples/flax/test_flax_examples.py", + "examples/flax/text-classification/run_glue.py", + "examples/pytorch/test_pytorch_examples.py", + "examples/pytorch/text-classification/run_glue.py", + "examples/tensorflow/test_tensorflow_examples.py", + "examples/tensorflow/text-classification/run_glue.py", } assert set(reverse_map["src/transformers/models/bert/__init__.py"]) == expected_init_deps @@ -554,13 +631,26 @@ def test_create_module_to_test_map(self): with patch_transformer_repo_path(tmp_folder): test_map = create_module_to_test_map(filter_models=True) + expected_bert_tests = { + "examples/flax/test_flax_examples.py", + "examples/pytorch/test_pytorch_examples.py", + "examples/tensorflow/test_tensorflow_examples.py", + "tests/models/bert/test_modeling_bert.py", + } + for model in models: - assert test_map[f"src/transformers/models/{model}/modeling_{model}.py"] == [ - f"tests/models/{model}/test_modeling_{model}.py" - ] + if model != "bert": + assert test_map[f"src/transformers/models/{model}/modeling_{model}.py"] == [ + f"tests/models/{model}/test_modeling_{model}.py" + ] + else: + assert set(test_map[f"src/transformers/models/{model}/modeling_{model}.py"]) == expected_bert_tests # Init got filtered expected_init_tests = { + "examples/flax/test_flax_examples.py", + "examples/pytorch/test_pytorch_examples.py", + "examples/tensorflow/test_tensorflow_examples.py", "tests/test_modeling_common.py", "tests/models/bert/test_modeling_bert.py", "tests/models/gpt2/test_modeling_gpt2.py", @@ -575,12 +665,21 @@ def test_infer_tests_to_run(self): commit_changes("src/transformers/models/bert/modeling_bert.py", BERT_MODEL_FILE_NEW_CODE, repo) + example_tests = { + "examples/flax/test_flax_examples.py", + "examples/pytorch/test_pytorch_examples.py", + "examples/tensorflow/test_tensorflow_examples.py", + } + with patch_transformer_repo_path(tmp_folder): infer_tests_to_run(tmp_folder / "test-output.txt", diff_with_last_commit=True) with open(tmp_folder / "test-output.txt", "r") as f: tests_to_run = f.read() + with open(tmp_folder / "examples_test_list.txt", "r") as f: + example_tests_to_run = f.read() assert tests_to_run == "tests/models/bert/test_modeling_bert.py" + assert set(example_tests_to_run.split(" ")) == example_tests # Fake a new model addition repo = create_tmp_repo(tmp_folder, models=models) @@ -617,6 +716,8 @@ def test_infer_tests_to_run(self): infer_tests_to_run(tmp_folder / "test-output.txt") with open(tmp_folder / "test-output.txt", "r") as f: tests_to_run = f.read() + with open(tmp_folder / "examples_test_list.txt", "r") as f: + example_tests_to_run = f.read() expected_tests = { "tests/models/bert/test_modeling_bert.py", @@ -625,15 +726,19 @@ def test_infer_tests_to_run(self): "tests/test_modeling_common.py", } assert set(tests_to_run.split(" ")) == expected_tests + assert set(example_tests_to_run.split(" ")) == example_tests with patch_transformer_repo_path(tmp_folder): infer_tests_to_run(tmp_folder / "test-output.txt", filter_models=False) with open(tmp_folder / "test-output.txt", "r") as f: tests_to_run = f.read() + with open(tmp_folder / "examples_test_list.txt", "r") as f: + example_tests_to_run = f.read() expected_tests = [f"tests/models/{name}/test_modeling_{name}.py" for name in models + ["t5"]] expected_tests = set(expected_tests + ["tests/test_modeling_common.py"]) assert set(tests_to_run.split(" ")) == expected_tests + assert set(example_tests_to_run.split(" ")) == example_tests def test_infer_tests_to_run_with_test_modifs(self): with tempfile.TemporaryDirectory() as tmp_folder: diff --git a/tests/test_modeling_tf_common.py b/tests/test_modeling_tf_common.py index 220560d9238a90..69363686837b8a 100644 --- a/tests/test_modeling_tf_common.py +++ b/tests/test_modeling_tf_common.py @@ -14,6 +14,8 @@ # limitations under the License. +from __future__ import annotations + import copy import inspect import json @@ -22,10 +24,9 @@ import tempfile import unittest import unittest.mock as mock -from dataclasses import fields from importlib import import_module from math import isnan -from typing import List, Tuple, get_type_hints +from typing import List, Tuple from datasets import Dataset from huggingface_hub import HfFolder, Repository, delete_repo @@ -140,26 +141,6 @@ def _config_zero_init(config): return configs_no_init -def _return_type_has_loss(model): - return_type = get_type_hints(model.call) - if "return" not in return_type: - return False - return_type = return_type["return"] - if hasattr(return_type, "__args__"): # Awkward check for union because UnionType only turns up in 3.10 - for type_annotation in return_type.__args__: - if inspect.isclass(type_annotation) and issubclass(type_annotation, ModelOutput): - field_names = [field.name for field in fields(type_annotation)] - if "loss" in field_names: - return True - return False - elif isinstance(return_type, tuple): - return False - elif isinstance(return_type, ModelOutput): - class_fields = fields(return_type) - return "loss" in class_fields - return False - - @require_tf class TFModelTesterMixin: model_tester = None @@ -1464,8 +1445,6 @@ def test_loss_computation(self): config, inputs_dict = self.model_tester.prepare_config_and_inputs_for_common() for model_class in self.all_model_classes: model = model_class(config) - if not getattr(model, "hf_compute_loss", None) and not _return_type_has_loss(model): - continue # The number of elements in the loss should be the same as the number of elements in the label prepared_for_class = self._prepare_for_class(inputs_dict.copy(), model_class, return_labels=True) added_label_names = sorted(prepared_for_class.keys() - inputs_dict.keys(), reverse=True) @@ -1480,7 +1459,11 @@ def test_loss_computation(self): input_name = possible_input_names.intersection(set(prepared_for_class)).pop() model_input = prepared_for_class.pop(input_name) - loss = model(model_input, **prepared_for_class)[0] + outputs = model(model_input, **prepared_for_class) + if not isinstance(outputs, ModelOutput) or not hasattr(outputs, "loss"): + continue + + loss = outputs.loss self.assertTrue(loss.shape.as_list() == expected_loss_size or loss.shape.as_list() == [1]) # Test that model correctly compute the loss when we mask some positions @@ -1540,18 +1523,16 @@ def test_keras_fit(self): config, inputs_dict = self.model_tester.prepare_config_and_inputs_for_common() for model_class in self.all_model_classes: model = model_class(config) - if not getattr(model, "hf_compute_loss", False) and not _return_type_has_loss(model): - continue # Test that model correctly compute the loss with kwargs prepared_for_class = self._prepare_for_class(inputs_dict.copy(), model_class, return_labels=True) - # Is there a better way to remove these decoder inputs? # We also remove "return_loss" as this is covered by the train_step when using fit() prepared_for_class = { key: val for key, val in prepared_for_class.items() - if key - not in ("head_mask", "decoder_head_mask", "cross_attn_head_mask", "decoder_input_ids", "return_loss") + if key not in ("head_mask", "decoder_head_mask", "cross_attn_head_mask", "return_loss") } + if "labels" in prepared_for_class and "decoder_input_ids" in prepared_for_class: + del prepared_for_class["decoder_input_ids"] accuracy_classes = [ "ForPreTraining", @@ -1575,8 +1556,10 @@ def test_keras_fit(self): sample_weight = tf.convert_to_tensor([0.5] * self.model_tester.batch_size, dtype=tf.float32) else: sample_weight = None - - model(model.dummy_inputs) # Build the model so we can get some constant weights + # Build the model so we can get some constant weights and check outputs + outputs = model(prepared_for_class) + if getattr(outputs, "loss", None) is None: + continue model_weights = model.get_weights() # Run eagerly to save some expensive compilation times @@ -1648,7 +1631,6 @@ def test_keras_fit(self): # Pass in all samples as a batch to match other `fit` calls weighted_dataset = weighted_dataset.batch(len(dataset)) dataset = dataset.batch(len(dataset)) - # Reinitialize to fix batchnorm again model.set_weights(model_weights) @@ -1695,7 +1677,10 @@ def test_int_support(self): # After testing that the model accepts all int inputs, confirm that its dummies are int32 for key, tensor in model.dummy_inputs.items(): - self.assertTrue(isinstance(tensor, tf.Tensor), "Dummy inputs should be tf.Tensor!") + self.assertTrue( + isinstance(tensor, tf.Tensor) or tf.keras.backend.is_keras_tensor(tensor), + "Dummy inputs should be tf.Tensor!", + ) if tensor.dtype.is_integer: self.assertTrue(tensor.dtype == tf.int32, "Integer dummy inputs should be tf.int32!") diff --git a/tests/trainer/test_trainer.py b/tests/trainer/test_trainer.py index 63a12635880419..95b92d5295d024 100644 --- a/tests/trainer/test_trainer.py +++ b/tests/trainer/test_trainer.py @@ -2474,6 +2474,11 @@ def hp_name(trial): "lr": TrainingArguments.learning_rate, } + default_lion_kwargs = { + "betas": (TrainingArguments.adam_beta1, TrainingArguments.adam_beta2), + "lr": TrainingArguments.learning_rate, + } + default_anyprecision_kwargs = { "use_kahan_summation": False, "momentum_dtype": torch.float32, @@ -2525,11 +2530,59 @@ def hp_name(trial): optim_test_params.append( ( TrainingArguments(optim=OptimizerNames.ADAMW_BNB, output_dir="None"), - bnb.optim.Adam8bit, + bnb.optim.AdamW, default_adam_kwargs, ) ) + optim_test_params.append( + ( + TrainingArguments(optim=OptimizerNames.ADAMW_8BIT, output_dir="None"), + bnb.optim.AdamW, + default_adam_kwargs, + ) + ) + + optim_test_params.append( + ( + TrainingArguments(optim=OptimizerNames.PAGED_ADAMW, output_dir="None"), + bnb.optim.AdamW, + default_adam_kwargs, + ) + ) + + optim_test_params.append( + ( + TrainingArguments(optim=OptimizerNames.PAGED_ADAMW_8BIT, output_dir="None"), + bnb.optim.AdamW, + default_adam_kwargs, + ) + ) + + optim_test_params.append( + ( + TrainingArguments(optim=OptimizerNames.LION, output_dir="None"), + bnb.optim.Lion, + default_lion_kwargs, + ) + ) + + optim_test_params.append( + ( + TrainingArguments(optim=OptimizerNames.LION_8BIT, output_dir="None"), + bnb.optim.Lion, + default_lion_kwargs, + ) + ) + + optim_test_params.append( + ( + TrainingArguments(optim=OptimizerNames.PAGED_LION_8BIT, output_dir="None"), + bnb.optim.Lion, + default_lion_kwargs, + ) + ) + if is_torchdistx_available(): import torchdistx @@ -2598,15 +2651,113 @@ def test_bnb_adam8bit(self): modules = { "bitsandbytes": mock, "bitsandbytes.optim": mock.optim, - "bitsandbytes.optim.Adam8bit": mock.optim.Adam8bit, + "bitsandbytes.optim.AdamW": mock.optim.AdamW, } with patch.dict("sys.modules", modules): self.check_optim_and_kwargs( TrainingArguments(optim=OptimizerNames.ADAMW_BNB, output_dir="None"), - mock.optim.Adam8bit, + mock.optim.AdamW, default_adam_kwargs, ) + def test_bnb_paged_adam8bit_alias(self): + mock = Mock() + modules = { + "bitsandbytes": mock, + "bitsandbytes.optim": mock.optim, + "bitsandbytes.optim.AdamW": mock.optim.AdamW, + } + with patch.dict("sys.modules", modules): + self.check_optim_and_kwargs( + TrainingArguments(optim=OptimizerNames.ADAMW_8BIT, output_dir="None"), + mock.optim.AdamW, + default_adam_kwargs, + ) + + def test_bnb_paged_adam(self): + mock = Mock() + modules = { + "bitsandbytes": mock, + "bitsandbytes.optim": mock.optim, + "bitsandbytes.optim.AdamW": mock.optim.AdamW, + } + with patch.dict("sys.modules", modules): + self.check_optim_and_kwargs( + TrainingArguments(optim=OptimizerNames.PAGED_ADAMW, output_dir="None"), + mock.optim.AdamW, + default_adam_kwargs, + ) + + def test_bnb_paged_adam8bit(self): + mock = Mock() + modules = { + "bitsandbytes": mock, + "bitsandbytes.optim": mock.optim, + "bitsandbytes.optim.AdamW": mock.optim.AdamW, + } + with patch.dict("sys.modules", modules): + self.check_optim_and_kwargs( + TrainingArguments(optim=OptimizerNames.PAGED_ADAMW_8BIT, output_dir="None"), + mock.optim.AdamW, + default_adam_kwargs, + ) + + def test_bnb_lion(self): + mock = Mock() + modules = { + "bitsandbytes": mock, + "bitsandbytes.optim": mock.optim, + "bitsandbytes.optim.Lion": mock.optim.Lion, + } + with patch.dict("sys.modules", modules): + self.check_optim_and_kwargs( + TrainingArguments(optim=OptimizerNames.LION, output_dir="None"), + mock.optim.Lion, + default_lion_kwargs, + ) + + def test_bnb_lion8bit(self): + mock = Mock() + modules = { + "bitsandbytes": mock, + "bitsandbytes.optim": mock.optim, + "bitsandbytes.optim.Lion": mock.optim.Lion, + } + with patch.dict("sys.modules", modules): + self.check_optim_and_kwargs( + TrainingArguments(optim=OptimizerNames.LION_8BIT, output_dir="None"), + mock.optim.Lion, + default_lion_kwargs, + ) + + def test_bnb_paged_lion8bit(self): + mock = Mock() + modules = { + "bitsandbytes": mock, + "bitsandbytes.optim": mock.optim, + "bitsandbytes.optim.Lion": mock.optim.Lion, + } + with patch.dict("sys.modules", modules): + self.check_optim_and_kwargs( + TrainingArguments(optim=OptimizerNames.PAGED_LION_8BIT, output_dir="None"), + mock.optim.Lion, + default_lion_kwargs, + ) + + def test_bnb_paged_lion(self): + mock = Mock() + modules = { + "bitsandbytes": mock, + "bitsandbytes.optim": mock.optim, + "bitsandbytes.optim.Lion": mock.optim.Lion, + } + with patch.dict("sys.modules", modules): + self.check_optim_and_kwargs( + TrainingArguments(optim=OptimizerNames.PAGED_LION, output_dir="None"), + mock.optim.Lion, + default_lion_kwargs, + ) + def test_bnb_adam8bit_no_bnb(self): args = TrainingArguments(optim=OptimizerNames.ADAMW_BNB, output_dir="None") @@ -2616,6 +2767,42 @@ def test_bnb_adam8bit_no_bnb(self): with self.assertRaises(ValueError): Trainer.get_optimizer_cls_and_kwargs(args) + def test_bnb_paged_adam_no_bnb(self): + args = TrainingArguments(optim=OptimizerNames.PAGED_ADAMW, output_dir="None") + + # Pretend that bnb does not exist, even if installed. By setting bnb to None, importing + # bnb will fail even if bnb is installed. + with patch.dict("sys.modules", {"bitsandbytes.optim": None}): + with self.assertRaises(ValueError): + Trainer.get_optimizer_cls_and_kwargs(args) + + def test_bnb_paged_adam8bit_no_bnb(self): + args = TrainingArguments(optim=OptimizerNames.PAGED_ADAMW_8BIT, output_dir="None") + + # Pretend that bnb does not exist, even if installed. By setting bnb to None, importing + # bnb will fail even if bnb is installed. + with patch.dict("sys.modules", {"bitsandbytes.optim": None}): + with self.assertRaises(ValueError): + Trainer.get_optimizer_cls_and_kwargs(args) + + def test_bnb_paged_lion_no_bnb(self): + args = TrainingArguments(optim=OptimizerNames.PAGED_LION, output_dir="None") + + # Pretend that bnb does not exist, even if installed. By setting bnb to None, importing + # bnb will fail even if bnb is installed. + with patch.dict("sys.modules", {"bitsandbytes.optim": None}): + with self.assertRaises(ValueError): + Trainer.get_optimizer_cls_and_kwargs(args) + + def test_bnb_paged_lion8bit_no_bnb(self): + args = TrainingArguments(optim=OptimizerNames.PAGED_LION_8BIT, output_dir="None") + + # Pretend that bnb does not exist, even if installed. By setting bnb to None, importing + # bnb will fail even if bnb is installed. + with patch.dict("sys.modules", {"bitsandbytes.optim": None}): + with self.assertRaises(ValueError): + Trainer.get_optimizer_cls_and_kwargs(args) + def test_anyprecision_adamw(self): # Pretend that torchdistx is installed and mock torchdistx.optimizers.AnyPrecisionAdamW exists. # Trainer.get_optimizer_cls_and_kwargs does not use AnyPrecisioinAdamW. It only has to return the diff --git a/tests/utils/test_dynamic_module_utils.py b/tests/utils/test_dynamic_module_utils.py new file mode 100644 index 00000000000000..dfdc63460cd346 --- /dev/null +++ b/tests/utils/test_dynamic_module_utils.py @@ -0,0 +1,129 @@ +# Copyright 2023 The HuggingFace Team. All rights reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +import os + +import pytest + +from transformers.dynamic_module_utils import get_imports + + +TOP_LEVEL_IMPORT = """ +import os +""" + +IMPORT_IN_FUNCTION = """ +def foo(): + import os + return False +""" + +DEEPLY_NESTED_IMPORT = """ +def foo(): + def bar(): + if True: + import os + return False + return bar() +""" + +TOP_LEVEL_TRY_IMPORT = """ +import os + +try: + import bar +except ImportError: + raise ValueError() +""" + +TRY_IMPORT_IN_FUNCTION = """ +import os + +def foo(): + try: + import bar + except ImportError: + raise ValueError() +""" + +MULTIPLE_EXCEPTS_IMPORT = """ +import os + +try: + import bar +except (ImportError, AttributeError): + raise ValueError() +""" + +EXCEPT_AS_IMPORT = """ +import os + +try: + import bar +except ImportError as e: + raise ValueError() +""" + +GENERIC_EXCEPT_IMPORT = """ +import os + +try: + import bar +except: + raise ValueError() +""" + +MULTILINE_TRY_IMPORT = """ +import os + +try: + import bar + import baz +except ImportError: + raise ValueError() +""" + +MULTILINE_BOTH_IMPORT = """ +import os + +try: + import bar + import baz +except ImportError: + x = 1 + raise ValueError() +""" + +CASES = [ + TOP_LEVEL_IMPORT, + IMPORT_IN_FUNCTION, + DEEPLY_NESTED_IMPORT, + TOP_LEVEL_TRY_IMPORT, + GENERIC_EXCEPT_IMPORT, + MULTILINE_TRY_IMPORT, + MULTILINE_BOTH_IMPORT, + MULTIPLE_EXCEPTS_IMPORT, + EXCEPT_AS_IMPORT, + TRY_IMPORT_IN_FUNCTION, +] + + +@pytest.mark.parametrize("case", CASES) +def test_import_parsing(tmp_path, case): + tmp_file_path = os.path.join(tmp_path, "test_file.py") + with open(tmp_file_path, "w") as _tmp_file: + _tmp_file.write(case) + + parsed_imports = get_imports(tmp_file_path) + assert parsed_imports == ["os"] diff --git a/tests/utils/test_modeling_tf_core.py b/tests/utils/test_modeling_tf_core.py index f144a7b8d93392..ea5bc26986ae8b 100644 --- a/tests/utils/test_modeling_tf_core.py +++ b/tests/utils/test_modeling_tf_core.py @@ -14,6 +14,8 @@ # limitations under the License. +from __future__ import annotations + import copy import os import tempfile diff --git a/utils/check_config_attributes.py b/utils/check_config_attributes.py index 589a94ba6d0762..929f3a51b1c648 100644 --- a/utils/check_config_attributes.py +++ b/utils/check_config_attributes.py @@ -73,6 +73,8 @@ "InformerConfig": ["num_static_real_features", "num_time_features"], # used internally to calculate the feature size "TimeSeriesTransformerConfig": ["num_static_real_features", "num_time_features"], + # used internally to calculate the feature size + "AutoformerConfig": ["num_static_real_features", "num_time_features"], } # TODO (ydshieh): Check the failing cases, try to fix them or move some cases to the above block once we are sure diff --git a/utils/check_repo.py b/utils/check_repo.py index 7280381faf977a..8d1760d13352b1 100644 --- a/utils/check_repo.py +++ b/utils/check_repo.py @@ -73,6 +73,8 @@ "TimeSeriesTransformerDecoder", # Building part of bigger (tested) model. "InformerEncoder", # Building part of bigger (tested) model. "InformerDecoder", # Building part of bigger (tested) model. + "AutoformerEncoder", # Building part of bigger (tested) model. + "AutoformerDecoder", # Building part of bigger (tested) model. "JukeboxVQVAE", # Building part of bigger (tested) model. "JukeboxPrior", # Building part of bigger (tested) model. "DeformableDetrEncoder", # Building part of bigger (tested) model. @@ -223,6 +225,7 @@ "GPTSanJapaneseModel", "TimeSeriesTransformerForPrediction", "InformerForPrediction", + "AutoformerForPrediction", "JukeboxVQVAE", "JukeboxPrior", "PegasusXEncoder", diff --git a/utils/check_table.py b/utils/check_table.py index e7e31cfee3bc79..80593881a39ccc 100644 --- a/utils/check_table.py +++ b/utils/check_table.py @@ -173,56 +173,9 @@ def check_model_table(overwrite=False): ) -def has_onnx(model_type): - """ - Returns whether `model_type` is supported by ONNX (by checking if there is an ONNX config) or not. - """ - config_mapping = transformers_module.models.auto.configuration_auto.CONFIG_MAPPING - if model_type not in config_mapping: - return False - config = config_mapping[model_type] - config_module = config.__module__ - module = transformers_module - for part in config_module.split(".")[1:]: - module = getattr(module, part) - config_name = config.__name__ - onnx_config_name = config_name.replace("Config", "OnnxConfig") - return hasattr(module, onnx_config_name) - - -def get_onnx_model_list(): - """ - Return the list of models supporting ONNX. - """ - config_mapping = transformers_module.models.auto.configuration_auto.CONFIG_MAPPING - model_names = config_mapping = transformers_module.models.auto.configuration_auto.MODEL_NAMES_MAPPING - onnx_model_types = [model_type for model_type in config_mapping.keys() if has_onnx(model_type)] - onnx_model_names = [model_names[model_type] for model_type in onnx_model_types] - onnx_model_names.sort(key=lambda x: x.upper()) - return "\n".join([f"- {name}" for name in onnx_model_names]) + "\n" - - -def check_onnx_model_list(overwrite=False): - """Check the model list in the serialization.mdx is consistent with the state of the lib and maybe `overwrite`.""" - current_list, start_index, end_index, lines = _find_text_in_file( - filename=os.path.join(PATH_TO_DOCS, "serialization.mdx"), - start_prompt="", - end_prompt="In the next two sections, we'll show you how to:", - ) - new_list = get_onnx_model_list() - - if current_list != new_list: - if overwrite: - with open(os.path.join(PATH_TO_DOCS, "serialization.mdx"), "w", encoding="utf-8", newline="\n") as f: - f.writelines(lines[:start_index] + [new_list] + lines[end_index:]) - else: - raise ValueError("The list of ONNX-supported models needs an update. Run `make fix-copies` to fix this.") - - if __name__ == "__main__": parser = argparse.ArgumentParser() parser.add_argument("--fix_and_overwrite", action="store_true", help="Whether to fix inconsistencies.") args = parser.parse_args() check_model_table(args.fix_and_overwrite) - check_onnx_model_list(args.fix_and_overwrite) diff --git a/utils/tests_fetcher.py b/utils/tests_fetcher.py index 05009e9759fb91..d8373da5ef5b96 100644 --- a/utils/tests_fetcher.py +++ b/utils/tests_fetcher.py @@ -46,6 +46,7 @@ PATH_TO_REPO = Path(__file__).parent.parent.resolve() +PATH_TO_EXAMPLES = PATH_TO_REPO / "examples" PATH_TO_TRANFORMERS = PATH_TO_REPO / "src/transformers" PATH_TO_TESTS = PATH_TO_REPO / "tests" @@ -512,15 +513,40 @@ def print_tree_deps_of(module, all_edges=None): print(line[0]) +def init_test_examples_dependencies(): + """ + The test examples do not import from the examples (which are just scripts, not modules) so we need som extra + care initializing the dependency map there. + """ + test_example_deps = {} + all_examples = [] + for framework in ["flax", "pytorch", "tensorflow"]: + test_files = list((PATH_TO_EXAMPLES / framework).glob("test_*.py")) + all_examples.extend(test_files) + examples = [ + f for f in (PATH_TO_EXAMPLES / framework).glob("**/*.py") if f.parent != PATH_TO_EXAMPLES / framework + ] + all_examples.extend(examples) + for test_file in test_files: + with open(test_file, "r", encoding="utf-8") as f: + content = f.read() + test_example_deps[str(test_file.relative_to(PATH_TO_REPO))] = [ + str(e.relative_to(PATH_TO_REPO)) for e in examples if e.name in content + ] + return test_example_deps, all_examples + + def create_reverse_dependency_map(): """ Create the dependency map from module/test filename to the list of modules/tests that depend on it (even recursively). """ cache = {} - all_modules = list(PATH_TO_TRANFORMERS.glob("**/*.py")) + list(PATH_TO_TESTS.glob("**/*.py")) + example_deps, examples = init_test_examples_dependencies() + all_modules = list(PATH_TO_TRANFORMERS.glob("**/*.py")) + list(PATH_TO_TESTS.glob("**/*.py")) + examples all_modules = [str(mod.relative_to(PATH_TO_REPO)) for mod in all_modules] direct_deps = {m: get_module_dependencies(m, cache=cache) for m in all_modules} + direct_deps.update(example_deps) # This recurses the dependencies something_changed = True @@ -557,7 +583,15 @@ def create_module_to_test_map(reverse_map=None, filter_models=False): """ if reverse_map is None: reverse_map = create_reverse_dependency_map() - test_map = {module: [f for f in deps if f.startswith("tests")] for module, deps in reverse_map.items()} + + def is_test(fname): + if fname.startswith("tests"): + return True + if fname.startswith("examples") and fname.split(os.path.sep)[-1].startswith("test"): + return True + return False + + test_map = {module: [f for f in deps if is_test(f)] for module, deps in reverse_map.items()} if not filter_models: return test_map @@ -627,9 +661,7 @@ def create_json_map(test_files_to_run, json_output_file): json.dump(test_map, fp, ensure_ascii=False) -def infer_tests_to_run( - output_file, diff_with_last_commit=False, filters=None, filter_models=True, json_output_file=None -): +def infer_tests_to_run(output_file, diff_with_last_commit=False, filter_models=True, json_output_file=None): modified_files = get_modified_python_files(diff_with_last_commit=diff_with_last_commit) print(f"\n### MODIFIED FILES ###\n{_print_list(modified_files)}") @@ -663,11 +695,6 @@ def infer_tests_to_run( test_files_to_run = [f for f in test_files_to_run if not f.split(os.path.sep)[1] == "sagemaker"] # Make sure we did not end up with a test file that was removed test_files_to_run = [f for f in test_files_to_run if (PATH_TO_REPO / f).exists()] - if filters is not None: - filtered_files = [] - for _filter in filters: - filtered_files.extend([f for f in test_files_to_run if f.startswith(_filter)]) - test_files_to_run = filtered_files repo_utils_launch = any(f.split(os.path.sep)[1] == "repo_utils" for f in modified_files) @@ -676,6 +703,8 @@ def infer_tests_to_run( with open(repo_util_file, "w", encoding="utf-8") as f: f.write("tests/repo_utils") + examples_tests_to_run = [f for f in test_files_to_run if f.startswith("examples")] + test_files_to_run = [f for f in test_files_to_run if not f.startswith("examples")] print(f"\n### TEST TO RUN ###\n{_print_list(test_files_to_run)}") if len(test_files_to_run) > 0: with open(output_file, "w", encoding="utf-8") as f: @@ -690,6 +719,12 @@ def infer_tests_to_run( create_json_map(test_files_to_run, json_output_file) + print(f"\n### EXAMPLES TEST TO RUN ###\n{_print_list(examples_tests_to_run)}") + if len(examples_tests_to_run) > 0: + example_file = Path(output_file).parent / "examples_test_list.txt" + with open(example_file, "w", encoding="utf-8") as f: + f.write(" ".join(examples_tests_to_run)) + doctest_list = get_doctest_files() print(f"\n### DOCTEST TO RUN ###\n{_print_list(doctest_list)}") @@ -763,13 +798,6 @@ def parse_commit_message(commit_message): action="store_true", help="To fetch the tests between the current commit and the last commit", ) - parser.add_argument( - "--filters", - type=str, - nargs="*", - default=["tests"], - help="Only keep the test files matching one of those filters.", - ) parser.add_argument( "--filter_tests", action="store_true", @@ -814,7 +842,6 @@ def parse_commit_message(commit_message): infer_tests_to_run( args.output_file, diff_with_last_commit=diff_with_last_commit, - filters=args.filters, json_output_file=args.json_output_file, filter_models=not commit_flags["no_filter"], )