Fix pr 32013 #2

Remove conversation pipeline tests

* relaxed rope check * lets also accept rope_type=None, defaulting to the original implementation * type and rope_type can coexist

* let's not warn when someone is running a foward without cache + self.training * more models * fixup

fix resize when deepspeed

* Fix float8_e4m3fn in modeling_utils * style * fix * comment

* support gguf fp16 * support gguf bf16 with pytorch * add gguf f16 test * remove bf16

* No more default chat templates * Add the template to the GPT-SW3 tests since it's not available by default now * Fix GPT2 test * Fix Bloom test * Fix Bloom test * Remove default templates again

…ingface#32198) Replaced deprecated unittest method with the correct one.

* [whisper] fix short-form output type * add test * make style * update long-form tests * fixes * last fix * finalise test

….7.0 (huggingface#32210) remove unnecessary guard code related with pytorch versions 1.4.2 ~ 1.7.0

…gingface#32222) set _supports_param_buffer_assignment to False

fix E721 warnings

* fix * [test_all] trigger full CI --------- Co-authored-by: ydshieh <[email protected]>

* translate philosophy.md to chinese * add the missing link

…tility functions. Default to using the currently active microphone on Mac (huggingface#31846) * use currently active microphone on mac for ffmpeg_microphone * Allow ffmpeg_microphone device to be specified Co-authored-by: amyeroberts <[email protected]> --------- Co-authored-by: amyeroberts <[email protected]>

Fix code snippet for grounding-dino

* fix * move changes to prompt lookup * add test * set eos in assistant model * style * fix flakiness * changes for new `main` * Update tests/generation/test_utils.py Co-authored-by: amyeroberts <[email protected]> * Update tests/generation/test_utils.py Co-authored-by: amyeroberts <[email protected]> * add comment to explain --------- Co-authored-by: amyeroberts <[email protected]>

* llava w/o images * tests

* fix resize when deepspeed * deepsped uses new embeds * we needed this

…gingface#32143) * don't log base model architecture in wandb is log model is false * Update src/transformers/integrations/integration_utils.py Co-authored-by: amyeroberts <[email protected]> * convert log model setting into an enum * fix formatting --------- Co-authored-by: amyeroberts <[email protected]>

* Refactored to remove un-necessary object base class. * small fix.

* adds: extra_repr() to RMSNorm layers in multiple models * adds: extra_repr for deprecated models as well * formatting as per style guide

…tection` for owlv2 (huggingface#31934) * Add check for target_sizes is None in post_process_image_guided_detection * Make sure Owlvit and Owlv2 in sync * Fix incorrect indentation; add check for correct size of target_sizes

…n_implementation==flash_attention_2` (huggingface#32039) * add flash attention check * fix * fix

…ngface#32241) * fix * fix prev test (half of failures) * [run-slow] llama, gemma2 * [run-slow] llama, gemma2

update Co-authored-by: ydshieh <[email protected]>

…uggingface#32244) * replace for loop by tensor ops * rm assert; readability

* bloom dynamic cache * bloom follows standard cache format * no skips for bloom anymore * use cache position when possible * clean up * codestyle * Update src/transformers/models/bloom/modeling_bloom.py Co-authored-by: amyeroberts <[email protected]> * Update src/transformers/models/bloom/modeling_bloom.py Co-authored-by: amyeroberts <[email protected]> * Update src/transformers/models/bloom/modeling_bloom.py Co-authored-by: amyeroberts <[email protected]> * pr comments * isinstance fix * address comments * make musicgen test happy * [run-slow] bloom --------- Co-authored-by: amyeroberts <[email protected]>

upload Co-authored-by: ydshieh <[email protected]>

* Optimize t5 tokenize logic to avoid redundant calls * fix and overwrite copies

…n call (huggingface#32262) Removed one wrong argument passed to convert_blip_checkpoint function call.

remove exceptions

…uggingface#32076) * fix * bug fix * refine * fix

Fixed a link in docs.

* mvp * added test (a few models need fixes) * fix a few test cases * test nits * harder test 😈 * revert changes in stablelm * test with improved condition * add todo * tmp commit * merged with main * nits * add todo * final corrections * add docs for generation compilation * docs nits * add tip * PR suggestions * add more details to the compilation docs * fix cache positions * cache is now init in generate; update docs * tag test as flaky * docs * post rebase make fixup and other nits * remove unintended changes * whisper (encoder-decoder) not supported * move token default updates to ; add tests for token defaults * push changes * manual rebase * chameleon doesn't support this * fix test_static_cache_mha_mqa_gqa (broken in another PR) * docs: dynamic is better with end-to-end compilation

* fix _fix_key in PreTrainedModel * fix _find_longest_common_sequence * add test * remove result.json * nit * update test

* [pipeline] fix padding for 1-d tensors * add test * make style * Update tests/pipelines/test_pipelines_automatic_speech_recognition.py Co-authored-by: Kamil Akesbi <[email protected]> * Update tests/pipelines/test_pipelines_automatic_speech_recognition.py --------- Co-authored-by: Kamil Akesbi <[email protected]>

) * Add stream_to_gradio method for running agent in gradio demo

Co-authored-by: ydshieh <[email protected]>

Docs: fix GaLore optimizer example Fix incorrect usage of GaLore optimizer in Transformers trainer code example. The GaLore optimizer uses low-rank gradient updates to reduce memory usage. GaLore is quite popular and is implemented by the authors in [https://github.com/jiaweizzhao/GaLore](https://github.com/jiaweizzhao/GaLore). A few months ago GaLore was added to the HuggingFace Transformers library in huggingface#29588. Documentation of the Trainer module includes a few code examples of how to use GaLore. However, the `optim_targe_modules` argument to the `TrainingArguments` function is incorrect, as discussed in huggingface#29588 (comment). This pull request fixes this issue.

* fix gguf dequantize for gguf==0.9.1 * fix old version * make style

* fix epochs_trained as int when resuming training * refactor --------- Co-authored-by: teddyferdinan <[email protected]>

…ce#31663)

* tentative fix * do the same for M4T

* doc formatting nits * ignore non-autodocs * Apply suggestions from code review Co-authored-by: amyeroberts <[email protected]> * Update src/transformers/models/esm/modeling_esm.py Co-authored-by: amyeroberts <[email protected]> * Update src/transformers/models/esm/modeling_esm.py Co-authored-by: amyeroberts <[email protected]> * make fixup --------- Co-authored-by: amyeroberts <[email protected]>

* new agent plan * plan type assertion * style corrections * better prompt naming * make fixup

Fixed raising of few exceptions.

…ading for prequantized 4bit (huggingface#32276)

…ge of 10% o… (huggingface#32335) fixes huggingface#32329 : The Torch code is correct - to get an average of 10% of the total, we want to take 50% of the remainder after we've already masked 80% with [MASK] in the previous step.

) * tmp * skip files not in the diff * use git.Repo instead of an external subprocess * add tiny change to confirm that the diff is working on pushed changes * add make quality task * more profesh main commit reference

…ss (huggingface#32191) * Remove user-defined tokens which can be obtained through merges * Remove debug line * formatting * Refactor spm slow -> fast converter * revert unnecessary refactor * set comprehension * remove test files * Use `vocab_scores` * Always replace spiece underline with space in decode * we no longer need token filtering * Add save fast load slow unit test * Remove tokenizers version check * Remove duplicate code * Make `<start_of_turn>` and `<end_of_turn>` special tokens * Bias merge priority with length if score is the same * Add unit test for merge priority * CI

fix

* enable flash-attn & static cache * this works, not the prev * fix for sliding window layers * not needed anymore

fix 💩

* Fix FA2 call for Perciever layer * [run_slow] idefics2 * [run_slow] idefics2 * [run_slow] idefics2 * Fix up * [run_slow] idefics2 * [run_slow] idefics2 * [run_slow] idefics2

…ggingface#32360) Fix error when streaming agent run to gradio with non-string tool arguments

…oder models (huggingface#32227) * draft * apply changes to all relevant archs * rerun ci - check_docstrings.py failing? * fix docstring * move 2D->4D mask creation to modeling file * repo consistency * fix the batch size = 1 case - calling contiguous is not enough * nit * style * propagate to gemma/gemma-2 * prepare inputs for gemma generation * implement test and tiny fix in gemma2 * Update src/transformers/models/bloom/modeling_bloom.py Co-authored-by: Arthur <[email protected]> * fix copies * ci pass * fix gemma's test_compile_static_cache tests * flacky * retrigger ci --------- Co-authored-by: sanchit-gandhi <[email protected]> Co-authored-by: Arthur <[email protected]>

* Fixed staticmethods with self as first argument. * Fixed staticmethods with self as first argument. * Fixed staticmethods with self as first argument. * Fixed staticmethods with self as first argument.

cache class flag

…31772) * [whisper] compile compatibility with long-form decoding * clarify comment * fix after rebase * finalise * fix bsz * fix cache split * remove contiguous * style * finish * update doc * prevent cuda graph trace

…ngface#32339) * Remove size check between attn_weights and kv_seq_len * add unit tests

…uggingface#32359) Co-authored-by: Guoming Zhang <[email protected]>

…ggingface#31971) (huggingface#32043) * Remove TPU device map for saving tokenizer config * Update tokenization_utils_base.py * Fix error msg when passing non-string device into tokenizer * Fix error message for non-string tokenizer device * Print out tokenizer device type in error msg * Update tokenization_utils_base.py

…sion (huggingface#32342) empty list in defaults

…gface#31233) * Fix conflicting key in init kwargs in PreTrainedTokenizerBase * Update code to check for callable key in save_pretrained * Apply PR suggestions * Invoke CI * Updates based on PR suggestion

* Initial implementation of OffloadedCache * enable usage via cache_implementation * Address feedback, add tests, remove legacy methods. * Remove flash-attn, discover synchronization bugs, fix bugs * Prevent usage in CPU only mode * Add a section about offloaded KV cache to the docs * Fix typos in docs * Clarifications and better explanation of streams

…#32374)

* fixed hybrid cache init, added test * Fix Test Typo --------- Co-authored-by: Aaron Haag <[email protected]>

…e been done (huggingface#32299) * Test this zach * Test for improper init w/o zero3 * Move back * Apply suggestions from code review Co-authored-by: amyeroberts <[email protected]> * Get rid of stars in warning * Make private * Make clear --------- Co-authored-by: amyeroberts <[email protected]>

nits

tests! :D

…gingface#32367) * up * style * stopping

…rmers/examples/flax/language-modeling/t5_tokenizer_model.py`. (huggingface#32157) fix: Exception raised when running .

…ing rotary_seq_len, allowing None position_ids input. (huggingface#31500) * Mixtral: remove unnecessary plus 1 when calculating rotary_seq_len, allowing position_ids=None (no auto position_ids generation could be unsafe) * fix typo [:-1] to [:, -1] * to meet formatting requirement * to meet formatting requirement * remove white space * MixtralFlashAttention2: put "+ 1" inside parentheses when calculating rotary_seq_len, allowing None position_ids input. Fix format/style issue. * propagate to startcoder2, phi3, mixtral and qwen2 * update qwen2_moe

…on_transformer (huggingface#32393) Bump keras in /examples/research_projects/decision_transformer Bumps [keras](https://github.com/keras-team/keras) from 2.8.0 to 2.13.1. - [Release notes](https://github.com/keras-team/keras/releases) - [Commits](keras-team/keras@v2.8.0...v2.13.1) --- updated-dependencies: - dependency-name: keras dependency-type: direct:production ... Signed-off-by: dependabot[bot] <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* fix: SeamlessM4TFeatureExtractor stride remainder * Added attention mask size test * Reran ruff for style correction

fix phi

* save total_vocab_size = vocab_size + user added tokens to speed up operation * updating length when added_tokens_decoder is set * add test len(tokenizer)

I always forget what typical values are, and I have to look at the paper everytime. This will be a helpful reminder.

…face#32242) * fix: persist embedding type of MBartConditonalGeneration after resize * fix: persist embedding type of BartConditonalGeneration after resize

…uggingface#32413) Fixed tokenizertests for luke, mluke models.

* Respect the config's attn if set * Update test - can override in from_config * Fix

…ngface#32434)

* draft * updates * works? * try adding python example in hidden section * another try * hwo do i render python * format as html code? * Update docs/source/en/kv_cache.md Co-authored-by: Joao Gante <[email protected]> * Update docs/source/en/kv_cache.md Co-authored-by: Joao Gante <[email protected]> * Update docs/source/en/kv_cache.md Co-authored-by: Joao Gante <[email protected]> * Update docs/source/en/kv_cache.md Co-authored-by: Joao Gante <[email protected]> * Update docs/source/en/kv_cache.md Co-authored-by: Joao Gante <[email protected]> * one more small update * should render hidden secrtion now * add outputs * fix links * check links * update all links * update with offloaded cache * all cache is importable, so they appear in docs * fix copies * docstring... --------- Co-authored-by: Joao Gante <[email protected]>

fix: add new llava like model bug

* add flash attention check * fix * fix * add the missing marker * bug fix * add one more * remove order * add one more

…#32024) * BLIP preprocess * BIT preprocess * BRIDGETOWER preprocess * CHAMELEON preprocess * CHINESE_CLIP preprocess * CONVNEXT preprocess * DEIT preprocess * DONUT preprocess * DPT preprocess * FLAVA preprocess * EFFICIENTNET preprocess * FUYU preprocess * GLPN preprocess * IMAGEGPT preprocess * INTRUCTBLIPVIDEO preprocess * VIVIT preprocess * ZOEDEPTH preprocess * VITMATTE preprocess * VIT preprocess * VILT preprocess * VIDEOMAE preprocess * VIDEOLLAVA * TVP processing * TVP fixup * SWIN2SR preprocess * SIGLIP preprocess * SAM preprocess * RT-DETR preprocess * PVT preprocess * POOLFORMER preprocess * PERCEIVER preprocess * OWLVIT preprocess * OWLV2 preprocess * NOUGAT preprocess * MOBILEVIT preprocess * MOBILENETV2 preprocess * MOBILENETV1 preprocess * LEVIT preprocess * LAYOUTLMV2 preprocess * LAYOUTLMV3 preprocess * Add test * Update tests

…huggingface#32438)

deps_2

* Add nemotron support * fix inference * add unit test * add layernorm1p as a class to avoid meta device mismatch * test fixed * Add copied_from statements * remove pretraining_tp args * remove nemotronlayernorm * force LN computation done in FP32 * remove nemotrontokenizer and use llamatokenizer * license update * add option for kv_channels for minitron8b * remove assert * o_proj fixed * o_proj reshape * add gated_proj option * typo * remove todos * fix broken test after merging latest main * remove nezha/nat after meging main * chnage default config to 15b model * add nemo conversion script * rename conversion script * remove gate_proj option * pr comment resolved * fix unit test * rename kv_channels to head_dim * resolve PR issue * add nemotron md * fix broken tests * refactor rope for nemotron * test fix * remove linearscaling * whitespace and import * fix some copied-from * code style fix * reformatted * add position_embedding to nemotronattention * rope refactor to only use config, copied-from fix * format * Run make fix-copies * nemotron md with autodoc * doc fix * fix order * pass check_config_docstrings.py * fix config_attributes * remove all llama BC related code * Use PreTrainedTokenizerFast * ruff check examples * conversion script update * add nemotron to toctree

* add new model like * draft cuda forward - mismatched keys (sharding on conv1) * match keys successfully * fix split * get generation/forward running (wrong gens, norm?) * :update * some refactoring * fixes * works up until copy to cache * fix * update * NON WORKING VERSION * version that work? * nit * fix config * fix conversion script * working cuda forward * nit * update * simplifcation * make mamba slow simple work * no einops * todo * fix style * no einops * update fix no einsum * nit * remove einops * bug: scan_output differs strongly * add rms norm option * fix fast + slow generation with and w/o cache ✔️ * draft integration tests * remove a big chunk of the einsum * fix slow, fast generations, without any einsum * fix copies * fix structure * fix up modeling and tests * fix tests * clamping is indeed worse * recover mamba2 cache test * fix copies * no cache position (yet) * fix tf tests * fix matmul for generate * fixup * skip cache tests for now * [run-slow]mamba2 * tune out hidden states for padding * test batched generation * propagate attention mask changes * fix past length * fix integration test * style * address comments * update readme * add mamba2 version check * fix tests * [run-slow]mamba2 * skip edge tests * [run-slow]mamba2 * last fixup * [run-slow]mamba2 * update README --------- Co-authored-by: Arthur Zucker <[email protected]>

…ersions (huggingface#32292) * Migrate import checks to secondary accelerate calls * better errs too * Revert, just keep the import checks + remove accelerate-specific things * Rm extra' * Empty commit for ci * Small nits * Final

…32443) Update nllb.md

…32476) * cast a wide net * make fix-copies with a few manual changes * add copied from

…ient_loading for prequantized 4bit (huggingface#32276)" (huggingface#32477) * Revert "fixes to properly shard FSDP across cpu and meta for cpu_efficient_loading for prequantized 4bit (huggingface#32276)" This reverts commit 62c60a3. We uncovered an issue with this change that caused our training runs to hang. * `is_torchdynamo_compiling` -- cast a wide exception net (huggingface#32476) * cast a wide net * make fix-copies with a few manual changes * add copied from --------- Co-authored-by: Joao Gante <[email protected]>

) * docs: ko: tasks/mask_generation.md * feat: nmt draft * fix : toc local * fix : manual edits * fix : ko-toctree * fix: resolve suggestions Co-authored-by: boyunJang <[email protected]> Co-authored-by: Chaewon Song <[email protected]> * fix: resolve suggestions Co-authored-by: boyunJang <[email protected]> Co-authored-by: Chaewon Song <[email protected]> * fix: resolve suggestions * fix: resolve suggestions * fix: resolve suggestions --------- Co-authored-by: boyunJang <[email protected]> Co-authored-by: Chaewon Song <[email protected]>

* docs: ko: tasks/idefics.md * feat: nmt draft * fix: manual edits * fix: resolve suggestions Co-authored-by: Chaewon Song <[email protected]> Co-authored-by: Harheem Kim <[email protected]> Co-authored-by: timdalxx <[email protected]> --------- Co-authored-by: Chaewon Song <[email protected]> Co-authored-by: Harheem Kim <[email protected]> Co-authored-by: timdalxx <[email protected]>

* docs: ko: tasks/image_to_image.md * feat: nmt draft * fix: manual edits * fix: resolve suggestions Co-authored-by: Jihun Lim <[email protected]> Co-authored-by: Jiwook Han <[email protected]> * fix: handle remaining suggestions Co-authored-by: Jiwook Han <[email protected]> --------- Co-authored-by: Jihun Lim <[email protected]> Co-authored-by: Jiwook Han <[email protected]>

* draft bart with new cache * add cache for decoder-only models * revert utils * modify docstring * revert bart * minor fixes * fix copies (not related) * revert tests * remove enc-dec related code * remove bloom * remove opt (enc-dec) * update docstring * git, codegen, gpt_neo, gpt_neox, gpj * clean up * copied from statements * revert * tmp * update warning msg * forgot git * add more flags * run-slow git,codegen,gpt_neo,gpt_neox,gpj * add cache flag to VLMs * remove files * style * video LLMs also need a flag * style * llava will go in another PR * style * [run-slow] codegen, falcon, git, gpt_neo, gpt_neox, gptj, idefics * Update src/transformers/models/gpt_neo/modeling_gpt_neo.py Co-authored-by: Arthur <[email protected]> * copy from * deprecate until v4.45 and warn if not training * nit * fix test * test static cache * add more tests and fix models * fix copies * return sliding window mask * run slow tests & fix + codestyle * one more falcon fix for alibi --------- Co-authored-by: Arthur <[email protected]>

* gemma2 fallback to dynamic cache * Update src/transformers/models/gemma2/modeling_gemma2.py Co-authored-by: Joao Gante <[email protected]> * Update src/transformers/models/gemma2/modeling_gemma2.py Co-authored-by: Arthur <[email protected]> * raise error and dont fallback to dynamic cache * prev will break most forward calls/tests * Update src/transformers/models/gemma2/modeling_gemma2.py Co-authored-by: Arthur <[email protected]> * update * fix copies --------- Co-authored-by: Joao Gante <[email protected]> Co-authored-by: Arthur <[email protected]>

* enable xla fsdp * add acceleration version check for xla fsdp

* Allow optional use of grammars to constrain generation

`https://huggingface.co/docs/transformers/en/main_classes/pipelines#transformers.TextGenerationPipeline.__call__` `generate_kwargs (dict, optional) — Additional keyword arguments to pass along to the generate method of the model (see the generate method corresponding to your framework here).` link in "here" doesnt work

…2467) * logits * words

* fix: manual edits * fix: manual edits2 * fix: delete files * fix: resolve suggestions Co-authored-by: Sungmin Oh <[email protected]> Co-authored-by: SeungYoun Lee <[email protected]> Co-authored-by: 김준재 <[email protected]> * fix: resolve suggestions Co-authored-by: Steven Liu <[email protected]> --------- Co-authored-by: Sungmin Oh <[email protected]> Co-authored-by: SeungYoun Lee <[email protected]> Co-authored-by: 김준재 <[email protected]> Co-authored-by: Steven Liu <[email protected]>

* docs: ko: tasks/prompting.md * feat: nmt-draft * fix: update translation in prompting.md * fix: update toctree.yml * fix: manual edits * fix: toctree edits * fix: resolve suggestions Co-authored-by: boyunJang <[email protected]> Co-authored-by: Harheem Kim <[email protected]> Co-authored-by: timdalxx <[email protected]> --------- Co-authored-by: boyunJang <[email protected]> Co-authored-by: Harheem Kim <[email protected]> Co-authored-by: timdalxx <[email protected]>

…e#32281) * docs: ko: quantization/quanto.md * feat: nmt draft * fix: resolve suggestions Co-authored-by: SeungYoun Lee <[email protected]> Co-authored-by: Minki Kim <[email protected]> Co-authored-by: 김준재 <[email protected]> * fix: resolve suggestions Co-authored-by: SeungYoun Lee <[email protected]> --------- Co-authored-by: SeungYoun Lee <[email protected]> Co-authored-by: Minki Kim <[email protected]> Co-authored-by: 김준재 <[email protected]>

…ngface#32239) * docs: ko: tasks/images_feature_extraction.md * feat: nmt draft * fix: manual edits * fix: manual edits * fix: manual edits * fix: manual edits * feat: manual edits * Update docs/source/ko/tasks/image_feature_extraction.md Co-authored-by: Jihun Lim <[email protected]> * Update docs/source/ko/tasks/image_feature_extraction.md Co-authored-by: Jihun Lim <[email protected]> * fix: manual edits --------- Co-authored-by: Jihun Lim <[email protected]>

Fixed WhisperModel.forward’s docstring link.

) * docs: ko: chat_templating.md * feat: nmt draft * fix: manual edits * Update docs/source/ko/chat_templating.md Co-authored-by: Sungmin Oh <[email protected]> * Update docs/source/ko/chat_templating.md Co-authored-by: Sungmin Oh <[email protected]> * fix: apply suggestions from code review - anchor Co-authored-by: Sungmin Oh <[email protected]> * fix: manual edits Co-authored-by: SeungYoun Lee <[email protected]> Co-authored-by: Minki Kim <[email protected]> * fix: manual edits * fix: delete 'default template' section --------- Co-authored-by: Sungmin Oh <[email protected]> Co-authored-by: SeungYoun Lee <[email protected]> Co-authored-by: Minki Kim <[email protected]>

@itazap

Hello! ## Pull Request overview * Fix typo ## Details This should speak for itself. cc @itazap @ArthurZucker - Tom Aarsen

Update llm_tutorial.md remove comma re: issue 32518 huggingface#32518

* Change `_supports_sdpa` to True * add phi3 to sdpa support list

* fix typo * uniform kwargs * make style * add comments * remove return_tensors * remove common_kwargs from processor since it propagates * make style * return_token_type_ids to True * revert the default imagekwargs since does not accept any value in the image processro * revert processing_utils.py * make style * add molbap's commit * fix typo * fix common processor * remain * Revert "add molbap's commit" This reverts commit a476c6e. * add unsync PR * revert * make CI happy * nit * import annotationformat

* handle (processor_class, None) returned by ModelPatterns * handle (slow, fast) image processors in add model * handle old image processor case

* add qwen2audio * Update check_repo.py * fix style * fix test * fix style * add model size * Qwen2AudioEncoderModel->Qwen2AudioEncoder; add copy info * Update src/transformers/models/qwen2_audio/modeling_qwen2_audio.py Co-authored-by: Yoach Lacombe <[email protected]> * Update src/transformers/models/qwen2_audio/modeling_qwen2_audio.py Co-authored-by: Yoach Lacombe <[email protected]> * Update src/transformers/models/qwen2_audio/modeling_qwen2_audio.py Co-authored-by: Yoach Lacombe <[email protected]> * switch the attention_mask and the feature_attention_mask * add to PRIVATE_MODELS in check_repo.py; add to MODEL_NAMES_TO_IGNORE in check_table.py * fix initialization * update chat_template * fix consistency issue after copy * add docstrings to _merge_input_ids_with_audio_features * add copied from to prepare_inputs_for_generation * add more details to docs * rm comment * add init_std * Update src/transformers/models/qwen2_audio/modeling_qwen2_audio.py Co-authored-by: Yoach Lacombe <[email protected]> * Update src/transformers/models/qwen2_audio/modeling_qwen2_audio.py Co-authored-by: Yoach Lacombe <[email protected]> * Update src/transformers/models/qwen2_audio/modeling_qwen2_audio.py Co-authored-by: Yoach Lacombe <[email protected]> * Update src/transformers/models/qwen2_audio/modeling_qwen2_audio.py Co-authored-by: Yoach Lacombe <[email protected]> * update * Update docs/source/en/model_doc/qwen2_audio.md Co-authored-by: amyeroberts <[email protected]> * update tests * rm ignore_index * update processor * rm ffmpeg_read * Update tests/models/qwen2_audio/test_modeling_qwen2_audio.py Co-authored-by: amyeroberts <[email protected]> * Update docs/source/en/model_doc/qwen2_audio.md Co-authored-by: amyeroberts <[email protected]> * Update docs/source/en/model_doc/qwen2_audio.md Co-authored-by: amyeroberts <[email protected]> * Update docs/source/en/model_doc/qwen2_audio.md Co-authored-by: amyeroberts <[email protected]> * update * typo * [run_slow] qwen2_audio * [run_slow] qwen2_audio * [run_slow] qwen2_audio * fix quality * [run_slow] qwen2_audio * [run_slow] qwen2_audio * [run_slow] qwen2_audio * add official model --------- Co-authored-by: Yoach Lacombe <[email protected]> Co-authored-by: amyeroberts <[email protected]>

…0954) * filter flash_attn optional imports loading remote code * improve pattern * fix code style * Update src/transformers/dynamic_module_utils.py Co-authored-by: Matt <[email protected]> --------- Co-authored-by: Matt <[email protected]>

…uggingface#32372) * docs: ko: llm_tutorial_optimization.md * feat: nmt draft * fix: manual edits * Update docs/source/ko/llm_tutorial_optimization.md Co-authored-by: Chaewon Song <[email protected]> * Update docs/source/ko/llm_tutorial_optimization.md Co-authored-by: Chaewon Song <[email protected]> * fix: resolve suggestions - 1 Co-authored-by: Chaewon Song <[email protected]> Co-authored-by: timdalxx <[email protected]> Co-authored-by: boyunJang <[email protected]> * fix: resolve suggestions - 2 Co-authored-by: boyunJang <[email protected]> Co-authored-by: Chaewon Song <[email protected]> Co-authored-by: timdalxx <[email protected]> --------- Co-authored-by: Chaewon Song <[email protected]> Co-authored-by: timdalxx <[email protected]> Co-authored-by: boyunJang <[email protected]>

* docs: ko: ko-trainer * feat: nmt draft * fix: manual edits * fix: manual edits * fix: glossary * fix: glossary * Apply suggestions from code review Co-authored-by: Jinuk <[email protected]> Co-authored-by: SeongWooChoi <[email protected]> --------- Co-authored-by: Jinuk <[email protected]> Co-authored-by: SeongWooChoi <[email protected]>

* docs: ko: quantization/eetq.md * feat: nmt draft * fix docs: ko: quantization/eetq.md * fix docs: ko: quantization/eetq.md * fix: resolve suggestions Co-authored-by: Jiwook Han <[email protected]> * fix: resolve suggestions * fix: resolve suggsetions --------- Co-authored-by: Jiwook Han <[email protected]>

* docs: ko: fsdp.md * feat: nmt draft * fix: manual edits * Apply suggestions from code review Co-authored-by: 김준재 <[email protected]> Co-authored-by: Minki Kim <[email protected]> * fix: resolve suggestions * Update docs/source/ko/fsdp.md Co-authored-by: 김준재 <[email protected]> * Update docs/source/ko/fsdp.md Co-authored-by: Steven Liu <[email protected]> --------- Co-authored-by: 김준재 <[email protected]> Co-authored-by: Minki Kim <[email protected]> Co-authored-by: Steven Liu <[email protected]>

* docs: ko: quantization/bitsandbytes.md * feat: nmt draft * fix: minor typos * fix: manual edits * fix: manual edits * fix: resolve suggestions Co-authored-by: wony617 <[email protected]> Co-authored-by: YONGSANG <[email protected]> Co-authored-by: Woojun Jung <[email protected]> * fix: resolve suggestions Co-authored-by: Steven Liu <[email protected]> * Apply suggestions from code review Co-authored-by: Steven Liu <[email protected]> * Apply suggestions from code review Co-authored-by: Steven Liu <[email protected]> --------- Co-authored-by: wony617 <[email protected]> Co-authored-by: YONGSANG <[email protected]> Co-authored-by: Woojun Jung <[email protected]> Co-authored-by: Steven Liu <[email protected]>

* I think inputs_embeds has ndim == 3 * fix sequence length catch * add generate test * [run-slow]olmo, persimmon, gemma, gemma2, qwen2, llama * skip whisper * fix bart test * more fixes

…gface#32516) Workaround the export issue in torch 2.4 Co-authored-by: Guang Yang <[email protected]>

clarify

fix FA2

fix _update_model_kwargs_for_generation

no empty revision

…#32422) Signed-off-by: duzhanwei <[email protected]> Co-authored-by: duzhanwei <[email protected]>

* docs: ko: main_classes/agent * feat: chatgpt draft * fix: manual edits * �fix: resolve suggestions Co-authored-by: Woojun Jung <[email protected]> Co-authored-by: thsamaji <[email protected]> Co-authored-by: SeungAhSon <[email protected]> * fix: resolve suggestions * fix: resolve code line number --------- Co-authored-by: Woojun Jung <[email protected]> Co-authored-by: thsamaji <[email protected]> Co-authored-by: SeungAhSon <[email protected]>

* v1 - working version * fix * fix * fix * fix * rename to correct name * fix title * fixup * rename files * fix * add copied from on tests * rename to `FalconMamba` everywhere and fix bugs * fix quantization + accelerate * fix copies * add `torch.compile` support * fix tests * fix tests and add slow tests * copies on config * merge the latest changes * fix tests * add few lines about instruct * Apply suggestions from code review Co-authored-by: Arthur <[email protected]> * fix * fix tests --------- Co-authored-by: Arthur <[email protected]>

* fix check * add tests * [run-slow] llama, gemma2 * oops, whisper actually runs but needed some special treatment

…ce#32522) * fix sliding window attention (flash2) in gemma2 model * [run-slow] gemma * fix slicing attention_mask for flash_attn2 * fix slicing attention_mask when flash_attn is used * add missing comment * slice the last seq_len tokens in the key, value states * revert code of slicing key, value states

…2581) * Fixed conditional check for encodec model names. * Reformatted conditional check.

…eating PR on not-owned repo (huggingface#32094) Fix create_pr aagainst existing revision

…sion_transformer (huggingface#32569) Bump aiohttp in /examples/research_projects/decision_transformer Bumps [aiohttp](https://github.com/aio-libs/aiohttp) from 3.9.4 to 3.10.2. - [Release notes](https://github.com/aio-libs/aiohttp/releases) - [Changelog](https://github.com/aio-libs/aiohttp/blob/master/CHANGES.rst) - [Commits](aio-libs/aiohttp@v3.9.4...v3.10.2) --- updated-dependencies: - dependency-name: aiohttp dependency-type: direct:production ... Signed-off-by: dependabot[bot] <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

…_bert (huggingface#32220) Bump torch in /examples/research_projects/visual_bert Bumps [torch](https://github.com/pytorch/pytorch) from 1.13.1 to 2.2.0. - [Release notes](https://github.com/pytorch/pytorch/releases) - [Changelog](https://github.com/pytorch/pytorch/blob/main/RELEASE.md) - [Commits](pytorch/pytorch@v1.13.1...v2.2.0) --- updated-dependencies: - dependency-name: torch dependency-type: direct:production ... Signed-off-by: dependabot[bot] <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* Rename "Templates for Chat Models" doc to "Chat Templates" * Small formatting fix * Small formatting fix * Small formatting fix * Cleanup tool calling docs as well * Remove unneeded 'revision' * Move tip to below main code example * Little bonus section on template editing

* Update _toctree.yml * docs: ko: deepspeed.md * Apply suggestions from code review Co-authored-by: wony617 <[email protected]> * Apply suggestions from code review Co-authored-by: wony617 <[email protected]> * Update docs/source/ko/_toctree.yml Co-authored-by: Steven Liu <[email protected]> * Update docs/source/ko/deepspeed.md * Update docs/source/ko/deepspeed.md Co-authored-by: SeungAhSon <[email protected]> * Apply suggestions from code review Co-authored-by: wony617 <[email protected]> * Update docs/source/ko/_toctree.yml --------- Co-authored-by: wony617 <[email protected]> Co-authored-by: Steven Liu <[email protected]> Co-authored-by: SeungAhSon <[email protected]>

* fix: manual edits * Apply suggestions from code review Co-authored-by: SeongWooChoi <[email protected]> Co-authored-by: Chulhwa (Evan) Han <[email protected]> * fix:manual edits - 잘못된 경로에 번역본 파일을 생성해서 옮김 * Delete docs/source/ko/tasks/awq.md * Update docs/source/ko/_toctree.yml Co-authored-by: Steven Liu <[email protected]> --------- Co-authored-by: SeongWooChoi <[email protected]> Co-authored-by: Chulhwa (Evan) Han <[email protected]> Co-authored-by: Steven Liu <[email protected]>

Fixed failing test_find_base_model_checkpoint.

…decision_transformer (huggingface#32341) Bump tensorflow in /examples/research_projects/decision_transformer Bumps [tensorflow](https://github.com/tensorflow/tensorflow) from 2.11.1 to 2.12.1. - [Release notes](https://github.com/tensorflow/tensorflow/releases) - [Changelog](https://github.com/tensorflow/tensorflow/blob/master/RELEASE.md) - [Commits](tensorflow/tensorflow@v2.11.1...v2.12.1) --- updated-dependencies: - dependency-name: tensorflow dependency-type: direct:production ... Signed-off-by: dependabot[bot] <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* "to be not" -> "not to be" * Update sam.md * Update trainer.py * Update modeling_utils.py * Update test_modeling_utils.py * Update test_modeling_utils.py

…version` argument (huggingface#32545) * Fixed wrong argument in is_torch_mps_available() function call. * Fixed wrong argument in is_torch_mps_available() function call. * sorted the import. * Fixed wrong argument in is_torch_mps_available() function call. * Fixed wrong argument in is_torch_mps_available() function call. * Update src/transformers/utils/import_utils.py Co-authored-by: Arthur <[email protected]> * removed extra space. * Added type hint for the min_version parameter. * Added missing import. --------- Co-authored-by: Arthur <[email protected]>

* let it be * draft * should not have changed * add warnings * fix & add tests * fix tests * ipnuts embeds cannot be passed with pixels * more updates * paligemma ready! * minor typos * update blip-2 * fix tests & raise error * docstring * add blip2 test * tmp * add image seq length to config * update docstring * delete * fix tests * fix blip * fix paligemma * out-of-place scatter * add llava-next-video * Update src/transformers/models/blip_2/modeling_blip_2.py Co-authored-by: Pablo Montalvo <[email protected]> * remove tmp * codestyle * nits * more nits * remove overriding in tests * comprehension when merging video * fix-copies * revert changes for embeds test * fix tests after making comprehension * Update src/transformers/models/blip_2/processing_blip_2.py Co-authored-by: Pablo Montalvo <[email protected]> * Update src/transformers/models/blip_2/processing_blip_2.py Co-authored-by: Pablo Montalvo <[email protected]> * more updates * fix tests --------- Co-authored-by: Pablo Montalvo <[email protected]>

) * Automatically add `transformers` tag to the modelcard * Specify library_name and test

* skip failing tests * [no-filter] * [no-filter] * fix wording catch in FA2 test * [no-filter] * trigger normal CI without filtering

…face#32316) * fix * enable on xpu * no manual remove * move to device * remove to * add move to

* add grokadamw * reformat * code review feedback, unit test * reformat * reformat

* add checkpoint and repo names * adapt head to support metric depth estimation * add max_depth output scaling * add expected logits * improve docs * fix docstring * add checkpoint and repo names * adapt head to support metric depth estimation * add max_depth output scaling * add expected logits * improve docs * fix docstring * rename depth_estimation to depth_estimation_type * add integration test * Refactored tests to include metric depth model inference test * Integration test pass when the timm backbone lines are commented (L220-L227) * address feedback * replace model path to use organization path * formatting * delete deprecated TODO * address feedback * [run_slow] depth_anything

…s.py` (huggingface#32601) * Removed un-necessary expressions. * Fixed directory path for utils folder in test_tokenization_utils.py

) * Add padding="max_length" to tokenizer kwargs and change crop_size to size for image_processor kwargs * remove crop_size argument in align processor tests to be coherent with base tests * Add pad_token when loading tokenizer if needed, change test override tokenizer kwargs, remove unnecessary test overwrites in grounding dino

* Update modeling_tf_deberta.py Corrected some codes which do not support mixed precision * Update modeling_tf_deberta_v2.py Corrected some codes which do not support mixed precision * Update modeling_tf_deberta_v2.py * Update modeling_tf_deberta.py * Add files via upload * Add files via upload

* add fix for recurrentgemma * [no-filter] * trigger-ci * [no-filter] * [no-filter] * attempt to fix mysterious zip error * [no-filter] * fix lookup error * [no-filter] * remove summarization hack * [no-filter]

…#31913) Add accelerate version check, needs accelerate>=0.33.0

…uggingface#32678) * Fixed failing tests in tests/utils/test_add_new_model_like.py * Fixed formatting using ruff. * Small nit.

update list of people to tag

* Add TorchAOHfQuantizer Summary: Enable loading torchao quantized model in huggingface. Test Plan: local test Reviewers: Subscribers: Tasks: Tags: * Fix a few issues * style * Added tests and addressed some comments about dtype conversion * fix torch_dtype warning message * fix tests * style * TorchAOConfig -> TorchAoConfig * enable offload + fix memory with multi-gpu * update torchao version requirement to 0.4.0 * better comments * add torch.compile to torchao README, add perf number link --------- Co-authored-by: Marc Sun <[email protected]>

JetMoeIntegrationTest Co-authored-by: ydshieh <[email protected]>

…ingface#32669) * Update the Kubernetes CPU training example * Add namespace arg Signed-off-by: Dina Suehiro Jones <[email protected]> --------- Signed-off-by: Dina Suehiro Jones <[email protected]>

…2475) Fixed unknown config option doctest_glob.

Unpin deepspeed

Updated few workflows to the latest versions.

…ggingface#32679) restore huggingface#32386

…32837) Corrected the model checkpoint.

* fix beam search in video llava * [run-slow] video_llava

* add back the position ids * fix failing test

* use head_dim if in config for RoPE * typo * simplify with getattr

* fix on xpu * [run_all]

* more precise name * better docstrings * Update src/transformers/cache_utils.py Co-authored-by: Arthur <[email protected]> --------- Co-authored-by: Arthur <[email protected]>

…e#32844) * Fix: fix all model_type of Llava-Next-Video to llava_next_video * Fix doc for llava_next_video * * Fix formatting issues * Change llava-next-video.md file name into llava_next_video.md to make it compatible with implementation * Fix docs TOC for llava-next-video

* improve _get_is_as_tensor_fns * format

Revert PR 32299

…enamed, and provide a step forward (huggingface#32656) * Fin * Modify msg * Finish up nits

…uggingface#32674) * Fix beam_constraints.Constraint.advance() docstring * Update src/transformers/generation/beam_constraints.py Co-authored-by: Steven Liu <[email protected]> --------- Co-authored-by: Joao Gante <[email protected]> Co-authored-by: Steven Liu <[email protected]>

…generation (huggingface#32856)

* tfmsenv restored in main * installed flax * forward pass done and all tests passed * make fix-copies and cleaning the scripts * fixup attempt 1 * fixup attempt 2 * fixup third attempt * fixup attempt 4 * fixup attempt 5 * dinov2 doc fixed * FlaxDinov2Model + ForImageClassification added to OBJECTS_TO_IGNORE * external pos_encoding layer removed * fixup attempt 6 * fixed integration test values * fixup attempt 7 * Update src/transformers/models/dinov2/modeling_flax_dinov2.py Co-authored-by: amyeroberts <[email protected]> * Update src/transformers/models/dinov2/modeling_flax_dinov2.py Co-authored-by: amyeroberts <[email protected]> * Update src/transformers/models/dinov2/modeling_flax_dinov2.py Co-authored-by: amyeroberts <[email protected]> * Update src/transformers/models/dinov2/modeling_flax_dinov2.py Co-authored-by: amyeroberts <[email protected]> * Update src/transformers/models/dinov2/modeling_flax_dinov2.py Co-authored-by: amyeroberts <[email protected]> * Update src/transformers/models/dinov2/modeling_flax_dinov2.py Co-authored-by: amyeroberts <[email protected]> * Update src/transformers/models/dinov2/modeling_flax_dinov2.py Co-authored-by: amyeroberts <[email protected]> * Update src/transformers/models/dinov2/modeling_flax_dinov2.py Co-authored-by: amyeroberts <[email protected]> * Update src/transformers/models/dinov2/modeling_flax_dinov2.py Co-authored-by: amyeroberts <[email protected]> * Update src/transformers/models/dinov2/modeling_flax_dinov2.py Co-authored-by: amyeroberts <[email protected]> * Update src/transformers/models/dinov2/modeling_flax_dinov2.py Co-authored-by: amyeroberts <[email protected]> * Update src/transformers/models/dinov2/modeling_flax_dinov2.py Co-authored-by: amyeroberts <[email protected]> * Update src/transformers/models/dinov2/modeling_flax_dinov2.py Co-authored-by: amyeroberts <[email protected]> * Update src/transformers/models/dinov2/modeling_flax_dinov2.py Co-authored-by: amyeroberts <[email protected]> * Update src/transformers/models/dinov2/modeling_flax_dinov2.py Co-authored-by: amyeroberts <[email protected]> * Update src/transformers/models/dinov2/modeling_flax_dinov2.py Co-authored-by: amyeroberts <[email protected]> * comments removed * comment removed from the test * fixup * Update src/transformers/models/dinov2/modeling_flax_dinov2.py Co-authored-by: Sanchit Gandhi <[email protected]> * new fixes 1 * interpolate_pos_encoding function removed * droppath rng fixed, pretrained beit copied-from still not working * modeling_flax_dinov2.py reformatted * Update tests/models/dinov2/test_modeling_flax_dinov2.py Co-authored-by: Sanchit Gandhi <[email protected]> * added Copied from, to the tests * copied from statements removed from tests * fixed copied from statements in the tests * [run_slow] dinov2 --------- Co-authored-by: amyeroberts <[email protected]> Co-authored-by: Sanchit Gandhi <[email protected]>

* dac model * original dac works * add dac model * dac can be instatiated * add forward pass * load weights * all weights are used * convert checkpoint script ready * test * add feature extractor * up * make style * apply cookicutter * fix tests * iterate on FeatureExtractor * nit * update dac doc * replace nn.Sequential with nn.ModuleList * nit * apply review suggestions 1/2 * Update src/transformers/models/dac/modeling_dac.py Co-authored-by: Sanchit Gandhi <[email protected]> * up * apply review suggestions 2/2 * update padding in FeatureExtractor * apply review suggestions * iterate on design and tests * add integration tests * feature extractor tests * make style * all tests pass * make style * fixup * apply review suggestions * fix-copies * apply review suggestions * apply review suggestions * Update docs/source/en/model_doc/dac.md Co-authored-by: Yoach Lacombe <[email protected]> * Update docs/source/en/model_doc/dac.md Co-authored-by: Yoach Lacombe <[email protected]> * anticipate transfer weights to descript * up * make style * apply review suggestions * update slow test values * update slow tests * update test values * update with CI values * update with vorace values * update test with slice * make style --------- Co-authored-by: Sanchit Gandhi <[email protected]> Co-authored-by: Yoach Lacombe <[email protected]>

…face#32519) * enable * fix

* Add representation for Conv1D, for better output info. * code format for Conv1D * We add a __repr__ func for Conv1D, this allows the print (or output) of the model's info has a better description for Conv1D.

* Support save/load ckpt for XLA FSDP * Fix bug for save * Fix style * reserve sharded ckpt and better file naming * minor fix Co-authored-by: Zach Mueller <[email protected]> * add is_fsdp_xla_v1_enabled --------- Co-authored-by: Zach Mueller <[email protected]>

* fix: Parameterized norm freezing For the R18 model, the authors don't freeze norms in the backbone. * Update src/transformers/models/rt_detr/configuration_rt_detr.py Co-authored-by: Pavel Iakubovskii <[email protected]> --------- Co-authored-by: Pavel Iakubovskii <[email protected]>

* fix gguf config vocab size * minor fix * link issue

* fix mamba left padding * Apply suggestions from code review Co-authored-by: Pablo Montalvo <[email protected]> * fix copies * test with `inputs_embeds` * Update src/transformers/models/falcon_mamba/modeling_falcon_mamba.py Co-authored-by: Arthur <[email protected]> * copies * clairfy * fix last comments * remove --------- Co-authored-by: Pablo Montalvo <[email protected]> Co-authored-by: Arthur <[email protected]>

…uggingface#32694) * fix cache when using input embeddings * simplify check, we can always add input ids seq len since its 0 in first pass

Fixed whisper-large-v2 model link in docs.

* support head dim * fix the doc * fixup * add oproj Co-authored-by: Suhara <[email protected]>> * update Co-authored-by: bzantium <[email protected]> * Co-authored-by: suhara <[email protected]> * Update Co-authored-by: Yoshi Suhara <[email protected]> --------- Co-authored-by: bzantium <[email protected]> Co-authored-by: Yoshi Suhara <[email protected]>

* Update min version of accelerate to 0.26.0 * dev-ci * update min version in import * remove useless check * dev-ci * style * dev-ci * dev-ci

add nx

Co-authored-by: Gal Cohen <[email protected]>

* mamba2 uses norm_before_gate=False * small nit * remove norm_before_gate flag and follow False path only

…nsformer (huggingface#32903) Bump nltk in /examples/research_projects/decision_transformer Bumps [nltk](https://github.com/nltk/nltk) from 3.7 to 3.9. - [Changelog](https://github.com/nltk/nltk/blob/develop/ChangeLog) - [Commits](nltk/nltk@3.7...3.9) --- updated-dependencies: - dependency-name: nltk dependency-type: direct:production ... Signed-off-by: dependabot[bot] <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

…xport (huggingface#32887) * Replace .norm() with decomposed version for executorch export * [run_slow] clip

* link for optimizer names Add a note and link to where the user can find more optimizer names easily because there are many more optimizers than are mentioned in the docstring. * make fixup

* Update README.md * Update README.md * Add README_ar.md to i18n/README_de.md * Add README_ar.md to i18n/README_es.md * Add README_ar.md to i18n/README_fr.md * Add README_ar.md to i18n/README_hd.md * Add README_ar.md to i18n/README_ja.md * Add README_ar.md to i18n/README_ko.md * Add README_ar.md to i18n/README_pt-br.md * Add README_ar.md to i18n/README_ru.md * Add README_ar.md to i18n/README_te.md * Add README_ar.md to i18n/README_vi.md * Add README_ar.md to i18n/README_vi.md * Add README_ar.md to i18n/README_zh-hans.md * Add README_ar.md to i18n/README_zh-hant.md * Create README_ar.md

… when `return_timestamps` is not passed to `generate` function (huggingface#31296) [whisper] don't overwrite return_timestamps when not passed to generate

commit

* try test updates * a few more changes * a few more changes * a few more changes * [run slow] jamba * skip logits checks on older gpus * [run slow] jamba * oops * [run slow] jamba * Update tests/models/jamba/test_modeling_jamba.py Co-authored-by: amyeroberts <[email protected]> * Update tests/models/jamba/test_modeling_jamba.py Co-authored-by: amyeroberts <[email protected]> --------- Co-authored-by: amyeroberts <[email protected]>

…ngface#32891) Added missing huggingface_hub installation to workflows.

Co-authored-by: Gal Cohen <[email protected]>

* add 4bit optimizer * style * fix msg * style * add qgalore * Revert "add qgalore" This reverts commit 25278e8. * style * version check

* separate step to download nltk files * duplicated * rm comma

…1469) * Update hub.py * Update errors * Apply suggestions from code review Co-authored-by: Lucain <[email protected]> --------- Co-authored-by: Amy Roberts <[email protected]> Co-authored-by: Lucain <[email protected]>

* fix * >= 0.3.0 --------- Co-authored-by: ydshieh <[email protected]>

Do not call torch.repeat_interleave if expand_size is 1

…e#32908) * add chat_template to gguf tokenizer * add template through tokenizer config

…ainer` with `eval_on_start=True` in Jupyter Notebook. (huggingface#32849) fix: `AttributeError` raised when using `Trainer` with `eval_on_start=True` in Jupyter Notebook.

…1691 (huggingface#32921) fix save_pretrained

…on.md to Korean" (huggingface#32334) * docs: ko: tasks/knowledge_distillation_for_image_classification.md * feat: nmt draft * fix: manual edits * Apply suggestions from code review Co-authored-by: Chulhwa (Evan) Han <[email protected]> * Apply suggestions from code review Co-authored-by: Chulhwa (Evan) Han <[email protected]> * Apply suggestions from code review Co-authored-by: Ahnjj_DEV <[email protected]> * Apply suggestions from code review Co-authored-by: Ahnjj_DEV <[email protected]> * Apply suggestions from code review Co-authored-by: Ahnjj_DEV <[email protected]> * Apply suggestions from code review Co-authored-by: Chulhwa (Evan) Han <[email protected]> * Apply suggestions from code review Co-authored-by: Chulhwa (Evan) Han <[email protected]> * Apply suggestions from code review Co-authored-by: Chulhwa (Evan) Han <[email protected]> * Apply suggestions from code review * Apply suggestions from code review * Apply suggestions from code review * Apply suggestions from code review --------- Co-authored-by: Chulhwa (Evan) Han <[email protected]> Co-authored-by: Ahnjj_DEV <[email protected]>

…che=False` (huggingface#32863)

fix outdated link

…e() (huggingface#31292) * Add .float() in all generation methods logit outputs * Switch float-casting of logits to training only for main models * Add `num_logits_to_keep` in Llama and add it by default in generate * Apply style * Add num_logits_to_keep as arg in prepare_input_for_generation * Add support for Mistral * Revert models except llama and mistral * Fix default None value in _supports_num_logits_to_keep() * Fix dimension of dummy input * Add exception for prophetnet in _supports_num_logits_to_keep() * Update _supports_num_logits_to_keep() to use inspect.signature() * Add deprecation cycle + remove modification with pretraining_tp * Apply style * Add most used models * Apply style * Make `num_logits_to_keep` an int in all cases to remove if-else clause * Add compile check for the warning * Fix torch versions * style * Add gemma2 * Update warning version * Add comment about .float operations in generation utils * Add tests in GenerationTesterMixin and ModelTesterMixin * Fix batch size for assisted decoding in tests * fix small issues in test * refacor test * fix slicing removing dim issue * Add nemotron support (should fix check-copy issue in CIs) * Trigger new CIs * Trigger new CIs * Bump version * Bump version in TODO * Trigger CIs * remove blank space * Trigger CIs

…eprecations in `generate`-related code 🧹 (huggingface#32659) Co-authored-by: amyeroberts <[email protected]>

…uggingface#32860) * add liger integration * fix syntax * fix import issue * add trainer.md * Use _apply_liger_kernel() * Fixed log message * Update docs/source/en/trainer.md Co-authored-by: Marc Sun <[email protected]> * Update docs/source/en/trainer.md Co-authored-by: Marc Sun <[email protected]> * Update src/transformers/training_args.py Co-authored-by: Byron Hsu <[email protected]> * Update src/transformers/trainer.py Co-authored-by: Marc Sun <[email protected]> * Update src/transformers/training_args.py Co-authored-by: Byron Hsu <[email protected]> * Update docs/source/en/trainer.md Co-authored-by: Byron Hsu <[email protected]> * Fixed checkstyle and updated readme * Added test * Fixed checkstyle * fix docstring * rename use_liger to use_liger_kernel * Trigger Build * Added test * add fix-copies * Fixed copy inconsistencies --------- Co-authored-by: shimizust <[email protected]> Co-authored-by: Steven Shimizu <[email protected]> Co-authored-by: Marc Sun <[email protected]> Co-authored-by: Byron Hsu <[email protected]>

…ce#32684) * Add new Jinja features: - Do extension - Break/continue in loops - Call strftime to get current datetime in any format * Add new Jinja features: - Do extension - Break/continue in loops - Call strftime to get current datetime in any format * Fix strftime template * Add template strip() just to be safe * Remove the do extension to make porting easier, and also because it's the least useful * Rename test * strftime -> strftime_now * Split test * Update test to use strftime_now * Refactor everything out into chat_template_utils * Refactor everything out into chat_template_utils * Refactor everything out into chat_template_utils * Refactor everything out into chat_template_utils * Refactor everything out into chat_template_utils

huggingface#32910) * Update modeling_deformable_detr.py * Update src/transformers/models/deformable_detr/modeling_deformable_detr.py Co-authored-by: amyeroberts <[email protected]> * Update ms_deform_attn_cuda.cu * Update modeling_deformable_detr.py * Update modeling_deformable_detr.py * [empty] this is a empty commit --------- Co-authored-by: amyeroberts <[email protected]>

* added doctring to SchedulerType class * Remove trailing whitespace src/transformers/trainer_utils.py Co-authored-by: Steven Liu <[email protected]> * fixup --------- Co-authored-by: Steven Liu <[email protected]>

…#33097)

…n with Chameleon & its finetunes like Anole

Commits on Sep 3, 2024

Fix issues in PR huggingface#32013

YeLuoSuiYou committed Sep 3, 2024

Configuration menu

View commit details

Copy full SHA for 7607e4c

Browse repository at this point

Copy the full SHA

7607e4c View commit details

Browse the repository at this point in the history

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix pr 32013 #2

Fix pr 32013 #2

Commits on Jul 24, 2024

Commits on Jul 25, 2024

Commits on Jul 26, 2024

Commits on Jul 27, 2024

Commits on Jul 29, 2024

Commits on Jul 30, 2024

Commits on Jul 31, 2024

Commits on Aug 1, 2024

Commits on Aug 2, 2024

Commits on Aug 3, 2024

Commits on Aug 5, 2024

Commits on Aug 6, 2024

Commits on Aug 7, 2024

Commits on Aug 8, 2024

Commits on Aug 9, 2024

Commits on Aug 12, 2024

Commits on Aug 13, 2024

Commits on Aug 14, 2024

Commits on Aug 15, 2024

Commits on Aug 16, 2024

Commits on Aug 17, 2024

Commits on Aug 19, 2024

Commits on Aug 20, 2024

Commits on Aug 21, 2024

Commits on Aug 22, 2024

Commits on Aug 23, 2024

Commits on Aug 24, 2024

Commits on Aug 25, 2024

Commits on Sep 3, 2024