ValueError: OLMoForCausalLM does not support Flash Attention 2.0 yet #29145 #1

KaifAhmad1 · 2024-02-20T14:49:31Z

What does this PR do?

* add config, modeling, and tokenization * add auto and init * update readme * update readme * update team name * fixup * fixup * update config * update code style * update for fixup * update for fixup * update for fixup * update for testing * update for testing * fix bug for config and tokenization * fix bug for bos token * not doctest * debug tokenizer * not doctest * debug tokenization * debug init for tokenizer * fix style * update init * delete if in token auto * add tokenizer doc * add tokenizer in init * Update dummy_tokenizers_objects.py * update * update * debug * Update tokenization_qwen2.py * debug * Update convert_slow_tokenizer.py * add copies * add copied from and make style * update files map * update test * fix style * fix merge reading and update tests * fix tests * fix tests * fix style * debug a variable in readme * Update src/transformers/models/qwen2/configuration_qwen2.py Co-authored-by: Arthur <[email protected]> * update test and copied from * fix style * update qwen2 tokenization and tests * Update tokenization_qwen2.py * delete the copied from after property * fix style * update tests * update tests * add copied from * fix bugs * update doc * add warning for sliding window attention * update qwen2 tokenization * fix style * Update src/transformers/models/qwen2/modeling_qwen2.py Co-authored-by: Arthur <[email protected]> * fix tokenizer fast --------- Co-authored-by: Ren Xuancheng <[email protected]> Co-authored-by: renxuancheng.rxc <[email protected]> Co-authored-by: Arthur <[email protected]>

* skip bf16 test if not supported by device * fix * fix bis * use is_torch_bf16_available_on_device * use is_torch_fp16_available_on_device * fix & use public llama * use 1b model * fix flacky test --------- Co-authored-by: Your Name <[email protected]>

@LZHgrla

I want to train dinov2 with bf16 but I get the following error in https://github.com/huggingface/transformers/blob/bc72b4e2cdcbc80d5f56731f35dbc9c18b4c8de6/src/transformers/models/dinov2/modeling_dinov2.py#L635: ``` RuntimeError: Input type (float) and bias type (c10::BFloat16) should be the same ``` Since the input dtype is torch.float32, the parameter dtype has to be torch.float32... @LZHgrla and I checked the code of clip vision encoder and found there is an automatic dtype transformation (https://github.com/huggingface/transformers/blob/bc72b4e2cdcbc80d5f56731f35dbc9c18b4c8de6/src/transformers/models/clip/modeling_clip.py#L181-L182). So I add similar automatic dtype transformation to modeling_dinov2.py.

Fix sparse_step = 1 I case sparse_step = 1, the current code will not work.

* save processor * Update tests/models/auto/test_processor_auto.py Co-authored-by: Arthur <[email protected]> * Update tests/test_processing_common.py Co-authored-by: Arthur <[email protected]> * fix --------- Co-authored-by: ydshieh <[email protected]> Co-authored-by: Arthur <[email protected]>

* fix * fix * fix --------- Co-authored-by: ydshieh <[email protected]>

* fix * last attempt * current work * fix forward compatibility * save all special tokens * current state * revert additional changes * updates * remove tokenizer.model * add a test and the fix * nit * revert one more break * fix typefield issue * quality * more tests * fix fields for FC * more nits? * new additional changes * how * some updates * the fix * where do we stand * nits * nits * revert unrelated changes * nits nits nits * styling * don't break llama just yet * revert llama changes * safe arg check * fixup * Add a test for T5 * Necessary changes * Tests passing, added tokens need to not be normalized. If the added tokens are normalized, it will the stripping which seems to be unwanted for a normal functioning * Add even more tests, when normalization is set to True (which does not work 😓 ) * Add even more tests, when normalization is set to True (which does not work 😓 ) * Update to main * nits * fmt * more and more test * comments * revert change as tests are failing * make the test more readble * nits * refactor the test * nit * updates * simplify * style * style * style convert slow * Update src/transformers/convert_slow_tokenizer.py

* first commit * correct default value non causal * update config and modeling code * update converting checkpoint * clean modeling and fix tests * make style * add new config parameters to docstring * fix copied from statements * Apply suggestions from code review Co-authored-by: Sanchit Gandhi <[email protected]> * make position_embeddings_type docstrings clearer * clean converting script * remove function not used * clean modeling file * apply suggestion for test file + add convert script to not_doctested * modify tests according to review - cleaner logic and more tests * Apply nit suggestions from code review Co-authored-by: amyeroberts <[email protected]> * add checker of valid position embeddings type * instantiate new layer norm layer with the right eps * fix freeze_feature_encoder since it can be None in some cases * add test same output in convert script * restore wav2vec2conformer and add new model * create processor and FE + clean * add new model code * fix convert script and set default config parameters * correct model id paths * make style * make fix-copies and cleaning files * fix copied from statements * complete .md and fixe copies * clean convert script argument defaults * fix config parameters docstrings * fix config docstring * add copied from and enrich FE tests * fix copied from and repo-consistency * add autotokenizer * make test input length shorter and change docstring code * fix docstrings and copied from * add add_adapter to ASR training example * make testing of adapters more robust * adapt to multi adapter layers * refactor input_values->input_features and remove w2v2-bert feature extractor * remove pretraining model * remove depreciated features and useless lines * add copied from and ignore statements to modeling tests * remove pretraining model #2 * change import in convert script * change default in convert script * update readme and remove useless line * Update tests/models/wav2vec2_bert/test_processor_wav2vec2_bert.py Co-authored-by: amyeroberts <[email protected]> * refactor BERT to Bert for consistency * remove useless ignore copy statement * add persistent to buffer in rotary * add eps in LayerNorm init and remove copied from * add adapter activation parameters and add copied from statements * Fix copied statements and add unitest.skip reasons * add copied statement in test_processor * refactor processor * make style * replace numpy random by torch rand * remove expected output CTC * improve converting script with processor class * Apply suggestions from code review Co-authored-by: amyeroberts <[email protected]> * remove gumbel class * remove tests related to previously deleted class * Update src/transformers/models/wav2vec2_bert/configuration_wav2vec2_bert.py Co-authored-by: amyeroberts <[email protected]> * correct typos * remove uused parameters * update processor to takes both text and audio * update checkpoints * update expected output and add ctc expected output * add label_attention_mask * replace pt with np in processor tests * fix typo * revert to behaviour with labels_attention_mask --------- Co-authored-by: Sanchit Gandhi <[email protected]> Co-authored-by: amyeroberts <[email protected]>

* inside with LoggingLevel * remove is_flaky --------- Co-authored-by: ydshieh <[email protected]>

* Fix the documentation checkpoint for xlm-roberta-xl * Improve docstring consistency

… init method (#28486) * add image processor arg * super * rm args

* move token ids to cpu * check for torch attr

* fix * tests * fix test

* add w2v2bert compatibility * Update examples/pytorch/speech-recognition/run_speech_recognition_ctc.py Co-authored-by: amyeroberts <[email protected]> --------- Co-authored-by: amyeroberts <[email protected]>

…ute (#28584) * not save if empty * fix * fix * fix * fix * fix --------- Co-authored-by: ydshieh <[email protected]>

* generalize asr pipeline to fbank models * change w2v2 pipeline output * Update test_pipelines_automatic_speech_recognition.py

* finalize * make fix copies whisper * [Tests] Make sure that we don't run tests mulitple times * Update src/transformers/models/whisper/modeling_whisper.py * [Tests] Make sure that we don't run tests mulitple times * fix more * improve * improve * improve further * improve more * improve * fix more * git commit and git push * fix more * fix more * fix more * New try * Fix more whisper stuff * Improve * correct more * correct more * correct more * Fix some tests * Add more tests * correct more * correct more * correct more * push * correct more * Fix more * Better * without dec mask * correct more * clean * save intermediate * Fix more * Fix VAD for large-v2 * Save new * Correct more * make cleaner * correct tests * correct src * Finish * Fix more * Fix more * finish * Fix edge cases * fix return_dict_in_generate * fix all tests * make style * add docstrings * add docstrings * Fix logit processor * make style * fix pipeline test * fix more style * Apply suggestions from code review * apply feedback Sanchit * correct more * Apply suggestions from code review Co-authored-by: Sanchit Gandhi <[email protected]> * Apply suggestions from code review Co-authored-by: Joao Gante <[email protected]> Co-authored-by: Sanchit Gandhi <[email protected]> * correct more * correct more * correct more * Fix staticmethod * correct more * fix * fix slow tests * make style * fix tokenizer test * fix tokenizer test * Apply suggestions from code review Co-authored-by: Arthur <[email protected]> * finish * finish * revert kwargs change --------- Co-authored-by: Sanchit Gandhi <[email protected]> Co-authored-by: Joao Gante <[email protected]> Co-authored-by: Arthur <[email protected]>

* remove elif xpu * remove redudant code

First draft

* Update convert_llava_weights_to_hf.py Fix call to `tokenizer.add_tokens` * Add special_tokens to tokenizer.add_tokens in convert_vipllava_weights_to_hf.py

* Allow non-special tokens to be added * Add test, fix token adding code * Revert changes to id_to_token and token_to_id * Update the ESM tokenizer to be a bit more standardized * Update src/transformers/models/esm/tokenization_esm.py Co-authored-by: Arthur <[email protected]> --------- Co-authored-by: Arthur <[email protected]>

…_key_values` (#28600)

* [DETA] fix freeze/unfreeze function * Update src/transformers/models/deta/modeling_deta.py Co-authored-by: Arthur <[email protected]> * Update src/transformers/models/deta/modeling_deta.py Co-authored-by: Arthur <[email protected]> * add freeze/unfreeze test case in DETA * fix type * fix typo 2 * fix : enable aux and enc loss in training pipeline * Add unsynced variables from original DETA for training * modification for passing CI test * make style * make fix * manual make fix * change deta_modeling_test of configuration 'two_stage' default to TRUE and minor change of dist checking * remove print * divide configuration in DetaModel and DetaForObjectDetection * image smaller size than 224 will give topk error * pred_boxes and logits should be equivalent to two_stage_num_proposals * add missing part in DetaConfig * Update src/transformers/models/deta/modeling_deta.py Co-authored-by: amyeroberts <[email protected]> * add docstring in configure and prettify TO DO part * change distribute related code to accelerate * Update src/transformers/models/deta/configuration_deta.py Co-authored-by: amyeroberts <[email protected]> * Update tests/models/deta/test_modeling_deta.py Co-authored-by: amyeroberts <[email protected]> * protect importing accelerate * change variable name to specific value * wrong import * fix aux_loss in conditional_detr * add test aux_loss * add aux_loss test in deta and table_transformer * fix yolos since it doesn't have auxiliary function * fix maskformer auxiliary_loss related code * make style * change param 'auxiliary_loss' to 'use_auxiliary_loss' * change param 'auxiliary_loss' to 'use_auxiliary_loss' in tests * make style & fix-copies, also revert yolos related parameter * revert variable name 'use_auxiliary_loss' to 'auxiliary_loss' due to DetrConfig * revert variable name in yolos * revert maskformer * add aux_loss test in maskformer * make style * Update src/transformers/models/yolos/configuration_yolos.py Co-authored-by: amyeroberts <[email protected]> --------- Co-authored-by: Arthur <[email protected]> Co-authored-by: amyeroberts <[email protected]>

* fix dtype issue * add a test * update copied from mentions * nits * fixup * fix copies * Apply suggestions from code review

Fix missing bbox in LayoutLM signature

* FIx trainer test * Update tests/trainer/test_trainer.py Co-authored-by: amyeroberts <[email protected]> --------- Co-authored-by: amyeroberts <[email protected]>

* Add chat support to text generation pipeline * Better handling of single elements * Deprecate ConversationalPipeline * stash commit * Add missing add_special_tokens kwarg * Update chat templating docs to refer to TextGenerationPipeline instead of ConversationalPipeline * Add ✨TF✨ tests * @require_tf * Add type hint * Add specific deprecation version * Remove unnecessary do_sample * Remove todo - the discrepancy has been resolved * Update src/transformers/tokenization_utils_base.py Co-authored-by: amyeroberts <[email protected]> * Update src/transformers/pipelines/text_generation.py Co-authored-by: amyeroberts <[email protected]> --------- Co-authored-by: amyeroberts <[email protected]>

* Add task_summary to es/_toctree.yml * Add task_summary.md to docs/es * Change title of task_summary.md * Translate firsts paragraphs * Translate middle paragraphs * Translte the rest of the doc * Edit firts paragraph

* add peft support for AWQ * Update src/transformers/quantizers/quantizer_awq.py Co-authored-by: amyeroberts <[email protected]> * fix --------- Co-authored-by: amyeroberts <[email protected]>

Update test_mixed_int8.py

The link in evaluation was missing a hyphen between post and processing. I fixed this, for English only. Someone with the ability to do a global search/replace should fix the other languages (if indeed they have this issue)/

Fix test

…n.py` (#29072)

* change version * nuke * this doesn't make sense * update some requirements.py * revert + no main * nits * change cache number * more pin * revert --------- Co-authored-by: ydshieh <[email protected]>

* Add resource * Add more resources * Add resources * Apply suggestions from code review Co-authored-by: amyeroberts <[email protected]> * Remove mention * Remove pipeline tags --------- Co-authored-by: amyeroberts <[email protected]>

@gante

output_logits option behaves like output_scores, but returns the raw, unprocessed prediction logit scores, ie. the values before they undergo logit processing and/or warping. The latter happens by default for the regular output scores. It's useful to have the unprocessed logit scores in certain circumstances. For example, unprocessed logit scores are very useful with causallm models when one wants to determine the probability of a certain answer, e.g. when asking a question with a yes/no answer. In that case getting the next-token probabilities of both "yes" and "no" (and/or their relative ratio) is of interest for classification. The reason for getting these _before_ logit processing and/or warping is b/c a) that can change the probabilities or b) reject the tokens of interest / reduce the number of tokens to just 1. For an example use-case see paper TabLLM: Few-shot Classification of Tabular Data with Large Language Models by Stefan Hegselmann, Alejandro Buendia, Hunter Lang, Monica Agrawal, Xiaoyi Jiang, and David Sontag. https://arxiv.org/abs/2210.10723 In addition: - added dedicated unit test: tests/generation/test_utils/test_return_unprocessed_logit_scores which tests return of logics with output_logits=True in generation. - set output_logits=True in all other generation unit tests, that also have output_scores=True. Implemented @gante's and @amyeroberts review feedback Co-authored-by: kx79wq <[email protected]>

* generated text on A10G * generated text in CI * Apply suggestions from code review add explanatory comments Co-authored-by: Younes Belkada <[email protected]> --------- Co-authored-by: Younes Belkada <[email protected]>

…ers()`'s docstring (#29102) * Update base.py * Fix a typo

* report grad_norm during training * support getting grad_norm from deepspeed

* Fixed nll with label_smoothing to nll * Resolved conflict by rebase * Fixed nll with label_smoothing to nll * Resolved conflict by rebase * Added label_smoothing to config file * Fixed nits

* default to use it * style

Move misplaced line, improve code comment

…ner` (#29082) * add RMSProp to Trainer * revert some change * Update src/transformers/trainer.py Co-authored-by: amyeroberts <[email protected]> --------- Co-authored-by: amyeroberts <[email protected]>

* abstract image processor arg checks. * fix signatures and quality * add validate_ method to rescale-prone processors * add more validations * quality * quality * fix formatting Co-authored-by: amyeroberts <[email protected]> * fix formatting Co-authored-by: amyeroberts <[email protected]> * fix formatting Co-authored-by: amyeroberts <[email protected]> * Fix formatting mishap Co-authored-by: amyeroberts <[email protected]> * fix crop_size compatibility * fix default mutable arg * fix segmentation map + image arg validity * remove segmentation check from arg validation * fix quality * fix missing segmap * protect PILImageResampling type * Apply suggestions from code review Co-authored-by: amyeroberts <[email protected]> * add back segmentation maps check --------- Co-authored-by: amyeroberts <[email protected]>

#29122) * forgot to push the changes for 4bit .. * trigger CI

)

* only compile when needed * fix mra as well * fix yoso as well * update * rempve comment * Update src/transformers/models/deformable_detr/modeling_deformable_detr.py * Update src/transformers/models/deformable_detr/modeling_deformable_detr.py * opps * Update src/transformers/models/deta/modeling_deta.py * nit

…els (#29055) * handle peft + compiled models * add tests * fixup * adapt from suggestions * clarify comment

…test issues (#28010) * add add_dummy_prefix_space option to slow * checking kwargs might be better. Should be there for all spm tokenizer IMO * nits * fix copies * more copied * nits * add prefix space * nit * nits * Update src/transformers/convert_slow_tokenizer.py * fix inti * revert wrong styling * fix * nits * style * updates * make sure we use slow tokenizer for conversion instead of looking for the decoder * support llama ast well * update llama tokenizer fast * nits * nits nits nits * update the doc * update * update to fix tests * skip unrelated tailing test * Update src/transformers/convert_slow_tokenizer.py * add proper testing * test decode as well * more testing * format * fix llama test * Apply suggestions from code review

* Revert "Add tie_weights() to LM heads and set bias in set_output_embeddings() (#28948)" This reverts commit 725f4ad. * Revert "Patch to skip failing `test_save_load_low_cpu_mem_usage` tests (#29043)" This reverts commit 4156f51.

…text example (#29070) * add support for siglip and chinese-clip model training with contrastive-image-text example * codebase fixups

nice job Co-authored-by: ydshieh <[email protected]>

JustinLin610 and others added 30 commits January 17, 2024 16:02

Fix SDPA tests (#28552)

2c1eebc

* skip bf16 test if not supported by device * fix * fix bis * use is_torch_bf16_available_on_device * use is_torch_fp16_available_on_device * fix & use public llama * use 1b model * fix flacky test --------- Co-authored-by: Your Name <[email protected]>

Fix Switch Transformers When sparse_step = 1 (#28564)

98dda8e

Fix sparse_step = 1 I case sparse_step = 1, the current code will not work.

Use weights_only only if torch >= 1.13 (#28506)

a1668cc

* fix * fix * fix --------- Co-authored-by: ydshieh <[email protected]>

chore: Fix multiple typos (#28574)

5d8eb93

Use LoggingLevel context manager in 3 tests (#28575)

0754217

* inside with LoggingLevel * remove is_flaky --------- Co-authored-by: ydshieh <[email protected]>

Fix the documentation checkpoint for xlm-roberta-xl (#28567)

c662c78

* Fix the documentation checkpoint for xlm-roberta-xl * Improve docstring consistency

[ASR Pipe] Update init to set model type and subsequently call parent…

0eaa5ea

… init method (#28486) * add image processor arg * super * rm args

[Whisper Tok] Move token ids to CPU when computing offsets (#28485)

619ecfe

* move token ids to cpu * check for torch attr

[Whisper] Fix audio classification with weighted layer sum (#28563)

186aa6b

* fix * tests * fix test

Making CTC training example more general (#28582)

772307b

* add w2v2bert compatibility * Update examples/pytorch/speech-recognition/run_speech_recognition_ctc.py Co-authored-by: amyeroberts <[email protected]> --------- Co-authored-by: amyeroberts <[email protected]>

Don't save processor_config.json if a processor has no extra attrib…

db9a7e9

…ute (#28584) * not save if empty * fix * fix * fix * fix * fix --------- Co-authored-by: ydshieh <[email protected]>

v4.38.dev.0

b2748a6

Add w2v2bert to pipeline (#28585)

268fc1f

* generalize asr pipeline to fbank models * change w2v2 pipeline output * Update test_pipelines_automatic_speech_recognition.py

feat: Sequential beam search (#26304)

d4fc1eb

Fix wrong xpu device in DistributedType.MULTI_XPU mode (#28386)

8db6436

* remove elif xpu * remove redudant code

[SigLIP] Don't pad by default (#28578)

faf0354

First draft

[Llava] Fix convert_llava_weights_to_hf.py script (#28570)

5b7f4bc

* Update convert_llava_weights_to_hf.py Fix call to `tokenizer.add_tokens` * Add special_tokens to tokenizer.add_tokens in convert_vipllava_weights_to_hf.py

Fix _speculative_sampling implementation (#28508)

9efec11

RWKV: raise informative exception when attempting to manipulate `past…

948ffff

…_key_values` (#28600)

[GPTNeoX] Fix BC issue with 4.36 (#28602)

83f9196

* fix dtype issue * add a test * update copied from mentions * nits * fixup * fix copies * Apply suggestions from code review

Fix id2label assignment in run_classification.py (#28590)

f0acf7b

Add missing key to TFLayoutLM signature (#28640)

bf67415

Fix missing bbox in LayoutLM signature

muellerzr and others added 29 commits February 16, 2024 10:04

Fix trainer test wrt DeepSpeed + auto_find_bs (#29061)

636b032

* FIx trainer test * Update tests/trainer/test_trainer.py Co-authored-by: amyeroberts <[email protected]> --------- Co-authored-by: amyeroberts <[email protected]>

[Docs] Spanish translation of task_summary.md (#28844)

ce4fff0

* Add task_summary to es/_toctree.yml * Add task_summary.md to docs/es * Change title of task_summary.md * Translate firsts paragraphs * Translate middle paragraphs * Translte the rest of the doc * Edit firts paragraph

[Awq] Add peft support for AWQ (#28987)

864c8e6

* add peft support for AWQ * Update src/transformers/quantizers/quantizer_awq.py Co-authored-by: amyeroberts <[email protected]> * fix --------- Co-authored-by: amyeroberts <[email protected]>

FIX [bnb / tests]: Fix currently failing bnb tests (#29092)

a75a6c9

Update test_mixed_int8.py

fix the post-processing link (#29091)

593230f

The link in evaluation was missing a hyphen between post and processing. I fixed this, for English only. Someone with the ability to do a global search/replace should fix the other languages (if indeed they have this issue)/

Fix the bert-base-cased tokenizer configuration test (#29105)

9830858

Fix test

Fix a typo in `examples/pytorch/text-classification/run_classificatio…

79132d4

…n.py` (#29072)

change version (#29097)

b2724d7

* change version * nuke * this doesn't make sense * update some requirements.py * revert + no main * nits * change cache number * more pin * revert --------- Co-authored-by: ydshieh <[email protected]>

[Docs] Add resources (#28705)

07e3454

* Add resource * Add more resources * Add resources * Apply suggestions from code review Co-authored-by: amyeroberts <[email protected]> * Remove mention * Remove pipeline tags --------- Co-authored-by: amyeroberts <[email protected]>

Bnb test fix for different hardwares (#29066)

5ce90f3

* generated text on A10G * generated text in CI * Apply suggestions from code review add explanatory comments Co-authored-by: Younes Belkada <[email protected]> --------- Co-authored-by: Younes Belkada <[email protected]>

Fix two tiny typos in `pipelines/base.py::Pipeline::_sanitize_paramet…

a4851d9

…ers()`'s docstring (#29102) * Update base.py * Fix a typo

storing & logging gradient norm in trainer (#27326)

4f09d0f

* report grad_norm during training * support getting grad_norm from deepspeed

Fixed nll with label_smoothing to just nll (#28708)

49c0b29

* Fixed nll with label_smoothing to nll * Resolved conflict by rebase * Fixed nll with label_smoothing to nll * Resolved conflict by rebase * Added label_smoothing to config file * Fixed nits

[gradient_checkpointing] default to use it for torch 2.3 (#28538)

9094abe

* default to use it * style

Move misplaced line (#29117)

a7ff2f2

Move misplaced line, improve code comment

FEAT [Trainer / bnb]: Add RMSProp from bitsandbytes to HF `Trai…

f7ef7ce

…ner` (#29082) * add RMSProp to Trainer * revert some change * Update src/transformers/trainer.py Co-authored-by: amyeroberts <[email protected]> --------- Co-authored-by: amyeroberts <[email protected]>

FIX [bnb / tests] Propagate the changes from #29092 to 4-bit tests (

ff76e7c

#29122) * forgot to push the changes for 4bit .. * trigger CI

Llama: fix batched generation (#29109)

7d312ad

Generate: unset GenerationConfig parameters do not raise warning (#29119

a7755d2

)

FIX [PEFT / Trainer ] Handle better peft + quantized compiled mod…

efdd436

…els (#29055) * handle peft + compiled models * add tests * fixup * adapt from suggestions * clarify comment

Revert low cpu mem tie weights (#29135)

0996a10

* Revert "Add tie_weights() to LM heads and set bias in set_output_embeddings() (#28948)" This reverts commit 725f4ad. * Revert "Patch to skip failing `test_save_load_low_cpu_mem_usage` tests (#29043)" This reverts commit 4156f51.

Add support for fine-tuning CLIP-like models using contrastive-image-…

ee3af60

…text example (#29070) * add support for siglip and chinese-clip model training with contrastive-image-text example * codebase fixups

Save (circleci) cache at the end of a job (#29141)

7688d8d

nice job Co-authored-by: ydshieh <[email protected]>

[Phi] Add support for sdpa (#29108)

b8b1647

KaifAhmad1 merged commit f432dee into KaifAhmad1:main Feb 20, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ValueError: OLMoForCausalLM does not support Flash Attention 2.0 yet #29145 #1

ValueError: OLMoForCausalLM does not support Flash Attention 2.0 yet #29145 #1

KaifAhmad1 commented Feb 20, 2024 •

edited

Loading

ValueError: OLMoForCausalLM does not support Flash Attention 2.0 yet #29145 #1

ValueError: OLMoForCausalLM does not support Flash Attention 2.0 yet #29145 #1

Conversation

KaifAhmad1 commented Feb 20, 2024 • edited Loading

What does this PR do?

KaifAhmad1 commented Feb 20, 2024 •

edited

Loading