Releases · huggingface/optimum

06 Dec 10:34

fxmarty

v1.15.0

8eaf54c

ROCMExecutionProvider support

The Optimum ONNX Runtime integration is extended to officially support ROCMExecutionProvider. See more details in the documentation.

Add AMD GPU support by @mht-sharma in #1546
Update ROCM ORT doc by @mht-sharma in #1564

Extended ONNX export

The Swin2sr, DPT, GLPN, ConvNextv2 are now supported in the ONNX export.

Swin2sr onnx by @baskrahmer in #1492
Add depth-estimation w/ DPT+GLPN by @xenova in #1529
Add convnextv2 onnx export by @xenova in #1560

What's Changed

Add OV export CLI to README by @echarlaix in #1526
Refactor NormalizedConfigs for GQA by @michaelbenayoun in #1539
Fix model patcher ONNX decoder export by @fxmarty in #1547
Add AMD to the documentation main page by @mfuntowicz in #1540
Add Optimum-amd documentation to the PR & release doc by @fxmarty in #1562
Add amd documentation by @echarlaix in #1557
Remove delete_doc_comment workflows by @regisss in #1565
optimum-nvidia by @mfuntowicz in #1566
Update installation instructions in README by @echarlaix in #1568
Update doc for AMD by @mht-sharma in #1570
Add amd extra to setup.py by @echarlaix in #1567

New Contributors

@xenova made their first contribution in #1529

Full Changelog: v1.14.0...v1.15.0

Contributors

mfuntowicz, fxmarty, and 6 other contributors

Assets 2

14 Nov 17:50

echarlaix

v1.14.1

f837dcd

v1.14.1: Patch release

Update optimum-intel required version by @echarlaix in #1521
Swin2sr onnx by @baskrahmer in #1492
Fix Falcon ONNX export with alibi by @fxmarty in #1524
Fix whisper v3 ONNX export by @fxmarty in #1525
Add new fusion argument to fix compatibility with onnxruntime v1.16.2 by @echarlaix in #1535
Add depth-estimation w/ DPT+GLPN by @xenova in #1529

Contributors

fxmarty, baskrahmer, and 2 other contributors

Assets 2

07 Nov 13:54

echarlaix

v1.14.0

076ecce

v1.14.0: LCMs, SpeechT5, Falcon, Mistral, decoder refactorization

ONNX

New architectures

Falcon

Add ONNX and ORT support for Falcon by @fxmarty in #1391

SpeechT5

SpeechT5 ONNX support by @fxmarty in #1404

Mistral

Add Mistral models ONNX export support by @echarlaix in #1425

TrOCR

Enable KV cache support by @fxmarty in #1456

LCMs

Enable LCMs (available in in diffusers since v0.22.0) ONNX export and ORT inference by @echarlaix in #1469

from optimum.onnxruntime import ORTLatentConsistencyModelPipeline

pipe = ORTLatentConsistencyModelPipeline.from_pretrained("SimianLuo/LCM_Dreamshaper_v7", export=True)
prompt = "sailing ship in storm by Leonardo da Vinci"
images = pipe(prompt=prompt, num_inference_steps=4, guidance_scale=8.0).images

Also enable ONNX export using the CLI :

optimum-cli export onnx --model SimianLuo/LCM_Dreamshaper_v7 lcm_onnx/

Decoder refactorization

Add position ids as input during ONNX export by @fxmarty in #1381
Enable the export of only one decoder for decoder-only models by @echarlaix in #1257

GPTQ

Enable possibility to choose exllamav2 kernels for GPTQ models by @SunMarc in #1419
Disable exllamav2 for quantization by @SunMarc in #1482
Default to exllama when exllamav2 is disabled by @SunMarc in #1494
Added cache_block_outputs parameter to handle models with non-regular structure such as ChatGLM by @AlexKoff88 in #1479
Add support for CPU Inference by @vivekkhandelwal1 in #1496
Fix minimum version of auto-gptq by @fxmarty in #1504
switch to exllama_config instead of disabling exllamav2 by @SunMarc in #1505

Other changes and bugfixes

Fix wrong dtype in the ONNX export by @fxmarty in #1369
Add support for loading quantization from config by @aarnphm #1363
Guard multiprocessing set start method by @fxmarty in #1377
Do not output KV cache when not using with-past in the ONNX export by @fxmarty in #1358
Fix provider availability check on ORT 1.16.0 release by @fxmarty in #1403
Fix quantization for onnxruntime v1.16.0 by @echarlaix in #1405
Fix normalized config key for models architecture by @echarlaix in #1408
Fix arg in bettertransformer llama attention by @SunMarc in #1421
Ignore .xml files for Stable Diffusion ORT downloads by @baskrahmer in #1428
Falcon BetterTransformer requires transformers>=4.34 by @fxmarty in #1431
Fix llama ONNX export by @fxmarty in #1432
Update attention.py by @DongHande in #1416
Remove SharedDDP as it was deprecated from Transformers by @AdamLouly in #1443
Fix owlvit task detection by @fxmarty in #1453
Improve ONNX quantization doc by @fxmarty in #1451
Fix perceiver tests and dummy inputs for ONNX by @baskrahmer in #1449
Disable bart onnx export for text-classification and question-answering by @fxmarty in #1457
Fix ONNX exporter library_name by @baskrahmer in #1460
[ORT Training] Some important updates of ONNX Runtime training APIs by @JingyaHuang in #1335
Fix typo in BetterTransformer CLIP by @fxmarty in #1468
Fix custom architecture detection in onnx export by @fxmarty in #1472
Fix whisper export by @mht-sharma in #1503
Update Transformers dependency for Habana extra by @regisss in #1508
Fix argument error by @ranchlai in #1501
Remove attention mask patching by @fxmarty in #1509
Fix generation input by @echarlaix in #1512
Fix tests ORTModel by @fxmarty in #1517
Fix BT on transformers 4.35 release by @fxmarty in #1518

New Contributors

@aarnphm made their first contribution in #1363
@DongHande made their first contribution in #1416
@AlexKoff88 made their first contribution in #1479
@vivekkhandelwal1 made their first contribution in #1496
@ranchlai made their first contribution in #1501

Contributors

ranchlai, fxmarty, and 11 other contributors

Assets 2

03 Nov 19:13

fxmarty

v1.13.3

2e8308e

v1.13.3: Patch release

Patch release for transformers==4.34.1 compatibility. We will do a release next week for transformers==4.35 compatibility and new features. Please bear with us!

Falcon BetterTransformer requires transformers>=4.34 by @fxmarty #1431
Fix arg in bettertransformer llama attention by @SunMarc #1421
Update Transformers dependency for Habana extra by @regisss #1508
temporarily pin to transformers<4.35 by @fxmarty 6169310

Contributors

fxmarty, regisss, and SunMarc

Assets 2

21 Sep 18:33

echarlaix

v1.13.2

f105046

v1.13.2: Patch release

Fix provider availability check on ORT 1.16.0 release by @fxmarty in #1403
Fix ONNX Runtime quantization compatibility for onnxruntime v1.16.0 by @echarlaix in #1405

Contributors

fxmarty and echarlaix

Assets 2

08 Sep 15:57

fxmarty

v1.13.1

1446e53

v1.13.1: Patch release

Fix ONNX fp16 export that broke in 1.13.0.

What's Changed

Fix wrong dtype in the ONNX export by @fxmarty in #1369
Fix tests collection for TFLite export and trigger TFLite tests only when relevant by @fxmarty in #1368
upgrade min compatible optimum-intel version by @echarlaix in #1371
Fix fp16 ONNX export test by @fxmarty in #1373

Contributors

fxmarty and echarlaix

Assets 2

08 Sep 09:30

fxmarty

v1.13.0

aaa07fe

v1.13.0: ONNX weight deduplication, ONNX export and ORT extension

Deduplicate Embedding / LM head weight in the ONNX export

Workaround for a bug in the PyTorch ONNX export that does not deduplicate the Embedding and LM head shared weight: pytorch/pytorch#108342. For small enough models, this results in up to 50% ONNX serialized model size decrease.

Fix PyTorch tied weights being duplicated in the exported ONNX models by @fxmarty in #1326
Fix initializer detection for weight deduplication by @fxmarty in #1333

Extended ONNX Runtime support

ONNX Runtime integration now supports Pix2Struct and MPT architectures. Donut now supports IO Binding. Encoder-Decoder models are now supported as well.

Pix2Struct onnxruntime support by @krathul in #1296
Add MPT onnx and ORT support by @jiqing-feng in #1161
Donut iobinding by @IlyasMoutawwakil in #1209
Add encoder decoder model by @mht-sharma in #851

Extended ONNX export: MPT, TIMM models, Encoder-Decoder

Additionally, the model SAM is now be default exported as a vision_encoder.onnx, and prompt_encoder_mask_decoder.onnx.

Add MPT onnx and ORT support by @jiqing-feng in #1161
Adds ONNX Export Support for Timm Models by @mht-sharma in #965
Add encoder decoder model by @mht-sharma in #851
Fix SAM ONNX export requirements with transformers 4.32, export vision encoder separately by @fxmarty in #1301

BetterTransformer supports Falcon

[BetterTransformer] Add falcon to BetterTransformer by @younesbelkada in #1343

Major bugfix: ability to set GPTQ Exllama kernel maximum length in the transformers integration

The function exllama_set_max_input_length from auto-gptq can now be used with Transformers GPTQ models.

Version bump + add max_input_length to gptq by @SunMarc in #1329

Other changes and bugfixes

Update version to 1.12.1.dev0 following release by @fxmarty in #1312
Add GPTQ prefill benchmark by @fxmarty in #1313
Precise ORTModel documentation by @fxmarty in #1268
Improve BetterTransformer backward compatibility by @fxmarty in #1314
Improve ORTModel documentation by @fxmarty in #1245
Add bitsandbytes benchmark by @fxmarty in #1320
fix typo in log message by @AAnirudh07 in #1322
Support customize dtype for dummy generators by @JingyaHuang in #1307
Fix opset custom onnx export by @mht-sharma in #1331
Replace mpt to ernie custom export by @mht-sharma in #1332
Fix BT benchmark script by @fxmarty in #1344
Add name_or_path for donut generation by @fxmarty in #1345
send both negative prompt embeds to ORT SDXL by @ssube in #1339
add vae image processor by @echarlaix in #1219
add negative prompt test by @echarlaix in #1347
Add GPT BigCode to the BT documentation by @fxmarty in #1356
Add BT dummy objects by @fxmarty in #1355
Add text2text-generation-with-past test for encoder-decoder model by @mht-sharma in #1338
Fix sentence transformer export by @mht-sharma in #1366

New Contributors

@krathul made their first contribution in #1296
@AAnirudh07 made their first contribution in #1322
@jiqing-feng made their first contribution in #1161
@ssube made their first contribution in #1339

Full Changelog: v1.12.0...v1.13.0

Contributors

ssube, fxmarty, and 9 other contributors

Assets 2

23 Aug 12:27

fxmarty

v1.12.0

e00afaa

v1.12.0: AutoGPTQ integration, extended BetterTransformer support

AutoGPTQ integration

Part of AutoGPTQ library has been integrated in Optimum, with utilities to ease the integration in other Hugging Face libraries. Reference: https://huggingface.co/docs/optimum/llm_quantization/usage_guides/quantization

Add GPTQ Quantization by @SunMarc in #1216
Fix GPTQ doc by @regisss in #1267
Add AutoGPTQ benchmark by @fxmarty in #1292
Fix gptq params by @SunMarc in #1284

Extended BetterTransformer support

BetterTransformer now supports BLOOM and GPT-BigCode architectures.

Bt bloom by @baskrahmer in #1221
Support gpt_bigcode in bettertransformer by @fxmarty in #1252
Fix BetterTransformer starcoder init by @fxmarty in #1254
Fix BT starcoder fp16 by @fxmarty in #1255
SDPA dispatches to flash for MQA by @fxmarty in #1259
Check output_attentions is False in BetterTransformer by @fxmarty in #1306

Other changes and bugfixes

Update bug report template by @fxmarty in #1266
Fix ORTModule uses fp32 model issue by @jingyanwangms in #1264
Fix build PR doc workflow by @fxmarty in #1270
Avoid triggering stop job on label by @fxmarty in #1274
Update version following 1.11.1 patch by @fxmarty in #1275
Fix fp16 ONNX detection for decoder models by @fxmarty in #1276
Update version following 1.11.2 patch by @regisss in #1291
Pin tensorflow<=2.12.1 by @fxmarty in #1305
ONNX: disable text-generation models for sequence classification & fixes for transformers 4.32 by @fxmarty in #1308
Fix staging tests following transformers 4.32 release by @fxmarty in #1309
More fixes following transformers 4.32 release by @fxmarty in #1311

New Contributors

@SunMarc made their first contribution in #1216
@jingyanwangms made their first contribution in #1264

Full Changelog: v1.11.2...v1.12.0

Contributors

fxmarty, regisss, and 3 other contributors

Assets 2

17 Aug 11:47

regisss

v1.11.2

068789e

v1.11.2: Patch release

Remove the Transformers version constraint on optimum[habana].

Remove Transformers version constraint on Optimum Habana #1290 by @regisss

Full Changelog: v1.11.1...v1.11.2

Contributors

regisss

Assets 2

11 Aug 12:58

fxmarty

v1.11.1

b4fa986

v1.11.1: Patch release

Minor fix: documentation building for 1.11.

Accelerate as a soft dependency by @fxmarty

Full Changelog: v1.11.0...v1.11.1

Contributors

fxmarty

Assets 2

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ROCMExecutionProvider support

Extended ONNX export

What's Changed

New Contributors

Contributors

Contributors

ONNX

New architectures

Falcon

SpeechT5

Mistral

TrOCR

LCMs

Decoder refactorization

GPTQ

Other changes and bugfixes

New Contributors

Contributors

Contributors

Contributors

What's Changed

Contributors

Deduplicate Embedding / LM head weight in the ONNX export

Extended ONNX Runtime support

Extended ONNX export: MPT, TIMM models, Encoder-Decoder

BetterTransformer supports Falcon

Major bugfix: ability to set GPTQ Exllama kernel maximum length in the transformers integration

Other changes and bugfixes

New Contributors

Contributors

AutoGPTQ integration

Extended BetterTransformer support

Other changes and bugfixes

New Contributors

Contributors

Contributors

Contributors

Releases: huggingface/optimum

v1.15.0: ROCMExecutionProvider support

ROCMExecutionProvider support

Extended ONNX export

What's Changed

New Contributors

Contributors

v1.14.1: Patch release

Contributors

v1.14.0: LCMs, SpeechT5, Falcon, Mistral, decoder refactorization

ONNX

New architectures

Falcon

SpeechT5

Mistral

TrOCR

LCMs

Decoder refactorization

GPTQ

Other changes and bugfixes

New Contributors

Contributors

v1.13.3: Patch release

Contributors

v1.13.2: Patch release

Contributors

v1.13.1: Patch release

What's Changed

Contributors

v1.13.0: ONNX weight deduplication, ONNX export and ORT extension

Deduplicate Embedding / LM head weight in the ONNX export

Extended ONNX Runtime support

Extended ONNX export: MPT, TIMM models, Encoder-Decoder

BetterTransformer supports Falcon

Major bugfix: ability to set GPTQ Exllama kernel maximum length in the transformers integration

Other changes and bugfixes

New Contributors

Contributors

v1.12.0: AutoGPTQ integration, extended BetterTransformer support

AutoGPTQ integration

Extended BetterTransformer support

Other changes and bugfixes

New Contributors

Contributors

v1.11.2: Patch release

Contributors

v1.11.1: Patch release

Contributors