Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

chore(deps): update dependency microsoft/onnxruntime to v1.19.0 #116

Merged
merged 1 commit into from
Aug 17, 2024

Conversation

renovate[bot]
Copy link
Contributor

@renovate renovate bot commented Aug 17, 2024

Mend Renovate

This PR contains the following updates:

Package Update Change
microsoft/onnxruntime minor 1.18.1 -> 1.19.0

Release Notes

microsoft/onnxruntime (microsoft/onnxruntime)

v1.19.0: ONNX Runtime v1.19

Compare Source

Announcements

  • Note that Java (maven) and training (pypi) packages are delayed from package manager release due to some publishing errors. Feel free to contact @​maanavd if you need release candidates for some workflows ASAP. In the meantime, binaries are attached to this post. This message will be deleted once this ceases to be the case. Thanks for your understanding :)

Build System & Packages

  • Numpy support for 2.x has been added
  • Qualcomm SDK has been upgraded to 2.25
  • ONNX has been upgraded from 1.16 → 1.16.1
  • Default GPU packages use CUDA 12.x and Cudnn 8.x (previously CUDA 11.x/CuDNN 8.x) CUDA 11.x/CuDNN 8.x packages are moved to the aiinfra VS feed.
  • TensorRT 10.2 support added
  • Introduced Java CUDA 12 packages on Maven.
  • Discontinued support for Xamarin. (Xamarin reached EOL on May 1, 2024)
  • Discontinued support for macOS 11 and increasing the minimum supported macOS version to 12. (macOS 11 reached EOL in September 2023)
  • Discontinued support for iOS 12 and increasing the minimum supported iOS version to 13.

Core

  • Implemented DeformConv

Performance

  • Added QDQ support for INT4 quantization in CPU and CUDA Execution Providers
  • Implemented FlashAttention on CPU to improve performance for GenAI prompt cases
  • Improved INT4 performance on CPU (X64, ARM64) and NVIDIA GPUs

Execution Providers

  • TensorRT

    • Updated to support TensorRT 10.2
    • Remove calls to deprecated api’s
    • Enable refittable embedded engine when ONNX model provided as byte stream
  • CUDA

    • Added support for building with CUDA 12.5.
    • Upgraded cutlass to 3.5.0 for performance improvement of memory efficient attention.
    • Updated MultiHeadAttention and Attention operators to be thread-safe.
    • Added sdpa_kernel provider option to choose kernel for Scaled Dot-Product Attention.
    • Expanded op support - Tile (bf16)
  • CPU

    • Expanded op support - GroupQueryAttention, SparseAttention (for Phi-3 small)
  • QNN

    • Updated to support QNN SDK 2.25
    • Expanded op support - HardSigmoid, ConvTranspose 3d, Clip (int32 data), Matmul (int4 weights), Conv (int4 weights), prelu (fp16)
    • Expanded fusion support – Conv + Clip/Relu fusion
  • OpenVINO

    • Added support for OpenVINO 2024.3
    • Support for enabling EpContext using session options
  • DirectML

    • Updated DirectML from 1.14.1 → 1.15
    • Updated ONNX opset from 17 → 20
    • Opset 19 and Opset 20 are supported with known caveats:
      • Gridsample 20: 5d not supported
      • DeformConv not supported

Mobile

Web

  • Updated JavaScript packaging to align with best practices, including slight incompatibilities when apps bundle onnxruntime-web
  • Improved CPU operators coverage for WebNN (now supported by Chrome)

Training

  • No specific updates

GenAI

  • Support for new models Qwen, Llama 3.1, Gemma 2, phi3 small
  • Support to build quantized models with method AWQ and GPTQ
  • Performance improvements for Intel and Arm CPU
  • Packing and language binding
    • Added Java bindings (build from source)
    • Separate OnnxRuntime.dll and directml.dll out of GenAI package to improve usability
    • Publish packages for Win Arm
    • Support for Android (build from source)
  • Bug fixes, like the long prompt correctness issue for phi3.

Extensions

  • Added C APIs for language, vision and audio processors including new FeatureExtractor for Whisper
  • Support for Phi-3 Small Tokenizer and new OpenAI tiktoken format for fast loading of BPE tokenizers
  • Added new CUDA custom operators such as MulSigmoid, Transpose2DCast, ReplaceZero, AddSharedInput and MulSharedInput
  • Enhanced Custom Op Lite API on GPU and fused kernels for DORT
  • Bug fixes, including null bos_token for Qwen2 tokenizer and SentencePiece converted FastTokenizer issue on non-ASCII characters, as well as necessary updates for MSVC 19.40 and numpy 2.0 release

Contributors

Changming Sun, Baiju Meswani, Scott McKay, Edward Chen, Jian Chen, Wanming Lin, Tianlei Wu, Adrian Lizarraga, Chester Liu, Yi Zhang, Yulong Wang, Hector Li, kunal-vaishnavi, pengwa, aciddelgado, Yifan Li, Xu Xing, Yufeng Li, Patrice Vignola, Yueqing Zhang, Jing Fang, Chi Lo, Dmitri Smirnov, mingyueliuh, cloudhan, Yi-Hong Lyu, Ye Wang, Ted Themistokleous, Guenther Schmuelling, George Wu, mindest, liqun Fu, Preetha Veeramalai, Justin Chu, Xiang Zhang, zz002, vraspar, kailums, guyang3532, Satya Kumar Jandhyala, Rachel Guo, Prathik Rao, Maximilian Müller, Sophie Schoenmeyer, zhijiang, maggie1059, ivberg, glen-amd, aamajumder, Xavier Dupré, Vincent Wang, Suryaprakash Shanmugam, Sheil Kumar, Ranjit Ranjan, Peishen Yan, Frank Dong, Chen Feiyue, Caroline Zhu, Adam Louly, Ștefan Talpalaru, zkep, winskuo-quic, wejoncy, vividsnow, vivianw-amd, moyo1997, mcollinswisc, jingyanwangms, Yang Gu, Tom McDonald, Sunghoon, Shubham Bhokare, RuomeiMS, Qingnan Duan, PeixuanZuo, Pavan Goyal, Nikolai Svakhin, KnightYao, Jon Campbell, Johan MEJIA, Jake Mathern, Hans, Hann Wang, Enrico Galli, Dwayne Robinson, Clément Péron, Chip Kerchner, Chen Fu, Carson M, Adam Reeve, Adam Pocock.

Big thank you to everyone who contributed to this release!

Full Changelog: microsoft/onnxruntime@v1.19.0...v1.19.0


Configuration

📅 Schedule: Branch creation - At any time (no schedule defined), Automerge - At any time (no schedule defined).

🚦 Automerge: Enabled.

Rebasing: Whenever PR becomes conflicted, or you tick the rebase/retry checkbox.

🔕 Ignore: Close this PR and you won't be reminded about this update again.


  • If you want to rebase/retry this PR, check this box

This PR was generated by Mend Renovate. View the repository job log.

@renovate renovate bot added the bump/minor label Aug 17, 2024
@HeavenVolkoff HeavenVolkoff merged commit 71f070c into main Aug 17, 2024
11 checks passed
@HeavenVolkoff HeavenVolkoff deleted the renovate/microsoft-onnxruntime-1.x branch August 17, 2024 03:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant