ONNX Runtime v1.4.0
Key Updates
- Performance optimizations for Transformer models
- GPT2 - Enable optimizations for Attention with Past State and Attention Mask
- BERT - Improve EmbedLayerNormalization fusion coverage
- Quantization updates
- Added new quantization operators: QLinearAdd, QAttention
- Improved quantization performance for transformer based models on CPU
- More graph fusion
- Further optimization in MLAS kernel
- Introduced pre-packing for constant Matrix B of DynamicQuantizeMatMul and Qattention
- New Python IOBinding APIs (bind_cpu_input, bind_output, copy_outputs_to_cpu) allow easier benchmarking
- Users no longer need to allocate inputs and outputs on non-CPU devices using third-party allocators.
- Users no longer need to copy inputs to non-CPU devices; ORT handles the copy.
- Users can now use copy_outputs_to_cpu to copy outputs from non-CPU devices to CPU for verification.
- CUDA support for Einsum (opset12)
- ONNX Runtime Training updates
- Opset 12 support
- New sample for training experiment using Huggingface GPT-2.
- Upgraded docker image built from the latest PyTorch release
- Telemetry is now enabled by default for Python packages and Github release zip files (C API); see more details on what/how telemetry is collected in ORT
- [Coming soon] Availability of Python package for ONNX Runtime 1.4 for Jetpack 4.4
Execution Providers
New Execution Providers available for preview:
- [Preview] AMD MIGraphX
- [Preview] ARM NN
Contributions
Contributors to ONNX Runtime include members across teams at Microsoft, along with our community members:
snnn, tianleiwu, edgchen1, hariharans29, skottmckay, tracysh, yufenglee, fs-eire, codemzs, tiagoshibata, yuslepukhin, gwang-msft, wschin, smk2007, prabhat00155, liuziyue, liqunfu, ytaous, iK1D, BowenBao, askhade, pranavsharma, faxu, jywu-msft, ryanlai2, xzhu1900, KeDengMS, tlh20, smkarlap, weixingzhang, jeffbloo, RyanUnderhill, mrry, jgbradley1, stevenlix, zhanghuanrong, suffiank, Andrews548, pengwa, SherlockNoMad, orilevari, duli2012, yangchen-MS, yan12125, jornt-xilinx, ashbhandare, neginraoof, Tixxx, thiagocrepaldi, Craigacp, mayeut, chilo-ms, prasanthpul, martinb35, manashgoswami, zhangxiang1993, suryasidd, wangyems, kit1980, RandySheriffH, fdwr