Make operator TreeEnsemble 5x faster for batches of size 100.000 #5965

xadupre · 2020-11-28T02:20:24Z

Description:

The fix only improves the operator. It parallelizes by trees instead of observations.

1 target or 1 class

BEFORE

AFTER

multiregression or multiclass

BEFORE

AFTER

Motivation and Context
This change makes onnxruntime as fast as scikit-learn on big batches.

…o optrf

onnxruntime/core/providers/cpu/ml/tree_ensemble_common.h

yuslepukhin · 2020-12-01T19:49:42Z

          std::lock_guard<OrtMutex> lock(merge_mutex);

Do we need a mutex here since each thread has its own block? Can we we make code take advantage of it and rid of the mutex? #Resolved

Refers to: onnxruntime/core/providers/cpu/ml/tree_ensemble_common.h:384 in ad600af. [](commit_id = ad600af, deletion_comment = False)

onnxruntime/core/providers/cpu/ml/tree_ensemble_common.h

xadupre · 2020-12-02T19:13:42Z

          std::lock_guard<OrtMutex> lock(merge_mutex);
Do we need a mutex here since each thread has its own block? Can we we make code take advantage of it and rid of the mutex?

Refers to: onnxruntime/core/providers/cpu/ml/tree_ensemble_common.h:384 in ad600af. [](commit_id = ad600af, deletion_comment = False)

Removed.

yuslepukhin

…o optrf

* improves processing time by 10 * extend coverage unit test coverage * better implementation for the multi regression case * better comment, keep parallelization by trees when not enough trees

* Fix PR #5550 reverted in #5911 (performance improvment for operator Transpose) (#5916) * Improves implementation of transpose operator * Fix issue mentioned in #5911 * adding unit test for function DoTransposeImpl * Make operator TreeEnsemble 5x faster for batches of size 100.000 (#5965) * improves processing time by 10 * extend coverage unit test coverage * better implementation for the multi regression case * better comment, keep parallelization by trees when not enough trees * Initialize a structure in operator ReduceSum (#6005) * fix initialisation issue * Fuse MatMulIntegerToFloat only when scales are scalar (#6008) MatMulIntegerToFloat fusion fuses per-row and per-column MatMulInteger, which is not supported by the MatMulIntegerToFloat kernel now. Limit the fusion to per-matrix only before we supporting the per-channel fully. * Disable Python 3.9 for training Python packaging build. (#6012) Disable Python 3.9 for training Python packaging build. Python 3.9 is not supported by the PyTorch dependency. * Fix bugs for 1: Calibrator should check model inputs; 2: (#6017) quantize_inupts forgot to use parameter initializer_use_weight_qtyp. * Bump highlight.js from 10.2.1 to 10.4.1 in /nodejs Bumps [highlight.js](https://github.com/highlightjs/highlight.js) from 10.2.1 to 10.4.1. - [Release notes](https://github.com/highlightjs/highlight.js/releases) - [Changelog](https://github.com/highlightjs/highlight.js/blob/master/CHANGES.md) - [Commits](highlightjs/highlight.js@10.2.1...10.4.1) Signed-off-by: dependabot[bot] <[email protected]> * work around of the build break in mac (#6069) * Fix the build break in macos release * revert android change * Bump up API version for 1.6 release (#6076) * Update version to 1.6.0 (#6041) * Update version to 1.6.0 * Add v 1.5.3 info * Updating WindowsAI and ONNX version Co-authored-by: Du Li <duli@OrtTrainingDev0.af05slrtruoetgaxwwjv5nsq5e.px.internal.cloudapp.net> * Rsevert "Fuse MatMulIntegerToFloat only when scales are scalar (#6008)" This reverts commit beb950e. Co-authored-by: Xavier Dupré <[email protected]> Co-authored-by: Yufeng Li <[email protected]> Co-authored-by: Edward Chen <[email protected]> Co-authored-by: Zhang Lei <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: Pranav Sharma <[email protected]> Co-authored-by: Du Li <duli@OrtTrainingDev0.af05slrtruoetgaxwwjv5nsq5e.px.internal.cloudapp.net>

sdpython added 2 commits November 28, 2020 02:18

improves processing time by 10

02ea092

extend coverage unit test coverage

bafe5ce

xadupre requested a review from a team as a code owner November 28, 2020 02:20

sdpython added 3 commits November 30, 2020 02:23

better implementation for the multi regression case

b19a460

remove unnecessary variable

5d52525

reduce number of distinct implementations

91ed22e

xadupre changed the title ~~[WIP] Make operator TreeEnsemble 5x faster for batches of size 100.000~~ Make operator TreeEnsemble 5x faster for batches of size 100.000 Nov 30, 2020

sdpython added 2 commits December 1, 2020 01:33

remove unused variable

c4c7ecb

Merge branch 'master' of https://github.com/microsoft/onnxruntime int…

ad600af

…o optrf

xadupre added the release:1.6 label Dec 1, 2020