Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make operator TreeEnsemble 5x faster for batches of size 100.000 #5965

Merged
merged 11 commits into from
Dec 3, 2020

Conversation

xadupre
Copy link
Member

@xadupre xadupre commented Nov 28, 2020

Description:

The fix only improves the operator. It parallelizes by trees instead of observations.

1 target or 1 class

BEFORE

image

AFTER

image

multiregression or multiclass

BEFORE

image

AFTER

image

Motivation and Context
This change makes onnxruntime as fast as scikit-learn on big batches.

@xadupre xadupre requested a review from a team as a code owner November 28, 2020 02:20
@xadupre xadupre changed the title [WIP] Make operator TreeEnsemble 5x faster for batches of size 100.000 Make operator TreeEnsemble 5x faster for batches of size 100.000 Nov 30, 2020
@yuslepukhin
Copy link
Member

yuslepukhin commented Dec 1, 2020

          std::lock_guard<OrtMutex> lock(merge_mutex);

Do we need a mutex here since each thread has its own block? Can we we make code take advantage of it and rid of the mutex? #Resolved


Refers to: onnxruntime/core/providers/cpu/ml/tree_ensemble_common.h:384 in ad600af. [](commit_id = ad600af, deletion_comment = False)

@xadupre
Copy link
Member Author

xadupre commented Dec 2, 2020

          std::lock_guard<OrtMutex> lock(merge_mutex);

Do we need a mutex here since each thread has its own block? Can we we make code take advantage of it and rid of the mutex?

Refers to: onnxruntime/core/providers/cpu/ml/tree_ensemble_common.h:384 in ad600af. [](commit_id = ad600af, deletion_comment = False)

Removed.

Copy link
Member

@yuslepukhin yuslepukhin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:shipit:

@xadupre xadupre merged commit 0acc383 into microsoft:master Dec 3, 2020
duli2012 pushed a commit that referenced this pull request Dec 8, 2020
* improves processing time by 10
* extend coverage unit test coverage
* better implementation for the multi regression case
* better comment, keep parallelization by trees when not enough trees
duli2012 added a commit that referenced this pull request Dec 9, 2020
* Fix PR #5550 reverted in #5911 (performance improvment for operator Transpose) (#5916)

* Improves implementation of transpose operator
* Fix issue mentioned in #5911
* adding unit test for function DoTransposeImpl

* Make operator TreeEnsemble 5x faster for batches of size 100.000 (#5965)

* improves processing time by 10
* extend coverage unit test coverage
* better implementation for the multi regression case
* better comment, keep parallelization by trees when not enough trees

* Initialize a structure in operator ReduceSum (#6005)

* fix initialisation issue

* Fuse MatMulIntegerToFloat only when scales are scalar (#6008)

MatMulIntegerToFloat fusion fuses per-row and per-column MatMulInteger, which is not supported by the MatMulIntegerToFloat kernel now. Limit the fusion to per-matrix only before we supporting the per-channel fully.

* Disable Python 3.9 for training Python packaging build. (#6012)

Disable Python 3.9 for training Python packaging build. Python 3.9 is not supported by the PyTorch dependency.

* Fix bugs for 1: Calibrator should check model inputs; 2: (#6017)

quantize_inupts forgot to use parameter initializer_use_weight_qtyp.

* Bump highlight.js from 10.2.1 to 10.4.1 in /nodejs

Bumps [highlight.js](https://github.com/highlightjs/highlight.js) from 10.2.1 to 10.4.1.
- [Release notes](https://github.com/highlightjs/highlight.js/releases)
- [Changelog](https://github.com/highlightjs/highlight.js/blob/master/CHANGES.md)
- [Commits](highlightjs/highlight.js@10.2.1...10.4.1)

Signed-off-by: dependabot[bot] <[email protected]>

* work around of the build break in mac (#6069)

* Fix the build break in macos release

* revert android change

* Bump up API version for 1.6 release (#6076)

* Update version to 1.6.0 (#6041)

* Update version to 1.6.0

* Add v 1.5.3 info

* Updating WindowsAI and ONNX version

Co-authored-by: Du Li <duli@OrtTrainingDev0.af05slrtruoetgaxwwjv5nsq5e.px.internal.cloudapp.net>

* Rsevert "Fuse MatMulIntegerToFloat only when scales are scalar (#6008)"

This reverts commit beb950e.

Co-authored-by: Xavier Dupré <[email protected]>
Co-authored-by: Yufeng Li <[email protected]>
Co-authored-by: Edward Chen <[email protected]>
Co-authored-by: Zhang Lei <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Pranav Sharma <[email protected]>
Co-authored-by: Du Li <duli@OrtTrainingDev0.af05slrtruoetgaxwwjv5nsq5e.px.internal.cloudapp.net>
@xadupre xadupre deleted the optrf branch September 28, 2021 22:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants