Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ExecutionProvider API refactor - move allocator from EP level to SessionState level and indexed by OrtDevice #15833

Merged
merged 12 commits into from
Jun 20, 2023

Conversation

jslhcl
Copy link
Contributor

@jslhcl jslhcl commented May 6, 2023

Description

This PR is to refactor ExecutionProvider API for memory management, which is to move allocators from EP level to SessionState level and indexed by OrtDevice

Motivation and Context

This PR is to refactor ExecutionProvider API for memory management, which is to move allocators from EP level to SessionState level and indexed by OrtDevice. By this change, EP level will shift the burden of maintaining allocators, which will be user friendly for EP developers

@jslhcl jslhcl marked this pull request as ready for review June 7, 2023 16:29
@jslhcl jslhcl requested review from adithyaselv, jywu-msft and pranavsharma and removed request for adithyaselv June 7, 2023 21:20
include/onnxruntime/core/framework/op_kernel_info.h Outdated Show resolved Hide resolved
onnxruntime/core/framework/session_state.h Outdated Show resolved Hide resolved
onnxruntime/core/framework/session_state.h Outdated Show resolved Hide resolved
onnxruntime/core/framework/device_stream_collection.h Outdated Show resolved Hide resolved
onnxruntime/core/session/inference_session.h Outdated Show resolved Hide resolved
@jslhcl jslhcl requested review from askhade and msancolman June 9, 2023 18:28
Copy link
Contributor

@pranavsharma pranavsharma left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good. A few minor comments.

onnxruntime/core/framework/session_state.h Outdated Show resolved Hide resolved
onnxruntime/core/framework/session_state.h Show resolved Hide resolved
Copy link
Contributor

@pranavsharma pranavsharma left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you modify this error message to say that you can call the new API for other EPs?

onnxruntime/core/session/environment.cc Outdated Show resolved Hide resolved
pranavsharma
pranavsharma previously approved these changes Jun 16, 2023
@jslhcl
Copy link
Contributor Author

jslhcl commented Jun 16, 2023

/azp run Android CI Pipeline

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

askhade
askhade previously approved these changes Jun 16, 2023
Copy link
Contributor

@askhade askhade left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@jslhcl jslhcl dismissed stale reviews from askhade and pranavsharma via b3f508c June 18, 2023 17:27
@souptc souptc merged commit dd72192 into main Jun 20, 2023
@souptc souptc deleted the leca/epAllocator2session branch June 20, 2023 00:44
@jywu-msft
Copy link
Member

@jslhcl FYI, it looks like this PR broke the CANN EP build. see https://github.com/Ascend/onnxruntime/actions/runs/5322372781/jobs/9638730400#step:3:3026

/root/Git.d/onnxruntime/onnxruntime/core/providers/cann/cann_execution_provider.cc:1450:7: error: ‘cann_device’ was not declared in this scope
1450 | cann_device.Id(),
| ^~~~~~~~~~~
/root/Git.d/onnxruntime/onnxruntime/core/providers/cann/cann_execution_provider.cc:1466:7: error: ‘pinned_device’ was not declared in this scope
1466 | pinned_device.Id());
| ^~~~~~~~~~~~~
gmake[2]: *** [CMakeFiles/onnxruntime_providers_cann.dir/build.make:132: CMakeFiles/onnxruntime_providers_cann.dir/root/Git.d/onnxruntime/onnxruntime/core/providers/cann/cann_execution_provider.cc.o] Error 1
gmake[1]: *** [CMakeFiles/Makefile2:1947: CMakeFiles/onnxruntime_providers_cann.dir/all] Error 2

@jywu-msft
Copy link
Member

jywu-msft commented Jun 21, 2023

@jslhcl FYI, it looks like this PR broke the CANN EP build. see https://github.com/Ascend/onnxruntime/actions/runs/5322372781/jobs/9638730400#step:3:3026

/root/Git.d/onnxruntime/onnxruntime/core/providers/cann/cann_execution_provider.cc:1450:7: error: ‘cann_device’ was not declared in this scope 1450 | cann_device.Id(), | ^~~~~~~~~~~ /root/Git.d/onnxruntime/onnxruntime/core/providers/cann/cann_execution_provider.cc:1466:7: error: ‘pinned_device’ was not declared in this scope 1466 | pinned_device.Id()); | ^~~~~~~~~~~~~ gmake[2]: *** [CMakeFiles/onnxruntime_providers_cann.dir/build.make:132: CMakeFiles/onnxruntime_providers_cann.dir/root/Git.d/onnxruntime/onnxruntime/core/providers/cann/cann_execution_provider.cc.o] Error 1 gmake[1]: *** [CMakeFiles/Makefile2:1947: CMakeFiles/onnxruntime_providers_cann.dir/all] Error 2

+@FFFrog ,@zhangsibo1129 FYI, this change broke the CANN EP build. can you advise/help @jslhcl fix it since he doesn't have an environment to build/test?

@zhangsibo1129
Copy link
Contributor

I fixed the CANN EP build for this #15833 (comment), @jywu-msft would you take a look at this PR. Thanks a lot!

@jywu-msft
Copy link
Member

I fixed the CANN EP build for this #15833 (comment), @jywu-msft would you take a look at this PR. Thanks a lot!

thank you, @zhangsibo1129!
@jslhcl can you help take a look?

jslhcl added a commit that referenced this pull request Jun 28, 2023
### Description
Remove AllocatorManager class


### Motivation and Context
After the refactor PR #15833 is in, AllocatorManager class is not
referenced anymore.
chilo-ms added a commit that referenced this pull request Jul 5, 2023
Fix memory leak issue which comes from TRT EP's allocator object not
being released upon destruction.
Following is the log from valgrind:
```
==1911860== 100,272 (56 direct, 100,216 indirect) bytes in 1 blocks are definitely lost in loss record 1,751 of 1,832
==1911860==    at 0x483CFA3: operator new(unsigned long) (vg_replace_malloc.c:472)
==1911860==    by 0x315DC2: std::_MakeUniq<onnxruntime::OrtAllocatorImplWrappingIAllocator>::__single_object std::make_unique<onnxruntime::OrtAllocatorImplWrappingIAllocator, std::shared_ptr<onnxruntime::IAllocator> >(std::shared_ptr<onnxruntime::IAllocator>&&) (unique_ptr.h:857)
==1911860==    by 0x30EE7B: OrtApis::KernelContext_GetAllocator(OrtKernelContext const*, OrtMemoryInfo const*, OrtAllocator**) (custom_ops.cc:121)
==1911860==    by 0x660D115: onnxruntime::TensorrtExecutionProvider::Compile(std::vector<onnxruntime::IExecutionProvider::FusedNodeAndGraph, std::allocator<onnxruntime::IExecutionProvider::FusedNodeAndGraph> > const&, std::vector<onnxruntime::NodeComputeInfo, std::allocator<onnxruntime::NodeComputeInfo> >&)::{lambda(void*, OrtApi const*, OrtKernelContext*)#3}::operator()(void*, OrtApi const*, OrtKernelContext*) const (tensorrt_execution_provider.cc:2223)
```
This issue happens after this [EP allocator
refactor](#15833)
jywu-msft pushed a commit that referenced this pull request Jul 6, 2023
)

Modify CANN EP `CANNExecutionProvider::CreatePreferredAllocators`,
`CANNExecutionProvider::CreateCannAllocator`
to align with the EP API refactor and fix CANN CI for
#15833 (comment)
in this [PR](#15833)
jslhcl added a commit that referenced this pull request Jul 7, 2023
### Description
clean unused parameter in ORT_UNUSED_PARAMETER


### Motivation and Context
clean unused parameters in ORT_UNUSED_PARAMETER which are introduced
from #15833
jeffbloo added a commit that referenced this pull request Aug 11, 2023
…t allocation in ORT API (#17030)

This addresses a DML performance regression from the following PR
resulting in allocations not being rounded and pooled in the DML
execution provider.

#15833

This also fixes a pre-existing limitation that allocations during
session initialization (primarily large weights and persistent
resources) only bypassed rounding and pooling while using the Winml API.
The allocator now also respects a caller's rounding mode parameter when
provided.
jchen351 pushed a commit that referenced this pull request Aug 12, 2023
…t allocation in ORT API (#17030)

This addresses a DML performance regression from the following PR
resulting in allocations not being rounded and pooled in the DML
execution provider.

#15833

This also fixes a pre-existing limitation that allocations during
session initialization (primarily large weights and persistent
resources) only bypassed rounding and pooling while using the Winml API.
The allocator now also respects a caller's rounding mode parameter when
provided.
kleiti pushed a commit to kleiti/onnxruntime that referenced this pull request Mar 22, 2024
…t allocation in ORT API (microsoft#17030)

This addresses a DML performance regression from the following PR
resulting in allocations not being rounded and pooled in the DML
execution provider.

microsoft#15833

This also fixes a pre-existing limitation that allocations during
session initialization (primarily large weights and persistent
resources) only bypassed rounding and pooling while using the Winml API.
The allocator now also respects a caller's rounding mode parameter when
provided.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants