Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[EM] Support mmap backed ellpack. #10602

Merged
merged 2 commits into from
Jul 18, 2024

Conversation

trivialfis
Copy link
Member

@trivialfis trivialfis commented Jul 17, 2024

This PR changes ellpack to use the resource view as the backing storage similar to CPU counterparts. This will help us enable more options for GPU external memory including mmap, normal malloc, device malloc, pinned memory, and managed memory. For now, only mmap is used. The PR supports both HMM and non-HMM systems.

In addition, some refactoring to split up wrappers of the cuda runtime API to make them usable with a host compiler.

I haven't done any profiling yet as I need another branch to run real batch-based workflow, the madvise and prefetching are mostly placeholders. Will experiment with various backends after this PR.

  • Support resource view in ellpack.
  • Define the CUDA version of MMAP resource.
  • Define the CUDA version of malloc resource.
  • Refactor cuda runtime API wrappers, and add memory access related wrappers.
  • gather windows macros into a single header.

To-do

  • Dispatch from ellpack source.

Comments.

Use context.

Work on cuda mmap.

Close stream.

Split up helpers.

dispatch.

Test flag.

Cleanup.

CPU.

Dispatch.

Windows.

Leak.

Cleanup.

Fix.

Windows.

lint.

windows.

cleanup.
@trivialfis trivialfis changed the title [WIP][EM] Support mmap backed ellpack. [EM] Support mmap backed ellpack. Jul 17, 2024
@trivialfis trivialfis marked this pull request as ready for review July 17, 2024 21:04
@trivialfis
Copy link
Member Author

cc @rongou

dh::safe_cuda(
cudaMemAdvise(handle_->base_ptr, handle_->base_size, cudaMemAdviseSetAccessedBy, device));
dh::safe_cuda(
cudaMemPrefetchAsync(handle_->base_ptr, handle_->base_size, device, dh::DefaultStream()));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this always happen after the data is initialized on the cpu?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, for now. We might optionally only prefetch the gradient index part and leave other scalars on host.

@trivialfis trivialfis merged commit 292bb67 into dmlc:master Jul 18, 2024
31 of 32 checks passed
@trivialfis trivialfis deleted the ext-ellpack-resource branch July 18, 2024 00:20
trivialfis added a commit that referenced this pull request Jul 22, 2024
* [coll] Allow using local host for testing. (#10526)

- Don't try to retrieve the IP address if a host is specified.
- Fix compiler deprecation warning.

* Fix boolean array for arrow-backed DF. (#10527)

* [EM] Move prefetch in reset into the end of the iteration. (#10529)

* Enhance the threadpool implementation. (#10531)



- Accept an initialization function.
- Support void return tasks.

* [doc] Update link to release notes. [skip ci] (#10533)

* [doc] Fix learning to rank tutorial. [skip ci] (#10539)

* Cache GPU histogram kernel configuration. (#10538)

* [sycl] Reorder if-else statements to allow using of cpu branches for sycl-devices (#10543)

* reoder if-else statements for sycl compatibility

* trigger check

---------

Co-authored-by: Dmitry Razdoburdin <>

* [EM] Basic distributed test for external memory. (#10492)

* [sycl] Improve build configuration. (#10548)

Co-authored-by: Dmitry Razdoburdin <>

* [R] Update roxygen. (#10556)

* [doc] Add more detailed explanations for advanced objectives (#10283)



---------

Co-authored-by: Jiaming Yuan <[email protected]>

* [doc] Add `build_info` to autodoc. [skip ci] (#10551)

* [doc] Add notes about RMM and device ordinal. [skip ci] (#10562)

- Remove the experimental tag, we have been running it for a long time now.
- Add notes about avoiding set CUDA device.
- Add link in parameter.

* Fix empty partition. (#10559)

* Avoid the use of size_t in the partitioner. (#10541)

- Avoid the use of size_t in the partitioner.
- Use `Span` instead of `Elem` where `node_id` is not needed.
- Remove the `const_cast`.
- Make sure the constness is not removed in the `Elem` by making it reference only.

size_t is implementation-defined, which causes issue when we want to pass pointer or span.

* [EM] Handle base idx in GPU histogram. (#10549)

* [fed] Split up federated test CMake file. (#10566)

- Collect all federated test files into the same directory.
- Independently list the files.

* Avoid thrust vector initialization. (#10544)

* Avoid thrust vector initialization.

- Add a wrapper for rmm device uvector.
- Split up the `Resize` method for HDV.

* Fix column split race condition. (#10572)

* Small cleanup for CMake scripts. (#10573)

- Remove rabit.

* replace channel for sycl dependencies (#10576)

Co-authored-by: Dmitry Razdoburdin <>

* Bump org.apache.maven.plugins:maven-project-info-reports-plugin (#10497)

Bumps [org.apache.maven.plugins:maven-project-info-reports-plugin](https://github.com/apache/maven-project-info-reports-plugin) from 3.5.0 to 3.6.1.
- [Commits](apache/maven-project-info-reports-plugin@maven-project-info-reports-plugin-3.5.0...maven-project-info-reports-plugin-3.6.1)

---
updated-dependencies:
- dependency-name: org.apache.maven.plugins:maven-project-info-reports-plugin
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* Bump org.apache.flink:flink-clients in /jvm-packages (#10517)

Bumps [org.apache.flink:flink-clients](https://github.com/apache/flink) from 1.19.0 to 1.19.1.
- [Commits](apache/flink@release-1.19.0...release-1.19.1)

---
updated-dependencies:
- dependency-name: org.apache.flink:flink-clients
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* Bump org.apache.maven.plugins:maven-surefire-plugin (#10429)

Bumps [org.apache.maven.plugins:maven-surefire-plugin](https://github.com/apache/maven-surefire) from 3.2.5 to 3.3.0.
- [Release notes](https://github.com/apache/maven-surefire/releases)
- [Commits](apache/maven-surefire@surefire-3.2.5...surefire-3.3.0)

---
updated-dependencies:
- dependency-name: org.apache.maven.plugins:maven-surefire-plugin
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* Bump commons-logging:commons-logging in /jvm-packages/xgboost4j-spark (#10547)

Bumps commons-logging:commons-logging from 1.3.2 to 1.3.3.

---
updated-dependencies:
- dependency-name: commons-logging:commons-logging
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Jiaming Yuan <[email protected]>

* Bump org.apache.maven.plugins:maven-jar-plugin (#10458)

Bumps [org.apache.maven.plugins:maven-jar-plugin](https://github.com/apache/maven-jar-plugin) from 3.4.1 to 3.4.2.
- [Release notes](https://github.com/apache/maven-jar-plugin/releases)
- [Commits](apache/maven-jar-plugin@maven-jar-plugin-3.4.1...maven-jar-plugin-3.4.2)

---
updated-dependencies:
- dependency-name: org.apache.maven.plugins:maven-jar-plugin
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* Bump org.apache.maven.plugins:maven-project-info-reports-plugin (#10585)

Bumps [org.apache.maven.plugins:maven-project-info-reports-plugin](https://github.com/apache/maven-project-info-reports-plugin) from 3.6.1 to 3.6.2.
- [Commits](apache/maven-project-info-reports-plugin@maven-project-info-reports-plugin-3.6.1...maven-project-info-reports-plugin-3.6.2)

---
updated-dependencies:
- dependency-name: org.apache.maven.plugins:maven-project-info-reports-plugin
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* Bump org.apache.maven.plugins:maven-release-plugin (#10586)

Bumps [org.apache.maven.plugins:maven-release-plugin](https://github.com/apache/maven-release) from 3.0.1 to 3.1.1.
- [Release notes](https://github.com/apache/maven-release/releases)
- [Commits](apache/maven-release@maven-release-3.0.1...maven-release-3.1.1)

---
updated-dependencies:
- dependency-name: org.apache.maven.plugins:maven-release-plugin
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* Bump net.alchim31.maven:scala-maven-plugin in /jvm-packages/xgboost4j (#10536)

Bumps net.alchim31.maven:scala-maven-plugin from 4.9.1 to 4.9.2.

---
updated-dependencies:
- dependency-name: net.alchim31.maven:scala-maven-plugin
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* Bump org.apache.maven.plugins:maven-checkstyle-plugin in /jvm-packages (#10518)

Bumps [org.apache.maven.plugins:maven-checkstyle-plugin](https://github.com/apache/maven-checkstyle-plugin) from 3.3.1 to 3.4.0.
- [Commits](apache/maven-checkstyle-plugin@maven-checkstyle-plugin-3.3.1...maven-checkstyle-plugin-3.4.0)

---
updated-dependencies:
- dependency-name: org.apache.maven.plugins:maven-checkstyle-plugin
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* [R] Redesigned `xgboost()` interface skeleton (#10456)


---------

Co-authored-by: Michael Mayer <[email protected]>

* [jvm-packages] Bump rapids version. (#10588)

* Bump scalatest.version from 3.2.18 to 3.2.19 in /jvm-packages/xgboost4j (#10535)

Bumps `scalatest.version` from 3.2.18 to 3.2.19.

Updates `org.scalatest:scalatest_2.12` from 3.2.18 to 3.2.19
- [Release notes](https://github.com/scalatest/scalatest/releases)
- [Commits](scalatest/scalatest@release-3.2.18...release-3.2.19)

Updates `org.scalactic:scalactic_2.12` from 3.2.18 to 3.2.19
- [Release notes](https://github.com/scalatest/scalatest/releases)
- [Commits](scalatest/scalatest@release-3.2.18...release-3.2.19)

---
updated-dependencies:
- dependency-name: org.scalatest:scalatest_2.12
  dependency-type: direct:development
  update-type: version-update:semver-patch
- dependency-name: org.scalactic:scalactic_2.12
  dependency-type: direct:development
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* [Doc] Fix CRAN badge in README [skip ci] (#10587)

* Change http to https in Badges

* Change all http to https

* Partial fix for CTK 12.5 (#10574)

* Merge approx tests. (#10583)

* [CI] Reduce the frequency of dependabot PRs (#10593)

* Bump actions/setup-python from 5.1.0 to 5.1.1 (#10599)

Bumps [actions/setup-python](https://github.com/actions/setup-python) from 5.1.0 to 5.1.1.
- [Release notes](https://github.com/actions/setup-python/releases)
- [Commits](actions/setup-python@82c7e63...39cd149)

---
updated-dependencies:
- dependency-name: actions/setup-python
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* Bump actions/upload-artifact from 4.3.3 to 4.3.4 (#10600)

Bumps [actions/upload-artifact](https://github.com/actions/upload-artifact) from 4.3.3 to 4.3.4.
- [Release notes](https://github.com/actions/upload-artifact/releases)
- [Commits](actions/upload-artifact@6546280...0b2256b)

---
updated-dependencies:
- dependency-name: actions/upload-artifact
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* Bump com.fasterxml.jackson.core:jackson-databind (#10590)

Bumps [com.fasterxml.jackson.core:jackson-databind](https://github.com/FasterXML/jackson) from 2.15.2 to 2.17.2.
- [Commits](https://github.com/FasterXML/jackson/commits)

---
updated-dependencies:
- dependency-name: com.fasterxml.jackson.core:jackson-databind
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* Refactor `DeviceUVector`. (#10595)

Create a wrapper instead of using inheritance to avoid inconsistent interface of the class.

* [EM] Support mmap backed ellpack. (#10602)


- Support resource view in ellpack.
- Define the CUDA version of MMAP resource.
- Define the CUDA version of malloc resource.
- Refactor cuda runtime API wrappers, and add memory access related wrappers.
- gather windows macros into a single header.

* [CI] Fix test environment. (#10609)

* [CI] Fix test environment.

* Remove shell.

* Remove.

* Update Dockerfile.i386

* [CI] Build a CPU-only wheel under name `xgboost-cpu` (#10603)

* Drop support for CUDA legacy stream. (#10607)

* Optionally skip cupy on windows. (#10611)

* [EM] Prevent init with CUDA malloc resource. (#10606)

* Move device histogram storage into `histogram.cuh`. (#10608)

* Fix.

* Fix.

---------

Signed-off-by: dependabot[bot] <[email protected]>
Co-authored-by: Dmitry Razdoburdin <[email protected]>
Co-authored-by: david-cortes <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Michael Mayer <[email protected]>
Co-authored-by: RektPunk <[email protected]>
Co-authored-by: Philip Hyunsu Cho <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants