-
-
Notifications
You must be signed in to change notification settings - Fork 8.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use ptr from mmap for GHistIndexMatrix and ColumnMatrix. #9315
Conversation
8be4326
to
43ab634
Compare
- Define a resource for holding various types of memory pointers. - Define ref vector for holding resources. - Swap the underlying resources for GHist and ColumnM. - Add documentation for current status. cpu compile. lint. lint. fix. Squashed commit of the following: commit 6552242 Author: Jiaming Yuan <[email protected]> Date: Fri Jun 16 05:17:52 2023 +0800 Don't blame. commit de4f71c Author: Jiaming Yuan <[email protected]> Date: Thu Jun 15 09:23:59 2023 +0800 fix. commit f3e39ac Author: Jiaming Yuan <[email protected]> Date: Thu Jun 15 09:11:10 2023 +0800 read-only. commit 914a186 Author: Jiaming Yuan <[email protected]> Date: Thu Jun 15 09:06:59 2023 +0800 read-only. commit 6169fdc Author: Jiaming Yuan <[email protected]> Date: Thu Jun 15 08:59:18 2023 +0800 cleanup. commit 8b993ff Author: Jiaming Yuan <[email protected]> Date: Thu Jun 15 06:08:23 2023 +0800 Forbid pointer to bool cast. commit 076a788 Author: Jiaming Yuan <[email protected]> Date: Thu Jun 15 02:55:33 2023 +0800 fix. commit b5b57a0 Author: Jiaming Yuan <[email protected]> Date: Thu Jun 15 01:57:32 2023 +0800 Improve test. commit c8726c3 Author: Jiaming Yuan <[email protected]> Date: Thu Jun 15 00:56:56 2023 +0800 polishing. commit 3d76acc Merge: 22ae3f6 8cdbb87 Author: Jiaming Yuan <[email protected]> Date: Thu Jun 15 00:54:19 2023 +0800 Merge remote-tracking branch 'jiamingy/ext-mmap' into ext-mmap commit 22ae3f6 Author: Jiaming Yuan <[email protected]> Date: Thu Jun 15 00:54:05 2023 +0800 reduce page number. commit 8cdbb87 Author: Jiaming Yuan <[email protected]> Date: Thu Jun 15 00:38:52 2023 +0800 mingw commit a4e11d3 Author: Jiaming Yuan <[email protected]> Date: Thu Jun 15 00:19:00 2023 +0800 fix win leak commit 94b8a0d Author: Jiaming Yuan <[email protected]> Date: Wed Jun 14 18:31:34 2023 +0800 Timer. commit 9dd5812 Author: Jiaming Yuan <[email protected]> Date: Wed Jun 14 05:47:23 2023 +0800 improve the tests. commit 6a02601 Author: Jiaming Yuan <[email protected]> Date: Wed Jun 14 05:42:22 2023 +0800 log time. commit e88f561 Author: Jiaming Yuan <[email protected]> Date: Wed Jun 14 05:41:15 2023 +0800 lint. commit 788f2b6 Author: Jiaming Yuan <[email protected]> Date: Wed Jun 14 03:10:03 2023 +0800 GPU compilation. commit d3987e8 Author: Jiaming Yuan <[email protected]> Date: Wed Jun 14 02:57:42 2023 +0800 cleanup. commit 9bbecf5 Author: Jiaming Yuan <[email protected]> Date: Wed Jun 14 01:47:55 2023 +0800 Cleanup. commit c195db1 Author: Jiaming Yuan <[email protected]> Date: Wed Jun 14 01:25:40 2023 +0800 Cleanup. commit bcf4cdb Author: Jiaming Yuan <[email protected]> Date: Wed Jun 14 01:14:03 2023 +0800 Fix. commit 925245c Author: Jiaming Yuan <[email protected]> Date: Tue Jun 13 22:11:48 2023 +0800 debug. commit 5baf5ca Author: Jiaming Yuan <[email protected]> Date: Tue Jun 13 20:34:13 2023 +0800 Cleanup. commit 58c0d99 Author: Jiaming Yuan <[email protected]> Date: Tue Jun 13 20:29:27 2023 +0800 Avoid padding the data. commit 2660c66 Author: Jiaming Yuan <[email protected]> Date: Tue Jun 13 19:06:59 2023 +0800 Pad the file for windows. commit 341c8fb Author: Jiaming Yuan <[email protected]> Date: Tue Jun 13 17:26:16 2023 +0800 compile commit 9068cc8 Merge: a61a079 4989269 Author: Jiaming Yuan <[email protected]> Date: Tue Jun 13 17:10:52 2023 +0800 Merge branch 'ext-mmap-win' into ext-mmap commit 4989269 Author: Jiaming Yuan <[email protected]> Date: Tue Jun 13 16:36:47 2023 +0800 windows mmap commit a61a079 Author: Jiaming Yuan <[email protected]> Date: Tue Jun 13 05:20:16 2023 +0800 doc. commit a736125 Author: Jiaming Yuan <[email protected]> Date: Tue Jun 13 04:03:12 2023 +0800 lint. commit 39ed218 Author: Jiaming Yuan <[email protected]> Date: Tue Jun 13 03:40:09 2023 +0800 comment. commit 4521f04 Author: Jiaming Yuan <[email protected]> Date: Tue Jun 13 03:35:22 2023 +0800 use ctx. commit 4b5d38f Author: Jiaming Yuan <[email protected]> Date: Tue Jun 13 02:33:18 2023 +0800 GPU initialization. commit 68b838d Author: Jiaming Yuan <[email protected]> Date: Tue Jun 13 02:15:29 2023 +0800 remove in no sampling. commit f383f76 Author: Jiaming Yuan <[email protected]> Date: Tue Jun 13 02:13:33 2023 +0800 remove page in uniform sampling. commit 1b0dab2 Author: Jiaming Yuan <[email protected]> Date: Tue Jun 13 02:09:17 2023 +0800 Remove page in grad-based sampling. commit 9b5c686 Author: Jiaming Yuan <[email protected]> Date: Mon Jun 12 23:19:54 2023 +0800 cleanup. commit ed635d3 Author: Jiaming Yuan <[email protected]> Date: Mon Jun 12 23:17:17 2023 +0800 rename. commit 05ce49b Author: Jiaming Yuan <[email protected]> Date: Mon Jun 12 23:12:30 2023 +0800 Add test. commit da00b6d Author: Jiaming Yuan <[email protected]> Date: Mon Jun 12 21:59:21 2023 +0800 lint. commit 9ee1643 Author: Jiaming Yuan <[email protected]> Date: Mon Jun 12 21:56:55 2023 +0800 Skip python test. commit 117fb97 Author: Jiaming Yuan <[email protected]> Date: Mon Jun 12 21:45:27 2023 +0800 Cleanup. commit a6202d0 Author: Jiaming Yuan <[email protected]> Date: Mon Jun 12 02:46:18 2023 +0800 cleanup. commit ba358af Author: Jiaming Yuan <[email protected]> Date: Mon Jun 12 02:45:50 2023 +0800 Fix. commit 1e0405e Author: Jiaming Yuan <[email protected]> Date: Mon Jun 12 02:29:48 2023 +0800 debug. commit a04dc39 Author: Jiaming Yuan <[email protected]> Date: Sat Jun 10 00:30:18 2023 +0800 macos. commit 18c3544 Author: Jiaming Yuan <[email protected]> Date: Sat Jun 10 00:08:43 2023 +0800 Cleanup. commit 3832fd5 Author: Jiaming Yuan <[email protected]> Date: Fri Jun 9 23:02:47 2023 +0800 cleanup. commit 9ebd4ef Author: Jiaming Yuan <[email protected]> Date: Fri Jun 9 22:51:24 2023 +0800 abstract into a dmlc stream. commit fa5d460 Author: Jiaming Yuan <[email protected]> Date: Fri Jun 9 12:42:47 2023 +0800 cleanup. commit 23a89b0 Author: Jiaming Yuan <[email protected]> Date: Fri Jun 9 12:25:08 2023 +0800 reduce size. commit b1573d4 Author: Jiaming Yuan <[email protected]> Date: Fri Jun 9 11:51:35 2023 +0800 use mmap for external memory. Win. cleanup. debug. Revert "debug." This reverts commit ab26114. Fix resize. Doc. Optional grow. lint. Fixes. test file. Remove doc. nvcc. histogram. exception. allow exception. Define streams. aligned write. Aligned write for tests. Windows build.
4bf2d33
to
950a133
Compare
/** | ||
* @param Alignment for resource read stream and aligned write stream. | ||
*/ | ||
constexpr std::size_t IOAlignment() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We might want to make this a compile-time configurable parameter in the future. Currently, 8-byte alignment is sufficient for all types used by the external memory.
T const& init) | ||
: RefResourceView{ptr, n, mem} { | ||
if (n != 0) { | ||
std::fill_n(ptr_, n, init); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Need some more thoughts for GPU integration.
fi->Read(&impl->is_dense); | ||
fi->Read(&impl->row_stride); | ||
fi->Read(&impl->gidx_buffer.HostVector()); | ||
if (!fi->Read(&impl->n_rows)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The read method is non-throwing based on the convention set by dmlc core. Therefore we always check the return type and use it to indicate failure. This might not be a good fit for XGBoost.
bool Read(GHistIndexMatrix* page, dmlc::SeekStream* fi) override { | ||
bool Read(GHistIndexMatrix* page, common::ResourceReadStream* fi) override { | ||
CHECK(fi); | ||
|
||
if (!ReadHistogramCuts(&page->cut, fi)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This doesn't use mmap
as the cuts are stored in the host device vector. We need to change it in the future. We might drop the requirement of storing the cuts.
The basic idea is to define a reference counted view on opaque data. Either host memory or mmap can back the data. One problem with this approach is that the underlying resource might be read-only, but we cannot reflect this on the view. We can use
std::variant
to hold a const and a non-const version of the vectors, but the code can get messy with many accessors. Currently, the pages returned byGetBatches
are const references, providing some safety.Stacked on top of #9282.