-
-
Notifications
You must be signed in to change notification settings - Fork 8.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use DMatrix Proxy for implementing data callback. #5629
Conversation
d405ed8
to
6c33130
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like the example in data_iterator.py
. The custom data iterator will be quite useful for users who want to integrate a custom data source.
See my questions, especially about data erasure.
next_callback, | ||
ctypes.c_float(missing), | ||
ctypes.c_int(nthread), | ||
ctypes.c_int(256), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are we okay with hardcoding 256 for max_bin
here?
''' | ||
def __init__(self, data, label=None, weight=None, base_margin=None, | ||
label_lower_bound=None, label_upper_bound=None): | ||
'''Generate some random data for demostration. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this comment make sense here? This is not a demo.
src/data/iterative_device_dmatrix.cu
Outdated
#define DISPATCH_MEM(__Proxy, __Fn) \ | ||
[](DMatrixProxy const* proxy) -> decltype( \ | ||
(dmlc::get<CupyAdapterBatch>(proxy->Value())).__Fn()) { \ | ||
if (proxy->Value().type() == typeid(CupyAdapterBatch)) { \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it correct to say that the proxy matrix performs type erasure for the batch, and now we're trying to recover the type of the batch dynamically?
From the way I see it, we have an implicit list of allowable types, namely CupyAdapterBatch
and CudfAdapterBatch
. It may be better to add a type ID field in the DMatrix proxy. Take a look at PackedFunc from TVM project, which uses type erasure to expose a generic function type. The PackedFunc uses type ID to distinguish between underlying types.
Also, is it possible to avoid macro here? Why use macro?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will look into the pack function today.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cool, I felt that PackedFunc may give us hint for simplifying the proxy.
size_t row_stride); | ||
|
||
template <typename AdapterBatch> | ||
explicit EllpackPageImpl(AdapterBatch batch, float missing, int device, bool is_dense, int nthread, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's your rationale for adding a second interface for EllpackPageImpl()
? Is it to specify a custom cuts
?
src/data/ellpack_page.cu
Outdated
CopyDataRowMajor(batch, this, device, missing); | ||
} else { | ||
// CopyDataColumnMajor(adapter, batch, this, missing); | ||
LOG(FATAL) << "Not implemented"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What makes the second EllpackPageImpl
different from the first EllpackPageImpl
, such that column major is not supported?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Approving the general idea.
6c33130
to
155be2b
Compare
* Add new iterative DMatrix. * Add new proxy DMatrix. * Add dask interface.
87975fe
to
c1d2c27
Compare
One last piece would be the dask interface then the big series will be over. |
@RAMitchell @hcho3 The second prototype.
Closes #5583, #5571
Please ignore the code in
quantile
, it's copied and pasted fromhist_util.cu
.Depends on #5623 . First prototype is in #5630 .