-
Notifications
You must be signed in to change notification settings - Fork 901
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
AWS S3 IO through KvikIO #16499
base: branch-24.12
Are you sure you want to change the base?
AWS S3 IO through KvikIO #16499
Conversation
dd1b46d
to
55cc220
Compare
55cc220
to
02c19cc
Compare
55d817f
to
7d2a170
Compare
41ae12e
to
f4e0f57
Compare
ad2e46d
to
ea983e1
Compare
4f94346
to
22c0222
Compare
d8e0c8d
to
a1a7ee1
Compare
50f30c2
to
f6c710a
Compare
37717e7
to
167c5d2
Compare
cpp/src/io/utilities/datasource.cpp
Outdated
*/ | ||
static bool is_supported_remote_url(std::string const& url) | ||
{ | ||
// Regular expression to match "<s3|http|https>://" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wondering about long-term: If we're given an http or https URI, how would we tell whether that's on amazon s3 vs some other cloud service?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
E.g. if users rely on https automatically working for s3, then we need to change how https works in the future, could their code break.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point, I have limited the support to s3://
urls for now.
KvikIO also supports regular http servers but let's address that in a later PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
great stuff!
few small comments
Co-authored-by: Vukasin Milovanovic <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall looks good to me!
CUDF_EXPECTS(supports_device_read(), "Device reads are not supported for this file."); | ||
|
||
auto const read_size = std::min(size, this->size() - offset); | ||
return _kvikio_file.pread(dst, read_size, offset); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Question: If insufficient memory is allocated in dst
, does pread
catch the exception thrown?
Implement remote IO read using KvikIO's S3 backend.
For now, this is an experimental feature for parquet read only. Enable by defining
CUDF_KVIKIO_REMOTE_IO=ON
.Checklist