-
Notifications
You must be signed in to change notification settings - Fork 221
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(python): gpu based ivf partition training #1361
Conversation
python/python/lance/vector.py
Outdated
else: | ||
samples = dataset.sample(k * sample_rate)[column] | ||
|
||
if accelerator in ["gpu", "cuda"]: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: gpu could be MPS or AMD as well, I don't think we should map gpu
to cuda
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yea, so i was thinking that use gpu
to call preferred_device()
later to auto detect GPU on the machine
def preferred_device(device: Optional[str] = None): |
But it is fair that we don't need to do it now. As mps performance is not good at the moment.
column: str, | ||
k: int, | ||
metric_type: str, | ||
accelerator: str, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: allow device id here like cuda:0
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just two device handling nits
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor comment tweaks but this looks good
*, | ||
sample_rate: int = 256, | ||
) -> np.ndarray: | ||
"""Use accelerator (GPU or MPS) to train kmeans.""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is MPS actually supported currently?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, we can run mps today, it is just not as fast as we desired.
python/python/lance/vector.py
Outdated
sample_rate: int = 256, | ||
) -> np.ndarray: | ||
"""Use accelerator (GPU or MPS) to train kmeans.""" | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: should we check for torch
is installed before trying to do all the sampling?
Co-authored-by: Weston Pace <[email protected]>
Co-authored-by: Weston Pace <[email protected]>
Co-authored-by: Weston Pace <[email protected]>
Use pytorch to train IVF partitions on GPU
ds.create_index(..., accelerator="cuda")