fix: prevent OOM when IVF centroids are provided #1653

wjones127 · 2023-11-22T02:49:04Z

These are a small collection of fixes I needed in order to run a benchmark for partitioning while using CUDA acceleration.

Regardless of whether IVF is going to be trained, we load in enough vectors to do the training. This causes OOM on large datasets, so I've changed this to skip loading those vectors if the IVF centroids have already been passed in. GPU training can handle larger than memory datasets thanks to it's data loader, so this makes large scale training with GPUs possible.
Added some cast to ints, as it's easy to accidentally get floats in some cases.
Handle older pyarrow versions in GPU training API

wjones127 · 2023-11-22T02:50:05Z

python/python/lance/torch/data.py

-    for col in batch.column_names:
+    for col in batch.schema.names:


This is to support older versions of pyarrow. pyarrow 12.0.0 doesn't have RecordBatch.column_names. (For some reason that's the version conda solved for when I needed to install pytorch.)

chebbyChefNEQ · 2023-11-22T03:17:24Z

rust/lance/src/index/vector/ivf.rs

+    let mut training_data = if ivf_params.centroids.is_none() {
+        let start = std::time::Instant::now();
+        log::info!(
+            "Loading training data for IVF. Sample size: {}",
+            sample_size_hint
+        );
+        let data = Some(maybe_sample_training_data(dataset, column, sample_size_hint).await?);
+        log::info!(
+            "Finished loading training data in {:02} seconds",
+            start.elapsed().as_secs_f32()
+        );
+        data


nit: maybe just add the timer in maybe_sample_training_data?

With the narrative logs, I like that they are all in 1 function. 🤷

fix: prevent OOM when IVF centroids are provided

4b5935c

wjones127 commented Nov 22, 2023

View reviewed changes

wjones127 marked this pull request as ready for review November 22, 2023 03:09

wjones127 requested review from eddyxu and chebbyChefNEQ November 22, 2023 03:09

chebbyChefNEQ reviewed Nov 22, 2023

View reviewed changes

chebbyChefNEQ approved these changes Nov 22, 2023

View reviewed changes

wjones127 merged commit 8fc78d7 into main Nov 22, 2023
17 checks passed

wjones127 deleted the wjones127/training-fixes branch November 22, 2023 03:24

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: prevent OOM when IVF centroids are provided #1653

fix: prevent OOM when IVF centroids are provided #1653

wjones127 commented Nov 22, 2023

wjones127 Nov 22, 2023

chebbyChefNEQ Nov 22, 2023

wjones127 Nov 22, 2023

		for col in batch.column_names:
		for col in batch.schema.names:

fix: prevent OOM when IVF centroids are provided #1653

fix: prevent OOM when IVF centroids are provided #1653

Conversation

wjones127 commented Nov 22, 2023

wjones127 Nov 22, 2023

Choose a reason for hiding this comment

chebbyChefNEQ Nov 22, 2023

Choose a reason for hiding this comment

wjones127 Nov 22, 2023

Choose a reason for hiding this comment