perf: make sure we use multiple threads when scanning #1705

wjones127 · 2023-12-11T23:35:55Z

No description provided.

wjones127 · 2023-12-13T01:03:47Z

rust/lance-core/src/encodings/plain.rs

+        // If contiguous, continue
+        if indices[i + 1] == indices[i] {
+            continue;
+        }


This fixes a performance bug for large vectors. If we have vectors larger than the block size (1536-dim f32 vectors are 6,144 bytes while the default block size on local fs is 4K), then without this line we make a range request for every vector we take. 😱

westonpace

Just the one question

westonpace · 2023-12-13T18:57:30Z

rust/lance/src/io/exec/pushdown_scan.rs

@@ -294,7 +294,12 @@ impl FragmentScanner {
        let stream = futures::stream::iter(simplified_predicates.into_iter().enumerate()).map(
            move |(batch_id, predicate)| {
                let scanner_ref = scanner.clone();
-                async move { scanner_ref.read_batch(batch_id, predicate).await }
+                tokio::task::spawn(async move { scanner_ref.read_batch(batch_id, predicate).await })


Would buffered / buffer_unordered not work here?

We do call it below, it's just not sufficient. Each of the read_batch() tasks has some CPU-bound work, which can block IO of other tasks if they are in the same thread. By spawning, we distribute the tasks amongst threads, ensuring we get concurrency.

wjones127 added 2 commits December 11, 2023 15:34

perf: make sure we use multiple threads when scanning

e585697

fix bug with taking large vectors

aa3225f

wjones127 commented Dec 13, 2023

View reviewed changes

wjones127 marked this pull request as ready for review December 13, 2023 01:03

wjones127 requested review from eddyxu and westonpace December 13, 2023 01:31

westonpace approved these changes Dec 13, 2023

View reviewed changes

wjones127 merged commit 48a4b07 into lancedb:main Dec 13, 2023
17 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

perf: make sure we use multiple threads when scanning #1705

perf: make sure we use multiple threads when scanning #1705

wjones127 commented Dec 11, 2023

wjones127 Dec 13, 2023

westonpace left a comment

westonpace Dec 13, 2023

wjones127 Dec 13, 2023

perf: make sure we use multiple threads when scanning #1705

perf: make sure we use multiple threads when scanning #1705

Conversation

wjones127 commented Dec 11, 2023

wjones127 Dec 13, 2023

Choose a reason for hiding this comment

westonpace left a comment

Choose a reason for hiding this comment

westonpace Dec 13, 2023

Choose a reason for hiding this comment

wjones127 Dec 13, 2023

Choose a reason for hiding this comment