Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Examples/Move] BigVector #16721

Closed
wants to merge 4 commits into from
Closed

[Examples/Move] BigVector #16721

wants to merge 4 commits into from

Conversation

amnn
Copy link
Member

@amnn amnn commented Mar 18, 2024

Description

An implementation of a B+ Tree based data structure in Move.

Test Plan

New unit tests:

big_vector$ sui move test

Copy link

vercel bot commented Mar 18, 2024

The latest updates on your projects. Learn more about Vercel for Git ↗︎

Name Status Preview Comments Updated (UTC)
explorer ✅ Ready (Inspect) Visit Preview 💬 Add feedback Apr 17, 2024 3:43pm
mysten-ui ✅ Ready (Inspect) Visit Preview 💬 Add feedback Apr 17, 2024 3:43pm
sui-core ✅ Ready (Inspect) Visit Preview 💬 Add feedback Apr 17, 2024 3:43pm
3 Ignored Deployments
Name Status Preview Comments Updated (UTC)
multisig-toolkit ⬜️ Ignored (Inspect) Visit Preview Apr 17, 2024 3:43pm
sui-kiosk ⬜️ Ignored (Inspect) Visit Preview Apr 17, 2024 3:43pm
sui-typescript-docs ⬜️ Ignored (Inspect) Visit Preview Apr 17, 2024 3:43pm

Copy link
Member Author

@amnn amnn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that this will currently fail to pass tests because it requires indexing support for vector to land in Stdlib, which hasn't happened yet (coming in #16466), but I'm just putting this up as a PR, so that we have some place to discuss the code, and for @0xaslan and @leecchh to play with it as a possible replacement for the CritbitTree + LinkedTable combo we use currently for Deepbook.

_keys: vector<u128>,
_vals: vector<E>,
) {
abort 0
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left unimplemented for now, because it's a little fiddly/complicated, and an implementation without this should still be an improvement on what we have today, but @gdanezis pointed out that having the ability to insert and remove in batches would be a helpful primitive to expose to efficiently handle workloads due to market makers who are pushing in orders in bulk.

Once we get an idea of whether or not this implementation is useful, we can add these extra bells and whistles.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note: to be useful it would have to implement batch_op (where you can do in one go both insert, update, or delete), return an idea of mutations, and not abort (so you can mix many different parties operations). So yes, lets wait and see where it might be useful.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it, so I was imagining this being used by a market maker who was deleting their old orders and putting in a number of new orders, but it sounds like you have something else in mind, perhaps where you gather updates to the table across multiple transactions into a journal and then flush them in a batch, is that right?

(I have gone slightly off my idea of using a $B^\epsilon$ tree for this, because it would really complicate iteration, otherwise it would work well, but maybe we can get the best of both worlds by literally storing a journal at a single level?)

/// hitting object size limits.
const MAX_SLICE_SIZE: u64 = 256 * 1024;

// === Removal fix-up strategies ===
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These constants could be an enum (although ideally they would be a private enum).

_lo: u128,
_hi: u128,
) {
abort 0
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ditto, out of scope for prototype (useful for market maker workloads).

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See above, I do not think separate batch insert / delete with aborts is as useful.

_self: &mut BigVector<E>,
_keys: vector<u128>,
): vector<E> {
abort 0
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ditto, out of scope for prototype (useful for deleting filled orders).

// === Receiver function aliases ===

public use fun slice_is_null as SliceRef.is_null;
public use fun slice_is_leaf as Slice.is_leaf;
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a reminder that we need to update GitHub's highlighting for Move, right?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Admitting defeat by adding this utils module... Initial temptation was to suggest adding these to Stdlib but I think the right move is to have a more feature rich "standard library" separate from the framework (but depending on it), and functions like this might go in that.

Comment on lines +562 to +588
*(&mut bv[7]) = 8;
*(&mut bv[9]) = 10;
*(&mut bv[15]) = 16;
*(&mut bv[19]) = 20;
*(&mut bv[22]) = 23;
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

...a bit funky -- will we have lvalue index syntax before the feature goes out?

@amnn amnn marked this pull request as ready for review March 18, 2024 11:45
last_id: _,
} = self;

assert!(length == 0, ENotEmpty);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: should we do the assert before destructuring it?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tend not to worry about the ordering of these things because the assertion failure is game over -- nothing that happened before or after it will have any effect.

Comment on lines +184 to +191
let BigVector {
id,

depth: _,
length,
max_slice_size: _,
max_fan_out: _,

root_id: _,
last_id: _,
} = self;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should work after the enum syntax diff lands:

Suggested change
let BigVector {
id,
depth: _,
length,
max_slice_size: _,
max_fan_out: _,
root_id: _,
last_id: _,
} = self;
let BigVector { id, length , .. } = self;


let (ix, leaf, off) = self.find_leaf(key);
if (off >= leaf.keys.length()) {
(leaf.next(), 0)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is there a case where leaf.next() doesn't exist? for example if the key is greater than the highest key present in the tree.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed, this can return (SliceRef { ix: NO_SLICE }, 0), this is the case mentioned in the doc comment here (additional emphasis added):

Returns the reference to the slice and the local offset within the slice if it exists, or (NO_SLICE, 0), if there is no matching key-value pair.

Copy link
Contributor

@leecchh leecchh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@gdanezis
Copy link
Collaborator

Overall question: is BigVector a good name for this? The defining property of a B-Tree is that it is (a) a key value data structure and (b) that the keys are stored in order (and sequential ops are cheap). The term "vector" does not really capture these.

Copy link
Collaborator

@gdanezis gdanezis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice one!

/// including gaps in the vector.
length: u64,

/// Max size of leaf nodes (counted in number of elements, `E`).
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe add some guidance here in doc about how to set max_slice_size and max_fan_out? I think they affect efficiency of different operations, but I have to reach for my data structures book to remember how exactly.

_keys: vector<u128>,
_vals: vector<E>,
) {
abort 0
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note: to be useful it would have to implement batch_op (where you can do in one go both insert, update, or delete), return an idea of mutations, and not abort (so you can mix many different parties operations). So yes, lets wait and see where it might be useful.

_lo: u128,
_hi: u128,
) {
abort 0
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See above, I do not think separate batch insert / delete with aborts is as useful.

@amnn
Copy link
Member Author

amnn commented Mar 20, 2024

Overall question: is BigVector a good name for this?

I don't have a strong opinion on this -- happy to take suggestions for names. The original reasons it was called this are that

  • it was derived from the needs/interface outlined in SIP13: SIP-13: BigVector Implementation sui-foundation/sips#13 (comment),
  • it's similar to Android's SparseIntArray, which plays a similar trick but with a sorted array instead of a B+ tree, and
  • I felt bad calling it just a B+ Tree, because I couldn't parameterise the key, so its API looked more like a vector where the index type was a u128, so a very big vector.

But yes, the performance characteristics for insert and remove are off.

@gdanezis
Copy link
Collaborator

I don't have a strong opinion on this -- happy to take suggestions for names. The original reasons it was called this are that

Yeah, BigSortedVector is not as catchy :)

@amnn
Copy link
Member Author

amnn commented Mar 21, 2024

... BTreeVector?

An implementation of a B+ Tree based data structure in Move.

Test Plan:

New unit tests:

```
big_vector$ sui move test
```
@jericlong
Copy link

Overall question: is BigVector a good name for this?

I don't have a strong opinion on this -- happy to take suggestions for names. The original reasons it was called this are that

But yes, the performance characteristics for insert and remove are off.

The original purpose for creating BigVector is the need of massive data storability and mutability in a single transaction. For instance when we settle an option vault, we need to calculate user shares one by one since there are active shares for settlement and warmup shares for the next round. User A may have 10 active + 5 warmup = total 15 for the next round, User B may have 20 active + 3 warmup = total 23 for the next round, since the ratio between active and warmup is not the same, the calculation is necessary. From this point of view I think the B+ Tree based data structure implementation is quiet different with sips#13

@amnn
Copy link
Member Author

amnn commented Apr 25, 2024

@jericlong, thanks for sharing! Could you share a bit more on how the data is stored in BigVector and how these user interactions translate to operations on that data structure?

@amnn
Copy link
Member Author

amnn commented Oct 30, 2024

Seeing as this has now gone out with Deepbook v3, I'll close this PR!

@amnn amnn closed this Oct 30, 2024
@amnn amnn deleted the amnn/btree-example branch November 11, 2024 13:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants