-
Notifications
You must be signed in to change notification settings - Fork 11.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Examples/Move] BigVector #16721
[Examples/Move] BigVector #16721
Conversation
The latest updates on your projects. Learn more about Vercel for Git ↗︎
3 Ignored Deployments
|
60e8bb1
to
f0ac0a7
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note that this will currently fail to pass tests because it requires indexing support for vector to land in Stdlib
, which hasn't happened yet (coming in #16466), but I'm just putting this up as a PR, so that we have some place to discuss the code, and for @0xaslan and @leecchh to play with it as a possible replacement for the CritbitTree
+ LinkedTable
combo we use currently for Deepbook.
_keys: vector<u128>, | ||
_vals: vector<E>, | ||
) { | ||
abort 0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Left unimplemented for now, because it's a little fiddly/complicated, and an implementation without this should still be an improvement on what we have today, but @gdanezis pointed out that having the ability to insert and remove in batches would be a helpful primitive to expose to efficiently handle workloads due to market makers who are pushing in orders in bulk.
Once we get an idea of whether or not this implementation is useful, we can add these extra bells and whistles.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note: to be useful it would have to implement batch_op (where you can do in one go both insert, update, or delete), return an idea of mutations, and not abort (so you can mix many different parties operations). So yes, lets wait and see where it might be useful.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Got it, so I was imagining this being used by a market maker who was deleting their old orders and putting in a number of new orders, but it sounds like you have something else in mind, perhaps where you gather updates to the table across multiple transactions into a journal and then flush them in a batch, is that right?
(I have gone slightly off my idea of using a
/// hitting object size limits. | ||
const MAX_SLICE_SIZE: u64 = 256 * 1024; | ||
|
||
// === Removal fix-up strategies === |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These constants could be an enum
(although ideally they would be a private enum
).
_lo: u128, | ||
_hi: u128, | ||
) { | ||
abort 0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ditto, out of scope for prototype (useful for market maker workloads).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See above, I do not think separate batch insert / delete with aborts is as useful.
_self: &mut BigVector<E>, | ||
_keys: vector<u128>, | ||
): vector<E> { | ||
abort 0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ditto, out of scope for prototype (useful for deleting filled orders).
// === Receiver function aliases === | ||
|
||
public use fun slice_is_null as SliceRef.is_null; | ||
public use fun slice_is_leaf as Slice.is_leaf; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a reminder that we need to update GitHub's highlighting for Move, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Admitting defeat by adding this utils
module... Initial temptation was to suggest adding these to Stdlib
but I think the right move is to have a more feature rich "standard library" separate from the framework (but depending on it), and functions like this might go in that.
*(&mut bv[7]) = 8; | ||
*(&mut bv[9]) = 10; | ||
*(&mut bv[15]) = 16; | ||
*(&mut bv[19]) = 20; | ||
*(&mut bv[22]) = 23; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
...a bit funky -- will we have lvalue index syntax before the feature goes out?
last_id: _, | ||
} = self; | ||
|
||
assert!(length == 0, ENotEmpty); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: should we do the assert before destructuring it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tend not to worry about the ordering of these things because the assertion failure is game over -- nothing that happened before or after it will have any effect.
let BigVector { | ||
id, | ||
|
||
depth: _, | ||
length, | ||
max_slice_size: _, | ||
max_fan_out: _, | ||
|
||
root_id: _, | ||
last_id: _, | ||
} = self; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should work after the enum syntax diff lands:
let BigVector { | |
id, | |
depth: _, | |
length, | |
max_slice_size: _, | |
max_fan_out: _, | |
root_id: _, | |
last_id: _, | |
} = self; | |
let BigVector { id, length , .. } = self; |
|
||
let (ix, leaf, off) = self.find_leaf(key); | ||
if (off >= leaf.keys.length()) { | ||
(leaf.next(), 0) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is there a case where leaf.next() doesn't exist? for example if the key is greater than the highest key present in the tree.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Indeed, this can return (SliceRef { ix: NO_SLICE }, 0)
, this is the case mentioned in the doc comment here (additional emphasis added):
Returns the reference to the slice and the local offset within the slice if it exists, or (NO_SLICE, 0), if there is no matching key-value pair.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Overall question: is |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice one!
/// including gaps in the vector. | ||
length: u64, | ||
|
||
/// Max size of leaf nodes (counted in number of elements, `E`). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe add some guidance here in doc about how to set max_slice_size
and max_fan_out
? I think they affect efficiency of different operations, but I have to reach for my data structures book to remember how exactly.
_keys: vector<u128>, | ||
_vals: vector<E>, | ||
) { | ||
abort 0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note: to be useful it would have to implement batch_op (where you can do in one go both insert, update, or delete), return an idea of mutations, and not abort (so you can mix many different parties operations). So yes, lets wait and see where it might be useful.
_lo: u128, | ||
_hi: u128, | ||
) { | ||
abort 0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See above, I do not think separate batch insert / delete with aborts is as useful.
I don't have a strong opinion on this -- happy to take suggestions for names. The original reasons it was called this are that
But yes, the performance characteristics for insert and remove are off. |
Yeah, |
... |
An implementation of a B+ Tree based data structure in Move. Test Plan: New unit tests: ``` big_vector$ sui move test ```
f3a9d82
to
95a73c3
Compare
The original purpose for creating BigVector is the need of massive data storability and mutability in a single transaction. For instance when we settle an option vault, we need to calculate user shares one by one since there are active shares for settlement and warmup shares for the next round. User A may have 10 active + 5 warmup = total 15 for the next round, User B may have 20 active + 3 warmup = total 23 for the next round, since the ratio between active and warmup is not the same, the calculation is necessary. From this point of view I think the B+ Tree based data structure implementation is quiet different with sips#13 |
@jericlong, thanks for sharing! Could you share a bit more on how the data is stored in |
Seeing as this has now gone out with Deepbook v3, I'll close this PR! |
Description
An implementation of a B+ Tree based data structure in Move.
Test Plan
New unit tests: