-
Notifications
You must be signed in to change notification settings - Fork 11.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[GraphQL] Object DataLoader #17332
[GraphQL] Object DataLoader #17332
Conversation
The latest updates on your projects. Learn more about Vercel for Git ↗︎
3 Ignored Deployments
|
@@ -819,7 +819,24 @@ Checkpoint created: 7 | |||
task 33 'run-graphql'. lines 445-495: | |||
Response: { | |||
"data": { | |||
"parent_version_4_outside_consistent_range": null, | |||
"parent_version_4_outside_consistent_range": { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The change in this test demonstrates how we can now query historical objects, but not their dynamic fields (because the dynamic field query still relies on the consistent object fetching logic).
@@ -192,7 +192,22 @@ Response: { | |||
} | |||
} | |||
}, | |||
"object_outside_available_range": null, | |||
"object_not_in_snapshot": null | |||
"object_outside_available_range": { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Returning response for historical queries.
@@ -129,7 +130,7 @@ pub(crate) struct ObjectKey { | |||
|
|||
/// The object's owner type: Immutable, Shared, Parent, or Address. | |||
#[derive(Union, Clone)] | |||
pub enum ObjectOwner { | |||
pub(crate) enum ObjectOwner { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These pub(crate)
-s are not strictly part of the data loading PR, but I noticed them here -- can pull them out into their own PR if it's helpful.
@@ -1192,36 +1077,6 @@ impl Checkpointed for Cursor { | |||
} | |||
} | |||
|
|||
impl Paginated<Cursor> for StoredObject { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This change is also deleting some code that we no longer use, because we no longer fetch directly from the objects
table.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ptal at the comments regarding using View::Consistent
for the LatestAtKey
dataloader!
build_objects_query( | ||
View::Consistent, | ||
range, | ||
&Page::bounded(ids.len() as u64), | ||
|q| apply_parent_bound(filter.apply(q)), | ||
apply_parent_bound, | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you can just use View::Historical
here; given the changes in this PR perhaps it's a bit of a misnomer, but since we aren't filtering by object_version
or some other malleable criteria, and instead solely on object_id
, we can avoid the bulky LEFT JOIN
s. Similarly in the paginated objects query, https://github.com/MystenLabs/sui/blob/main/crates/sui-graphql-rpc/src/types/object.rs#L1330, I believe the check should've been instead for whether the filter has anything that isn't empty or object_ids
.
Alternatively, we can also do a single fetch for all object_ids to objects_snapshot
, parallel fetches to objects_history
per checkpoint bound, and then filter in app for the latest per group from either snapshot or history.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In fact, if we're at a cursor for an object that has a yet newer version, using View::Consistent
will erroneously filter out the object because a newer version exists, although what we really want is the latest version of the object bounded by the checkpoint
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TL;DR: I think View::Historical
is pretty broken, so I won't use it here, I'll add a TODO in the codebase and create a task for fixing it).
I see what you're saying (although I had to write out both the historical and consistent queries to double check 😅) -- I think you're right that I can use the Historical
query kind here, but I'm in two minds about it (see below).
but since we aren't filtering by object_version or some other malleable criteria,
Note that I am filtering by object_version
, but that is what enables use of the historical mode IIRC.
I believe the check should've been instead for whether the filter has anything that isn't empty or object_ids.
I don't think this is right. If you only supply object_ids
, then you are implicitly filtering the live object set at whatever checkpoint you're querying at, so historical queries do not work, because you need the behaviour of filtering outdated candidates (objects that match the filters but aren't latest in the checkpoint). It works for object_keys
because you are fixing the version of objects you're querying for.
using View::Consistent will erroneously filter out the object because a newer version exists
I don't think this will happen, because the query to find newer versions is also bounded by the available range, and by the apply_parent_bound
modifier that I pass to the last argument here.
Finally while View::Historical
avoids the left joins:
- It doesn't bound the query on the historical side by the available range (so it's going to look into every partition), while still redundantly querying the snapshot which is a summary of some prefix of object history snapshots.
- In the long run, this loader will be split into two: One that handles object lookups by version, and one that handles latest objects. The former of these will be implemented using the
object_versions
table directly, so should hopefully be nice and cheap, and the latter is implemented most efficiently as a form of consistent lookup (i.e. avoiding looking into all object history partitions).
This actually makes me think that the Historical
mode might be generally broken... imagine a case where you have an ObjectFilter
that contains both object_ids
and object_keys
-- because it has object_keys
it will be treated as a historical query, but it also fetching objects by ID, so it's going to fetch them from all partitions in objects_history
and then take the latest version without applying a bound by available range, which is going to break consistency.
## Description Implement data loaders for fetching historical object versions, objects bounded by their parent versions, and the latest versions of an object at a given checkpoint. By implementing an Object DataLoader, we also implicitly get support for data-loading all derived types (`MoveObject`, `MovePackage`, `MoveModule`, `DynamicField`, `Coin`, etc). These implementations (particularly historical queries and queries where the version can be bounded by a parent version) can be made even more efficient with the existence of an index/side table that maps an object's ID and version to the checkpoint it is part of. This change has not been included in this PR, but we will follow up on this as part of Object query benchmarking. As part of this change, I enabled queries for historical objects outside the available range. Later (with the use of an `obj_version` index) it will also be possible to enable dynamic field look-ups on historical objects as well. ## Test Plan ``` sui$ cargo nextest run -p sui-graphql-rpc sui$ cargo nextest run -p sui-graphql-e2e-tests --features pg_integration ``` Run the following query -- after this change, it takes about 100ms to complete on the server: ``` query { owner( address: "0x029170bfa0a1677054263424fe4f9960c7cf05d359f6241333994c8830772bdb" ) { dynamicFields(first: 50) { pageInfo { hasNextPage endCursor } nodes { name { type { repr } json } value { ... on MoveValue { type { repr } json } ... on MoveObject { contents { json type { repr } } } } } } } } ```
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🚢
## Description Implement data loaders for fetching historical object versions, objects bounded by their parent versions, and the latest versions of an object at a given checkpoint. By implementing an Object DataLoader, we also implicitly get support for data-loading all derived types (`MoveObject`, `MovePackage`, `MoveModule`, `DynamicField`, `Coin`, etc). These implementations (particularly historical queries and queries where the version can be bounded by a parent version) can be made even more efficient with the existence of an index/side table that maps an object's ID and version to the checkpoint it is part of. This change has not been included in this PR, but we will follow up on this as part of Object query benchmarking. As part of this change, I enabled queries for historical objects outside the available range. Later (with the use of an `obj_version` index) it will also be possible to enable dynamic field look-ups on historical objects as well. ## Test Plan ``` sui$ cargo nextest run -p sui-graphql-rpc sui$ cargo nextest run -p sui-graphql-e2e-tests --features pg_integration ``` Run the following query -- after this change, it takes about 8s to complete on the server, fetching about 80 objects, while previously it would either timeout or squeak in *just* under the 40s timeout. I expect this number to improve further once we have an efficient way to map object ids and versions to a checkpoint sequence number. ```graphql query { transactionBlocks(last: 5) { nodes { effects { objectChanges(first: 50) { pageInfo { hasNextPage } nodes { idCreated idDeleted inputState { asMoveObject { contents { json } } } outputState { asMoveObject { contents { json } } } } } } } } } ``` --- ## Release notes Check each box that your changes affect. If none of the boxes relate to your changes, release notes aren't required. For each box you select, include information after the relevant heading that describes the impact of your changes that a user might notice and any actions they must take to implement updates. - [ ] Protocol: - [ ] Nodes (Validators and Full nodes): - [ ] Indexer: - [ ] JSON-RPC: - [x] GraphQL: Queries for historical versions of objects will now return data even if that version of the object is outside the available range. - [ ] CLI: - [ ] Rust SDK:
## Description Implement data loaders for fetching historical object versions, objects bounded by their parent versions, and the latest versions of an object at a given checkpoint. By implementing an Object DataLoader, we also implicitly get support for data-loading all derived types (`MoveObject`, `MovePackage`, `MoveModule`, `DynamicField`, `Coin`, etc). These implementations (particularly historical queries and queries where the version can be bounded by a parent version) can be made even more efficient with the existence of an index/side table that maps an object's ID and version to the checkpoint it is part of. This change has not been included in this PR, but we will follow up on this as part of Object query benchmarking. As part of this change, I enabled queries for historical objects outside the available range. Later (with the use of an `obj_version` index) it will also be possible to enable dynamic field look-ups on historical objects as well. ## Test Plan ``` sui$ cargo nextest run -p sui-graphql-rpc sui$ cargo nextest run -p sui-graphql-e2e-tests --features pg_integration ``` Run the following query -- after this change, it takes about 8s to complete on the server, fetching about 80 objects, while previously it would either timeout or squeak in *just* under the 40s timeout. I expect this number to improve further once we have an efficient way to map object ids and versions to a checkpoint sequence number. ```graphql query { transactionBlocks(last: 5) { nodes { effects { objectChanges(first: 50) { pageInfo { hasNextPage } nodes { idCreated idDeleted inputState { asMoveObject { contents { json } } } outputState { asMoveObject { contents { json } } } } } } } } } ``` --- ## Release notes Check each box that your changes affect. If none of the boxes relate to your changes, release notes aren't required. For each box you select, include information after the relevant heading that describes the impact of your changes that a user might notice and any actions they must take to implement updates. - [ ] Protocol: - [ ] Nodes (Validators and Full nodes): - [ ] Indexer: - [ ] JSON-RPC: - [x] GraphQL: Queries for historical versions of objects will now return data even if that version of the object is outside the available range. - [ ] CLI: - [ ] Rust SDK:
## Description Implement data loaders for fetching historical object versions, objects bounded by their parent versions, and the latest versions of an object at a given checkpoint. By implementing an Object DataLoader, we also implicitly get support for data-loading all derived types (`MoveObject`, `MovePackage`, `MoveModule`, `DynamicField`, `Coin`, etc). These implementations (particularly historical queries and queries where the version can be bounded by a parent version) can be made even more efficient with the existence of an index/side table that maps an object's ID and version to the checkpoint it is part of. This change has not been included in this PR, but we will follow up on this as part of Object query benchmarking. As part of this change, I enabled queries for historical objects outside the available range. Later (with the use of an `obj_version` index) it will also be possible to enable dynamic field look-ups on historical objects as well. ## Test Plan ``` sui$ cargo nextest run -p sui-graphql-rpc sui$ cargo nextest run -p sui-graphql-e2e-tests --features pg_integration ``` Run the following query -- after this change, it takes about 8s to complete on the server, fetching about 80 objects, while previously it would either timeout or squeak in *just* under the 40s timeout. I expect this number to improve further once we have an efficient way to map object ids and versions to a checkpoint sequence number. ```graphql query { transactionBlocks(last: 5) { nodes { effects { objectChanges(first: 50) { pageInfo { hasNextPage } nodes { idCreated idDeleted inputState { asMoveObject { contents { json } } } outputState { asMoveObject { contents { json } } } } } } } } } ``` --- ## Release notes Check each box that your changes affect. If none of the boxes relate to your changes, release notes aren't required. For each box you select, include information after the relevant heading that describes the impact of your changes that a user might notice and any actions they must take to implement updates. - [ ] Protocol: - [ ] Nodes (Validators and Full nodes): - [ ] Indexer: - [ ] JSON-RPC: - [x] GraphQL: Queries for historical versions of objects will now return data even if that version of the object is outside the available range. - [ ] CLI: - [ ] Rust SDK:
## Description Implement data loaders for fetching historical object versions, objects bounded by their parent versions, and the latest versions of an object at a given checkpoint. By implementing an Object DataLoader, we also implicitly get support for data-loading all derived types (`MoveObject`, `MovePackage`, `MoveModule`, `DynamicField`, `Coin`, etc). These implementations (particularly historical queries and queries where the version can be bounded by a parent version) can be made even more efficient with the existence of an index/side table that maps an object's ID and version to the checkpoint it is part of. This change has not been included in this PR, but we will follow up on this as part of Object query benchmarking. As part of this change, I enabled queries for historical objects outside the available range. Later (with the use of an `obj_version` index) it will also be possible to enable dynamic field look-ups on historical objects as well. ## Test Plan ``` sui$ cargo nextest run -p sui-graphql-rpc sui$ cargo nextest run -p sui-graphql-e2e-tests --features pg_integration ``` Run the following query -- after this change, it takes about 8s to complete on the server, fetching about 80 objects, while previously it would either timeout or squeak in *just* under the 40s timeout. I expect this number to improve further once we have an efficient way to map object ids and versions to a checkpoint sequence number. ```graphql query { transactionBlocks(last: 5) { nodes { effects { objectChanges(first: 50) { pageInfo { hasNextPage } nodes { idCreated idDeleted inputState { asMoveObject { contents { json } } } outputState { asMoveObject { contents { json } } } } } } } } } ``` --- ## Release notes Check each box that your changes affect. If none of the boxes relate to your changes, release notes aren't required. For each box you select, include information after the relevant heading that describes the impact of your changes that a user might notice and any actions they must take to implement updates. - [ ] Protocol: - [ ] Nodes (Validators and Full nodes): - [ ] Indexer: - [ ] JSON-RPC: - [x] GraphQL: Queries for historical versions of objects will now return data even if that version of the object is outside the available range. - [ ] CLI: - [ ] Rust SDK:
## Description Implement data loaders for fetching historical object versions, objects bounded by their parent versions, and the latest versions of an object at a given checkpoint. By implementing an Object DataLoader, we also implicitly get support for data-loading all derived types (`MoveObject`, `MovePackage`, `MoveModule`, `DynamicField`, `Coin`, etc). These implementations (particularly historical queries and queries where the version can be bounded by a parent version) can be made even more efficient with the existence of an index/side table that maps an object's ID and version to the checkpoint it is part of. This change has not been included in this PR, but we will follow up on this as part of Object query benchmarking. As part of this change, I enabled queries for historical objects outside the available range. Later (with the use of an `obj_version` index) it will also be possible to enable dynamic field look-ups on historical objects as well. ## Test Plan ``` sui$ cargo nextest run -p sui-graphql-rpc sui$ cargo nextest run -p sui-graphql-e2e-tests --features pg_integration ``` Run the following query -- after this change, it takes about 8s to complete on the server, fetching about 80 objects, while previously it would either timeout or squeak in *just* under the 40s timeout. I expect this number to improve further once we have an efficient way to map object ids and versions to a checkpoint sequence number. ```graphql query { transactionBlocks(last: 5) { nodes { effects { objectChanges(first: 50) { pageInfo { hasNextPage } nodes { idCreated idDeleted inputState { asMoveObject { contents { json } } } outputState { asMoveObject { contents { json } } } } } } } } } ``` --- ## Release notes Check each box that your changes affect. If none of the boxes relate to your changes, release notes aren't required. For each box you select, include information after the relevant heading that describes the impact of your changes that a user might notice and any actions they must take to implement updates. - [ ] Protocol: - [ ] Nodes (Validators and Full nodes): - [ ] Indexer: - [ ] JSON-RPC: - [x] GraphQL: Queries for historical versions of objects will now return data even if that version of the object is outside the available range. - [ ] CLI: - [ ] Rust SDK:
## Description Implement data loaders for fetching historical object versions, objects bounded by their parent versions, and the latest versions of an object at a given checkpoint. By implementing an Object DataLoader, we also implicitly get support for data-loading all derived types (`MoveObject`, `MovePackage`, `MoveModule`, `DynamicField`, `Coin`, etc). These implementations (particularly historical queries and queries where the version can be bounded by a parent version) can be made even more efficient with the existence of an index/side table that maps an object's ID and version to the checkpoint it is part of. This change has not been included in this PR, but we will follow up on this as part of Object query benchmarking. As part of this change, I enabled queries for historical objects outside the available range. Later (with the use of an `obj_version` index) it will also be possible to enable dynamic field look-ups on historical objects as well. ## Test Plan ``` sui$ cargo nextest run -p sui-graphql-rpc sui$ cargo nextest run -p sui-graphql-e2e-tests --features pg_integration ``` Run the following query -- after this change, it takes about 8s to complete on the server, fetching about 80 objects, while previously it would either timeout or squeak in *just* under the 40s timeout. I expect this number to improve further once we have an efficient way to map object ids and versions to a checkpoint sequence number. ```graphql query { transactionBlocks(last: 5) { nodes { effects { objectChanges(first: 50) { pageInfo { hasNextPage } nodes { idCreated idDeleted inputState { asMoveObject { contents { json } } } outputState { asMoveObject { contents { json } } } } } } } } } ``` --- ## Release notes Check each box that your changes affect. If none of the boxes relate to your changes, release notes aren't required. For each box you select, include information after the relevant heading that describes the impact of your changes that a user might notice and any actions they must take to implement updates. - [ ] Protocol: - [ ] Nodes (Validators and Full nodes): - [ ] Indexer: - [ ] JSON-RPC: - [x] GraphQL: Queries for historical versions of objects will now return data even if that version of the object is outside the available range. - [ ] CLI: - [ ] Rust SDK:
Implement data loaders for fetching historical object versions, objects bounded by their parent versions, and the latest versions of an object at a given checkpoint. By implementing an Object DataLoader, we also implicitly get support for data-loading all derived types (`MoveObject`, `MovePackage`, `MoveModule`, `DynamicField`, `Coin`, etc). These implementations (particularly historical queries and queries where the version can be bounded by a parent version) can be made even more efficient with the existence of an index/side table that maps an object's ID and version to the checkpoint it is part of. This change has not been included in this PR, but we will follow up on this as part of Object query benchmarking. As part of this change, I enabled queries for historical objects outside the available range. Later (with the use of an `obj_version` index) it will also be possible to enable dynamic field look-ups on historical objects as well. ``` sui$ cargo nextest run -p sui-graphql-rpc sui$ cargo nextest run -p sui-graphql-e2e-tests --features pg_integration ``` Run the following query -- after this change, it takes about 8s to complete on the server, fetching about 80 objects, while previously it would either timeout or squeak in *just* under the 40s timeout. I expect this number to improve further once we have an efficient way to map object ids and versions to a checkpoint sequence number. ```graphql query { transactionBlocks(last: 5) { nodes { effects { objectChanges(first: 50) { pageInfo { hasNextPage } nodes { idCreated idDeleted inputState { asMoveObject { contents { json } } } outputState { asMoveObject { contents { json } } } } } } } } } ``` --- Check each box that your changes affect. If none of the boxes relate to your changes, release notes aren't required. For each box you select, include information after the relevant heading that describes the impact of your changes that a user might notice and any actions they must take to implement updates. - [ ] Protocol: - [ ] Nodes (Validators and Full nodes): - [ ] Indexer: - [ ] JSON-RPC: - [x] GraphQL: Queries for historical versions of objects will now return data even if that version of the object is outside the available range. - [ ] CLI: - [ ] Rust SDK:
Implement data loaders for fetching historical object versions, objects bounded by their parent versions, and the latest versions of an object at a given checkpoint. By implementing an Object DataLoader, we also implicitly get support for data-loading all derived types (`MoveObject`, `MovePackage`, `MoveModule`, `DynamicField`, `Coin`, etc). These implementations (particularly historical queries and queries where the version can be bounded by a parent version) can be made even more efficient with the existence of an index/side table that maps an object's ID and version to the checkpoint it is part of. This change has not been included in this PR, but we will follow up on this as part of Object query benchmarking. As part of this change, I enabled queries for historical objects outside the available range. Later (with the use of an `obj_version` index) it will also be possible to enable dynamic field look-ups on historical objects as well. ``` sui$ cargo nextest run -p sui-graphql-rpc sui$ cargo nextest run -p sui-graphql-e2e-tests --features pg_integration ``` Run the following query -- after this change, it takes about 8s to complete on the server, fetching about 80 objects, while previously it would either timeout or squeak in *just* under the 40s timeout. I expect this number to improve further once we have an efficient way to map object ids and versions to a checkpoint sequence number. ```graphql query { transactionBlocks(last: 5) { nodes { effects { objectChanges(first: 50) { pageInfo { hasNextPage } nodes { idCreated idDeleted inputState { asMoveObject { contents { json } } } outputState { asMoveObject { contents { json } } } } } } } } } ``` --- Check each box that your changes affect. If none of the boxes relate to your changes, release notes aren't required. For each box you select, include information after the relevant heading that describes the impact of your changes that a user might notice and any actions they must take to implement updates. - [ ] Protocol: - [ ] Nodes (Validators and Full nodes): - [ ] Indexer: - [ ] JSON-RPC: - [x] GraphQL: Queries for historical versions of objects will now return data even if that version of the object is outside the available range. - [ ] CLI: - [ ] Rust SDK: ## Description Describe the changes or additions included in this PR. ## Test plan How did you test the new or updated feature? --- ## Release notes Check each box that your changes affect. If none of the boxes relate to your changes, release notes aren't required. For each box you select, include information after the relevant heading that describes the impact of your changes that a user might notice and any actions they must take to implement updates. - [ ] Protocol: - [ ] Nodes (Validators and Full nodes): - [ ] Indexer: - [ ] JSON-RPC: - [ ] GraphQL: - [ ] CLI: - [ ] Rust SDK: --------- Co-authored-by: Ashok Menon <[email protected]>
Description
Implement data loaders for fetching historical object versions, objects bounded by their parent versions, and the latest versions of an object at a given checkpoint.
By implementing an Object DataLoader, we also implicitly get support for data-loading all derived types (
MoveObject
,MovePackage
,MoveModule
,DynamicField
,Coin
, etc).These implementations (particularly historical queries and queries where the version can be bounded by a parent version) can be made even more efficient with the existence of an index/side table that maps an object's ID and version to the checkpoint it is part of. This change has not been included in this PR, but we will follow up on this as part of Object query benchmarking.
As part of this change, I enabled queries for historical objects outside the available range. Later (with the use of an
obj_version
index) it will also be possible to enable dynamic field look-ups on historical objects as well.Test Plan
Run the following query -- after this change, it takes about 8s to complete on the server, fetching about 80 objects, while previously it would either timeout or squeak in just under the 40s timeout. I expect this number to improve further once we have an efficient way to map object ids and versions to a checkpoint sequence number.
Release notes
Check each box that your changes affect. If none of the boxes relate to your changes, release notes aren't required.
For each box you select, include information after the relevant heading that describes the impact of your changes that a user might notice and any actions they must take to implement updates.