Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Errored subgraphs sometimes resume correctly after restart #3236

Closed
aasseman opened this issue Feb 9, 2022 · 11 comments · Fixed by #3316
Closed

Errored subgraphs sometimes resume correctly after restart #3236

aasseman opened this issue Feb 9, 2022 · 11 comments · Fixed by #3316
Assignees

Comments

@aasseman
Copy link

aasseman commented Feb 9, 2022

Do you want to request a feature or report a bug?
Bug

What is the current behavior?
Graph-node v0.25.0.

Some subgraph's indexing would hang up, declaring a deterministic error. But after one (or more) restarts of the graph-node, would resume with seemingly no issues. Observed on QmTBxvMF6YnbT1eYeRx9XQpH4WvxTV53vdptCCZFiZSprg (Enzyme) and Qmf5XXWA8zhHbdvWqtPcR3jFkmb5FLR4MAefEYx8E3pHfr (Graph network).

If the current behavior is a bug, please provide the steps to reproduce and if possible a minimal demo of the problem.
Index QmTBxvMF6YnbT1eYeRx9XQpH4WvxTV53vdptCCZFiZSprg or Qmf5XXWA8zhHbdvWqtPcR3jFkmb5FLR4MAefEYx8E3pHfr.

What is the expected behavior?
The error should really be deterministic. Here it seems that due to unknown factors, a subgraph can be stuck, then unstuck.

Logs for QmTBxvMF6YnbT1eYeRx9XQpH4WvxTV53vdptCCZFiZSprg:

Feb 08 03:02:22.060 DEBG Trying again after eth_call RPC call failed (attempt #1) with result Err(Web3Error(Transport("Error(IncompleteMessage)"))), block_hash: 0xc3b82bc6897a2f490c757dec90fa5ac07043b64e7e442822f8427b240ba8641e, block_number: 14162904, sgd: 10, subgraph_id: QmTBxvMF6YnbT1eYeRx9XQpH4WvxTV53vdptCCZFiZSprg, component: SubgraphInstanceManager 
Feb 09 08:02:03.161 ERRO Handler skipped due to execution failure, error: Critical error logged in mapping wasm backtrace: 0: 0x22bc - <unknown>!src/utils/logCritical/logCritical 1: 0x238b - <unknown>!src/entities/Release/createRelease 2: 0x2787 - <unknown>!src/entities/Release/ensureRelease 3: 0x2aeb - <unknown>!src/mappings/Dispatcher/handleCurrentFundDeployerSet , handler: handleCurrentFundDeployerSet, block_hash: 0x2d759b06d445f158b8b44d8f12d7803a2a3b3653e47f651cbbcbf60ab6e81a02, block_number: 14170674, sgd: 10, subgraph_id: QmTBxvMF6YnbT1eYeRx9XQpH4WvxTV53vdptCCZFiZSprg, component: SubgraphInstanceManager 
Feb 09 08:02:03.197 ERRO Subgraph error 1/1, code: SubgraphSyncingFailure, error: Critical error logged in mapping wasm backtrace: 0: 0x22bc - <unknown>!src/utils/logCritical/logCritical 1: 0x238b - <unknown>!src/entities/Release/createRelease 2: 0x2787 - <unknown>!src/entities/Release/ensureRelease 3: 0x2aeb - <unknown>!src/mappings/Dispatcher/handleCurrentFundDeployerSet in handler `handleCurrentFundDeployerSet` at block #14170674 (2d759b06d445f158b8b44d8f12d7803a2a3b3653e47f651cbbcbf60ab6e81a02), block_hash: 0x2d759b06d445f158b8b44d8f12d7803a2a3b3653e47f651cbbcbf60ab6e81a02, block_number: 14170674, sgd: 10, subgraph_id: QmTBxvMF6YnbT1eYeRx9XQpH4WvxTV53vdptCCZFiZSprg, component: SubgraphInstanceManager 
Feb 09 08:02:03.356 ERRO Subgraph instance failed to run: Critical error logged in mapping wasm backtrace: 0: 0x22bc - <unknown>!src/utils/logCritical/logCritical 1: 0x238b - <unknown>!src/entities/Release/createRelease 2: 0x2787 - <unknown>!src/entities/Release/ensureRelease 3: 0x2aeb - <unknown>!src/mappings/Dispatcher/handleCurrentFundDeployerSet in handler `handleCurrentFundDeployerSet` at block #14170674 (2d759b06d445f158b8b44d8f12d7803a2a3b3653e47f651cbbcbf60ab6e81a02), code: SubgraphSyncingFailure, sgd: 10, subgraph_id: QmTBxvMF6YnbT1eYeRx9XQpH4WvxTV53vdptCCZFiZSprg, component: SubgraphInstanceManager 
... index-node restart ...
Feb 09 19:06:55.128 INFO Reverting errored block, to_block_hash: 0x6427d445b10086a920ab36ff061e032dfbec9bc1fc83a77bd570bca5172f6026, to_block_number: 14170673, from_block_hash: 0x2d759b06d445f158b8b44d8f12d7803a2a3b3653e47f651cbbcbf60ab6e81a02, from_block_number: 14170674, subgraph_id: QmTBxvMF6YnbT1eYeRx9XQpH4WvxTV53vdptCCZFiZSprg, shard: primary, component: Store 
...
Feb 09 19:07:56.034 DEBG Requesting 0 block(s), sgd: 10, subgraph_id: QmTBxvMF6YnbT1eYeRx9XQpH4WvxTV53vdptCCZFiZSprg, component: BlockStream 
Feb 09 19:07:56.035 DEBG Processing 6 triggers, sgd: 10, subgraph_id: QmTBxvMF6YnbT1eYeRx9XQpH4WvxTV53vdptCCZFiZSprg, component: BlockStream 
Feb 09 19:07:56.038 DEBG 1 candidate trigger in this block, block_hash: 0x44142e5015a24d401615c1f75a523b97d4aa574828965ff5972511968cd9c8b5, block_number: 14170703, sgd: 10, subgraph_id: QmTBxvMF6YnbT1eYeRx9XQpH4WvxTV53vdptCCZFiZSprg, component: SubgraphInstanceManager 
Feb 09 19:07:56.091 INFO Done processing trigger, data_source: ChainlinkAggregatorDataSource, handler: handleAnswerUpdated, total_ms: 51, address: 0x37bc…a9e6, signature: AnswerUpdated(indexed int256,indexed uint256,uint256), block_hash: 0x44142e5015a24d401615c1f75a523b97d4aa574828965ff5972511968cd9c8b5, block_number: 14170703, sgd: 10, subgraph_id: QmTBxvMF6YnbT1eYeRx9XQpH4WvxTV53vdptCCZFiZSprg, component: SubgraphInstanceManager 
Feb 09 19:07:56.094 INFO Applying 6 entity operation(s), block_hash: 0x44142e5015a24d401615c1f75a523b97d4aa574828965ff5972511968cd9c8b5, block_number: 14170703, sgd: 10, subgraph_id: QmTBxvMF6YnbT1eYeRx9XQpH4WvxTV53vdptCCZFiZSprg, component: SubgraphInstanceManager 
Feb 09 19:07:56.418 DEBG 2 candidate triggers in this block, block_hash: 0x98a0a6323fe06e5486235853de27b8a10c0c5953e75bdb0a46d520a5174e66b0, block_number: 14170715, sgd: 10, subgraph_id: QmTBxvMF6YnbT1eYeRx9XQpH4WvxTV53vdptCCZFiZSprg, component: SubgraphInstanceManager 
... Indexing all the way to chain head without errors ...

Logs for Qmf5XXWA8zhHbdvWqtPcR3jFkmb5FLR4MAefEYx8E3pHfr:

Feb 05 22:12:55.147 ERRO Subgraph error 1/1, code: SubgraphSyncingFailure, error: Mapping aborted at src/types/schema.ts, line 1757, column 4, with message: Cannot save SubgraphDeployment entity without an ID wasm backtrace: 0: 0x28f2 - <unknown>!src/types/schema/SubgraphDeployment#save 1: 0x29cb - <unknown>!src/mappings/gns/handleSubgraphMetadataUpdated in handler `handleSubgraphMetadataUpdated` at block #13842506 (24cce9d54fb7d9ab6c41d83773fe5fb16e674a4ad28a36ce42c171d6e8216592), block_hash: 0x24cce9d54fb7d9ab6c41d83773fe5fb16e674a4ad28a36ce42c171d6e8216592, block_number: 13842506, sgd: 8, subgraph_id: Qmf5XXWA8zhHbdvWqtPcR3jFkmb5FLR4MAefEYx8E3pHfr, component: SubgraphInstanceManager 
Feb 05 22:12:55.256 ERRO Subgraph instance failed to run: Mapping aborted at src/types/schema.ts, line 1757, column 4, with message: Cannot save SubgraphDeployment entity without an ID wasm backtrace: 0: 0x28f2 - <unknown>!src/types/schema/SubgraphDeployment#save 1: 0x29cb - <unknown>!src/mappings/gns/handleSubgraphMetadataUpdated in handler `handleSubgraphMetadataUpdated` at block #13842506 (24cce9d54fb7d9ab6c41d83773fe5fb16e674a4ad28a36ce42c171d6e8216592), code: SubgraphSyncingFailure, sgd: 8, subgraph_id: Qmf5XXWA8zhHbdvWqtPcR3jFkmb5FLR4MAefEYx8E3pHfr, component: SubgraphInstanceManager 
... restart index-node ...
Feb 09 19:06:52.054 INFO Reverting errored block, to_block_hash: 0x26a7dab59ab3ea10145c99c1af169c320310299a46d74f0cf81f74d9f8077217, to_block_number: 13842505, from_block_hash: 0x24cce9d54fb7d9ab6c41d83773fe5fb16e674a4ad28a36ce42c171d6e8216592, from_block_number: 13842506, subgraph_id: Qmf5XXWA8zhHbdvWqtPcR3jFkmb5FLR4MAefEYx8E3pHfr, shard: primary, component: Store 
... indexing all the way to chain head without errors ...
@leoyvens
Copy link
Collaborator

This is mysterious to me. Do you have the source code for the subgraphs that exhibited these issues?

@aasseman
Copy link
Author

This is mysterious to me. Do you have the source code for the subgraphs that exhibited these issues?

Not for QmTBxvMF6YnbT1eYeRx9XQpH4WvxTV53vdptCCZFiZSprg, but it looks like the same phenomenon happens on the Qmf5XXWA8zhHbdvWqtPcR3jFkmb5FLR4MAefEYx8E3pHfr (the Graph Network Subgraph), but less often though.

Indeed I had the issue again on QmTBxvMF6YnbT1eYeRx9XQpH4WvxTV53vdptCCZFiZSprg:

ERRO Subgraph instance failed to run: Critical error logged in mapping wasm backtrace: 0: 0x22bc - <unknown>!src/utils/logCritical/logCritical 1: 0x238b - <unknown>!src/entities/Release/createRelease 2: 0x2787 - <unknown>!src/entities/Release/ensureRelease 3: 0x32a7 - <unknown>!src/entities/Migration/ensureMigration 4: 0x3377 - <unknown>!src/mappings/Dispatcher/handleMigrationSignaled in handler `handleMigrationSignaled` at block #14176609 (78b02be061ad6d0cf55061901d57472c0553f13c29b9109c1d885ae86fd0cc87), code: SubgraphSyncingFailure, sgd: 10, subgraph_id: QmTBxvMF6YnbT1eYeRx9XQpH4WvxTV53vdptCCZFiZSprg, component: SubgraphInstanceManager 
... index-node restart ...
Feb 10 18:21:09.302 INFO Reverting errored block, to_block_hash: 0xfc5b12d88ea811429f01ea64dac103370d090f50934f895b7b09b23829b2f9d8, to_block_number: 14176608, from_block_hash: 0x78b02be061ad6d0cf55061901d57472c0553f13c29b9109c1d885ae86fd0cc87, from_block_number: 14176609, subgraph_id: QmTBxvMF6YnbT1eYeRx9XQpH4WvxTV53vdptCCZFiZSprg
... index to chain head ...
Feb 10 18:25:59.798 ERRO Subgraph error 1/1, code: SubgraphSyncingFailure, error: Critical error logged in mapping wasm backtrace: 0: 0x22bc - <unknown>!src/utils/logCritical/logCritical 1: 0x238b - <unknown>!src/entities/Release/createRelease 2: 0x2787 - <unknown>!src/entities/Release/ensureRelease 3: 0x32a7 - <unknown>!src/entities/Migration/ensureMigration 4: 0x3377 - <unknown>!src/mappings/Dispatcher/handleMigrationSignaled in handler `handleMigrationSignaled` at block #14178043 (54ef502d32fd895e584a9de2558797e3152d559b2ad5eb489e76599c2632c0f9), block_hash: 0x54ef502d32fd895e584a9de2558797e3152d559b2ad5eb489e76599c2632c0f9, block_number: 14178043, sgd: 10, subgraph_id: QmTBxvMF6YnbT1eYeRx9XQpH4WvxTV53vdptCCZFiZSprg, component: SubgraphInstanceManager 
... index-node restart ...
Feb 10 18:28:33.268 INFO Reverting errored block, to_block_hash: 0x83aaceb144cefd07c9030fa8da4335c24ab205bc60deb50489888fa3d2deba7a, to_block_number: 14178042, from_block_hash: 0x54ef502d32fd895e584a9de2558797e3152d559b2ad5eb489e76599c2632c0f9, from_block_number: 14178043, subgraph_id: QmTBxvMF6YnbT1eYeRx9XQpH4WvxTV53vdptCCZFiZSprg, shard: primary, component: Store 
... indexing for a bit ...
Feb 10 18:29:23.364 ERRO Handler skipped due to execution failure, error: Critical error logged in mapping wasm backtrace: 0: 0x22bc - <unknown>!src/utils/logCritical/logCritical 1: 0x238b - <unknown>!src/entities/Release/createRelease 2: 0x2787 - <unknown>!src/entities/Release/ensureRelease 3: 0x32a7 - <unknown>!src/entities/Migration/ensureMigration 4: 0x3377 - <unknown>!src/mappings/Dispatcher/handleMigrationSignaled , handler: handleMigrationSignaled, block_hash: 0x7e5c09bf47cf661681d31d8eba792f53afd24e1e393f4f45adbb197abdb30e0e, block_number: 14178128, sgd: 10, subgraph_id: QmTBxvMF6YnbT1eYeRx9XQpH4WvxTV53vdptCCZFiZSprg, component: SubgraphInstanceManager 

@leoyvens
Copy link
Collaborator

leoyvens commented Feb 10, 2022

It says "Critical error logged" but it doesn't actually log any CRIT messages?

@aasseman
Copy link
Author

Ah I filtered only for ERRO. Here are the CRIT messages for QmTBxvMF6YnbT1eYeRx9XQpH4WvxTV53vdptCCZFiZSprg:

Feb 09 08:02:03.150 CRIT Release with FundDeployer 0x4f1c53f096533c04d8157efb6bca3eb22ddc6360 not found., data_source: DispatcherDataSource, block_hash: 0x2d759b06d445f158b8b44d8f12d7803a2a3b3653e47f651cbbcbf60ab6e81a02, block_number: 14170674, sgd: 10, subgraph_id: QmTBxvMF6YnbT1eYeRx9XQpH4WvxTV53vdptCCZFiZSprg, component: SubgraphInstanceManager 
Feb 10 06:00:52.609 CRIT Release with FundDeployer 0x4f1c53f096533c04d8157efb6bca3eb22ddc6360 not found., data_source: DispatcherDataSource, block_hash: 0x78b02be061ad6d0cf55061901d57472c0553f13c29b9109c1d885ae86fd0cc87, block_number: 14176609, sgd: 10, subgraph_id: QmTBxvMF6YnbT1eYeRx9XQpH4WvxTV53vdptCCZFiZSprg, component: SubgraphInstanceManager 
Feb 10 18:25:59.794 CRIT Release with FundDeployer 0x4f1c53f096533c04d8157efb6bca3eb22ddc6360 not found., data_source: DispatcherDataSource, block_hash: 0x54ef502d32fd895e584a9de2558797e3152d559b2ad5eb489e76599c2632c0f9, block_number: 14178043, sgd: 10, subgraph_id: QmTBxvMF6YnbT1eYeRx9XQpH4WvxTV53vdptCCZFiZSprg, component: SubgraphInstanceManager 
Feb 10 18:29:23.363 CRIT Release with FundDeployer 0x4f1c53f096533c04d8157efb6bca3eb22ddc6360 not found., data_source: DispatcherDataSource, block_hash: 0x7e5c09bf47cf661681d31d8eba792f53afd24e1e393f4f45adbb197abdb30e0e, block_number: 14178128, sgd: 10, subgraph_id: QmTBxvMF6YnbT1eYeRx9XQpH4WvxTV53vdptCCZFiZSprg, component: SubgraphInstanceManager 
Feb 10 18:31:51.973 CRIT Release with FundDeployer 0x4f1c53f096533c04d8157efb6bca3eb22ddc6360 not found., data_source: DispatcherDataSource, block_hash: 0x4c49da33fad448e75127543352347f220c74af4885b21f98d2d6a559da59a376, block_number: 14178168, sgd: 10, subgraph_id: QmTBxvMF6YnbT1eYeRx9XQpH4WvxTV53vdptCCZFiZSprg, component: SubgraphInstanceManager 
Feb 10 18:49:49.460 CRIT Release with FundDeployer 0x4f1c53f096533c04d8157efb6bca3eb22ddc6360 not found., data_source: DispatcherDataSource, block_hash: 0xdce407ae6411712ae53242726d552fa693061b69aa68974ffd0266ab446777aa, block_number: 14178184, sgd: 10, subgraph_id: QmTBxvMF6YnbT1eYeRx9XQpH4WvxTV53vdptCCZFiZSprg, component: SubgraphInstanceManager 
Feb 10 19:52:51.583 CRIT Release with FundDeployer 0x4f1c53f096533c04d8157efb6bca3eb22ddc6360 not found., data_source: DispatcherDataSource, block_hash: 0x3299f51399fd69770e90fd709010700e3d17db8ff022b1b62f3dccb29bc2bc97, block_number: 14178264, sgd: 10, subgraph_id: QmTBxvMF6YnbT1eYeRx9XQpH4WvxTV53vdptCCZFiZSprg, component: SubgraphInstanceManager 

Between each of those entries, there is a graph-node restart and little bit more indexing happening.

@cryptovestor21
Copy link

Seeing the same behaviour. Had to close an Enzyme allocation with last known good PoI as restarting the graph node was resulting in less and less time passing before a subgraph failure would occur.

@azf20
Copy link
Contributor

azf20 commented Feb 15, 2022

Thanks @aasseman and @cryptovestor21 - @otaviopace does any of the above ring a bell given changes in non-deterministic error handling recently?

@evaporei
Copy link
Contributor

I'll try take a look this week 🙂

@evaporei evaporei self-assigned this Feb 18, 2022
@evaporei
Copy link
Contributor

@aasseman I finally took a look at those.

In Enzyme's case, the error happens here, when the address doesn't match any of the addresses A and B.

I'm not sure how this can happen in an non-deterministic fashion. I'll try to run this version locally and see if I can index the same block multiple times until it happens.

And for the graph-network-subgraph, the error is here, because the ID being passed here somehow is null.

Again, not sure how this can happen in a non-deterministic fashion.

One thing that I'm curious about is, what RPC provider do you use? Also it might be helpful if I can have access to those IPFS files to deploy locally and debug. I can use the commits I've linked above, but I might be wrong in which version I'm using compared to what you're running.

@aasseman
Copy link
Author

In case of Enzyme I used Chainstack (geth AFAIK).

For graph-network-subgraph I saw the same issue on both Chainstack and Erigon (2 different indexers).

@evaporei
Copy link
Contributor

@aasseman I've been able to reproduce a similar error in a more recent block for the Enzyme subgraph, using the PR linked above ☝️

I've reproduced it with both alchemy and infura providers, so it's probably a bug on our end (graph-node side).

@evaporei
Copy link
Contributor

evaporei commented Mar 3, 2022

@aasseman I've been able to fix the issue 🙏

This PR will do it, we'll plan today how this will be released in the next version 😊

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants