Errored subgraphs sometimes resume correctly after restart #3236

aasseman · 2022-02-09T20:16:49Z

Do you want to request a feature or report a bug?
Bug

What is the current behavior?
Graph-node v0.25.0.

Some subgraph's indexing would hang up, declaring a deterministic error. But after one (or more) restarts of the graph-node, would resume with seemingly no issues. Observed on QmTBxvMF6YnbT1eYeRx9XQpH4WvxTV53vdptCCZFiZSprg (Enzyme) and Qmf5XXWA8zhHbdvWqtPcR3jFkmb5FLR4MAefEYx8E3pHfr (Graph network).

If the current behavior is a bug, please provide the steps to reproduce and if possible a minimal demo of the problem.
Index QmTBxvMF6YnbT1eYeRx9XQpH4WvxTV53vdptCCZFiZSprg or Qmf5XXWA8zhHbdvWqtPcR3jFkmb5FLR4MAefEYx8E3pHfr.

What is the expected behavior?
The error should really be deterministic. Here it seems that due to unknown factors, a subgraph can be stuck, then unstuck.

Logs for QmTBxvMF6YnbT1eYeRx9XQpH4WvxTV53vdptCCZFiZSprg:

Feb 08 03:02:22.060 DEBG Trying again after eth_call RPC call failed (attempt #1) with result Err(Web3Error(Transport("Error(IncompleteMessage)"))), block_hash: 0xc3b82bc6897a2f490c757dec90fa5ac07043b64e7e442822f8427b240ba8641e, block_number: 14162904, sgd: 10, subgraph_id: QmTBxvMF6YnbT1eYeRx9XQpH4WvxTV53vdptCCZFiZSprg, component: SubgraphInstanceManager 
Feb 09 08:02:03.161 ERRO Handler skipped due to execution failure, error: Critical error logged in mapping wasm backtrace: 0: 0x22bc - <unknown>!src/utils/logCritical/logCritical 1: 0x238b - <unknown>!src/entities/Release/createRelease 2: 0x2787 - <unknown>!src/entities/Release/ensureRelease 3: 0x2aeb - <unknown>!src/mappings/Dispatcher/handleCurrentFundDeployerSet , handler: handleCurrentFundDeployerSet, block_hash: 0x2d759b06d445f158b8b44d8f12d7803a2a3b3653e47f651cbbcbf60ab6e81a02, block_number: 14170674, sgd: 10, subgraph_id: QmTBxvMF6YnbT1eYeRx9XQpH4WvxTV53vdptCCZFiZSprg, component: SubgraphInstanceManager 
Feb 09 08:02:03.197 ERRO Subgraph error 1/1, code: SubgraphSyncingFailure, error: Critical error logged in mapping wasm backtrace: 0: 0x22bc - <unknown>!src/utils/logCritical/logCritical 1: 0x238b - <unknown>!src/entities/Release/createRelease 2: 0x2787 - <unknown>!src/entities/Release/ensureRelease 3: 0x2aeb - <unknown>!src/mappings/Dispatcher/handleCurrentFundDeployerSet in handler `handleCurrentFundDeployerSet` at block #14170674 (2d759b06d445f158b8b44d8f12d7803a2a3b3653e47f651cbbcbf60ab6e81a02), block_hash: 0x2d759b06d445f158b8b44d8f12d7803a2a3b3653e47f651cbbcbf60ab6e81a02, block_number: 14170674, sgd: 10, subgraph_id: QmTBxvMF6YnbT1eYeRx9XQpH4WvxTV53vdptCCZFiZSprg, component: SubgraphInstanceManager 
Feb 09 08:02:03.356 ERRO Subgraph instance failed to run: Critical error logged in mapping wasm backtrace: 0: 0x22bc - <unknown>!src/utils/logCritical/logCritical 1: 0x238b - <unknown>!src/entities/Release/createRelease 2: 0x2787 - <unknown>!src/entities/Release/ensureRelease 3: 0x2aeb - <unknown>!src/mappings/Dispatcher/handleCurrentFundDeployerSet in handler `handleCurrentFundDeployerSet` at block #14170674 (2d759b06d445f158b8b44d8f12d7803a2a3b3653e47f651cbbcbf60ab6e81a02), code: SubgraphSyncingFailure, sgd: 10, subgraph_id: QmTBxvMF6YnbT1eYeRx9XQpH4WvxTV53vdptCCZFiZSprg, component: SubgraphInstanceManager 
... index-node restart ...
Feb 09 19:06:55.128 INFO Reverting errored block, to_block_hash: 0x6427d445b10086a920ab36ff061e032dfbec9bc1fc83a77bd570bca5172f6026, to_block_number: 14170673, from_block_hash: 0x2d759b06d445f158b8b44d8f12d7803a2a3b3653e47f651cbbcbf60ab6e81a02, from_block_number: 14170674, subgraph_id: QmTBxvMF6YnbT1eYeRx9XQpH4WvxTV53vdptCCZFiZSprg, shard: primary, component: Store 
...
Feb 09 19:07:56.034 DEBG Requesting 0 block(s), sgd: 10, subgraph_id: QmTBxvMF6YnbT1eYeRx9XQpH4WvxTV53vdptCCZFiZSprg, component: BlockStream 
Feb 09 19:07:56.035 DEBG Processing 6 triggers, sgd: 10, subgraph_id: QmTBxvMF6YnbT1eYeRx9XQpH4WvxTV53vdptCCZFiZSprg, component: BlockStream 
Feb 09 19:07:56.038 DEBG 1 candidate trigger in this block, block_hash: 0x44142e5015a24d401615c1f75a523b97d4aa574828965ff5972511968cd9c8b5, block_number: 14170703, sgd: 10, subgraph_id: QmTBxvMF6YnbT1eYeRx9XQpH4WvxTV53vdptCCZFiZSprg, component: SubgraphInstanceManager 
Feb 09 19:07:56.091 INFO Done processing trigger, data_source: ChainlinkAggregatorDataSource, handler: handleAnswerUpdated, total_ms: 51, address: 0x37bc…a9e6, signature: AnswerUpdated(indexed int256,indexed uint256,uint256), block_hash: 0x44142e5015a24d401615c1f75a523b97d4aa574828965ff5972511968cd9c8b5, block_number: 14170703, sgd: 10, subgraph_id: QmTBxvMF6YnbT1eYeRx9XQpH4WvxTV53vdptCCZFiZSprg, component: SubgraphInstanceManager 
Feb 09 19:07:56.094 INFO Applying 6 entity operation(s), block_hash: 0x44142e5015a24d401615c1f75a523b97d4aa574828965ff5972511968cd9c8b5, block_number: 14170703, sgd: 10, subgraph_id: QmTBxvMF6YnbT1eYeRx9XQpH4WvxTV53vdptCCZFiZSprg, component: SubgraphInstanceManager 
Feb 09 19:07:56.418 DEBG 2 candidate triggers in this block, block_hash: 0x98a0a6323fe06e5486235853de27b8a10c0c5953e75bdb0a46d520a5174e66b0, block_number: 14170715, sgd: 10, subgraph_id: QmTBxvMF6YnbT1eYeRx9XQpH4WvxTV53vdptCCZFiZSprg, component: SubgraphInstanceManager 
... Indexing all the way to chain head without errors ...

Logs for Qmf5XXWA8zhHbdvWqtPcR3jFkmb5FLR4MAefEYx8E3pHfr:

Feb 05 22:12:55.147 ERRO Subgraph error 1/1, code: SubgraphSyncingFailure, error: Mapping aborted at src/types/schema.ts, line 1757, column 4, with message: Cannot save SubgraphDeployment entity without an ID wasm backtrace: 0: 0x28f2 - <unknown>!src/types/schema/SubgraphDeployment#save 1: 0x29cb - <unknown>!src/mappings/gns/handleSubgraphMetadataUpdated in handler `handleSubgraphMetadataUpdated` at block #13842506 (24cce9d54fb7d9ab6c41d83773fe5fb16e674a4ad28a36ce42c171d6e8216592), block_hash: 0x24cce9d54fb7d9ab6c41d83773fe5fb16e674a4ad28a36ce42c171d6e8216592, block_number: 13842506, sgd: 8, subgraph_id: Qmf5XXWA8zhHbdvWqtPcR3jFkmb5FLR4MAefEYx8E3pHfr, component: SubgraphInstanceManager 
Feb 05 22:12:55.256 ERRO Subgraph instance failed to run: Mapping aborted at src/types/schema.ts, line 1757, column 4, with message: Cannot save SubgraphDeployment entity without an ID wasm backtrace: 0: 0x28f2 - <unknown>!src/types/schema/SubgraphDeployment#save 1: 0x29cb - <unknown>!src/mappings/gns/handleSubgraphMetadataUpdated in handler `handleSubgraphMetadataUpdated` at block #13842506 (24cce9d54fb7d9ab6c41d83773fe5fb16e674a4ad28a36ce42c171d6e8216592), code: SubgraphSyncingFailure, sgd: 8, subgraph_id: Qmf5XXWA8zhHbdvWqtPcR3jFkmb5FLR4MAefEYx8E3pHfr, component: SubgraphInstanceManager 
... restart index-node ...
Feb 09 19:06:52.054 INFO Reverting errored block, to_block_hash: 0x26a7dab59ab3ea10145c99c1af169c320310299a46d74f0cf81f74d9f8077217, to_block_number: 13842505, from_block_hash: 0x24cce9d54fb7d9ab6c41d83773fe5fb16e674a4ad28a36ce42c171d6e8216592, from_block_number: 13842506, subgraph_id: Qmf5XXWA8zhHbdvWqtPcR3jFkmb5FLR4MAefEYx8E3pHfr, shard: primary, component: Store 
... indexing all the way to chain head without errors ...

The text was updated successfully, but these errors were encountered:

leoyvens · 2022-02-10T18:14:42Z

This is mysterious to me. Do you have the source code for the subgraphs that exhibited these issues?

aasseman · 2022-02-10T18:36:07Z

This is mysterious to me. Do you have the source code for the subgraphs that exhibited these issues?

Not for QmTBxvMF6YnbT1eYeRx9XQpH4WvxTV53vdptCCZFiZSprg, but it looks like the same phenomenon happens on the Qmf5XXWA8zhHbdvWqtPcR3jFkmb5FLR4MAefEYx8E3pHfr (the Graph Network Subgraph), but less often though.

Indeed I had the issue again on QmTBxvMF6YnbT1eYeRx9XQpH4WvxTV53vdptCCZFiZSprg:

ERRO Subgraph instance failed to run: Critical error logged in mapping wasm backtrace: 0: 0x22bc - <unknown>!src/utils/logCritical/logCritical 1: 0x238b - <unknown>!src/entities/Release/createRelease 2: 0x2787 - <unknown>!src/entities/Release/ensureRelease 3: 0x32a7 - <unknown>!src/entities/Migration/ensureMigration 4: 0x3377 - <unknown>!src/mappings/Dispatcher/handleMigrationSignaled in handler `handleMigrationSignaled` at block #14176609 (78b02be061ad6d0cf55061901d57472c0553f13c29b9109c1d885ae86fd0cc87), code: SubgraphSyncingFailure, sgd: 10, subgraph_id: QmTBxvMF6YnbT1eYeRx9XQpH4WvxTV53vdptCCZFiZSprg, component: SubgraphInstanceManager 
... index-node restart ...
Feb 10 18:21:09.302 INFO Reverting errored block, to_block_hash: 0xfc5b12d88ea811429f01ea64dac103370d090f50934f895b7b09b23829b2f9d8, to_block_number: 14176608, from_block_hash: 0x78b02be061ad6d0cf55061901d57472c0553f13c29b9109c1d885ae86fd0cc87, from_block_number: 14176609, subgraph_id: QmTBxvMF6YnbT1eYeRx9XQpH4WvxTV53vdptCCZFiZSprg
... index to chain head ...
Feb 10 18:25:59.798 ERRO Subgraph error 1/1, code: SubgraphSyncingFailure, error: Critical error logged in mapping wasm backtrace: 0: 0x22bc - <unknown>!src/utils/logCritical/logCritical 1: 0x238b - <unknown>!src/entities/Release/createRelease 2: 0x2787 - <unknown>!src/entities/Release/ensureRelease 3: 0x32a7 - <unknown>!src/entities/Migration/ensureMigration 4: 0x3377 - <unknown>!src/mappings/Dispatcher/handleMigrationSignaled in handler `handleMigrationSignaled` at block #14178043 (54ef502d32fd895e584a9de2558797e3152d559b2ad5eb489e76599c2632c0f9), block_hash: 0x54ef502d32fd895e584a9de2558797e3152d559b2ad5eb489e76599c2632c0f9, block_number: 14178043, sgd: 10, subgraph_id: QmTBxvMF6YnbT1eYeRx9XQpH4WvxTV53vdptCCZFiZSprg, component: SubgraphInstanceManager 
... index-node restart ...
Feb 10 18:28:33.268 INFO Reverting errored block, to_block_hash: 0x83aaceb144cefd07c9030fa8da4335c24ab205bc60deb50489888fa3d2deba7a, to_block_number: 14178042, from_block_hash: 0x54ef502d32fd895e584a9de2558797e3152d559b2ad5eb489e76599c2632c0f9, from_block_number: 14178043, subgraph_id: QmTBxvMF6YnbT1eYeRx9XQpH4WvxTV53vdptCCZFiZSprg, shard: primary, component: Store 
... indexing for a bit ...
Feb 10 18:29:23.364 ERRO Handler skipped due to execution failure, error: Critical error logged in mapping wasm backtrace: 0: 0x22bc - <unknown>!src/utils/logCritical/logCritical 1: 0x238b - <unknown>!src/entities/Release/createRelease 2: 0x2787 - <unknown>!src/entities/Release/ensureRelease 3: 0x32a7 - <unknown>!src/entities/Migration/ensureMigration 4: 0x3377 - <unknown>!src/mappings/Dispatcher/handleMigrationSignaled , handler: handleMigrationSignaled, block_hash: 0x7e5c09bf47cf661681d31d8eba792f53afd24e1e393f4f45adbb197abdb30e0e, block_number: 14178128, sgd: 10, subgraph_id: QmTBxvMF6YnbT1eYeRx9XQpH4WvxTV53vdptCCZFiZSprg, component: SubgraphInstanceManager

leoyvens · 2022-02-10T20:22:52Z

It says "Critical error logged" but it doesn't actually log any CRIT messages?

aasseman · 2022-02-10T20:53:05Z

Ah I filtered only for ERRO. Here are the CRIT messages for QmTBxvMF6YnbT1eYeRx9XQpH4WvxTV53vdptCCZFiZSprg:

Feb 09 08:02:03.150 CRIT Release with FundDeployer 0x4f1c53f096533c04d8157efb6bca3eb22ddc6360 not found., data_source: DispatcherDataSource, block_hash: 0x2d759b06d445f158b8b44d8f12d7803a2a3b3653e47f651cbbcbf60ab6e81a02, block_number: 14170674, sgd: 10, subgraph_id: QmTBxvMF6YnbT1eYeRx9XQpH4WvxTV53vdptCCZFiZSprg, component: SubgraphInstanceManager 
Feb 10 06:00:52.609 CRIT Release with FundDeployer 0x4f1c53f096533c04d8157efb6bca3eb22ddc6360 not found., data_source: DispatcherDataSource, block_hash: 0x78b02be061ad6d0cf55061901d57472c0553f13c29b9109c1d885ae86fd0cc87, block_number: 14176609, sgd: 10, subgraph_id: QmTBxvMF6YnbT1eYeRx9XQpH4WvxTV53vdptCCZFiZSprg, component: SubgraphInstanceManager 
Feb 10 18:25:59.794 CRIT Release with FundDeployer 0x4f1c53f096533c04d8157efb6bca3eb22ddc6360 not found., data_source: DispatcherDataSource, block_hash: 0x54ef502d32fd895e584a9de2558797e3152d559b2ad5eb489e76599c2632c0f9, block_number: 14178043, sgd: 10, subgraph_id: QmTBxvMF6YnbT1eYeRx9XQpH4WvxTV53vdptCCZFiZSprg, component: SubgraphInstanceManager 
Feb 10 18:29:23.363 CRIT Release with FundDeployer 0x4f1c53f096533c04d8157efb6bca3eb22ddc6360 not found., data_source: DispatcherDataSource, block_hash: 0x7e5c09bf47cf661681d31d8eba792f53afd24e1e393f4f45adbb197abdb30e0e, block_number: 14178128, sgd: 10, subgraph_id: QmTBxvMF6YnbT1eYeRx9XQpH4WvxTV53vdptCCZFiZSprg, component: SubgraphInstanceManager 
Feb 10 18:31:51.973 CRIT Release with FundDeployer 0x4f1c53f096533c04d8157efb6bca3eb22ddc6360 not found., data_source: DispatcherDataSource, block_hash: 0x4c49da33fad448e75127543352347f220c74af4885b21f98d2d6a559da59a376, block_number: 14178168, sgd: 10, subgraph_id: QmTBxvMF6YnbT1eYeRx9XQpH4WvxTV53vdptCCZFiZSprg, component: SubgraphInstanceManager 
Feb 10 18:49:49.460 CRIT Release with FundDeployer 0x4f1c53f096533c04d8157efb6bca3eb22ddc6360 not found., data_source: DispatcherDataSource, block_hash: 0xdce407ae6411712ae53242726d552fa693061b69aa68974ffd0266ab446777aa, block_number: 14178184, sgd: 10, subgraph_id: QmTBxvMF6YnbT1eYeRx9XQpH4WvxTV53vdptCCZFiZSprg, component: SubgraphInstanceManager 
Feb 10 19:52:51.583 CRIT Release with FundDeployer 0x4f1c53f096533c04d8157efb6bca3eb22ddc6360 not found., data_source: DispatcherDataSource, block_hash: 0x3299f51399fd69770e90fd709010700e3d17db8ff022b1b62f3dccb29bc2bc97, block_number: 14178264, sgd: 10, subgraph_id: QmTBxvMF6YnbT1eYeRx9XQpH4WvxTV53vdptCCZFiZSprg, component: SubgraphInstanceManager

Between each of those entries, there is a graph-node restart and little bit more indexing happening.

cryptovestor21 · 2022-02-15T16:05:59Z

Seeing the same behaviour. Had to close an Enzyme allocation with last known good PoI as restarting the graph node was resulting in less and less time passing before a subgraph failure would occur.

azf20 · 2022-02-15T16:24:31Z

Thanks @aasseman and @cryptovestor21 - @otaviopace does any of the above ring a bell given changes in non-deterministic error handling recently?

evaporei · 2022-02-15T18:39:43Z

I'll try take a look this week 🙂

evaporei · 2022-02-23T00:15:44Z

@aasseman I finally took a look at those.

In Enzyme's case, the error happens here, when the address doesn't match any of the addresses A and B.

I'm not sure how this can happen in an non-deterministic fashion. I'll try to run this version locally and see if I can index the same block multiple times until it happens.

And for the graph-network-subgraph, the error is here, because the ID being passed here somehow is null.

Again, not sure how this can happen in a non-deterministic fashion.

One thing that I'm curious about is, what RPC provider do you use? Also it might be helpful if I can have access to those IPFS files to deploy locally and debug. I can use the commits I've linked above, but I might be wrong in which version I'm using compared to what you're running.

aasseman · 2022-02-23T00:29:00Z

In case of Enzyme I used Chainstack (geth AFAIK).

For graph-network-subgraph I saw the same issue on both Chainstack and Erigon (2 different indexers).

evaporei · 2022-02-26T11:28:30Z

@aasseman I've been able to reproduce a similar error in a more recent block for the Enzyme subgraph, using the PR linked above ☝️

I've reproduced it with both alchemy and infura providers, so it's probably a bug on our end (graph-node side).

evaporei · 2022-03-03T16:38:27Z

@aasseman I've been able to fix the issue 🙏

This PR will do it, we'll plan today how this will be released in the next version 😊

evaporei self-assigned this Feb 18, 2022

evaporei mentioned this issue Feb 26, 2022

Improve debug forking #3292

Merged

evaporei mentioned this issue Mar 3, 2022

Reset BlockStream on error #3316

Merged

leoyvens closed this as completed in #3316 Mar 3, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Errored subgraphs sometimes resume correctly after restart #3236

Errored subgraphs sometimes resume correctly after restart #3236

aasseman commented Feb 9, 2022 •

edited

Loading

leoyvens commented Feb 10, 2022

aasseman commented Feb 10, 2022

leoyvens commented Feb 10, 2022 •

edited

Loading

aasseman commented Feb 10, 2022

cryptovestor21 commented Feb 15, 2022

azf20 commented Feb 15, 2022

evaporei commented Feb 15, 2022

evaporei commented Feb 23, 2022

aasseman commented Feb 23, 2022

evaporei commented Feb 26, 2022

evaporei commented Mar 3, 2022 •

edited

Loading

Errored subgraphs sometimes resume correctly after restart #3236

Errored subgraphs sometimes resume correctly after restart #3236

Comments

aasseman commented Feb 9, 2022 • edited Loading

leoyvens commented Feb 10, 2022

aasseman commented Feb 10, 2022

leoyvens commented Feb 10, 2022 • edited Loading

aasseman commented Feb 10, 2022

cryptovestor21 commented Feb 15, 2022

azf20 commented Feb 15, 2022

evaporei commented Feb 15, 2022

evaporei commented Feb 23, 2022

aasseman commented Feb 23, 2022

evaporei commented Feb 26, 2022

evaporei commented Mar 3, 2022 • edited Loading

aasseman commented Feb 9, 2022 •

edited

Loading

leoyvens commented Feb 10, 2022 •

edited

Loading

evaporei commented Mar 3, 2022 •

edited

Loading