Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix Asset Hub collator crashing when starting from genesis #1788

Merged
merged 21 commits into from
Oct 9, 2023

Conversation

georgepisaltu
Copy link
Contributor

@georgepisaltu georgepisaltu commented Oct 4, 2023

Fixes #1566

Description

When running polkadot-parachain --chain asset-hub-* with no DB, the node would crash because it would try to access Aura API which wasn't available in the genesis shell runtime that Asset Hub started with. More details on the issue here.

This PR makes the node start with a relay chain driven consensus which doesn't call into AuraApi::slot_duration, then after the API becomes available (after the runtime is upgraded to the proper Asset Hub), it uses the Aura consensus.

Checklist

  • My PR includes a detailed description as outlined in the "Description" section above
  • My PR follows the labeling requirements of this project (at minimum one label for T
    required)
  • I have made corresponding changes to the documentation (if applicable)
  • I have added tests that prove my fix is effective or that my feature works (if applicable)

@georgepisaltu georgepisaltu self-assigned this Oct 4, 2023
Base automatically changed from brad-ab-glutton to master October 4, 2023 19:28
@slumber slumber added the T9-cumulus This PR/Issue is related to cumulus. label Oct 5, 2023
Copy link
Member

@bkchr bkchr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generally looking good, but some of the details need to improve.

Comment on lines 1466 to 1471
let parachain_inherent = parachain_inherent.ok_or_else(|| {
Box::<dyn std::error::Error + Send + Sync>::from(
"Failed to create parachain inherent",
)
})?;
Ok(parachain_inherent)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
let parachain_inherent = parachain_inherent.ok_or_else(|| {
Box::<dyn std::error::Error + Send + Sync>::from(
"Failed to create parachain inherent",
)
})?;
Ok(parachain_inherent)
parachain_inherent.ok_or_else(|| {
Box::<dyn std::error::Error + Send + Sync>::from(
"Failed to create parachain inherent",
)
})

Comment on lines 1506 to 1537
let collation = {
let last_head_hash = match collator
.header_hash(request.persisted_validation_data().clone())
{
Some(header_hash) => header_hash,
None => {
request.complete(None);
continue
},
};
// Check if we have upgraded to an Aura compatible runtime and transition if
// necessary.
if client
.runtime_api()
.has_api::<dyn AuraApi<Block, AuraId>>(last_head_hash)
.unwrap_or(false)
{
// Respond to this request before transitioning to Aura.
request.complete(None);
break
}

collator
.clone()
.produce_candidate(
*request.relay_parent(),
request.persisted_validation_data().clone(),
)
.await
};

request.complete(collation);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
let collation = {
let last_head_hash = match collator
.header_hash(request.persisted_validation_data().clone())
{
Some(header_hash) => header_hash,
None => {
request.complete(None);
continue
},
};
// Check if we have upgraded to an Aura compatible runtime and transition if
// necessary.
if client
.runtime_api()
.has_api::<dyn AuraApi<Block, AuraId>>(last_head_hash)
.unwrap_or(false)
{
// Respond to this request before transitioning to Aura.
request.complete(None);
break
}
collator
.clone()
.produce_candidate(
*request.relay_parent(),
request.persisted_validation_data().clone(),
)
.await
};
request.complete(collation);
let last_head_hash = match collator
.header_hash(request.persisted_validation_data().clone())
{
Some(header_hash) => header_hash,
None => {
request.complete(None);
continue
},
};
// Check if we have upgraded to an Aura compatible runtime and transition if
// necessary.
if client
.runtime_api()
.has_api::<dyn AuraApi<Block, AuraId>>(last_head_hash)
.unwrap_or(false)
{
// Respond to this request before transitioning to Aura.
request.complete(None);
break
}
request.complete(None);

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No need to try to produce any blocks.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We actually don't need all of this code. Just decode the header manually and then check if the runtime api is there.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@@ -69,14 +69,14 @@ where
RA::Api: CollectCollationInfo<Block>,
{
/// Create a new instance.
fn new(
pub fn new(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually revert all the changes in this file.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

Copy link
Member

@bkchr bkchr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please fix the last issues and then you can merge it! Ty!

Comment on lines 1483 to 1487

if !collator_service.check_block_status(last_head_hash, &last_header) {
request.complete(None);
continue
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if !collator_service.check_block_status(last_head_hash, &last_header) {
request.complete(None);
continue
}

No need to do this.

}

// Move to Aura consensus.
let slot_duration = cumulus_client_consensus_aura::slot_duration(&*client).unwrap();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No unwrap. Please print the error and return.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch, I forgot about this.

});

let spawner = task_manager.spawn_handle();
spawner.spawn("cumulus-asset-hub-collator", None, collation_future);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
spawner.spawn("cumulus-asset-hub-collator", None, collation_future);
spawner.spawn_essential("cumulus-asset-hub-collator", None, collation_future);

.await
});

let spawner = task_manager.spawn_handle();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
let spawner = task_manager.spawn_handle();
let spawner = task_manager.spawn_essential_handle();

@georgepisaltu georgepisaltu merged commit 98286ad into master Oct 9, 2023
109 checks passed
@georgepisaltu georgepisaltu deleted the george/asset-hub-shell-runtime-transition branch October 9, 2023 20:39
ordian added a commit that referenced this pull request Oct 12, 2023
* master: (33 commits)
  ci: set CI_IMAGE back to (now updated) .ci-unified (#1854)
  ci: bump ci image to rust 1.73.0 (#1830)
  Refactor Identity to benchmark v2 (#1838)
  PVF worker: bump landlock, update ABI docs (#1850)
  Xcm emulator nits (#1649)
  Fixes path issue in derive-impl (#1823)
  upgrade to macro_magic 0.4.3 (#1832)
  Use safe math when pruning statuses (#1835)
  remote-ext: fix state download stall on slow connections and reduce memory usage (#1295)
  Update testnet bootnode dns name (#1712)
  [FRAME] Warn on unchecked weight witness (#1818)
  [xcm] Use `Weight::MAX` for `reserve_asset_deposited`, `receive_teleported_asset` benchmarks (#1726)
  Update bridges subtree (#1803)
  Check for parent of first ready block being on chain (#1812)
  Make CheckNonce refuse transactions signed by accounts with no providers (#1578)
  Fix Asset Hub collator crashing when starting from genesis (#1788)
  Mixnet integration (#1346)
  [xcm-emulator] Decouple the `AccountId` type from `AccountId32` (#1458)
  Treasury spends various asset kinds (#1333)
  chore: bump zombienter version (#1806)
  ...
bkchr pushed a commit that referenced this pull request Oct 19, 2023
)

When launching our [small
network](https://github.com/paritytech/polkadot-sdk/blob/master/cumulus/zombienet/examples/small_network.toml)
for testing the node was crashing here shortly after launch:

https://github.com/paritytech/polkadot-sdk/blob/5cdd819ed295645958afd9d937d989978fd0c84e/polkadot/node/collation-generation/src/lib.rs#L140

After changes in #1788 for the asset hub collator we are waiting for
blocks of the shell runtime to pass before we initialize aura. However,
this means that we attempted to initialize the collation related relay
chain subsystems twice, leading to the error.

I modified Aura to let it optionally take an already initialized stream
of collation requests.
skunert added a commit to skunert/polkadot-sdk that referenced this pull request Oct 19, 2023
Use prebuilt try-runtime binary in CI (paritytech#1898)

`cargo install` takes a long time in CI. We want to run it relatively
frequently without chewing through so much compute (see
paritytech/ci_cd#771) so I added automatic
binary releases to the try-runtime-cli repo.

A small added benefit is we can use it in our existing
`on-runtime-upgrade` checks, which should cut their execution time by
about half.

Cumulus: Allow aura to use initialized collation request receiver (paritytech#1911)

When launching our [small
network](https://github.com/paritytech/polkadot-sdk/blob/master/cumulus/zombienet/examples/small_network.toml)
for testing the node was crashing here shortly after launch:

https://github.com/paritytech/polkadot-sdk/blob/5cdd819ed295645958afd9d937d989978fd0c84e/polkadot/node/collation-generation/src/lib.rs#L140

After changes in paritytech#1788 for the asset hub collator we are waiting for
blocks of the shell runtime to pass before we initialize aura. However,
this means that we attempted to initialize the collation related relay
chain subsystems twice, leading to the error.

I modified Aura to let it optionally take an already initialized stream
of collation requests.

Remove comments
tdimitrov pushed a commit that referenced this pull request Oct 23, 2023
)

When launching our [small
network](https://github.com/paritytech/polkadot-sdk/blob/master/cumulus/zombienet/examples/small_network.toml)
for testing the node was crashing here shortly after launch:

https://github.com/paritytech/polkadot-sdk/blob/5cdd819ed295645958afd9d937d989978fd0c84e/polkadot/node/collation-generation/src/lib.rs#L140

After changes in #1788 for the asset hub collator we are waiting for
blocks of the shell runtime to pass before we initialize aura. However,
this means that we attempted to initialize the collation related relay
chain subsystems twice, leading to the error.

I modified Aura to let it optionally take an already initialized stream
of collation requests.
serban300 added a commit to serban300/polkadot-sdk that referenced this pull request Mar 26, 2024
serban300 added a commit to serban300/polkadot-sdk that referenced this pull request Mar 27, 2024
serban300 added a commit to serban300/polkadot-sdk that referenced this pull request Apr 8, 2024
serban300 added a commit to serban300/polkadot-sdk that referenced this pull request Apr 8, 2024
serban300 added a commit to serban300/polkadot-sdk that referenced this pull request Apr 8, 2024
serban300 added a commit to serban300/polkadot-sdk that referenced this pull request Apr 8, 2024
serban300 added a commit to serban300/polkadot-sdk that referenced this pull request Apr 8, 2024
serban300 added a commit to serban300/polkadot-sdk that referenced this pull request Apr 9, 2024
serban300 added a commit to serban300/polkadot-sdk that referenced this pull request Apr 9, 2024
serban300 added a commit to serban300/polkadot-sdk that referenced this pull request Apr 9, 2024
serban300 added a commit to serban300/polkadot-sdk that referenced this pull request Apr 9, 2024
serban300 added a commit to serban300/polkadot-sdk that referenced this pull request Apr 9, 2024
serban300 added a commit to serban300/polkadot-sdk that referenced this pull request Apr 9, 2024
serban300 added a commit to serban300/polkadot-sdk that referenced this pull request Apr 10, 2024
serban300 added a commit to serban300/polkadot-sdk that referenced this pull request Apr 10, 2024
bkchr pushed a commit that referenced this pull request Apr 10, 2024
We need this for the integrity tests in cumulus.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
T9-cumulus This PR/Issue is related to cumulus.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Error after upgrading polkadot-parachain to v1.1.0-polkadot on Asset Hub
4 participants