Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reduce grace ticks, and ignore grace ticks for missing leaders #7764

Merged
merged 3 commits into from
Jan 13, 2020

Conversation

pgarg66
Copy link
Contributor

@pgarg66 pgarg66 commented Jan 11, 2020

Problem

A downed node appears to have too large of an impact on the blocks produced by the next leader in line.

Summary of Changes

The new leader was burning grace ticks waiting for the previous leader to finish its slot. The grace ticks at minimum are 2 slots, and can go up to 3 slots. Also, if the previous leader node is missing, the new leader will burn all grace ticks, before assuming leader role.

Check if previous leader slots are empty. If so, don't burn grace ticks. Also reduced max grace ticks to 2 slots worth of ticks in all scenarios.

Fixes #7588

@pgarg66 pgarg66 requested a review from mvines January 11, 2020 23:34
@codecov
Copy link

codecov bot commented Jan 12, 2020

Codecov Report

Merging #7764 into master will increase coverage by <.1%.
The diff coverage is 98.2%.

@@           Coverage Diff            @@
##           master   #7764     +/-   ##
========================================
+ Coverage    81.8%   81.8%   +<.1%     
========================================
  Files         238     241      +3     
  Lines       50993   51075     +82     
========================================
+ Hits        41730   41817     +87     
+ Misses       9263    9258      -5

@pgarg66 pgarg66 requested a review from carllin January 12, 2020 11:15
@mvines mvines added the v0.22 label Jan 13, 2020
core/src/poh_recorder.rs Outdated Show resolved Hide resolved
@pgarg66
Copy link
Contributor Author

pgarg66 commented Jan 13, 2020

@mvines LMK if any other changes are needed.

mvines
mvines previously approved these changes Jan 13, 2020
// we've approached target_tick_height OR poh was reset to run immediately
// Or, previous leader didn't transmit in any of its leader slots, so ignore grace ticks
self.tick_height >= target_tick_height
|| self.start_tick_height + self.grace_ticks == leader_first_tick_height
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Quick clarification (I know this was in the original code as well)

If you have N consecutive slots, won't self.start_tick_height + self.grace_ticks == leader_first_tick_height return true for every slot after the first one? (I think self.start_tick_height is set on reset which is the beginning of the first slot).

Will this function then return true every single time ReplayStage checks reached_leader_slot after the first slot? Is this ok b/c this check in ReplayStage will make sure another bank for the same slot isn't created: https://github.com/solana-labs/solana/blob/master/core/src/replay_stage.rs#L482

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It should be ok due to this check https://github.com/solana-labs/solana/blob/master/core/src/replay_stage.rs#L377

If the leader already has bank, it's not calling maybe_start_leader which in turn calls this function.
WDYT?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@pgarg66 I think it's possible for tick() -> flush() here: https://github.com/solana-labs/solana/blob/master/core/src/replay_stage.rs#L377 to clear the bank, before ReplayStage has reset to a different bank, and then that check won't prevent a call to maybe_start_leader. But then this check should catch that case: https://github.com/solana-labs/solana/blob/master/core/src/replay_stage.rs#L482.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cool. so it's all good? anyway, I'll merge this PR. If this is still an issue, we can track it as a separate issue.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yup, looks like it, thanks!

@mergify mergify bot dismissed mvines’s stale review January 13, 2020 22:03

Pull request has been modified.

@pgarg66 pgarg66 merged commit 156292e into solana-labs:master Jan 13, 2020
@pgarg66 pgarg66 deleted the issue7588 branch January 13, 2020 23:55
mergify bot pushed a commit that referenced this pull request Jan 13, 2020
* Reduce grace ticks, and ignore grace ticks for missing leaders

* address review comments

* blockstore related renames

(cherry picked from commit 156292e)

# Conflicts:
#	core/src/poh_recorder.rs
#	ledger/src/lib.rs
pgarg66 added a commit that referenced this pull request Jan 17, 2020
* Reduce grace ticks, and ignore grace ticks for missing leaders

* address review comments

* blockstore related renames

(cherry picked from commit 156292e)

# Conflicts:
#	core/src/poh_recorder.rs
#	ledger/src/lib.rs
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

A downed node appears to have too large of an impact on the blocks produced by the next leader in line
3 participants