Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: extract miner sector state changes #1032

Merged
merged 5 commits into from
Aug 23, 2022
Merged

fix: extract miner sector state changes #1032

merged 5 commits into from
Aug 23, 2022

Conversation

frrist
Copy link
Member

@frrist frrist commented Aug 13, 2022

This PR fixes #1031

The previous implementation diffed previous and current partitions directly. This was incorrect since miners
may compact their partitions resulting in sectors moving to different partitions, which the previous code proceeded to record as invalid events. Instead, we now collect all sectors from all miner partitions and diff the sets directly.

- previous implementation diffed previous and current partitions, this was incorrect since miners
may compact their partitions resulting in sectors moving to different partitions. Instead we
collect all sectors from all miner partitions and diff the sets directly.
@frrist frrist self-assigned this Aug 13, 2022
@codecov-commenter
Copy link

codecov-commenter commented Aug 13, 2022

Codecov Report

Merging #1032 (5881258) into master (2a5df7c) will decrease coverage by 1.1%.
The diff coverage is 31.6%.

@@           Coverage Diff            @@
##           master   #1032     +/-   ##
========================================
- Coverage    35.6%   34.4%   -1.2%     
========================================
  Files          44      44             
  Lines        2881    2925     +44     
========================================
- Hits         1027    1008     -19     
- Misses       1750    1821     +71     
+ Partials      104      96      -8     

Copy link
Contributor

@placer14 placer14 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems good to me.

A thought: It would probably be easy to write a query which proves the invariant isn't in our exports which we could periodically run over our exported data.

Can you outline/include how this data bug was detected originally? I can see how we can run tests like these in CI. (Ideas @kasteph?)

Edit: Would the check be as simple as taking all Termination events and making sure that these sectors never have additional events after the Termination occurs?

faulted := bitfield.New()
recovered := bitfield.New()
recovering := bitfield.New()
// SectorStates contains a set of bitfields for active, live, fault, and recovering sectors.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would be good to mention how this bitfield represents each type of sector represented here. (a packed binary list of sector IDs?)

return out, nil

// previous faulty sectors minus current active sectors are sectors recovered this epoch.
recovered, err := bitfield.IntersectBitField(previous.Faulty, current.Active)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comment and code don't match. I assume we only want to see which previous faults overlap with current actives... this makes sense as being "Recovered". Let's update this comment.

// previous faulty sectors which match (intersect) active sectors are sectors recovered this epoch.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Whoops, bad copy paste

@frrist
Copy link
Member Author

frrist commented Aug 16, 2022

A thought: It would probably be easy to write a query which proves the invariant isn't in our exports which we could periodically run over our exported data.

@davidgasquez is working on this as a part of the dbt workflow.

Edit: Would the check be as simple as taking all Termination events and making sure that these sectors never have additional events after the Termination occurs?

That would be part of the check, but there are more cases, for example - sectors cannot recover before becoming faulted, sectors cannot terminate before being added, faulted sectors cannot become faulty, etc.

A sector lifecycle is a state machine, so we'll want to validate all sector events correspond to valid states.

@placer14
Copy link
Contributor

My approval is sticky. Good catch on accidental context sharing. 🙇

@frrist frrist merged commit 45f15be into master Aug 23, 2022
@frrist frrist deleted the frrist/fix-1031 branch August 23, 2022 00:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Fault/recover event still happen after a termination event
3 participants