Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Baseline bug fix (FIP-0081) #1557

Merged
merged 27 commits into from
Sep 30, 2024
Merged

Baseline bug fix (FIP-0081) #1557

merged 27 commits into from
Sep 30, 2024

Conversation

kkarrancsu
Copy link
Contributor

No description provided.

@jennijuju jennijuju requested a review from lemmih June 24, 2024 15:24
@jennijuju
Copy link
Member

note for the maintainers: this FIP has not yet been accepted, so the PR should live in an integration/feat until then

Copy link
Member

@anorth anorth left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not intending to be the primary reviewer here, but just skimming the architecture.

Comment on lines 319 to 325
fn ramp_duration(rt: &impl Runtime) -> Result<u64, ActorError> {
rt.validate_immediate_caller_accept_any()?;
let st: State = rt.state()?;

Ok(st.ramp_duration_epochs)
}

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the ramp configuration should be rolled into the return value from the existing request for total power. All the call sites above that called this function also invoked the power actor for the total power just a few lines earlier.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks - that makes sense and reduces the number of overall changes, I've updated accordingly.

@@ -68,6 +70,8 @@ pub struct State {
/// Number of miners having proven the minimum consensus power.
pub miner_above_min_power_count: i64,

pub epochs_since_ramp_start: u64,
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I recall our conversation correctly @lemmih - we said that this would be hardcoded to a particular value for each network?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please ignore this b/c it is incorrect, at least in the way I was thinking about it, epochs_since_ramp_start needs to be able to be positive or negative, so I've updated this to be i64

@@ -68,6 +70,8 @@ pub struct State {
/// Number of miners having proven the minimum consensus power.
pub miner_above_min_power_count: i64,

pub epochs_since_ramp_start: i64,
Copy link
Contributor Author

@kkarrancsu kkarrancsu Jun 25, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I recall our convo correctly @lemmih - this will be hardcoded for each network at the next upgrade?

I was probably not thinking about everything when we discussed this, but this is a value that will be updated every epoch. My thinking was that this value would be negative before the ramp was scheduled to be initiated, but update every epoch, kind of like a countdown timer. Once it is greater than 0, the code to update pledge according to the ramp will be activated. See updated line 276 of monies.rs

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So I think the main thing remaining is - where does the epochs_since_ramp_start parameter get updated? It seems there needs to be some way to access the current epoch of the network as well as a hardcoded value of the epoch at which the upgrade will take place.

@@ -267,10 +270,30 @@ pub fn initial_pledge_for_power(
let lock_target_denom = LOCK_TARGET_FACTOR_DENOM;
let pledge_share_num = qa_power;
let network_qa_power = network_qa_power_estimate.estimate();
let pledge_share_denom = cmp::max(cmp::max(&network_qa_power, baseline_power), qa_power);

// Compute gamma based on current_epoch
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

where can we write a unittest for the updated initial_pledge_for_power so that we can be more certain that the computation is what we expect?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just add a new file in actors/miner/tests/, you'll see it already being tested for a particular case in actors/miner/tests/precommit_deposit_and_initial_pledge_positive_test.rs so you can copy that pattern

actors/miner/src/ext.rs Outdated Show resolved Hide resolved
actors/power/src/state.rs Outdated Show resolved Hide resolved
…ration, and refactored other variables to set the correct inputs for the pledge function
actors/miner/src/lib.rs Outdated Show resolved Hide resolved
@@ -1981,11 +1985,14 @@ impl Actor {
activate_sectors_deals(rt, &data_activations, compute_commd)?;
let successful_activations = batch_return.successes(&precommited_sectors);

// TODO: total_power is not requested here, so how can we get ramp_start_epoch & ramp_duration_epochs ?
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@lemmih - flag here, do we need to make a call to get_total_power(), or is this function a special startup state where the inputs are noops?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You ran into some technical debt here. This whole method is legacy and never invoked on chain. It should be removed. It's used only for creating genesis state (e.g. for testnets). Just set both values to zero.

cc @ZenGround0

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Slated for removal #1556 so don't spend too much time on this. It's only ever called at genesis in trusted testing environments.

@rvagg
Copy link
Member

rvagg commented Jun 26, 2024

@kkarrancsu btw running make check will get you past the linting problems that get reported in CI here before you push

@kkarrancsu
Copy link
Contributor Author

@kkarrancsu btw running make check will get you past the linting problems that get reported in CI here before you push

Sounds good, I will do that before any further pushes.

@kkarrancsu
Copy link
Contributor Author

Hi @jennijuju, @anorth - zooming out a little bit - we initially wanted to prototype this for the purpose of making the FIP complete to be considered by core-devs and community. Are we at that stage, or do we need to get this to the finish line, with unit tests, etc?

@jennijuju
Copy link
Member

Hi @jennijuju, @anorth - zooming out a little bit - we initially wanted to prototype this for the purpose of making the FIP complete to be considered by core-devs and community. Are we at that stage, or do we need to get this to the finish line, with unit tests, etc?

I think this is good enough for actor spec. For the FIP, you also need to specify where the ramp start epoch is set and when (IIUC values will be set at upgrade migration - @lemmih could you please confirm?

@anorth
Copy link
Member

anorth commented Jul 1, 2024

I agree this is probably enough to inform a spec in the FIP now, though failing tests might suggest that something is missed.

@kkarrancsu
Copy link
Contributor Author

I agree this is probably enough to inform a spec in the FIP now, though failing tests might suggest that something is missed.

The remaining error results from updating the return value to return the additional two elements needed to compute the pledge based on how far along we've progressed in the ramped activation

pub epochs_since_ramp_start: i64,
pub ramp_duration_epochs: u64,

I'm looking at the pre_commit_requires_commd_test function in integration_tests/src/tests/batch_onboarding_deals_test.rs, but it isn't immediately clear how to fix this test error. If someone can point me in the right direction here, I can resolve it.

@lemmih
Copy link
Contributor

lemmih commented Jul 4, 2024

I agree this is probably enough to inform a spec in the FIP now, though failing tests might suggest that something is missed.

The remaining error results from updating the return value to return the additional two elements needed to compute the pledge based on how far along we've progressed in the ramped activation

pub epochs_since_ramp_start: i64,
pub ramp_duration_epochs: u64,

I'm looking at the pre_commit_requires_commd_test function in integration_tests/src/tests/batch_onboarding_deals_test.rs, but it isn't immediately clear how to fix this test error. If someone can point me in the right direction here, I can resolve it.

Update CurrentTotalPowerReturnParams in actors/market/src/ext.rs to include the new fields:

diff --git a/actors/market/src/ext.rs b/actors/market/src/ext.rs
index 987d5068..0be8f9d8 100644
--- a/actors/market/src/ext.rs
+++ b/actors/market/src/ext.rs
@@ -162,5 +162,7 @@ pub mod power {
         pub quality_adj_power: StoragePower,
         pub pledge_collateral: TokenAmount,
         pub quality_adj_power_smoothed: FilterEstimate,
+        pub ramp_start_epoch: i64,
+        pub ramp_duration_epochs: u64,
     }
 }

@lemmih
Copy link
Contributor

lemmih commented Sep 19, 2024

@anorth @rvagg This implementation proposal is now complete and is ready to be reviewed.

@anorth anorth changed the title initial updates to integrate baseline bug fix Baseline bug fix (FIP-0081) Sep 19, 2024
@@ -1150,6 +1150,9 @@ pub fn expect_query_network_info(rt: &MockRuntime) {
quality_adj_power: power.clone(),
pledge_collateral: TokenAmount::default(),
quality_adj_power_smoothed: FilterEstimate::new(reward.atto().clone(), BigInt::zero()),
// TODO: flagging that this was set to zero, but this wouldn't be the actual return from the function
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please remove this TODO. Zero is a reasonable value for networks other than mainnet.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

actors/power/src/state.rs Show resolved Hide resolved
Comment on lines 53 to 57
&qa_sector_power(),
&StoragePower::from(1u64 << 37),
&reward_estimate(),
&power_estimate(),
&TokenAmount::from_whole(1),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I found these a bit hard to mentally verify. Could you please expand them, e.g. by using named vals for the parameters, and spell out in words the expected result. I suggest not using the consts defined for the "clamping" test – mixing and matching those with inline values couples the tests together and makes it hard to trace what the values are intended for. You can probably combine all these new cases into a single test for re-use.

E.g. "With a fixed reward of zero and circulating supply of 1, a sector contributing 1/2 the current network QAP requires pledge of 1/2 * 30% of circulating supply => 15% of one token."

Please add tests for:

  • Another point part-way through the ramp, to demonstrate it's really linear
  • Before the ramp start, i.e. epochs_since_ramp_start is negative
  • Ramp start and duration of zero (i.e. no ramp for all new networks)
  • Ramp duration of zero, and the epochs -1, 0, and 1 around that ramp (i.e. step change)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I only have a rudimentary understanding of the reasons behind pledges and FIP0081. My understanding is limited to this: We previously used equation A to calculate initial pledges, and now we're using 70% A + 30% B. If you want a more detailed explanation of the rules, we'll have to pull in one of the authors of FIP0081.

Another point part-way through the ramp, to demonstrate it's really linear

Done.

Before the ramp start, i.e. epochs_since_ramp_start is negative

Done.

Ramp start and duration of zero (i.e. no ramp for all new networks)

Already there.

Ramp duration of zero, and the epochs -1, 0, and 1 around that ramp (i.e. step change)

Done.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ramp start and duration of zero (i.e. no ramp for all new networks)

The existing test is not exercising the pledge calculation properly, it's just testing a boundary condition where there is zero reward and supply. Please add a test for (ramp, duration) = (0, 0) and with reward/supply inputs similar to the other tests you added.

Ramp duration of zero, and the epochs -1, 0, and 1 around that ramp (i.e. step change)

I don't see this test added. To be concrete, I'm looking to exercise where epochs_since_ramp start differs by zero and +/- 1 from the ramp duration.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ramp start and duration of zero (i.e. no ramp for all new networks)

The existing test is not exercising the pledge calculation properly, it's just testing a boundary condition where there is zero reward and supply. Please add a test for (ramp, duration) = (0, 0) and with reward/supply inputs similar to the other tests you added.

Done.

Ramp duration of zero, and the epochs -1, 0, and 1 around that ramp (i.e. step change)

I don't see this test added. To be concrete, I'm looking to exercise where epochs_since_ramp start differs by zero and +/- 1 from the ramp duration.

Added a test with epochs_since_ramp={-1,0,1}.

actors/miner/src/monies.rs Show resolved Hide resolved
// gamma = 1000 - 300 * (epochs_since_ramp_start / ramp_duration_epochs).max(0).min(1)
let skew = ((epochs_since_ramp_start * FIP_0081_ACTIVATION_PERMILLE)
/ ramp_duration_epochs.max(1) as i64)
.max(0)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The uses of max and min are clever, but maybe too much so? I'm finding it too hard to mentally verify the expected behaviour around cases like negative epochs_since_start and zero ramp duration. Tests will help with confidence here, but also an if/match construction could take out some of those edge cases clearly.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This way is definitely easier to follow thanks

actors/market/tests/harness.rs Outdated Show resolved Hide resolved
@@ -75,6 +75,8 @@ pub const TERMINATION_LIFETIME_CAP: ChainEpoch = 140;
// Multiplier of whole per-winner rewards for a consensus fault penalty.
const CONSENSUS_FAULT_FACTOR: u64 = 5;

const FIXED_POINT_FACTOR: u64 = 1000; // 3 decimal places
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: there's lots of other fixed point calculations floating around actors code so change this to something more precise either ONE_THOUSAND or GAMMA_FIXED_POINT_FACTOR

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

actors/miner/src/monies.rs Outdated Show resolved Hide resolved
actors/miner/src/monies.rs Outdated Show resolved Hide resolved
actors/miner/src/monies.rs Outdated Show resolved Hide resolved
Copy link
Member

@anorth anorth left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the improvements

Comment on lines 53 to 57
&qa_sector_power(),
&StoragePower::from(1u64 << 37),
&reward_estimate(),
&power_estimate(),
&TokenAmount::from_whole(1),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ramp start and duration of zero (i.e. no ramp for all new networks)

The existing test is not exercising the pledge calculation properly, it's just testing a boundary condition where there is zero reward and supply. Please add a test for (ramp, duration) = (0, 0) and with reward/supply inputs similar to the other tests you added.

Ramp duration of zero, and the epochs -1, 0, and 1 around that ramp (i.e. step change)

I don't see this test added. To be concrete, I'm looking to exercise where epochs_since_ramp start differs by zero and +/- 1 from the ramp duration.

);
}

// On-ramp where 'baseline power' (85%).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This comment is wrong, this is past ramp

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.

Copy link
Contributor

@ZenGround0 ZenGround0 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for getting this done it looks great.

initial_pledge_on_ramp
);
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: it would be nice to confirm that when we are less than 1/1000th of the way through the ramp that gamma == 1. ramp duration == 2000, epochs since ramp == 1, pledge should be 1500

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we're also missing a case for "at ramp", e.g. 10, 10, giving us 1950, and similar to @ZenGround0's case, 999/1000, what does that give us? 1950 or one step before that?

@ZenGround0 ZenGround0 added this pull request to the merge queue Sep 30, 2024
Merged via the queue into filecoin-project:master with commit 01a05fc Sep 30, 2024
10 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: ☑️ Done (Archive)
Development

Successfully merging this pull request may close these issues.

6 participants