Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Performance impact of BOYD #4160

Closed
mhofman opened this issue Dec 7, 2021 · 7 comments · Fixed by #5884
Closed

Performance impact of BOYD #4160

mhofman opened this issue Dec 7, 2021 · 7 comments · Fixed by #5884
Assignees
Labels
bug Something isn't working performance Performance related issues SwingSet package: SwingSet
Milestone

Comments

@mhofman
Copy link
Member

mhofman commented Dec 7, 2021

Describe the bug

The bring out your dead PR (#3801) caused a jump in swingset time. There also seem to be more time spent in swingset during a block over the life of the chain, which is usually indicative of a memory leak.

To Reproduce

Steps to reproduce the behavior:

  1. Follow instructions to setup loadgen
  2. Start a loadgen test with the following parameters: --stage.duration=10 --stage.loadgen.vault.interval=12 --stage.loadgen.amm.interval=12 --stage.loadgen.amm.wait=6 --stage.loadgen.vault.limit=10 --stage.loadgen.amm.limit=10

Expected behavior

Not such a significant increase in swingset time. No further increase over the life of the chain

Platform Environment

  • Loadgen docker image
  • is there anything special/unusual about your platform? WSL2
  • what version of the Agoric-SDK are you using? 9a47516a349185409d3c320b585312178c8a08b8

Additional context

The PR doesn't change the GC schedule so the impact is likely from the different accounting logic that BYOD introduces.

Screenshots

"sdkRevision","sdkCommitTime","success","walletDeployDuration","loadgenDeployDuration","cycleSuccessRate","cycleAvgDuration","stage1_cycleSuccessRate","stage1_cycleAvgDuration","stage1_avgSwingsetBlockTime","stage1_avgSwingsetPercentage","stage2_cycleSuccessRate","stage2_cycleAvgDuration","stage2_avgSwingsetBlockTime","stage2_avgSwingsetPercentage","stage3_cycleSuccessRate","stage3_cycleAvgDuration","stage3_avgSwingsetBlockTime","stage3_avgSwingsetPercentage","stage4_cycleSuccessRate","stage4_cycleAvgDuration","stage4_avgSwingsetBlockTime","stage4_avgSwingsetPercentage"
"da7c924aa","2021-11-01 23:00:26",true,238.067808,94.07785,100,219.902715,100,214.911984,1.254112,24.89,100,218.745497,1.391519,27.61,100,226.627465,1.402919,27.84,100,220.151802,1.425838,28.14
"9a47516a3","2021-11-03 01:10:09",false,238.950638,98.910524,100,234.284053,100,235.138661,1.713511,34.03,100,228.431566,1.821268,36.14,100,239.761833,2.357044,47.26,,,,

Swingset percentage over time

NB: Stage 4 is missing due to an unrelated chain stall on restart

@mhofman mhofman added the bug Something isn't working label Dec 7, 2021
@FUDCo FUDCo changed the title Performance impact of BYOD Performance impact of BOYD Dec 7, 2021
@dckc dckc added the performance Performance related issues label Dec 7, 2021
@mhofman mhofman added the SwingSet package: SwingSet label Jan 20, 2022
@Tartuffo Tartuffo added MN-1 and removed MN-1 labels Feb 2, 2022
@warner
Copy link
Member

warner commented Feb 25, 2022

Maybe a good ticket for our new hire.

@Tartuffo
Copy link
Contributor

Tartuffo commented Mar 7, 2022

FYI @arirubinstein

@Tartuffo Tartuffo added this to the Mainnet 1 milestone Mar 23, 2022
@Tartuffo
Copy link
Contributor

We could probably get a lot of benefit from some tuning of parameters, probably worth doing that for MN-1, and further fine tuning controls for MN-1.1.

@warner
Copy link
Member

warner commented Jun 24, 2022

@FUDCo and I figure that we should set this to like 1 BOYD per 1000 deliveries for now.

The risk is allowing one vat (which doesn't get constant traffic) to hold onto objects that keep other vats from dropping things. Chip points out that this is only really a problem if it impedes operations. We think we could build a follower node that periodically makes a separate copy of the kernel state, then detaches that copy from the chain, and sends a BOYD to all vats, and measures the change in the kernel object table size. This would let us keep track of how much garbage is "floating" on the chain, waiting for a BOYD, and that might tell us 1: it's not that much, we don't need to worry about it, or 2: we should do something at the next convenient point of upgrade.

warner added a commit that referenced this issue Jun 30, 2022
While we were debugging performance problems last year, we introduced
`managerType: 'xs-worker-no-gc'`, which is a variant of `xs-worker`
that disables the forced GC we were doing at the end of every
delivery.

Since then, we've improved performance in several ways:

* XS bugs were retaining objects, which have since been fixed
* we only do a forced GC during special `dispatch.bringOutYourDead`
cranks
* we deliver BOYD less often (according to a configurable schedule,
see #4160 for our decisions, but we're thinking once every 100 to 1000
deliveries)

The `xs-worker-no-gc` manager type controlled a flag named
`gcEveryCrank`, which replaced the GC primitive available to liveslots
with a dummy version.

This commit removes both `xs-worker-no-gc` and `gcEveryCrank`.

closes #5600
warner added a commit that referenced this issue Jul 3, 2022
While we were debugging performance problems last year, we introduced
`managerType: 'xs-worker-no-gc'`, which is a variant of `xs-worker`
that disables the forced GC we were doing at the end of every
delivery.

Since then, we've improved performance in several ways:

* XS bugs were retaining objects, which have since been fixed
* we only do a forced GC during special `dispatch.bringOutYourDead`
cranks
* we deliver BOYD less often (according to a configurable schedule,
see #4160 for our decisions, but we're thinking once every 100 to 1000
deliveries)

The `xs-worker-no-gc` manager type controlled a flag named
`gcEveryCrank`, which replaced the GC primitive available to liveslots
with a dummy version.

This commit removes both `xs-worker-no-gc` and `gcEveryCrank`.

closes #5600
warner added a commit that referenced this issue Jul 6, 2022
While we were debugging performance problems last year, we introduced
`managerType: 'xs-worker-no-gc'`, which is a variant of `xs-worker`
that disables the forced GC we were doing at the end of every
delivery.

Since then, we've improved performance in several ways:

* XS bugs were retaining objects, which have since been fixed
* we only do a forced GC during special `dispatch.bringOutYourDead`
cranks
* we deliver BOYD less often (according to a configurable schedule,
see #4160 for our decisions, but we're thinking once every 100 to 1000
deliveries)

The `xs-worker-no-gc` manager type controlled a flag named
`gcEveryCrank`, which replaced the GC primitive available to liveslots
with a dummy version.

This commit removes both `xs-worker-no-gc` and `gcEveryCrank`.

closes #5600
warner added a commit that referenced this issue Jul 7, 2022
While we were debugging performance problems last year, we introduced
`managerType: 'xs-worker-no-gc'`, which is a variant of `xs-worker`
that disables the forced GC we were doing at the end of every
delivery.

Since then, we've improved performance in several ways:

* XS bugs were retaining objects, which have since been fixed
* we only do a forced GC during special `dispatch.bringOutYourDead`
cranks
* we deliver BOYD less often (according to a configurable schedule,
see #4160 for our decisions, but we're thinking once every 100 to 1000
deliveries)

The `xs-worker-no-gc` manager type controlled a flag named
`gcEveryCrank`, which replaced the GC primitive available to liveslots
with a dummy version.

This commit removes both `xs-worker-no-gc` and `gcEveryCrank`.

closes #5600
warner added a commit that referenced this issue Jul 7, 2022
While we were debugging performance problems last year, we introduced
`managerType: 'xs-worker-no-gc'`, which is a variant of `xs-worker`
that disables the forced GC we were doing at the end of every
delivery.

Since then, we've improved performance in several ways:

* XS bugs were retaining objects, which have since been fixed
* we only do a forced GC during special `dispatch.bringOutYourDead`
cranks
* we deliver BOYD less often (according to a configurable schedule,
see #4160 for our decisions, but we're thinking once every 100 to 1000
deliveries)

The `xs-worker-no-gc` manager type controlled a flag named
`gcEveryCrank`, which replaced the GC primitive available to liveslots
with a dummy version.

This commit removes both `xs-worker-no-gc` and `gcEveryCrank`.

closes #5600
warner added a commit that referenced this issue Jul 9, 2022
While we were debugging performance problems last year, we introduced
`managerType: 'xs-worker-no-gc'`, which is a variant of `xs-worker`
that disables the forced GC we were doing at the end of every
delivery.

Since then, we've improved performance in several ways:

* XS bugs were retaining objects, which have since been fixed
* we only do a forced GC during special `dispatch.bringOutYourDead`
cranks
* we deliver BOYD less often (according to a configurable schedule,
see #4160 for our decisions, but we're thinking once every 100 to 1000
deliveries)

The `xs-worker-no-gc` manager type controlled a flag named
`gcEveryCrank`, which replaced the GC primitive available to liveslots
with a dummy version.

This commit removes both `xs-worker-no-gc` and `gcEveryCrank`.

closes #5600
turadg pushed a commit that referenced this issue Jul 14, 2022
While we were debugging performance problems last year, we introduced
`managerType: 'xs-worker-no-gc'`, which is a variant of `xs-worker`
that disables the forced GC we were doing at the end of every
delivery.

Since then, we've improved performance in several ways:

* XS bugs were retaining objects, which have since been fixed
* we only do a forced GC during special `dispatch.bringOutYourDead`
cranks
* we deliver BOYD less often (according to a configurable schedule,
see #4160 for our decisions, but we're thinking once every 100 to 1000
deliveries)

The `xs-worker-no-gc` manager type controlled a flag named
`gcEveryCrank`, which replaced the GC primitive available to liveslots
with a dummy version.

This commit removes both `xs-worker-no-gc` and `gcEveryCrank`.

closes #5600
@Tartuffo
Copy link
Contributor

We still need to choose the number and test it out.

@Tartuffo
Copy link
Contributor

@arirubinstein has done some benchmarking and has a proposed number based on that benchmarking.

There are two places it can get set - default value in the code itself (currently set to 1), and the config file for the swingset you are actually running.

We need to decide where to make the change. My preference is to change the default.

@Tartuffo
Copy link
Contributor

Tartuffo commented Aug 2, 2022

@FUDCo Brian and I just discussed, can you please start working on this?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working performance Performance related issues SwingSet package: SwingSet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants