Summary of Diego AZ Balancing Spike

I spiked on implementing the AZ balancing feature. Herein lies the summary of the spike. I'm putting this in a repo instead of a gist so it can have collaborators.

Main Goal
Completed Tasks
Other Wins
Remaining Work
Non-Goals

Main Goal

Start and Stop Auctions should take AZ (or Cluster) into account with multi-instance apps. Instances should be well-balanced across AZs so that the High Availability/Fault Tolerance a user gets by having multiple instances is less brittle in the face of an AZ going down. In that regard, this is to improve our SLA.

Completed Tasks

Refactor simulation so its easy to add more tests
Auction
Give Reps knowledge of what "AZ" they're on, and add statistics about AZ balancing before implementing AZ balancing, so we can see the before/after improvement
Rep | Auction
App-Manager knows how many AZs Reps/Executors are distributed across, and communicates this number to the Auctioneer via LRPStartAuction and LRPStopAuction
App-Manager | Auctioneer | Runtime-Schema
Auction combines NumberOfAZs from App-Manager and AZNumber from Rep to take AZ balancing into account when computing score/bid
Auction
Update Inigo, see it pass
Inigo
Bump Deps
App-Manager | Auctioneer | Inigo | Rep
Update Diego-Release to pass numAZs to App-Manager and AZNumber to each rep
Diego-Release
BOSH-Lite-AWS deploy cf-release/acceptance-deployed and diego-release/az_balance and see CATS pass. I've done this, and have an EC2 jumpbox you can go on and run CATS (and Inigo).

Other Wins

Add useful auction simulations (including some simulations are not AZ-related)
Add some sorely-needed variance statistics for existing variables in the simulation report
Add statistics for some important variables not previously covered in the simulation report (including many variables that are not AZ-related)
Make simulation report generation code more sane
Make simulation setup code more sane
Bring a little more consistency between treatment of StopAuctions vis-à-vis StartAuctions where it improves sanity

Remaining Work

Right now only in-process simulation reps are balanced across multiple "AZs", the corresponding work needs to be done when communicationMode is something other than "in-process", e.g. NATS.
SVG (browser) reports for simulation have not been touched. All the improved statistics are in the CLI report. SVG reports need some love.
Merge work onto master (or develop in the case of diego-release).

Non-Goals

Pre-filter Reps out of the auction according to whether they belong to / don't belong to a given Placement Pool / Tag
BUT: this will be easy to add on, and it's entirely separate since it's pre-filtering, not part of the main run of the auction
Ensure that we can always handle apps with large memory or disk requirements (i.e. ensure that we don't distribute the small granular "sand" apps so well that we have no room for the "boulders")
BUT: a simulation has been added to try and capture this
Fine-tune auction parameters to fully optimize app placement behaviour
BUT: it's totally straightforward to tweak the coefficients for Start and Stop Auctions
Make auction parameters configurable via the BOSH deployment manifest
BUT: this would be very easy to do if desired
Speed up time between a user pushing an app and it being started, e.g. by taking into account whether a Rep already has the droplet for the app cached locally
BUT: performance is just as good as before after this spike, and these performance enhancements can be added totally separately later

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Summary of Diego AZ Balancing Spike

Main Goal

Completed Tasks

Other Wins

Remaining Work

Non-Goals

About

Releases

Packages

Amit-PivotalLabs/diego-az-balance-spike-gist

Folders and files

Latest commit

History

Repository files navigation

Summary of Diego AZ Balancing Spike

Main Goal

Completed Tasks

Other Wins

Remaining Work

Non-Goals

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Packages