Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MGS/sp-sim: Teach simulated SPs about their interface to the management network #932

Merged
merged 6 commits into from
Apr 21, 2022

Conversation

jgallagher
Copy link
Contributor

This PR is the first of three that builds toward a preliminary implementation of RFD 250-style SP and location discovery.

Changes in this PR:

  • Each simulated SP now listens on two UDP ports instead of one (corresponding to the two ports of each real SP behind the KSZ switch).
  • Each simulated SP is given an IPv6 multicast group, and the sockets on both UDP ports join that group.

There are two quirks present related to the multicast group:

  1. With real hardware, all SPs will use the same multicast address - the setup of the management network switch provides isolation. For local dev/test we have no such isolation, so we can give each simulated SP a different multicast address, which means a sender will only receive responses from a single SP. This affects the "real" config but in a way that should be easy to change later.
  2. The code that actually joins the multicast group guards it with an "if address is actually a multicast address" - not all our CI runners allow joining IPv6 multicast groups, so for those runners we can fake it with loopback addresses. This is only in the simulator.

Copy link
Contributor

@ahl ahl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rad

recv: Result<(&[u8], SocketAddr)>,
server: &'a mut SpServer,
responsiveness: Responsiveness,
port_num: usize,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I mentioned this in another review, but I think this should be assert!(port_num == 1 || portnum == 2)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, good call. There's one spot where I end up converting this into an enum with two cases; I think I should just do that for all of them and ditch usize entirely.

recv: Result<(&[u8], SocketAddr)>,
server: &'a mut SpServer,
responsiveness: Responsiveness,
port_num: usize,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

as above (do we want to share this code or is it different enough that that would be more trouble than it's worth?)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It took a tiny bit of work to make the two handle_request()s not-different, but most of what it needed was worth doing for its own sake (threading the sending address and receiving port through handler.dispatch()). Thanks for the nudge

@jgallagher jgallagher enabled auto-merge (squash) April 21, 2022 16:48
@jgallagher jgallagher merged commit 8a9a581 into main Apr 21, 2022
@jgallagher jgallagher deleted the mgs-sp-discovery-prep1 branch April 21, 2022 16:55
leftwo pushed a commit that referenced this pull request Oct 4, 2023
Crucible updates
    all Crucible connections should set TCP_NODELAY (#983)
    Use a fixed size for tag and nonce (#957)
    Log crucible opts on start, order crutest options (#974)
    Lock the Downstairs less (#966)
    Cache dirty flag locally, reducing SQLite operations (#970)
    Make stats mutex synchronous (#961)
    Optimize requeue during flow control conditions (#962)
    Update Rust crate base64 to 0.21.4 (#950)
    Do less in control (#949)
    Fix --flush-per-blocks (#959)
    Fast dependency checking (#916)
    Update actions/checkout action to v4 (#960)
    Use `cargo hakari` for better workspace deps (#956)
    Update actions/checkout digest to 8ade135 (#939)
    Cache block size in Guest (#947)
    Update Rust crate ringbuffer to 0.15.0 (#954)
    Update Rust crate toml to 0.8 (#955)
    Update Rust crate reedline to 0.24.0 (#953)
    Update Rust crate libc to 0.2.148 (#952)
    Update Rust crate indicatif to 0.17.7 (#951)
    Remove unused async (#943)
    Use a synchronous mutex for bw/iop_tokens (#946)
    Make flush ID non-locking (#945)
    Use `oneshot` channels instead of `mpsc` for notification (#918)
    Use a strong type for upstairs negotiation (#941)
    Add a "dynamometer" option to crucible-downstairs (#931)
    Get new work and active count in one lock (#938)
    A bunch of misc test cleanup stuff (#937)
    Wait for a snapshot to finish on all downstairs (#920)
    dsc and clippy cleanup. (#935)
    No need to sort ackable_work (#934)
    Use a strong type for repair ID (#928)
    Keep new jobs sorted (#929)
    Remove state_count function on Downstairs (#927)
    Small cleanup to IOStateCount (#932)
    let cmon and IOStateCount use ClientId (#930)
    Fast return for zero length IOs (#926)
    Use a strong type for client ID (#925)
    A few Crucible Agent fixes (#922)
    Use a newtype for `JobId` (#919)
    Don't pass MutexGuard into functions (#917)
    Crutest updates, rename tests, new options (#911)

Propolis updates
    Update tungstenite crates to 0.20
    Use `strum` crate for enum-related utilities
    Wire up bits for CPUID customization
    PHD: improve artifact store (#529)
    Revert abort-on-panic in 'dev' cargo profile
leftwo added a commit that referenced this pull request Oct 5, 2023
Crucible updates
    all Crucible connections should set TCP_NODELAY (#983)
    Use a fixed size for tag and nonce (#957)
    Log crucible opts on start, order crutest options (#974)
    Lock the Downstairs less (#966)
    Cache dirty flag locally, reducing SQLite operations (#970)
    Make stats mutex synchronous (#961)
    Optimize requeue during flow control conditions (#962)
    Update Rust crate base64 to 0.21.4 (#950)
    Do less in control (#949)
    Fix --flush-per-blocks (#959)
    Fast dependency checking (#916)
    Update actions/checkout action to v4 (#960)
    Use `cargo hakari` for better workspace deps (#956)
    Update actions/checkout digest to 8ade135 (#939)
    Cache block size in Guest (#947)
    Update Rust crate ringbuffer to 0.15.0 (#954)
    Update Rust crate toml to 0.8 (#955)
    Update Rust crate reedline to 0.24.0 (#953)
    Update Rust crate libc to 0.2.148 (#952)
    Update Rust crate indicatif to 0.17.7 (#951)
    Remove unused async (#943)
    Use a synchronous mutex for bw/iop_tokens (#946)
    Make flush ID non-locking (#945)
    Use `oneshot` channels instead of `mpsc` for notification (#918)
    Use a strong type for upstairs negotiation (#941)
    Add a "dynamometer" option to crucible-downstairs (#931)
    Get new work and active count in one lock (#938)
    A bunch of misc test cleanup stuff (#937)
    Wait for a snapshot to finish on all downstairs (#920)
    dsc and clippy cleanup. (#935)
    No need to sort ackable_work (#934)
    Use a strong type for repair ID (#928)
    Keep new jobs sorted (#929)
    Remove state_count function on Downstairs (#927)
    Small cleanup to IOStateCount (#932)
    let cmon and IOStateCount use ClientId (#930)
    Fast return for zero length IOs (#926)
    Use a strong type for client ID (#925)
    A few Crucible Agent fixes (#922)
    Use a newtype for `JobId` (#919)
    Don't pass MutexGuard into functions (#917)
    Crutest updates, rename tests, new options (#911)

Propolis updates
    Update tungstenite crates to 0.20
    Use `strum` crate for enum-related utilities
    Wire up bits for CPUID customization
    PHD: improve artifact store (#529)
    Revert abort-on-panic in 'dev' cargo profile

---------

Co-authored-by: Alan Hanson <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants