Report physical disks through the inventory system #5145

smklein · 2024-02-27T02:40:06Z

In Sled Agent, send info about physical storage up to Nexus
In Nexus, process physical disks and add them to inventory in a new inv_physical_disk table
This PR also adjusts inventory tests to account for this new "physical disk" information being added alongside sleds, and adds some disk to the "representative inventory data"

Fixes #5150

jgallagher

LGTM; just two tiny nits picking at memory usage that probably doesn't matter - feel free to ignore either or both.

nexus/db-queries/src/db/datastore/inventory.rs

davepacheco · 2024-03-01T22:07:30Z

schema/crdb/38.0.0/up01.sql

@@ -0,0 +1,13 @@
+CREATE TABLE IF NOT EXISTS omicron.public.inv_physical_disk (


Can/should we include the slot number here? (if so, can/should we use that for pagination/uniqueness?)

Sure, we'll probably want that out of inventory anyway. Also, I agree, this seems like a better PK ("sled id" + "slot") than the reported values of serial/model/vendor.

Updated to plumb through the slot from the sled agent, and using as pagination key in this PR.

davepacheco · 2024-03-01T22:08:27Z

sled-agent/src/sim/sled_agent.rs

@@ -750,6 +752,17 @@ impl SledAgent {
                self.config.hardware.reservoir_ram,
            )
            .context("reservoir_size")?,
+            disks: self
+                .storage
+                .lock()


Are there any situations where this lock might be held for a long time (thinking tens of seconds or minutes)?

I don't believe so, especially not since this is the simulated sled agent. At a cursory glance, i think it's mostly just getting locked to populate / grab contents from a group of hashmaps.

davepacheco · 2024-03-01T22:10:09Z

nexus/db-queries/src/db/datastore/inventory.rs

+                use db::schema::inv_physical_disk::dsl;
+
+                let mut iter =
+                    disks.chunks(SQL_BATCH_SIZE.get().try_into().unwrap());


In general I like the idea of inserting in batches. But this is all going into one transaction so I'm not sure it's any better in that case.

The batch size is still fairly large 1,000, so it's unlikely we'll be looping much regardless on a single rack or two. But I could see this blowing past the size of a "single SQL statement" if we consider it for larger multi-rack cases?

smklein · 2024-03-12T01:06:15Z

Thank you @jgallagher and @davepacheco for the feedback! Apologies for the delay on responding, I've been building off this PR in #5172 and trying to make sure it'll work -- seems like yes.

I believe I've responded to all feedback -- I know y'all have me the +1 to merge, but I'll wait another ~half day if there are any other comments.

smklein · 2024-03-12T02:28:07Z

reconfigurator-cli::test_basic test_complex looks like it's failing, and I think that requires a particular lab environment that's being used to update, so I guess this PR is blocked for the moment. This seems like a bit of a hard bottleneck (can only be safely edited from a single, specific deployment)?

jgallagher · 2024-03-12T14:03:19Z

nexus/db-queries/src/db/datastore/inventory.rs

+                    InvPhysicalDisk::new(collection_id, *sled_id, disk.clone())
+                })
+            })
+            .collect();


Tiny nit: I think you can drop this collect() (and the corresponding : Vec<_> type hint), make this mut, and then remove the disks.into_iter() call on line 659.

This causes the following compilation error:

error: implementation of `FnOnce` is not general enough --> nexus/src/app/background/inventory_collection.rs:58:9 | 58 | / async { 59 | | match inventory_activate( 60 | | opctx, 61 | | &self.datastore, ... | 88 | | } 89 | | .boxed() | |________________^ implementation of `FnOnce` is not general enough | = note: closure with signature `fn(&'0 nexus_types::inventory::PhysicalDisk) -> InvPhysicalDisk` must implement `FnOnce<(&'1 nexus_types::inventory::PhysicalDisk,)>`, for any two lifetimes `'0` and `'1`... = note: ...but it actually implements `FnOnce<(&nexus_types::inventory::PhysicalDisk,)>`

I believe the usage of disks.into_iter() is called within a transaction, which is where the FnOnce comes from?

I think I'm going to keep this pattern of "collect it into Vec, pass that Vec to the FnOnce" for the moment, because the lifetime complexity doesn't seem worth trying to optimize right now

Grrrr, this is very annoying. I checked that diff against nexus-db-queries and it was fine, but didn't check whether it cause compilation errors down the line. That feels like a bug - if my function signature hasn't changed, why can implementation changes cause client breakage?

Thanks to @cbiffle who found we're hitting rust-lang/rust#99492 here; this ought to work. I found a couple weird looking workarounds using hints from that issue:

@@ -127,15 +127,10 @@ impl DataStore { }) .collect::<Result<Vec<_>, Error>>()?; // Pull disks out of all sled agents - let disks: Vec<_> = collection - .sled_agents - .iter() - .flat_map(|(sled_id, sled_agent)| { - sled_agent.disks.iter().map(|disk| { - InvPhysicalDisk::new(collection_id, *sled_id, disk.clone()) - }) - }) - .collect(); + let mut disks = + collection.sled_agents.iter().flat_map(|(sled_id, sled_agent)| { + std::iter::repeat(sled_id).zip(sled_agent.disks.iter()) + }); // Partition the sled agents into those with an associated baseboard id // and those without one. We handle these pretty differently. @@ -656,10 +651,18 @@ impl DataStore { use db::schema::inv_physical_disk::dsl; let batch_size = SQL_BATCH_SIZE.get().try_into().unwrap(); - let mut disks = disks.into_iter(); loop { - let some_disks = - disks.by_ref().take(batch_size).collect::<Vec<_>>(); + let some_disks = disks + .by_ref() + .take(batch_size) + .map(|(&sled_id, disk)| { + InvPhysicalDisk::new( + collection_id, + sled_id, + disk.clone(), + ) + }) + .collect::<Vec<_>>(); if some_disks.is_empty() { break; }

or if we don't like collection_id floating around, it can go in the repeat too:

@@ -127,15 +127,11 @@ impl DataStore { }) .collect::<Result<Vec<_>, Error>>()?; // Pull disks out of all sled agents - let disks: Vec<_> = collection - .sled_agents - .iter() - .flat_map(|(sled_id, sled_agent)| { - sled_agent.disks.iter().map(|disk| { - InvPhysicalDisk::new(collection_id, *sled_id, disk.clone()) - }) - }) - .collect(); + let mut disks = + collection.sled_agents.iter().flat_map(|(sled_id, sled_agent)| { + std::iter::repeat((collection_id, *sled_id)) + .zip(sled_agent.disks.iter()) + }); // Partition the sled agents into those with an associated baseboard id // and those without one. We handle these pretty differently. @@ -656,10 +652,18 @@ impl DataStore { use db::schema::inv_physical_disk::dsl; let batch_size = SQL_BATCH_SIZE.get().try_into().unwrap(); - let mut disks = disks.into_iter(); loop { - let some_disks = - disks.by_ref().take(batch_size).collect::<Vec<_>>(); + let some_disks = disks + .by_ref() + .take(batch_size) + .map(|((collection_id, sled_id), disk)| { + InvPhysicalDisk::new( + collection_id, + sled_id, + disk.clone(), + ) + }) + .collect::<Vec<_>>(); if some_disks.is_empty() { break; }

I don't know that either of these are worth saving the intermediate .collect(), but at least we (kind of) understand it.

Yeah I ran into this yesterday. It's a pretty bad bug in rustc.

jgallagher · 2024-03-12T14:06:25Z

schema/rss-service-plan-v3.json

@@ -0,0 +1,828 @@
+{


Is this file used? I recently ran into something where I considered updating the RSS on-disk plan format, and found a way to not do that to avoid the hassle.

It is used, and is stored, but it's not used particularly frequently.

We always try to "load a plan if one already exists" before creating a new one:

omicron/sled-agent/src/rack_setup/service.rs

Lines 959 to 971 in 145f25c

let service_plan = if let Some(plan) =

ServicePlan::load(&self.log, storage_manager).await?

{

plan

} else {

ServicePlan::create(

&self.log,

&config,

&storage_manager,

&plan.sleds,

)

.await?

};

This tries to load the plan with the latest version:

omicron/sled-agent/src/rack_setup/plan/service.rs

Lines 137 to 172 in 145f25c

if let Some(ledger) = ledger {

info!(log, "RSS plan already created, loading from file");

Ok(Some(ledger.data().clone()))

} else if Self::has_v1(storage_manager).await.map_err(|err| {

PlanError::Io {

message: String::from("looking for v1 RSS plan"),

err,

}

})? {

// If we found no current-version service plan, but we _do_ find

// a v1 plan present, bail out. We do not expect to ever see this

// in practice because that would indicate that:

//

// - We ran RSS previously on this same system using an older

// version of the software that generates v1 service plans and it

// got far enough through RSS to have written the v1 service plan.

// - That means it must have finished initializing all sled agents,

// including itself, causing it to record a

// `StartSledAgentRequest`s in its ledger -- while still running

// the older RSS.

// - But we're currently running software that knows about v2

// service plans. Thus, this process started some time after that

// ledger was written.

// - But the bootstrap agent refuses to execute RSS if it has a

// local `StartSledAgentRequest` ledgered. So we shouldn't get

// here if all of the above happened.

//

// This sounds like a complicated set of assumptions. If we got

// this wrong, we'll fail spuriously here and we'll have to figure

// out what happened. But the alternative is doing extra work to

// support a condition that we do not believe can ever happen in any

// system.

Err(PlanError::FoundV1)

} else {

Ok(None)

}

, and in other cases, at least it tries to report a better error

In the limit: I think we may get more mileage out of "wiping everything and forcing RSS to restart" in the case where we restart, which would make these intermediate files removable.

But those bits of code are still referencing the "rss-service-plan-v2.json" path, right? Should we bump that to v3? (I don't see anywhere where the schema json is checked, either? Sorry if I'm just missing it.)

Ugh, thank you, good catch, I'm using this in #5172 , but not here yet. I'll pull this into the appropriate PR.

davepacheco · 2024-03-12T15:37:42Z

reconfigurator-cli::test_basic test_complex looks like it's failing, and I think that requires a particular lab environment that's being used to update, so I guess this PR is blocked for the moment. This seems like a bit of a hard bottleneck (can only be safely edited from a single, specific deployment)?

The purpose of that file is to exercise the CLI with data saved from a real system. That seemed useful because part of the goal was to be able to import data from real systems. But it could be more trouble than it's worth while we're still evolving the blueprint format.

Also, it doesn't have to be any particular system. The test will dynamically pick an inventory collection and blueprints that it finds.

smklein · 2024-03-12T19:07:02Z

reconfigurator-cli::test_basic test_complex looks like it's failing, and I think that requires a particular lab environment that's being used to update, so I guess this PR is blocked for the moment. This seems like a bit of a hard bottleneck (can only be safely edited from a single, specific deployment)?

The purpose of that file is to exercise the CLI with data saved from a real system. That seemed useful because part of the goal was to be able to import data from real systems. But it could be more trouble than it's worth while we're still evolving the blueprint format.

Also, it doesn't have to be any particular system. The test will dynamically pick an inventory collection and blueprints that it finds.

I logged onto a machine set up by a4x2, configured my ssh keys to be able to copy files into and out of that machine, used zlogin to access, the switch zone, and ran omdb db reconfigurator-save as the test recommended, but it is still failing because I didn't generate enough blueprints.

I can spin up this environment and take another lap - to clarify, do I need to do omdb nexus blueprints regenerate twice? I'm not sure if I'm being dumb and there are some docs I should be following for "how to do this the right way".

smklein · 2024-03-12T19:21:42Z

Okay, I ran:

omdb nexus background-tasks show
Parsed that output to find the last collection id, and used that to run
omdb -w nexus blueprints generate-from-collection
I then waited for inventory to be collected again, and re-ran those commands, before running
omdb db reconfigurator-save complex.json
I then copied that file from my switch zone to my global zone, then added my ssh keys from my "hosting" illumos box to ~/.ssh/authorized_keys on the "target" illumos box via the serial console, then I copied complex.json out to my host machine, and re-ran the complex test
That failed, but re-running with EXPECTORATE=overwrite caused it to succeed

smklein added 2 commits February 23, 2024 16:57

[sled-agent] Plumb physical disk info through inventory

a0e89db

Plumb disks into inventory

b962606

smklein force-pushed the disk-in-inventory branch from 1ffd4d1 to b962606 Compare February 27, 2024 02:41

smklein added 2 commits February 27, 2024 15:02

Schema

f3bfadd

Fix tests

228bcdf

smklein marked this pull request as ready for review February 28, 2024 00:04

smklein added 2 commits February 27, 2024 16:05

You can't run memetendo swatch games on your macrohard box

0d93a86

Primary key order matters

7be0e1e

smklein requested review from jgallagher, andrewjstone and davepacheco February 28, 2024 01:04

stop ignoring test, it's only flaky locally

470be89

jgallagher approved these changes Feb 28, 2024

View reviewed changes

nexus/db-queries/src/db/datastore/inventory.rs Outdated Show resolved Hide resolved

nexus/db-queries/src/db/datastore/inventory.rs Outdated Show resolved Hide resolved

davepacheco reviewed Mar 1, 2024

View reviewed changes

davepacheco approved these changes Mar 1, 2024

View reviewed changes

smklein added 3 commits March 11, 2024 17:00

Merge branch 'main' into disk-in-inventory

50d4415

John's feedback

ad6bf23

Slot plumbing

72ac9b5

smklein added 3 commits March 11, 2024 20:39

Use slots for synthetic disks

58e3e03

fmt

cdab967

Fix illumos tests

6d0dc3e

jgallagher reviewed Mar 12, 2024

View reviewed changes

Fix test_complex

1d1f2c3

Not using new service plan yet

8cad2c5

smklein added 2 commits March 12, 2024 14:27

Merge branch 'main' into disk-in-inventory

6f97ab2

Merge branch 'main' into disk-in-inventory

296ed1e

smklein merged commit cf422c4 into main Mar 13, 2024
21 checks passed

smklein deleted the disk-in-inventory branch March 13, 2024 21:05

smklein mentioned this pull request Mar 14, 2024

[nexus] Add zpool to inventory #5249

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Report physical disks through the inventory system #5145

Report physical disks through the inventory system #5145

smklein commented Feb 27, 2024 •

edited

Loading

jgallagher left a comment

davepacheco Mar 1, 2024

smklein Mar 12, 2024

davepacheco Mar 1, 2024

smklein Mar 12, 2024

davepacheco Mar 1, 2024

smklein Mar 12, 2024

smklein commented Mar 12, 2024

smklein commented Mar 12, 2024

jgallagher Mar 12, 2024

smklein Mar 12, 2024

jgallagher Mar 12, 2024

jgallagher Mar 12, 2024 •

edited

Loading

sunshowers Mar 12, 2024

jgallagher Mar 12, 2024

smklein Mar 12, 2024 •

edited

Loading

jgallagher Mar 12, 2024

smklein Mar 12, 2024

davepacheco commented Mar 12, 2024

smklein commented Mar 12, 2024

smklein commented Mar 12, 2024

		@@ -0,0 +1,13 @@
		CREATE TABLE IF NOT EXISTS omicron.public.inv_physical_disk (

	let service_plan = if let Some(plan) =
	ServicePlan::load(&self.log, storage_manager).await?
	{
	plan
	} else {
	ServicePlan::create(
	&self.log,
	&config,
	&storage_manager,
	&plan.sleds,
	)
	.await?
	};

	if let Some(ledger) = ledger {
	info!(log, "RSS plan already created, loading from file");
	Ok(Some(ledger.data().clone()))
	} else if Self::has_v1(storage_manager).await.map_err(\|err\| {
	PlanError::Io {
	message: String::from("looking for v1 RSS plan"),
	err,
	}
	})? {
	// If we found no current-version service plan, but we _do_ find
	// a v1 plan present, bail out. We do not expect to ever see this
	// in practice because that would indicate that:
	//
	// - We ran RSS previously on this same system using an older
	// version of the software that generates v1 service plans and it
	// got far enough through RSS to have written the v1 service plan.
	// - That means it must have finished initializing all sled agents,
	// including itself, causing it to record a
	// `StartSledAgentRequest`s in its ledger -- while still running
	// the older RSS.
	// - But we're currently running software that knows about v2
	// service plans. Thus, this process started some time after that
	// ledger was written.
	// - But the bootstrap agent refuses to execute RSS if it has a
	// local `StartSledAgentRequest` ledgered. So we shouldn't get
	// here if all of the above happened.
	//
	// This sounds like a complicated set of assumptions. If we got
	// this wrong, we'll fail spuriously here and we'll have to figure
	// out what happened. But the alternative is doing extra work to
	// support a condition that we do not believe can ever happen in any
	// system.
	Err(PlanError::FoundV1)
	} else {
	Ok(None)
	}

Report physical disks through the inventory system #5145

Report physical disks through the inventory system #5145

Conversation

smklein commented Feb 27, 2024 • edited Loading

jgallagher left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

smklein commented Mar 12, 2024

smklein commented Mar 12, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jgallagher Mar 12, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

smklein Mar 12, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

davepacheco commented Mar 12, 2024

smklein commented Mar 12, 2024

smklein commented Mar 12, 2024

smklein commented Feb 27, 2024 •

edited

Loading

jgallagher Mar 12, 2024 •

edited

Loading

smklein Mar 12, 2024 •

edited

Loading