Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[receiver/vcenter] Add vCenter Host metrics (dropped packet rate + capacity) #33646

Merged

Conversation

BominRahmani
Copy link
Contributor

@BominRahmani BominRahmani commented Jun 19, 2024

Description:

The following PR adds the following metrics

vcenter.host.network.packet.drop.rate
vcenter.host.cpu.capacity
vcenter.host.cpu.reserve.capacity

These metrics can be found in the following links respectively:
errorTx and errorRx
reservedCapacity and totalCapacity

Link to tracking Issue: #33607

Testing:
Tested against a live environment to scrape added metrics, and updated golden test files.

Documentation:
Updated documentation through mdatagen.

Copy link
Contributor

@StefanKurek StefanKurek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a few small things. Also needs the scraper test updated with new results.

receiver/vcenterreceiver/metadata.yaml Outdated Show resolved Hide resolved
receiver/vcenterreceiver/metadata.yaml Outdated Show resolved Hide resolved
receiver/vcenterreceiver/metadata.yaml Outdated Show resolved Hide resolved
// disk metrics
"disk.totalReadLatency.average",
"disk.totalWriteLatency.average",
"disk.maxTotalLatency.latest",
"disk.read.average",
"disk.write.average",
// cpu metrics
"cpu.reservedCapacity.average",
"cpu.totalCapacity.average",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you know how different this number ends up (if at all) from numCpuCores * cpuMhz?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👆

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yea, so I've been trying to figure this out. From what I understand totalCapacity would be the more accurate metric to measure here. In most cases the numbers should be very close together (usually a bit lower). This is because numCpuCores might get artificially inflated by any logical cores caused by hyper threading. Beyond that the numbers should be very similar if not the same

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After talking with Stefan a bit more on this topic, we uncovered that what we think of as totalCapacity as shown on these performance metrics is a bit different than what someone would normally think (numCpuCores * cpuMhz) as shown on the vSphere client.
The difference being the calculated total capacity using the cpuCores and cpuMhz would be talking about how much capacity the host has for it
image
The performance metric totalCapacity is referring to the Total reservation capacity
image
and the performance metric reservedCapacity is referring "used reservation" of the total reservation capacity.
image

image

In the end, we think both the performance metric and the quickstat metric are both very useful depending on the user use case and setup, so we will be including them both.

receiver/vcenterreceiver/scraper_test.go Show resolved Hide resolved
@github-actions github-actions bot added the cmd/otelcontribcol otelcontribcol command label Jun 25, 2024
Copy link
Contributor

@StefanKurek StefanKurek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should be good to move from DRAFT. I still would like to know the difference between the total CPU performance metric and one you could calculate though.

@BominRahmani BominRahmani marked this pull request as ready for review June 26, 2024 18:17
@BominRahmani BominRahmani requested a review from a team June 26, 2024 18:17
if err == nil {
return vApps, nil
// ResourcePoolInventoryListObjects returns the ResourcePools (with populated InventoryLists) of the vSphere SDK
func (vc *vcenterClient) ResourcePoolInventoryListObjects(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually I'd say this PR is blocked until this PR is merged in and you can rebase (because this function for example isn't actually quite right ATM)

if_enabled_not_set: "this metric will be enabled by default starting in release v0.105.0"
vcenter.host.cpu.reserve.capacity:
enabled: false
description: Total CPU capacity that is available for reserve or reserved by virtual machines.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the description and name of this one has some room for improvement (still sounds a bit confusing).

What do you think about something like vcenter.host.cpu.reserved for the name? I know the performance metric names have "capacity" in them, but the UI equivalents do not. "Capacity" seems to only make sense for "total" and not "used" to me.

Then the description could then be something like The CPU of the host reserved for use by virtual machines.

Attribute could be changed to cpu_reservation_type with values of total and used. Description could be The type of CPU reservation for the host.

These are all just suggestions, but what do you think @BominRahmani ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was actually teetering back and forth between vcenter.host.cpu.reserved and vcenter.host.cpu.reserve.capacity I only decided the latter since it felt a bit more verbose. I was also thinking about switching the cpu_reservation_type to total and used to match the UI a bit more originally, So I am totally ok with these changes.

@BominRahmani BominRahmani force-pushed the feat/add-vcenter-host-metrics branch 3 times, most recently from a06edcb to fdd24e7 Compare June 30, 2024 23:11
Copy link
Member

@djaglowski djaglowski left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, just need checks fixed

@BominRahmani BominRahmani force-pushed the feat/add-vcenter-host-metrics branch from fdd24e7 to e44f552 Compare July 1, 2024 14:52
@BominRahmani BominRahmani force-pushed the feat/add-vcenter-host-metrics branch from 800febe to b51853d Compare July 1, 2024 15:56
@djaglowski djaglowski merged commit 2604193 into open-telemetry:main Jul 1, 2024
154 checks passed
@github-actions github-actions bot added this to the next release milestone Jul 1, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants