Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[receiver/hostmetrics] Use gopsutil total seconds to calculate utilization #8857

Merged
merged 3 commits into from
Mar 31, 2022
Merged

[receiver/hostmetrics] Use gopsutil total seconds to calculate utilization #8857

merged 3 commits into from
Mar 31, 2022

Conversation

rubenruizdegauna
Copy link
Contributor

@rubenruizdegauna rubenruizdegauna commented Mar 28, 2022

Description:
system.cpu.utilization delta time is calculated using timestamps read in each scrape. These timestamp reads are not directly related to the cpu times reads so the used delta time to calculate utilization is not aligned with the delta time from times values. This makes system.cpu.utilization inacurate.

As cpu utilization relies on cpu times, the delta time should be calculated from the cpu times value (gopsutil already provides a Total() function that returns the total number of seconds). This way we can avoid timestamp read and persistence between scrapes simplifying it.

Now:

utilization.user = cpu.user / (time diff from system time)
utilization.idle = cpu.idle / (time diff from system time)
...
Proposal:

utilization.user = cpu.user / (elapsed seconds from times)
utilization.idle = cpu.idle / (elapsed seconds from times)
...

Link to tracking Issue:
#8856

Testing:
Tests have been refactored.

Documentation:
No additional documentation has been added.

@dmitryax
Copy link
Member

I don't think that utilization.user = cpu.user / (cpu total seconds) is what utilization is supposed to be. If we do this, we are getting overall utilization right from the boot time. Once significant amount of time passes this value will come to some amount that won't be changing anymore.

For example:

Let's assume utilization.user at t1 is 50000000/100000000 = 0.5. Then CPU was idle next 10 seconds from t1=100000000 to t2=100000010, so utilization.user at t2 is still very close 0.5 50000000/100000010 = 0.49999995.

So if we do this change the metrics becomes useless.

@dmitryax
Copy link
Member

dmitryax commented Mar 28, 2022

Oh, after looking at the code, seems like utilization.user = cpu.user / (cpu total seconds) is just a misleading description. We still have utilization.user = cpu.user / (time diff) just calculated differently :)

@rubenruizdegauna
Copy link
Contributor Author

Oh, after looking at the code, seems like utilization.user = cpu.user / (cpu total seconds) is just a misleading description. We still have utilization.user = cpu.user / (time diff) just calculated differently :)

my bad 😅 I have updated the description and the variable name in 0e0c90e

@rubenruizdegauna
Copy link
Contributor Author

Hi @dmitryax do you think this can be merged to main? Is there anything else missing from my side? Thanks!

@jpkrohling jpkrohling merged commit 6da3bd2 into open-telemetry:main Mar 31, 2022
@rubenruizdegauna rubenruizdegauna deleted the host_metrics_receiver-cpu_utilization-using-times branch March 31, 2022 12:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants