Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New Intel PowerStat input plugin #8488

Merged
merged 4 commits into from
Dec 10, 2020

Conversation

MaciejMis
Copy link
Contributor

Required for all PRs:

  • Signed CCLA.
  • Associated README.md updated.
  • Has appropriate unit tests.

CCLA completed and signed by Intel Corporation.

Feature Request

We have created a Telegraf PowerStat input plugin which enables monitoring of platform metrics (power, TDP) and per-CPU metrics like temperature, power and utilization.

Intel PowerStat plugin supports Intel based platforms and assumes presence of Linux based OS.

Proposal:

The plugin we have created is able to provide following metrics:

Platform metrics:

  • thermal_design_power (Maximum Thermal Design Power (TDP) available for processor package [Watts])
  • current_power_consumption (Current power consumption of processor package [Watts])
  • current_dram_power_consumption (Current power consumption of processor package DRAM subsystem [Watts])

Per-CPU metrics:

  • cpu_frequency (Current operational frequency of CPU Core [MHz])
  • cpu_temperature (Current temperature of CPU Core [Celsius degrees])
  • cpu_busy_cycles (CPU Core Busy cycles as a ratio of Cycles spent in C0 state residency to all cycles executed by CPU Core [%])
  • cpu_c6_state_residency (Percentage of time that CPU Core spent in C6 Core residency state [%])
  • cpu_c1_state_residency (Percentage of time that CPU Core spent in C1 Core residency state [%])
  • cpu_busy_frequency (CPU Core Busy Frequency measured as frequency adjusted to CPU Core busy cycles [MHz])

Use case:

Key source of platform telemetry is power domain that is beneficial for MANO/Monitoring&Analytics systems to take preventive/corrective actions based on platform busyness, CPU temperature, actual CPU utilization and power statistics.
Main use cases are power saving and workload migration.

Closes: #8485

Copy link
Member

@srebhan srebhan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @MaciejMis, cool stuff. I've some comment nevertheless...

  1. I think the metric output format is costly on the query side. If you e.g. want the "cpu_frequency" you need to constrain the field value if "name" to "cpu_frequency". That means you need to touch all rows of the DB. Making it a field i.e. field: "cpu_frequency_mhz"=value would be much more efficient.
  2. The whole "services" thing feels like a standalone golang package. Can maybe Intel host that as a package (just the "services", "data", "mocks" thing) as a golang package that we can use here? I'm asking because that's a lot of code not solely bound with telegraf (e.g. someone could use it in a different golang project)...
  3. The skipfirst feels strange. While I couldn't come up with an ad-hoc replacement it feels like circumventing an Init() case...

@p-zak
Copy link
Collaborator

p-zak commented Dec 1, 2020

Hey @MaciejMis, cool stuff. I've some comment nevertheless...

  1. I think the metric output format is costly on the query side. If you e.g. want the "cpu_frequency" you need to constrain the field value if "name" to "cpu_frequency". That means you need to touch all rows of the DB. Making it a field i.e. field: "cpu_frequency_mhz"=value would be much more efficient.
  2. The whole "services" thing feels like a standalone golang package. Can maybe Intel host that as a package (just the "services", "data", "mocks" thing) as a golang package that we can use here? I'm asking because that's a lot of code not solely bound with telegraf (e.g. someone could use it in a different golang project)...
  3. The skipfirst feels strange. While I couldn't come up with an ad-hoc replacement it feels like circumventing an Init() case...
  1. Good point!
  2. Hosting "services" thing somewhere else is not an option right now. There are just helper functions extracted from main file to read and calculate values and crafted to be used by this plugin.
  3. skipfirst is just to omit few metrics which are calculated deltas. In first iteration they doesn't make any sense (need at least two iterations of Gather to calculate something).

Copy link
Member

@srebhan srebhan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice code, but I have the feeling it has too many indirections. Please see my comments.

@srebhan srebhan self-assigned this Dec 3, 2020
Copy link
Member

@srebhan srebhan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. I really enjoy reading the code!

One thing worth mentioning is the atomic organization of the metrics. There are multiple metrics emitted with the same name but different fields. One could argue to collect those fields before emitting, but on the other hand the current structure keeps the code clean.

@srebhan srebhan added the ready for final review This pull request has been reviewed and/or tested by multiple users and is ready for a final review. label Dec 7, 2020
@reimda reimda merged commit 9166a16 into influxdata:master Dec 10, 2020
ssoroka pushed a commit that referenced this pull request Dec 16, 2020
arstercz pushed a commit to arstercz/telegraf that referenced this pull request Mar 5, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
new plugin ready for final review This pull request has been reviewed and/or tested by multiple users and is ready for a final review.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

New Intel PowerStat input plugin
5 participants