Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[vSphere][host] Add support for new metrics in host metricset #40429

Merged

Conversation

kush-elastic
Copy link
Collaborator

@kush-elastic kush-elastic commented Aug 5, 2024

Description

Here are the following metrics to be added for the host data stream in the vSphere metricbeat module. Here we added a new performance API to get more detailed information from the vSphere.

Metrics Type Metrics API Field Mappings
Host Disk Performance      
  disk.deviceLatency.average (ms) Performance disk.devicelatency.average.ms
  disk.maxTotalLatency.latest (ms) Performance disk.latency.total.ms
  disk.usage.average (KBps) Performance disk.total.bytes
  disk.read.average (KBps) Performance disk.read.bytes
  disk.write.average (KBps) Performance disk.write.bytes
  disk.capacity.usage.average (KB) Performance disk.capacity.usage.bytes
       
Host uptime/status      
  host.Summary.OverallStatus Summary status
  host.Summary.QuickStats.Uptime Summary uptime
  host.Datastore Extra datastore.names
      datastore.count
  host.Vm Extra vm.names
      vm.count
       
Host Network Stats host.Netowork Extra network.names
      network.count
  net.transmitted.average (KBps) Performance network.bandwidth.transmitted.bytes
  net.received.average (KBps) Performance network.bandwidth.received.bytes
  net.usage.average (KBps) Performance network.bandwidth.total.bytes
  net.packetsTx.summation (num) Performance network.packets.transmitted.count
  net.packetsRx.summation (num) Performance network.packets.received.count
  net.errorsRx.summation (num) Performance network.packets.errors.received.count
  net.errorsTx.summation (num) Performance network.packets.errors.transmitted.count
      network.packets.errors.total.count
  net.multicastTx.summation (num) Performance network.packets.multicast.transmitted.count
  net.multicastRx.summation (num) Performance network.packets.multicast.received.count
      network.packets.multicast.total.count
  net.droppedTx.summation (num) Performance network.packets.dropped.transmitted.count
  net.droppedRx.summation (num) Performance network.packets.dropped.received.count
      network.packets.dropped.total.count

Checklist

  • My code follows the style guidelines of this project
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • I have made corresponding change to the default configuration files
  • I have added tests that prove my fix is effective or that my feature works
  • I have added an entry in CHANGELOG.next.asciidoc or CHANGELOG-developer.next.asciidoc.

Related issues

@kush-elastic kush-elastic requested a review from a team as a code owner August 5, 2024 05:40
@botelastic botelastic bot added the needs_team Indicates that the issue/PR needs a Team:* label label Aug 5, 2024
@kush-elastic kush-elastic self-assigned this Aug 5, 2024
@kush-elastic kush-elastic added enhancement Team:Obs-InfraObs Label for the Observability Infrastructure Monitoring team labels Aug 5, 2024
Copy link
Contributor

mergify bot commented Aug 5, 2024

This pull request does not have a backport label.
If this is a bug or security fix, could you label this PR @kush-elastic? 🙏.
For such, you'll need to label your PR with:

  • The upcoming major version of the Elastic Stack
  • The upcoming minor version of the Elastic Stack (if you're not pushing a breaking change)

To fixup this pull request, you need to add the backport labels for the needed
branches, such as:

  • backport-v8./d.0 is the label to automatically backport to the 8./d branch. /d is the digit

@botelastic botelastic bot removed the needs_team Indicates that the issue/PR needs a Team:* label label Aug 5, 2024
@kush-elastic kush-elastic added Metricbeat Metricbeat Module:beat The beat Metricbeat module labels Aug 5, 2024
@kush-elastic kush-elastic requested a review from a team as a code owner August 5, 2024 06:49
Copy link
Contributor

@harnish-elastic harnish-elastic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@shmsr
Copy link
Member

shmsr commented Aug 20, 2024

@kush-elastic Here are some more changes that I'd recommend:

diff --git a/metricbeat/module/vsphere/host/host.go b/metricbeat/module/vsphere/host/host.go
index 22e4756661..1a2e3a1a7c 100644
--- a/metricbeat/module/vsphere/host/host.go
+++ b/metricbeat/module/vsphere/host/host.go
@@ -66,24 +66,24 @@ type assetNames struct {
 }
 
 // Define metrics to be collected
-var metricNames = []string{
-	"disk.capacity.usage.average",
-	"disk.deviceLatency.average",
-	"disk.maxTotalLatency.latest",
-	"disk.usage.average",
-	"disk.read.average",
-	"disk.write.average",
-	"net.transmitted.average",
-	"net.received.average",
-	"net.usage.average",
-	"net.packetsTx.summation",
-	"net.packetsRx.summation",
-	"net.errorsTx.summation",
-	"net.errorsRx.summation",
-	"net.multicastTx.summation",
-	"net.multicastRx.summation",
-	"net.droppedTx.summation",
-	"net.droppedRx.summation",
+var metricSet = map[string]struct{}{
+	"disk.capacity.usage.average": {},
+	"disk.deviceLatency.average":  {},
+	"disk.maxTotalLatency.latest": {},
+	"disk.usage.average":          {},
+	"disk.read.average":           {},
+	"disk.write.average":          {},
+	"net.transmitted.average":     {},
+	"net.received.average":        {},
+	"net.usage.average":           {},
+	"net.packetsTx.summation":     {},
+	"net.packetsRx.summation":     {},
+	"net.errorsTx.summation":      {},
+	"net.errorsRx.summation":      {},
+	"net.multicastTx.summation":   {},
+	"net.multicastRx.summation":   {},
+	"net.droppedTx.summation":     {},
+	"net.droppedRx.summation":     {},
 }
 
 // Fetch methods implements the data gathering and data conversion to the right
@@ -130,77 +130,73 @@ func (m *MetricSet) Fetch(ctx context.Context, reporter mb.ReporterV2) error {
 	// Create a performance manager
 	perfManager := performance.NewManager(c)
 
-	// Retrieve metric IDs for the specified metric names
+	// Retrieve all available metrics
 	metrics, err := perfManager.CounterInfoByName(ctx)
 	if err != nil {
 		return fmt.Errorf("failed to retrieve metrics: %w", err)
 	}
 
-	// Retrieve only the required metrics
-	requiredMetrics := make(map[string]*types.PerfCounterInfo)
-
-	for _, name := range metricNames {
-		metric, exists := metrics[name]
-		if !exists {
-			m.Logger().Warnf("Metric %s not found", name)
-			continue
+	// Filter for required metrics
+	var metricIds []types.PerfMetricId
+	for metricName := range metricSet {
+		if metric, ok := metrics[metricName]; ok {
+			metricIds = append(metricIds, types.PerfMetricId{CounterId: metric.Key})
+		} else {
+			m.Logger().Warnf("Metric %s not found", metricName)
 		}
-		requiredMetrics[name] = metric
-	}
-
-	metricIDs := make([]types.PerfMetricId, 0, len(requiredMetrics))
-	for _, metric := range requiredMetrics {
-		metricIDs = append(metricIDs, types.PerfMetricId{
-			CounterId: metric.Key,
-		})
 	}
 
 	pc := property.DefaultCollector(c)
 	for i := range hst {
-		assetNames, err := getAssetNames(ctx, pc, &hst[i])
-		if err != nil {
-			m.Logger().Errorf("Failed to retrieve object from host %s: %w", hst[i].Name, err)
-		}
+		select {
+		case <-ctx.Done():
+			return ctx.Err()
+		default:
+			assetNames, err := getAssetNames(ctx, pc, &hst[i])
+			if err != nil {
+				m.Logger().Errorf("Failed to retrieve object from host %s: %w", hst[i].Name, err)
+			}
 
-		spec := types.PerfQuerySpec{
-			Entity:     hst[i].Reference(),
-			MetricId:   metricIDs,
-			MaxSample:  1,
-			IntervalId: 20, // right now we are only grabbing real time metrics from the performance manager
-		}
+			spec := types.PerfQuerySpec{
+				Entity:     hst[i].Reference(),
+				MetricId:   metricIds,
+				MaxSample:  1,
+				IntervalId: 20, // right now we are only grabbing real time metrics from the performance manager
+			}
 
-		// Query performance data
-		samples, err := perfManager.Query(ctx, []types.PerfQuerySpec{spec})
-		if err != nil {
-			m.Logger().Errorf("Failed to query performance data from host %s: %v", hst[i].Name, err)
-			continue
-		}
+			// Query performance data
+			samples, err := perfManager.Query(ctx, []types.PerfQuerySpec{spec})
+			if err != nil {
+				m.Logger().Errorf("Failed to query performance data from host %s: %v", hst[i].Name, err)
+				continue
+			}
 
-		if len(samples) == 0 {
-			m.Logger().Debug("No samples returned from performance manager")
-			continue
-		}
+			if len(samples) == 0 {
+				m.Logger().Debug("No samples returned from performance manager")
+				continue
+			}
 
-		results, err := perfManager.ToMetricSeries(ctx, samples)
-		if err != nil {
-			m.Logger().Errorf("Failed to convert performance data to metric series for host %s: %v", hst[i].Name, err)
-		}
+			results, err := perfManager.ToMetricSeries(ctx, samples)
+			if err != nil {
+				m.Logger().Errorf("Failed to convert performance data to metric series for host %s: %v", hst[i].Name, err)
+			}
 
-		metricMap := make(map[string]interface{})
-		for _, result := range results[0].Value {
-			if len(result.Value) > 0 {
-				metricMap[result.Name] = result.Value[0]
-				continue
+			metricMap := make(map[string]interface{})
+			for _, result := range results[0].Value {
+				if len(result.Value) > 0 {
+					metricMap[result.Name] = result.Value[0]
+					continue
+				}
+				m.Logger().Debugf("For host %s,Metric %v: No result found", hst[i].Name, result.Name)
 			}
-			m.Logger().Debugf("For host %s,Metric %v: No result found", hst[i].Name, result.Name)
-		}
 
-		reporter.Event(mb.Event{
-			MetricSetFields: m.eventMapping(hst[i], &metricData{
-				perfMetrics: metricMap,
-				assetsName:  assetNames,
-			}),
-		})
+			reporter.Event(mb.Event{
+				MetricSetFields: m.eventMapping(hst[i], &metricData{
+					perfMetrics: metricMap,
+					assetsName:  assetNames,
+				}),
+			})
+		}
 	}
 
 	return nil

Copy link
Member

@shmsr shmsr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Except for the CI failures, the rest LGTM!

Copy link
Contributor

@niraj-elastic niraj-elastic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Provided some minor nits.

CHANGELOG.next.asciidoc Outdated Show resolved Hide resolved
metricbeat/module/vsphere/host/_meta/fields.yml Outdated Show resolved Hide resolved
@kush-elastic kush-elastic changed the title [vSphere][host] Implement the beats changes [vSphere][host] Add support for new metrics in host metricset Aug 21, 2024
@shmsr
Copy link
Member

shmsr commented Aug 21, 2024

Some more changes:

diff --git a/metricbeat/module/vsphere/host/_meta/fields.yml b/metricbeat/module/vsphere/host/_meta/fields.yml
index 6ca8380c21..65ff1e110b 100644
--- a/metricbeat/module/vsphere/host/_meta/fields.yml
+++ b/metricbeat/module/vsphere/host/_meta/fields.yml
@@ -1,21 +1,21 @@
 - name: host
   type: group
   description: >
-    host
+    Host information from vSphere environment.
   release: ga
   fields:
     - name: cpu.used.mhz
       type: long
       description: >
-        Used CPU in Mhz.
+        Used CPU in MHz.
     - name: cpu.total.mhz
       type: long
       description: >
-        Total CPU in Mhz.
+        Total CPU in MHz.
     - name: cpu.free.mhz
       type: long
       description: >
-        Free CPU in Mhz.
+        Free CPU in MHz.
     - name: datastore.names
       type: keyword
       description: >
@@ -50,7 +50,7 @@
     - name: disk.total.bytes
       type: long
       description: >
-        Aggregated disk Input/Output rate.
+        Sum of disk read and write rates each second in bytes.
       format: bytes
     - name: memory.free.bytes
       type: long
@@ -71,10 +71,6 @@
       type: keyword
       description: >
         Host name.
-    - name: network_names
-      type: keyword
-      description: >
-        Network names.
     - name: network.names
       type: keyword
       description: >
@@ -96,7 +92,7 @@
     - name: network.bandwidth.total.bytes
       type: long
       description: >
-        Network utilization (combined transmit-rates and receive-rates).
+        Sum of network transmitted and received rates in bytes during the interval.
       format: bytes
     - name: network.packets.transmitted.count
       type: long

network_names is there twice? also fixed some descriptions.

@kush-elastic
Copy link
Collaborator Author

network_names is there twice? also fixed some descriptions.

We need to retain the original network_names field that existed before my changes. However, with the introduction of new fields, we wanted to adopt the ECS (Elastic Common Schema) convention, which groups fields using dots (.). As a result, we temporarily added duplicate fields. The older fields will be deprecated later.

@ishleenk17
Copy link
Contributor

@kush-elastic Can you add all the newly added metrics as part of the associations between host/network/datastore/vm to the metric list in the issue here and in our excel.
I see not all metrics are listed there.

Copy link
Contributor

@ishleenk17 ishleenk17 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good!
Please address this. #40429 (comment)

@kush-elastic kush-elastic merged commit e09627f into elastic:main Aug 22, 2024
123 checks passed
@ishleenk17 ishleenk17 added the backport-8.15 Automated backport to the 8.15 branch with mergify label Sep 5, 2024
mergify bot pushed a commit that referenced this pull request Sep 5, 2024
* [vSphere][host] Data collection and Fields mappings

* add changelog entry

* update github.com/vmware/govmomi version

* update go.sum

* make check for beats

* golint-ci

* update the field mappings

* mage update

* add more UTs

* metricbeat: mage update

* Resolve review comments

* Minor changes
- Update interval

* add vm and datastore names with count per host

* fix and update mapping methods

* remove fmt.Errorf

(cherry picked from commit e09627f)
ishleenk17 added a commit that referenced this pull request Sep 13, 2024
#40697)

* [vSphere][host] Data collection and Fields mappings

* add changelog entry

* update github.com/vmware/govmomi version

* update go.sum

* make check for beats

* golint-ci

* update the field mappings

* mage update

* add more UTs

* metricbeat: mage update

* Resolve review comments

* Minor changes
- Update interval

* add vm and datastore names with count per host

* fix and update mapping methods

* remove fmt.Errorf

(cherry picked from commit e09627f)

Co-authored-by: Kush Rana <[email protected]>
Co-authored-by: Ishleen Kaur <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport-8.15 Automated backport to the 8.15 branch with mergify enhancement Metricbeat Metricbeat Module:beat The beat Metricbeat module Team:Obs-InfraObs Label for the Observability Infrastructure Monitoring team
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants