Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[receiver/sqlserver] Enable more perf counter metrics #33420

Merged
merged 4 commits into from
Jun 19, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
33 changes: 33 additions & 0 deletions .chloggen/sqlserver_pc_metrics.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,33 @@
# Use this changelog template to create an entry for release notes.

# One of 'breaking', 'deprecation', 'new_component', 'enhancement', 'bug_fix'
change_type: enhancement
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm on the fence as to whether this should be considered an enhancement or breaking change. These metrics are all shown as being enabled by default, but the reality is they aren't received unless running on Windows.

This change does increase the number of metrics being received when directly connecting to the SQL server instance in reality, so let me know if we should mark this as breaking instead.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems reasonable to call it an enhancement


# The name of the component, or a single word describing the area of concern, (e.g. filelogreceiver)
component: sqlserverreceiver

# A brief description of the change. Surround your text with quotes ("") if it needs to start with a backtick (`).
note: Enable more perf counter metrics when directly connecting to SQL Server

# Mandatory: One or more tracking issues related to the change. You can use the PR number here if no issue exists.
issues: [33420]

# (Optional) One or more lines of additional information to render under the primary note.
# These lines will be padded with 2 spaces and then inserted directly into the document.
# Use pipe (|) for multiline entries.
subtext: |
This enables the following metrics by default on non Windows-based systems:
`sqlserver.batch.request.rate`
`sqlserver.batch.sql_compilation.rate`
`sqlserver.batch.sql_recompilation.rate`
`sqlserver.page.buffer_cache.hit_ratio`
`sqlserver.user.connection.count`

# If your change doesn't affect end users or the exported elements of any package,
# you should instead start your pull request title with [chore] or use the "Skip Changelog" label.
# Optional: The change log or logs in which this entry should be included.
# e.g. '[user]' or '[user, api]'
# Include 'user' if the change is relevant to end users.
# Include 'api' if there is a change to a library API.
# Default: '[user]'
change_logs: []
10 changes: 0 additions & 10 deletions receiver/sqlserverreceiver/documentation.md
Original file line number Diff line number Diff line change
Expand Up @@ -16,8 +16,6 @@ metrics:

Number of batch requests received by SQL Server.

This metric is only available when running on Windows.

| Unit | Metric Type | Value Type |
| ---- | ----------- | ---------- |
| {requests}/s | Gauge | Double |
Expand All @@ -26,8 +24,6 @@ This metric is only available when running on Windows.

Number of SQL compilations needed.

This metric is only available when running on Windows.

| Unit | Metric Type | Value Type |
| ---- | ----------- | ---------- |
| {compilations}/s | Gauge | Double |
Expand All @@ -36,8 +32,6 @@ This metric is only available when running on Windows.

Number of SQL recompilations needed.

This metric is only available when running on Windows.

| Unit | Metric Type | Value Type |
| ---- | ----------- | ---------- |
| {compilations}/s | Gauge | Double |
Expand All @@ -64,8 +58,6 @@ This metric is only available when running on Windows.

Pages found in the buffer pool without having to read from disk.

This metric is only available when running on Windows.

| Unit | Metric Type | Value Type |
| ---- | ----------- | ---------- |
| % | Gauge | Double |
Expand Down Expand Up @@ -210,8 +202,6 @@ This metric is only available when running on Windows.

Number of users connected to the SQL Server.

This metric is only available when running on Windows.

| Unit | Metric Type | Value Type |
| ---- | ----------- | ---------- |
| {connections} | Gauge | Int |
Expand Down
5 changes: 0 additions & 5 deletions receiver/sqlserverreceiver/metadata.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -55,7 +55,6 @@ metrics:
unit: "{connections}"
gauge:
value_type: int
extended_documentation: This metric is only available when running on Windows.
sqlserver.lock.wait_time.avg:
enabled: true
description: Average wait time for all lock requests that had to wait.
Expand All @@ -75,28 +74,24 @@ metrics:
unit: "{requests}/s"
gauge:
value_type: double
extended_documentation: This metric is only available when running on Windows.
sqlserver.batch.sql_compilation.rate:
enabled: true
description: Number of SQL compilations needed.
unit: "{compilations}/s"
gauge:
value_type: double
extended_documentation: This metric is only available when running on Windows.
sqlserver.batch.sql_recompilation.rate:
enabled: true
description: Number of SQL recompilations needed.
unit: "{compilations}/s"
gauge:
value_type: double
extended_documentation: This metric is only available when running on Windows.
sqlserver.page.buffer_cache.hit_ratio:
enabled: true
description: Pages found in the buffer pool without having to read from disk.
unit: "%"
gauge:
value_type: double
extended_documentation: This metric is only available when running on Windows.
sqlserver.page.life_expectancy:
enabled: true
description: Time a page will stay in the buffer pool.
Expand Down
45 changes: 45 additions & 0 deletions receiver/sqlserverreceiver/scraper.go
Original file line number Diff line number Diff line change
Expand Up @@ -172,10 +172,15 @@ func (s *sqlServerScraperHelper) recordDatabasePerfCounterMetrics(ctx context.Co
const counterKey = "counter"
const valueKey = "value"
// Constants are the columns for metrics from query
const batchRequestRate = "Batch Requests/sec"
const bufferCacheHitRatio = "Buffer cache hit ratio"
const diskReadIOThrottled = "Disk Read IO Throttled/sec"
const diskWriteIOThrottled = "Disk Write IO Throttled/sec"
const lockWaits = "Lock Waits/sec"
const processesBlocked = "Processes blocked"
const sqlCompilationRate = "SQL Compilations/sec"
const sqlReCompilationsRate = "SQL Re-Compilations/sec"
const userConnCount = "User Connections"

rows, err := s.client.QueryRows(ctx)

Expand All @@ -195,6 +200,22 @@ func (s *sqlServerScraperHelper) recordDatabasePerfCounterMetrics(ctx context.Co
}

switch row[counterKey] {
case batchRequestRate:
val, err := strconv.ParseFloat(row[valueKey], 64)
if err != nil {
err = fmt.Errorf("row %d: %w", i, err)
errs = append(errs, err)
} else {
s.mb.RecordSqlserverBatchRequestRateDataPoint(now, val)
}
case bufferCacheHitRatio:
val, err := strconv.ParseFloat(row[valueKey], 64)
if err != nil {
err = fmt.Errorf("row %d: %w", i, err)
errs = append(errs, err)
} else {
s.mb.RecordSqlserverPageBufferCacheHitRatioDataPoint(now, val)
}
case diskReadIOThrottled:
errs = append(errs, s.mb.RecordSqlserverResourcePoolDiskThrottledReadRateDataPoint(now, row[valueKey]))
case diskWriteIOThrottled:
Expand All @@ -209,6 +230,30 @@ func (s *sqlServerScraperHelper) recordDatabasePerfCounterMetrics(ctx context.Co
}
case processesBlocked:
errs = append(errs, s.mb.RecordSqlserverProcessesBlockedDataPoint(now, row[valueKey]))
case sqlCompilationRate:
val, err := strconv.ParseFloat(row[valueKey], 64)
if err != nil {
err = fmt.Errorf("row %d: %w", i, err)
errs = append(errs, err)
} else {
s.mb.RecordSqlserverBatchSQLCompilationRateDataPoint(now, val)
}
case sqlReCompilationsRate:
val, err := strconv.ParseFloat(row[valueKey], 64)
if err != nil {
err = fmt.Errorf("row %d: %w", i, err)
errs = append(errs, err)
} else {
s.mb.RecordSqlserverBatchSQLRecompilationRateDataPoint(now, val)
}
case userConnCount:
val, err := strconv.ParseInt(row[valueKey], 10, 64)
if err != nil {
err = fmt.Errorf("row %d: %w", i, err)
errs = append(errs, err)
} else {
s.mb.RecordSqlserverUserConnectionCountDataPoint(now, val)
}
}
}

Expand Down
40 changes: 40 additions & 0 deletions receiver/sqlserverreceiver/testdata/expectedPerfCounters.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,30 @@ resourceMetrics:
stringValue: 8cac97ac9b8f
scopeMetrics:
- metrics:
- description: Number of batch requests received by SQL Server.
gauge:
dataPoints:
- asDouble: 3375
startTimeUnixNano: "1000000"
timeUnixNano: "2000000"
name: sqlserver.batch.request.rate
unit: '{requests}/s'
- description: Number of SQL compilations needed.
gauge:
dataPoints:
- asDouble: 413
startTimeUnixNano: "1000000"
timeUnixNano: "2000000"
name: sqlserver.batch.sql_compilation.rate
unit: '{compilations}/s'
- description: Number of SQL recompilations needed.
gauge:
dataPoints:
- asDouble: 63
startTimeUnixNano: "1000000"
timeUnixNano: "2000000"
name: sqlserver.batch.sql_recompilation.rate
unit: '{compilations}/s'
- description: Number of lock requests resulting in a wait.
gauge:
dataPoints:
Expand All @@ -14,6 +38,14 @@ resourceMetrics:
timeUnixNano: "2000000"
name: sqlserver.lock.wait.rate
unit: '{requests}/s'
- description: Pages found in the buffer pool without having to read from disk.
gauge:
dataPoints:
- asDouble: 100
startTimeUnixNano: "1000000"
timeUnixNano: "2000000"
name: sqlserver.page.buffer_cache.hit_ratio
unit: '%'
- description: The number of processes that are currently blocked
gauge:
dataPoints:
Expand Down Expand Up @@ -44,6 +76,14 @@ resourceMetrics:
timeUnixNano: "2000000"
name: sqlserver.resource_pool.disk.throttled.write.rate
unit: '{writes}/s'
- description: Number of users connected to the SQL Server.
gauge:
dataPoints:
- asInt: "3"
startTimeUnixNano: "1000000"
timeUnixNano: "2000000"
name: sqlserver.user.connection.count
unit: '{connections}'
scope:
name: otelcol/sqlserverreceiver
version: latest