Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

services/horizon: Add new metrics counters for db connection close events #5225

Merged
merged 13 commits into from
Mar 5, 2024

Conversation

sreuland
Copy link
Contributor

@sreuland sreuland commented Feb 28, 2024

PR Checklist

PR Structure

  • This PR has reasonably narrow scope (if not, break it down into smaller PRs).
  • This PR avoids mixing refactoring changes with feature changes (split into two PRs
    otherwise).
  • This PR's title starts with name of package that is most changed in the PR, ex.
    services/friendbot, or all or doc if the changes are broad or impact many
    packages.

Thoroughness

  • This PR adds tests for the most critical parts of the new functionality or fixes.
  • I've updated any docs (developer docs, .md
    files, etc... affected by this change). Take a look in the docs folder for a given service,
    like this one.

Release planning

  • I've updated the relevant CHANGELOG (here for Horizon) if
    needed with deprecations, added features, breaking changes, and DB schema changes.
  • I've decided if this PR requires a new major/minor version according to
    semver, or if it's mainly a patch change. The PR is targeted at the next
    release branch if it's not a patch change.

What

Added new db layer metrics:
client_closed_session_total
server_timeout_closed_session_total
statement_timeout_closed_session_total

Why

obtain insights on what types of events underly db sessions timeouts
Closes #5217

Known limitations

@tamirms
Copy link
Contributor

tamirms commented Feb 28, 2024

Instead of having 3 separate prometheus counters for these specific errors, I think it would be better to have one prometheus counter for all possible postgres errors returned by the driver. This could be implemented by having a label in the prometheus counter which represents the postgres error code.

Then, you can wrap the session functions with some code which checks if the returned error is a postgres server error ( https://stackoverflow.com/questions/37560534/does-the-error-returned-by-db-exec-have-a-code ) and , in that case, you can increment the metric with the appropriate error code label.

@sreuland
Copy link
Contributor Author

I think it would be better to have one prometheus counter for all possible postgres errors returned by the driver. This could be implemented by having a label in the prometheus counter which represents the postgres error code.

I re-worked it per suggestion for single metric with labels, the new metrics gathering routine attempts to get pg server error code if libpq provides it.

support/db/metrics.go Outdated Show resolved Hide resolved
support/db/session.go Outdated Show resolved Hide resolved
support/db/metrics.go Outdated Show resolved Hide resolved
support/db/metrics.go Outdated Show resolved Hide resolved
support/db/metrics.go Outdated Show resolved Hide resolved
support/db/session.go Outdated Show resolved Hide resolved
support/db/metrics.go Outdated Show resolved Hide resolved
support/db/metrics.go Outdated Show resolved Hide resolved
@sreuland sreuland requested a review from tamirms March 1, 2024 01:45
support/db/metrics.go Outdated Show resolved Hide resolved
@sreuland sreuland requested a review from tamirms March 5, 2024 05:56
Copy link
Contributor

@tamirms tamirms left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks great! I just left one comment about capturing the context error nil case #5225 (comment)

@sreuland sreuland merged commit e21bc43 into stellar:master Mar 5, 2024
29 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

services/horizon: emit new metrics counters for timeouts
2 participants