Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Traceql metrics] New (unsafe) query hints #3396

Merged
merged 16 commits into from
Mar 5, 2024

Conversation

mdisibio
Copy link
Contributor

@mdisibio mdisibio commented Feb 14, 2024

What this PR does:
Adds several new query hints which expose traditionally server-side config options. These are useful for operators (like us) to try out and quickly optimize tuneables like job size, sharding interval, and concurrency etc. But because these are unsafe to expose to end-users they are protected behind a per-tenant override flag. For example a job size of 1 byte would quickly take out the read path.

Safe hint:
The only safe hint is the existing sample=<float>.

Unsafe hints:

  • dedupe=<bool>
    • Use dedupe=false to disable deduping spans. I am doing this mainly for profiling analysis, but a use case is to check compactor performance (how well is your data deduped?).
  • job_size=<int>
    • Used in the query-frontend, override for target_bytes_per_request. Default = configured in yaml
  • job_interval=<duration>
    • How to divide the time range (5m, 1h, etc), override for interval. Default = configured in yaml
  • concurrent_blocks=<int>
    • How many blocks to inspect concurrently. Works for both querier backend and local blocks processor. Default=configured in yaml. This is a new yaml setting with defaults: querier = 2, local blocks processor = 10.
  • time_overlap_cutoff=<float>

Requires #3388

Which issue(s) this PR fixes:
Fixes n/a

Checklist

  • Tests updated
  • Documentation added
  • CHANGELOG.md updated - the order of entries should be [CHANGE], [FEATURE], [ENHANCEMENT], [BUGFIX]

Copy link
Member

@mapno mapno left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

About dedup, it might be safe but do we want users passing this hint to Tempo? Is there a use case other than for operators to tune compaction?

@mdisibio
Copy link
Contributor Author

LGTM.

About dedup, it might be safe but do we want users passing this hint to Tempo? Is there a use case other than for operators to tune compaction?

Right, there isn't really a use-case for it besides tuning compaction. Currently it's faster to not dedupe, so you could trade accuracy for speed, but that should go away after we change dedupe to loser-tree.

  1. Do you want to put it in the unsafe bucket? We can always make it "safe" again later if the use case is found.
  2. Also what do you think about the location of all the hints? Naturally we want to centralize them together in hints.go, but it also makes sense to keep the hints close to the caller. For example target_bytes_per_request is only relevant to the query-frontend.

pkg/traceql/engine_metrics.go Outdated Show resolved Hide resolved
modules/querier/querier_query_range.go Outdated Show resolved Hide resolved
@mdisibio
Copy link
Contributor Author

mdisibio commented Mar 4, 2024

@mapno This is ready for re-review. Centralized all of the hint definitions and safety to the traceql package, exposed config options for concurrent_blocks and time_overlap_cutoff, and moved dedupe to be unsafe.

Copy link
Member

@mapno mapno left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@mdisibio mdisibio merged commit 8079874 into grafana:main Mar 5, 2024
14 checks passed
stoewer added a commit that referenced this pull request Apr 12, 2024
stoewer added a commit to stoewer/tempo that referenced this pull request Apr 16, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants