Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add query frontend stats logs to tempo #3342

Closed
jmichalek132 opened this issue Jan 27, 2024 · 3 comments · Fixed by #3383
Closed

Add query frontend stats logs to tempo #3342

jmichalek132 opened this issue Jan 27, 2024 · 3 comments · Fixed by #3383

Comments

@jmichalek132
Copy link
Contributor

Is your feature request related to a problem? Please describe.

I would like to see the queries being executed against Loki, and additional information in them so I can use Loki to track query performance, track down users whose queries failed, etc.

Describe the solution you'd like

Implementing similar log lines as the query stats already present in Loki and Mimir query frontend.

Describe alternatives you've considered

n/a

Additional context

This was discussed in the Tempo Community Call. It was suggested something like that is present already, Since I couldn't find any log lines like this I opened this issue. I would be also willing to work on implementing this.

@kvrhdn
Copy link
Member

kvrhdn commented Jan 29, 2024

We already report some stats similar to this here:

level.Info(f.logger).Log(
"tenant", orgID,
"method", r.Method,
"traceID", traceID,
"url", r.URL.RequestURI(),
"duration", elapsed.String(),
"response_size", contentLength,
"status", statusCode,
)

and here

level.Info(s.logger).Log(
"msg", "sharded search query request stats and SearchMetrics",
"tenant", tenantID,
"query", query,
"duration_seconds", reqTime,
"request_throughput", throughput,
"total_requests", totalJobs,
"started_requests", startedReqs,
"cancelled_requests", cancelledReqs,
"finished_requests", overallResponse.finishedRequests,
"totalBlocks", overallResponse.response.Metrics.TotalBlocks,
"inspectedBytes", overallResponse.response.Metrics.InspectedBytes,
"inspectedTraces", overallResponse.response.Metrics.InspectedTraces,
"totalBlockBytes", overallResponse.response.Metrics.TotalBlockBytes)

I assume this is what people were referring to in the community call.
Would this be the information you are looking for or is it missing something?

Example logs this creates:

level=info ts=2024-01-29T15:04:49.267959765Z caller=handler.go:134 tenant=vulture-tenant method=GET traceID=39b93258b7e54a70 url="/tempo/api/search?end=1706104010&q=%7B.vulture-process-1+%3D+%22CDFcUEYHqBhFzScTYOt%22%7D&start=1706100410" duration=145.026206ms response_size=1461 status=200
2024-01-29 15:04:49.267	
level=info ts=2024-01-29T15:04:49.267453897Z caller=searchsharding.go:247 msg="sharded search query request stats and SearchMetrics" tenant=vulture-tenant query="end=1706104010&q={.vulture-process-1+=+\"CDFcUEYHqBhFzScTYOt\"}&start=1706100410" duration_seconds=144.688742ms request_throughput=6.02823680642686e+06 total_requests=16 started_requests=16 cancelled_requests=0 finished_requests=16 totalBlocks=16 inspectedBytes=872218 inspectedTraces=0 totalBlockBytes=6888100
2024-01-29 15:04:49.048	
level=info ts=2024-01-29T15:04:49.04868224Z caller=handler.go:134 tenant=vulture-tenant method=GET traceID=48e941ce3db435ad url="/tempo/api/search?end=1706104010&start=1706100410&tags=vulture-process-1%3DCDFcUEYHqBhFzScTYOt" duration=20.354985ms response_size=280 status=200
2024-01-29 15:04:49.048	
level=info ts=2024-01-29T15:04:49.048534173Z caller=searchsharding.go:247 msg="sharded search query request stats and SearchMetrics" tenant=vulture-tenant query="end=1706104010&start=1706100410&tags=vulture-process-1=CDFcUEYHqBhFzScTYOt" duration_seconds=20.155161ms request_throughput=1.667235503601286e+07 total_requests=16 started_requests=16 cancelled_requests=0 finished_requests=16 totalBlocks=16 inspectedBytes=336034 inspectedTraces=0 totalBlockBytes=6888100

@jmichalek132
Copy link
Contributor Author

We already report some stats similar to this here:

level.Info(f.logger).Log(
"tenant", orgID,
"method", r.Method,
"traceID", traceID,
"url", r.URL.RequestURI(),
"duration", elapsed.String(),
"response_size", contentLength,
"status", statusCode,
)

and here

level.Info(s.logger).Log(
"msg", "sharded search query request stats and SearchMetrics",
"tenant", tenantID,
"query", query,
"duration_seconds", reqTime,
"request_throughput", throughput,
"total_requests", totalJobs,
"started_requests", startedReqs,
"cancelled_requests", cancelledReqs,
"finished_requests", overallResponse.finishedRequests,
"totalBlocks", overallResponse.response.Metrics.TotalBlocks,
"inspectedBytes", overallResponse.response.Metrics.InspectedBytes,
"inspectedTraces", overallResponse.response.Metrics.InspectedTraces,
"totalBlockBytes", overallResponse.response.Metrics.TotalBlockBytes)

I assume this is what people were referring to in the community call. Would this be the information you are looking for or is it missing something?

Example logs this creates:

level=info ts=2024-01-29T15:04:49.267959765Z caller=handler.go:134 tenant=vulture-tenant method=GET traceID=39b93258b7e54a70 url="/tempo/api/search?end=1706104010&q=%7B.vulture-process-1+%3D+%22CDFcUEYHqBhFzScTYOt%22%7D&start=1706100410" duration=145.026206ms response_size=1461 status=200
2024-01-29 15:04:49.267	
level=info ts=2024-01-29T15:04:49.267453897Z caller=searchsharding.go:247 msg="sharded search query request stats and SearchMetrics" tenant=vulture-tenant query="end=1706104010&q={.vulture-process-1+=+\"CDFcUEYHqBhFzScTYOt\"}&start=1706100410" duration_seconds=144.688742ms request_throughput=6.02823680642686e+06 total_requests=16 started_requests=16 cancelled_requests=0 finished_requests=16 totalBlocks=16 inspectedBytes=872218 inspectedTraces=0 totalBlockBytes=6888100
2024-01-29 15:04:49.048	
level=info ts=2024-01-29T15:04:49.04868224Z caller=handler.go:134 tenant=vulture-tenant method=GET traceID=48e941ce3db435ad url="/tempo/api/search?end=1706104010&start=1706100410&tags=vulture-process-1%3DCDFcUEYHqBhFzScTYOt" duration=20.354985ms response_size=280 status=200
2024-01-29 15:04:49.048	
level=info ts=2024-01-29T15:04:49.048534173Z caller=searchsharding.go:247 msg="sharded search query request stats and SearchMetrics" tenant=vulture-tenant query="end=1706104010&start=1706100410&tags=vulture-process-1=CDFcUEYHqBhFzScTYOt" duration_seconds=20.155161ms request_throughput=1.667235503601286e+07 total_requests=16 started_requests=16 cancelled_requests=0 finished_requests=16 totalBlocks=16 inspectedBytes=336034 inspectedTraces=0 totalBlockBytes=6888100

Ah sorry for the late reply I missed those. Would be great than If I could add this feature to them from loki / mimir
grafana/loki#11422
So I can see i.e. which user agent sent the query / which user from Grafana.
If I raised a PR for it would it be welcomed?

@mapno
Copy link
Member

mapno commented Feb 7, 2024

If I raised a PR for it would it be welcomed?

Yes! I'll assign you the issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants