From cacdae9d706a7174c2c364efad74ac7b94eba826 Mon Sep 17 00:00:00 2001 From: Mario Rodriguez Date: Wed, 24 Aug 2022 16:40:03 +0200 Subject: [PATCH 1/4] Increase default max size and add troubleshotting page --- CHANGELOG.md | 1 + cmd/tempo/app/config.go | 4 ++ docs/tempo/website/troubleshooting/_index.md | 1 + .../troubleshooting/response-too-large.md | 55 +++++++++++++++++++ 4 files changed, 61 insertions(+) create mode 100644 docs/tempo/website/troubleshooting/response-too-large.md diff --git a/CHANGELOG.md b/CHANGELOG.md index d3fce2de1e7..bdc9ce8aef8 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -1,6 +1,7 @@ ## main / unreleased * [CHANGE] tempo: check configuration returns now a list of warnings [#1663](https://github.com/grafana/tempo/pull/1663) (@frzifus) +* [CHANGE] Increase default max message size from 4MB to 16MB [#1662]() (@mapno) * [ENHANCEMENT] metrics-generator: expose span size as a metric [#1662](https://github.com/grafana/tempo/pull/1662) (@ie-pham) * [ENHANCEMENT] Set Max Idle connections to 100 for Azure, should reduce DNS errors in Azure [#1632](https://github.com/grafana/tempo/pull/1632) (@electron0zero) diff --git a/cmd/tempo/app/config.go b/cmd/tempo/app/config.go index a7c8719f8ea..580c38adabc 100644 --- a/cmd/tempo/app/config.go +++ b/cmd/tempo/app/config.go @@ -71,6 +71,10 @@ func (c *Config) RegisterFlagsAndApplyDefaults(prefix string, f *flag.FlagSet) { flagext.DefaultValues(&c.Server) c.Server.LogLevel.RegisterFlags(f) + // Increase max message size to 16MB + c.Server.GPRCServerMaxRecvMsgSize = 16 * 1024 * 1024 + c.Server.GRPCServerMaxSendMsgSize = 16 * 1024 * 1024 + // The following GRPC server settings are added to address this issue - https://github.com/grafana/tempo/issues/493 // The settings prevent the grpc server from sending a GOAWAY message if a client sends heartbeat messages // too frequently (due to lack of real traffic). diff --git a/docs/tempo/website/troubleshooting/_index.md b/docs/tempo/website/troubleshooting/_index.md index 71ea782f1f3..2437529ba3c 100644 --- a/docs/tempo/website/troubleshooting/_index.md +++ b/docs/tempo/website/troubleshooting/_index.md @@ -21,3 +21,4 @@ In addition, the [Tempo runbook](https://github.com/grafana/tempo/blob/main/oper - [Error message "Too many jobs in the queue"]({{< relref "too-many-jobs-in-queue/" >}}) - [Queries fail with 500 and "error using pageFinder"]({{< relref "bad-blocks/" >}}) - [I can search traces, but there are no service name or span name values available]({{< relref "search-tag" >}}) +- [Error message "response larger than the max ( vs )]({{< relref "response-too-large/" >}}) diff --git a/docs/tempo/website/troubleshooting/response-too-large.md b/docs/tempo/website/troubleshooting/response-too-large.md new file mode 100644 index 00000000000..a7029ceb16c --- /dev/null +++ b/docs/tempo/website/troubleshooting/response-too-large.md @@ -0,0 +1,55 @@ +--- +title: Response larger than the max +weight: 477 +--- + +# Response too large + +The error message will take a similar form to the following: + +``` +500 Internal Server Error Body: response larger than the max ( vs ) +``` + +This error indicates that the response receiver or sent is too large. +This can happen in multiple places, but it's most commonly seen in the query path, +with messages between the querier and the query frontend. + +## Solutions + +### Tempo server (general) + +Tempo components communicate with each other via gRPC requests. +In order to increase the maximum message size, you can increase the gRPC message size limit in the server block. + +```yaml +server: + grpc_server_max_recv_msg_size: + grpc_server_max_send_msg_size: +``` + +Note that the server config block is not synchronized across components. +Most likely, you will need to increase the message size limit in multiple components. + +### Querier + +Additionally, querier workers can be configured to use a larger message size limit as well. + +```yaml +querier: + frontend_worker: + grpc_client_config: + max_send_msg_size: +``` + +### Ingestion + +Lastly, message size is also limited in ingestion and can be modified in the distributor block. + +```yaml +distributor: + receivers: + otlp: + grpc: + max_recv_msg_size_mib: +``` \ No newline at end of file From 099b40afe0bf0153018962acfd89afba177d5278 Mon Sep 17 00:00:00 2001 From: Mario Rodriguez Date: Wed, 24 Aug 2022 16:41:15 +0200 Subject: [PATCH 2/4] Changelog --- CHANGELOG.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/CHANGELOG.md b/CHANGELOG.md index bdc9ce8aef8..c2c8bdff240 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -1,7 +1,7 @@ ## main / unreleased * [CHANGE] tempo: check configuration returns now a list of warnings [#1663](https://github.com/grafana/tempo/pull/1663) (@frzifus) -* [CHANGE] Increase default max message size from 4MB to 16MB [#1662]() (@mapno) +* [CHANGE] Increase default max message size from 4MB to 16MB [#1688](https://github.com/grafana/tempo/pull/1688) (@mapno) * [ENHANCEMENT] metrics-generator: expose span size as a metric [#1662](https://github.com/grafana/tempo/pull/1662) (@ie-pham) * [ENHANCEMENT] Set Max Idle connections to 100 for Azure, should reduce DNS errors in Azure [#1632](https://github.com/grafana/tempo/pull/1632) (@electron0zero) From 7660e52f8e21ac42fd79794acc2260d5eab4a1e0 Mon Sep 17 00:00:00 2001 From: Mario Rodriguez Date: Wed, 24 Aug 2022 16:42:28 +0200 Subject: [PATCH 3/4] Changelog --- CHANGELOG.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/CHANGELOG.md b/CHANGELOG.md index c2c8bdff240..96a5e0f4830 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -1,7 +1,7 @@ ## main / unreleased * [CHANGE] tempo: check configuration returns now a list of warnings [#1663](https://github.com/grafana/tempo/pull/1663) (@frzifus) -* [CHANGE] Increase default max message size from 4MB to 16MB [#1688](https://github.com/grafana/tempo/pull/1688) (@mapno) +* [CHANGE] Increase default values for `server.grpc_server_max_recv_msg_size` and `server.grpc_server_max_send_msg_size` from 4MB to 16MB [#1688](https://github.com/grafana/tempo/pull/1688) (@mapno) * [ENHANCEMENT] metrics-generator: expose span size as a metric [#1662](https://github.com/grafana/tempo/pull/1662) (@ie-pham) * [ENHANCEMENT] Set Max Idle connections to 100 for Azure, should reduce DNS errors in Azure [#1632](https://github.com/grafana/tempo/pull/1632) (@electron0zero) From c542234eba1830129523f7449fffa1273c411ed3 Mon Sep 17 00:00:00 2001 From: Mario Date: Thu, 25 Aug 2022 13:19:27 +0200 Subject: [PATCH 4/4] Apply suggestions from code review Co-authored-by: Kim Nylander <104772500+knylander-grafana@users.noreply.github.com> --- .../website/troubleshooting/response-too-large.md | 10 +++++----- 1 file changed, 5 insertions(+), 5 deletions(-) diff --git a/docs/tempo/website/troubleshooting/response-too-large.md b/docs/tempo/website/troubleshooting/response-too-large.md index a7029ceb16c..9fda1b35fa9 100644 --- a/docs/tempo/website/troubleshooting/response-too-large.md +++ b/docs/tempo/website/troubleshooting/response-too-large.md @@ -11,7 +11,7 @@ The error message will take a similar form to the following: 500 Internal Server Error Body: response larger than the max ( vs ) ``` -This error indicates that the response receiver or sent is too large. +This error indicates that the response received or sent is too large. This can happen in multiple places, but it's most commonly seen in the query path, with messages between the querier and the query frontend. @@ -20,7 +20,7 @@ with messages between the querier and the query frontend. ### Tempo server (general) Tempo components communicate with each other via gRPC requests. -In order to increase the maximum message size, you can increase the gRPC message size limit in the server block. +To increase the maximum message size, you can increase the gRPC message size limit in the server block. ```yaml server: @@ -28,12 +28,12 @@ server: grpc_server_max_send_msg_size: ``` -Note that the server config block is not synchronized across components. -Most likely, you will need to increase the message size limit in multiple components. +The server config block is not synchronized across components. +Most likely you will need to increase the message size limit in multiple components. ### Querier -Additionally, querier workers can be configured to use a larger message size limit as well. +Additionally, querier workers can be configured to use a larger message size limit. ```yaml querier: