Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[question] how to control the "maximum number of batches stored to disk" independently of in-memory queue size while using persistent queue in "exporterhelper" (ClickHouseExporter) #29006

Closed
ceevaaa opened this issue Nov 7, 2023 · 6 comments
Labels
exporter/clickhouse question Further information is requested

Comments

@ceevaaa
Copy link

ceevaaa commented Nov 7, 2023

Component(s)

exporter/clickhouse, extension/storage, extension/storage/filestorage

What happened?

Design

At a macro level I am sending the telemetry data like this.
local Otel collector (otlp exporter) -> central Otel collector (clickhouse exporter) -> ClickHouse DB

Description

AIM - To control the number of batches of telemetry data that can be stored in

  1. in-memory
  2. persistent storage

"independently" as and when there is an error in exportation due to various reasons.

Eg- I would want to have very less amount of data in-memory (say 1_000 batches) but I am okay with having 100_000 batches written to my persistent storage.

I am confused which exact parameter does that, as in the docs it says,
The maximum number of batches stored to disk can be controlled using sending_queue.queue_size parameter (which, similarly as for in-memory buffering, defaults to 1000 batches).
But I can only see one param in exporter helper that can change this (which I think will only change the in-memory queue size)

So my two questions are -

  1. Can I control these two parameters independently ? Say, increase storage and decrease in-memory ?
  2. How can I use the same in the ClickHouse exporter ?

Regards,
Shiva Pundir

Collector version

v0.88.0

Environment information

No response

OpenTelemetry Collector configuration

# Local Otel Collector

extensions:
  file_storage/otlp:
    directory: /var/lib/otelcol/file_storage
    timeout: 1s

exporters:
  otlp:
    endpoint: ${env:CENTRAL_OTEL_COLLECTOR_ENDPOINT}
    tls:
      insecure: true
    retry_on_failure:
      enabled: true
      initial_interval: 5s
      max_interval: 30s
      max_elapsed_time: 3600s
    sending_queue:
      enabled: true
      num_consumers: 10
      queue_size: 10000
      storage: file_storage/otlp
    timeout: 5s


# Central Otel Collector

exporters:
  clickhouse/realtime:
    endpoint: ${env:CLICKHOUSE_REALTIME_ENDPOINT}
    username: ${env:CLICKHOUSE_REALTIME_USERNAME}
    database: otel
    ttl_days: 90
    logs_table_name: otel_logs
    traces_table_name: otel_traces
    metrics_table_name: otel_metrics
    timeout: 5s
    sending_queue: 
      queue_size: 10000
    retry_on_failure:
      enabled: true
      initial_interval: 5s
      max_interval: 30s
      max_elapsed_time: 300s

Log output

No response

Additional context

No response

@ceevaaa ceevaaa added bug Something isn't working needs triage New item requiring triage labels Nov 7, 2023
Copy link
Contributor

github-actions bot commented Nov 7, 2023

Pinging code owners:

See Adding Labels via Comments if you do not have permissions to add labels yourself.

@ceevaaa
Copy link
Author

ceevaaa commented Nov 7, 2023

/label -bug

@crobert-1 crobert-1 added question Further information is requested and removed bug Something isn't working labels Nov 7, 2023
@crobert-1
Copy link
Member

crobert-1 commented Nov 7, 2023

Hello @ceevaaa, just a clarification first: Your config has two exporters, but no receivers. This is just a simplification since you're running multiple collectors, right?

I did some reading any I believe I may have an answer. First of all, to enable the persistent queue for the ClickHouse exporter, you simply set the sending_queue.storage config option, just like you've done for the OTLP exporter. (Your current config is not using persistent storage for the ClickHouse exporter)

Once you've enabled the persistent queue for a component (the ClickHouse exporter in this example), there's no in-memory queue. This was made clear to me from the code here. The persistent queue works by adding everything it receives to persistent storage, and then when it's time to send data it will load a single batch directly from the persistent queue. This means you only have a single batch in memory at a given time, and all data goes to the persistent queue before actually being exported. (My logic here for in-memory batches is for a single consumer, but you may have more in-memory batches if multiple consumers are de-queueing from the persistent queue at the same time).

One more thing, the ClickHouse exporter's README also explicitly states to use the batch processor. You can use the send_batch_size option in the batch processor to determine how large your queue_size should be. (I mention this because in your shared configuration there is no batch processor defined, so I just want to make sure we're not missing anything here.)

Does that all generally make sense? Feel free to let us know if you have any other questions. I'm not very familiar with these components, so someone else may correct me here or add more information as well.

@ceevaaa
Copy link
Author

ceevaaa commented Nov 7, 2023

Apologies from my side for not mentioning the other details. The actual configs are too big, but nonetheless, I should have stated these things clearly.

So,
"This is just a simplification since you're running multiple collectors, right?" - Yes
"I mention this because in your shared configuration there is no batch processor defined" - I am running the batch processor (but didn't mention it because of the long config, apologies again)

"Once you've enabled the persistent queue for a component (the ClickHouse exporter in this example), there's no in-memory queue. "
Oh 😲 , this explains everything.

"The persistent queue works by adding everything it receives to persistent storage, and then when it's time to send data it will load a single batch directly from the persistent queue. "
I see. I had the wrong idea earlier. I thought it was a mix of both and only worked when an exporter failed to export the telemetry data.

"Does that all generally make sense?"
Oh god yes. 100%. Thank you very much @crobert-1 .

So in conclusion, if you enable the persistent queue option, all the telemetry data is written to the persistent storage first and then is picked by the exporter (consumers - can be more than 1), sent to in-memory (in single batch per consumer) and exported.
Correct ?

Regards,
Shiva Pundir

@crobert-1
Copy link
Member

Yes, I believe your conclusion is correct. Let us know if you have any other questions, otherwise feel free to close the issue!

@ceevaaa
Copy link
Author

ceevaaa commented Nov 7, 2023

Thanks for the help.

@ceevaaa ceevaaa closed this as completed Nov 7, 2023
mx-psi pushed a commit to open-telemetry/opentelemetry-collector that referenced this issue Nov 8, 2023
**Description:**
Minor clarifications in the README. 

1. The [batch processor's
option](https://github.com/open-telemetry/opentelemetry-collector/blob/8bea0d372c9965a99dd88d1f5c4c4b7acee9db40/processor/batchprocessor/config.go#L24)
is named `send_batch_size`, not `batch_send_size`.
2. It may be obvious but I didn't instantly realize the sending queue is
one or the other (in-memory or persistent). This adds a comment to make
it clear.

For reference, I'm adding this as a result of investigation for this
issue:
open-telemetry/opentelemetry-collector-contrib#29006
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
exporter/clickhouse question Further information is requested
Projects
None yet
Development

No branches or pull requests

2 participants