Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[pkg/stanza] Make batching behavior configurable #21184

Conversation

dmitryax
Copy link
Member

@dmitryax dmitryax commented Apr 26, 2023

Expose configuration options to control the batching behavior of the stanza receivers:

  • batch.max_batch_size: The maximum number of spans to batch together.
  • batch.timeout: The maximum amount of time to wait before sending a batch.

Providing configuration options for batching in the receivers makes it possible to replace the batch processor and prevent data loss when an exporter returns a retriable error.

Introduce new configuration options to control the batching behavior of the stanza receivers:
  - `batch.max_batch_size`: The maximum number of spans to batch together.
  - `batch.timeout`: The maximum amount of time to wait before sending a batch.

Providing configuration options for batching in the receivers makes it possible to replace the batch processor and prevent data loss when an exporter returns a retryable error.
@dmitryax dmitryax force-pushed the stanza-make-batching-configurable branch from 50cdb15 to da09176 Compare April 26, 2023 05:58
@djaglowski
Copy link
Member

These setting were removed prior to declaring the filelog receiver beta, because it was unclear whether we should be batching at all. (context)

I think before we re-add these parameters, we should resolve the question of whether pkg/stanza at all. It's not clear to me that there are any real benefits, but it certainly complicates the codebase quite a bit.

@djaglowski
Copy link
Member

djaglowski commented May 10, 2023

One more reason we may want to remove the complicated conversion logic altogether: #21740 (comment)

Copy link
Member

@djaglowski djaglowski left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I want to make sure we do not expose configuration for batching until we determine whether or not we are keeping the behavior. See #21184 (comment).

I will open an issue to propose that we remove batching from the stanza adapter. If that is rejected, then we will have to add configuration.

| `encoding` | `utf-8` | The encoding of the file being read. See the list of supported encodings below for available options |
| `operators` | [] | An array of [operators](../../pkg/stanza/docs/operators/README.md#what-operators-are-available). See below for more details |
| `batch.max_batch_size` | `100` | Maximum number of log records per emitted batch. |
| `batch.timeout` | `100 milliseconds` | Maximum duration to wait before sending out the log records batch. |
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
| `batch.timeout` | `100 milliseconds` | Maximum duration to wait before sending out the log records batch. |
| `batch.timeout` | `100ms` | Maximum duration to wait before sending out the log records batch. |

@dmitryax
Copy link
Member Author

My motivation for this was to establish a pipeline that cannot drop any logs. After #20511, the only part of a typical pipeline dropping data is the batch processor. Removing it drops the batch size to 100 (hardcoded in stanza receivers), which significantly slows down the whole pipeline.

This change would bring me a short-term solution. The long-term one would be open-telemetry/opentelemetry-collector#4646. I'm good with rejecting this PR in favor of the long-term solution.

@djaglowski
Copy link
Member

@dmitryax, I've done some preliminary benchmarking in #21929 to evaluate the impact of the batch size. While I tentatively agree we will need to add batching controls, my intuition is that the adapter codebase can be simplified and that doing so may have performance benefits or impacts on the eventual configuration. To that end, I think it would be reasonable to add the new parameters behind a feature gate for now, but I'd like to spend a little more time evaluating this possibility before we commit to adding configuration that may later be changed. If you're able to review #21928 and #21929, this would help me move forward on the refactoring I'd like to try.

@github-actions
Copy link
Contributor

github-actions bot commented Jun 1, 2023

This PR was marked stale due to lack of activity. It will be closed in 14 days.

@github-actions github-actions bot added the Stale label Jun 1, 2023
@djaglowski djaglowski removed the Stale label Jun 1, 2023
@djaglowski
Copy link
Member

I'm still intending to evaluate ways to simplify the adapter codebase, but I've been sidetracked by some more urgent issues. In the meantime, I'd like to reiterate that adding these settings behind a feature gate seems like a reasonable way to move forward if these controls are presently urgent for anyone.

@github-actions
Copy link
Contributor

This PR was marked stale due to lack of activity. It will be closed in 14 days.

@github-actions
Copy link
Contributor

github-actions bot commented Jul 4, 2023

This PR was marked stale due to lack of activity. It will be closed in 14 days.

@github-actions github-actions bot added the Stale label Jul 4, 2023
@djaglowski djaglowski removed the Stale label Jul 5, 2023
@github-actions
Copy link
Contributor

This PR was marked stale due to lack of activity. It will be closed in 14 days.

@github-actions
Copy link
Contributor

github-actions bot commented Aug 4, 2023

This PR was marked stale due to lack of activity. It will be closed in 14 days.

@github-actions github-actions bot added the Stale label Aug 4, 2023
@MovieStoreGuy
Copy link
Contributor

Was there any update on these changes @dmitryax , @djaglowski ?

@djaglowski
Copy link
Member

Unfortunately, I have not found time for this. The package remains a high priority for me but my priority lately has been on making the fileconsumer codebase more maintainable.

@github-actions github-actions bot removed the Stale label Aug 10, 2023
@github-actions
Copy link
Contributor

This PR was marked stale due to lack of activity. It will be closed in 14 days.

@github-actions github-actions bot added the Stale label Aug 25, 2023
@orloffv
Copy link

orloffv commented Aug 28, 2023

@djaglowski any updates?

@github-actions github-actions bot removed the Stale label Aug 29, 2023
@dmitryax
Copy link
Member Author

@orloffv, I wanted to get this in to remove the batch processor and establish a resilient logs pipeline. The default batching of the filelog receiver is not as performant as the batch processor. This would be a short-term solution. Now, I'm working on a long-term solution that will be applicable to all data types open-telemetry/opentelemetry-collector#8122. So I'm not interested in this PR anymore

@github-actions
Copy link
Contributor

This PR was marked stale due to lack of activity. It will be closed in 14 days.

@github-actions github-actions bot added the Stale label Sep 15, 2023
@github-actions
Copy link
Contributor

Closed as inactive. Feel free to reopen if this PR is still being worked on.

@github-actions github-actions bot closed this Sep 30, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants