Skip to content
This repository has been archived by the owner on Sep 21, 2023. It is now read-only.

Add integration tests for the shipper output #23

Closed
Tracked by #8 ...
rdner opened this issue Apr 13, 2022 · 2 comments · Fixed by elastic/beats#31558
Closed
Tracked by #8 ...

Add integration tests for the shipper output #23

rdner opened this issue Apr 13, 2022 · 2 comments · Fixed by elastic/beats#31558
Assignees
Labels
Team:Elastic-Agent-Data-Plane Label for the Agent Data Plane team v8.3.0

Comments

@rdner
Copy link
Member

rdner commented Apr 13, 2022

  • An event batch is published from an input to the shipper gRPC server
  • An event batch is not dropped when the gRPC server is not available but starts later
  • An event batch is not dropped when ResourceExhausted code is returned from the gRPC
@rdner rdner changed the title Add integration tests that cover the following test cases: Add integration tests for the shipper output Apr 13, 2022
@rdner rdner self-assigned this Apr 13, 2022
@rdner rdner added v8.3.0 Team:Elastic-Agent-Data-Plane Label for the Agent Data Plane team labels Apr 13, 2022
@rdner
Copy link
Member Author

rdner commented May 9, 2022

@cmacknz @faec

Looking at the interface again, it occurred to me that I never asked one of the questions I had:

// Publishes an event via the Elastic agent shipper.
//
// Blocks until all processing steps complete and data is written to the queue. Returns a
// RESOURCE_EXHAUSTED gRPC status code if the queue is full.
//
// Inputs may execute multiple concurrent Produce requests for independent data streams.
// The order in which concurrent requests complete is not guaranteed. Use sequential requests to
// control ordering.
rpc PublishEvents(PublishRequest) returns (PublishReply);

is it fair to assume that regardless the returned error I must always check the list of results and mark the listed events as "accepted to the queue"?

I'll give you an example:

  1. I sent 50 events to the gRPC server
  2. It responds with an error with the gRPC code ResourceExhausted
  3. However, it also returns a list of 20 event results

The way I understand it now, I should treat these 20 events as accepted to the queue and I must retry the rest of the 30 events.

Can you please both confirm that this is the right behaviour that we're going to implement on the server?

Also, is this the case with the ResourceExhausted code only, or regardless the error I must always look the results up and retry the unaccepted events?

@cmacknz
Copy link
Member

cmacknz commented May 9, 2022

The way I understand it now, I should treat these 20 events as accepted to the queue and I must retry the rest of the 30 events.

Yes, I think it is reasonable for us to handle partially published batches. If we start implementing it and decide it is too difficult or complex we can re-evaluate that.

Also, is this the case with the ResourceExhausted code only, or regardless the error I must always look the results up and retry the unaccepted events?

ResourceExhausted is the most obvious case where some events could succeed but others may fail (because the queue is full). For now you can assume it is the only error with this behaviour.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Team:Elastic-Agent-Data-Plane Label for the Agent Data Plane team v8.3.0
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants