[shipper] Make the memory queue accept opaque pointers #31356

faec · 2022-04-19T17:34:38Z

What does this PR do?

Refactors the memory queue internal data structures to accept opaque pointers (interface{}) for its events rather than an explicit publisher.Event. This is needed for the queue to store the event representations anticipated in https://github.com/elastic/elastic-agent-shipper.

This doesn't fully resolve #31307 because it doesn't yet expose a type-agnostic public interface. This PR is already pretty big and I don't want it to eat into ON week, so I'm deferring those questions until I can give them full attention.

This change should, in a perfect world, be a functional no-op: it changes internal handling but the exposed API is unchanged. The main changes are:

Merging of events and clients into queueEntry. The memory queue previously stored events as publisher.Event, and their metadata in clientState. These were stored in separate arrays with shared indices, propagated in various ways. The new code creates the type queueEntry as its underlying buffer type, which contains the event (an interface{} which in beats has underlying type *publisher.Event) and its metadata. This change had to be propagated through a number of internal helpers like memqueue.ringBuffer
Removal of unused fields and helpers. During the conversion I came across various fields that are initialized / propagated but unused, e.g. the ackState in memqueue.batch. There were also some fields that were duplicates of others -- in eventloop.go the event loops had pointers to their associated broker and their own unaltered copies of several of its fields. I removed these when I could.
Simplifying / localizing event loop state. Event loop API endpoints were previously enabled / disabled via state variables in the structs (channels accepting pushRequest, getRequest etc) which were selectively nulled-out on appropriate state changes (e.g. if the queue is full after a pushRequest then the push channel is set to nil to block additional requests). This got quite hard to follow during the changes, since the fields were mutated throughout the code and their semantics were undocumented. I moved the channels into local variables in the run loop, initializing them immediately before their use in select. This keeps the logic in one place, and it's clearer now what specific circumstances can enable / disable each channel.
Renaming / documenting many fields and objects. A lot of the structures had api control channels and various auxiliary data but sketchy or absent documentation of their semantics. I tried to give things names more explicit about their function, and to describe when and how they're used.

Checklist

My code follows the style guidelines of this project
I have commented my code, particularly in hard-to-understand areas
~~I have made corresponding changes to the documentation~~
~~I have made corresponding change to the default configuration files~~
I have added tests that prove my fix is effective or that my feature works
~~I have added an entry in CHANGELOG.next.asciidoc or CHANGELOG-developer.next.asciidoc.~~

Performance tests

I ran some extra benchmarks using libbeat/publisher/pipeline/stress. The main configurations I tested were buffered vs direct event loop. The tests were 1-minute samples sending a continuous stream of events through the queue. They were run with:

go test -v -run TestPipeline/gen=wait_ack/pipeline=default_mem/out=default -duration 1m -memprofile memprofile-wait_ack-default.old.pprof

go test -v -run TestPipeline/gen=wait_ack/pipeline=direct_mem/out=default -duration 1m -memprofile memprofile-wait_ack-direct.old.pprof

and similarly after switching to the PR branch. The top-level results were:

Event throughput (per minute)

buffered old: 69424423
buffered new: 68942490 (-0.7%)
direct old: 30746313
direct new: 32339700 (+5.2%)

Total allocations:

buffered old: 33652 MB
buffered new: 40042 MB
direct old: 13266 MB
direct new: 18815 MB

In use allocations:

buffered old: 5125 kB, 10165 objects
buffered new: 8231 kB, 29488 objects
direct old: 3593 kB, 5500 objects
direct new: 4101 kB, 37238 objects

As expected for the nature of the change, the total allocations are noticeably higher, since a lot of the complexity of the publisher.Event handling was to avoid allocating temporary values. However, the throughput is fine. While in-use memory is up it is still reasonable (8MB to send 69 million events).

I also tested these configurations using the blocking output test (out=blocking in the test name) which adds a min_wait to the configuration. Remarkably, both old and new queues had exactly the same throughputs (tho on the order of 30K rather than 70M). The total allocations were up in the new version but in-use was slightly down.

Overall these results look to me like we are paying slightly for this simplification, but nothing that seems worrying. I'm expecting to do more pipeline performance work soon and this cleanup gives a good baseline for tracking down our real bottlenecks.

Related issues

…ut the contained buffer type

elasticmachine · 2022-04-19T17:34:40Z

Pinging @elastic/elastic-agent-data-plane (Team:Elastic-Agent-Data-Plane)

libbeat/publisher/queue/memqueue/internal_api.go

libbeat/publisher/queue/memqueue/ringbuf.go

cmacknz · 2022-04-28T02:34:30Z

libbeat/publisher/queue/memqueue/ringbuf.go

+		// there might also be free space before region A. In that
+		// case new events must be inserted in region B, but the
+		// queue isn't at capacity.
+		avail = len(b.entries) - b.regA.index - b.regA.size


What is the follow up from your comment about this possibly not being right? Is it too much work to fix, or not worth fixing?

If I understand the intention correctly then it's an easy fix, just removing b.regA.index from the right side. I've been leaving it for last cause I don't want to intentionally change the functional logic until everything is at full parity with the old version (which right now is just pending on the stress tests).

Hah -- this turned out to be the cause of the test failure 😅 This is the same computation as the old version, but before it was only made on a specific state transition, and now that its checked every loop iteration it ended up blocking the queue. Switching to the correct calculation here makes the tests pass locally, so fingers crossed on the CI now.

libbeat/publisher/queue/memqueue/broker.go

libbeat/publisher/queue/memqueue/ackloop.go

cmacknz · 2022-04-28T02:47:38Z

libbeat/publisher/queue/memqueue/eventloop.go


 	for {
+		var pushChan chan pushRequest


Is this duplicated entirely between here and newDirectEventLoop? It is hard for me to spot if there is some subtle difference between the two just scrolling up and down.

It's not quite duplicated -- it's the same logical sequence, but because the containing struct is different the conditions don't match (e.g.: here we check if the queue is full by comparing eventCount to maxEvents, but in the version above, directEventLoop has no field analogous to eventCount so it uses a different test).

I suspect that having these two almost-identical objects with such completely divergent implementations just for a special case doesn't help performance enough to justify the complexity, and if I get a chance I'd like to merge these into a single helper, but that seemed out of scope for now :-)

cmacknz · 2022-04-28T02:48:03Z

libbeat/publisher/queue/memqueue/eventloop.go

@@ -99,80 +83,73 @@ func (l *directEventLoop) run() {
 	)

 	for {
+		var pushChan chan pushRequest


This is significantly more obvious than what was going on before. Nice!

cmacknz · 2022-04-28T02:50:43Z

Just looking at the diff I can't spot any major issues. I'll try to check this out and build more of an understanding of what this is doing later (after your refactoring, which is much easier to follow).

Also I had never seen the libbeat-stress-tests pipeline stage triggered before. I'll have to look at what it does.

faec · 2022-04-28T14:34:24Z

Also I had never seen the libbeat-stress-tests pipeline stage triggered before. I'll have to look at what it does.

Yea, I've never triggered that one before either, but it's related to sending loads through the pipeline so it's almost certainly a real failure. Right now I'm debugging it expecting that I missed a race condition somewhere.

cmacknz · 2022-04-28T17:38:15Z

LGTM, give the rest of the time some time to look at it before merging though

kvch

Thank you, awesome as always!

faec added 22 commits April 13, 2022 15:13

remove no-op struct fields

be59c78

Merge branch 'main' into pipeline-cleanup

4a21831

placate linter

7bd9ff5

placate linter

40baa04

clean up some unneeded helper structs

0c95098

Merge branch 'main' into pipeline-cleanup-4

f39ce4a

add comment

cdba499

placate linter

62fc1c1

remove unused data and code

a1bc2f9

remove more unused bits

e8334ed

Merge branch 'main' into memqueue-cleanup

8ef17e7

remove more occurrences of internal logger

0c56916

make linter happy

1596934

lint lint lint

f49dc95

in-progress: converting memqueue internals to be less opinionated abo…

825b353

…ut the contained buffer type

Merge branch 'main' into memqueue-cleanup-2

05ba238

oops, save before commit

479a875

working...

24f8979

working on switching to queueEntry for memqueue events

c850412

fixed handleGetRequest

e4e21c9

more renaming

c32fb53

fix cancelRegion

f8f0522

faec added enhancement Team:Elastic-Agent-Data-Plane Label for the Agent Data Plane team labels Apr 19, 2022

faec requested a review from a team as a code owner April 19, 2022 17:34

faec self-assigned this Apr 19, 2022

faec requested review from cmacknz and kvch and removed request for a team April 19, 2022 17:34

faec added 8 commits April 26, 2022 14:10

more field renaming

6212e37

more removed / renamed fields

a554797

remove redundant fields

b08e9fd

removing more redundant event loop fields

066410a

removing more redundant event loop fields

5f69df3

document more fields

33e6290

lint?

6da4360

tidy old comments

1196848

faec requested a review from fearful-symmetry April 27, 2022 20:15

remove some unused calculations

afd448d