Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remote Write Allocs Improvements #5614

Merged
merged 6 commits into from
Jun 27, 2019
Merged

Conversation

csmarchbanks
Copy link
Member

@csmarchbanks csmarchbanks commented May 30, 2019

A few different commits to improve allocations when running remote write.

  1. Remove an unneeded temp variable
  2. Only allocate pendingSamples once per shard
  3. Allocate the send slice far fewer times
  4. Use a mask rather than always copy samples into a temporary slice
  5. Allocate a snappy buffer per shard rather than create one per request.

Added benchmark:

benchmark                     old ns/op     new ns/op     delta
BenchmarkSampleDelivery-4     1206794       1141762       -5.39%

benchmark                     old allocs     new allocs     delta
BenchmarkSampleDelivery-4     7152           7121           -0.43%

benchmark                     old bytes     new bytes     delta
BenchmarkSampleDelivery-4     909406        610669        -32.85%

Much of the remaining allocs are from proto encoding/decoding.

Reviewing commit by commit will have all changes in isolation.

@cstyan Have you been running any benchmarks for these? I only see one that requires a real WAL somewhere.

@csmarchbanks csmarchbanks force-pushed the rw-allocs branch 2 times, most recently from 0d65880 to c5be9ce Compare May 31, 2019 04:12
storage/remote/queue_manager.go Outdated Show resolved Hide resolved
max := s.qm.cfg.MaxSamplesPerSend
pendingSamples := make([]prompb.TimeSeries, 0, max)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this has the potential to cause more backup problems.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you explain this comment more? I am not seeing how this would cause backup problems.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I was thinking this would enforce there being no more than maxSamplesPerSend pending samples, but that's not how slices work 🤦‍♂️

Backup meaning the shards not being able to send samples at the rate their being scrapped and written to the WAL. At the moment we send as soon as we reach maxSamplesPerSend or the flush timeout. At the moment we can potentially have some multiple of maxSamplesPerSend samples pending, meaning that after a successful send there's theoretically less time before we have enough samples for the next send. If we enforced only have maxSamplesPerSend pending it could take longer to fill up pendingSample given that we block on the shards queue.

@csmarchbanks
Copy link
Member Author

When looking further through the code I am pretty convinced that seriesMtx is unnecessary which allows me simplify Append to only have one loop and no tempSamples. I have been running remote write locally for awhile now with -race, and only found the race condition in maxGauge that I fixed with the most recent commit.

@csmarchbanks
Copy link
Member Author

Update: I ran this branch all weekend in a cluster without issue. I am reasonably sure that the mutex is not needed.

backoff := s.qm.cfg.MinBackoff
req, highest, err := buildWriteRequest(samples)
req, highest, err := buildWriteRequest(samples, *buf)
*buf = req
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you think this assignment to the pointer should be in buildWriteRequest?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I put it here because it is the last place that deals with *shards and I wanted to keep the mutation isolated to those functions. I am happy to move it down to buildWriteRequest if you have a strong preference.

@tomwilkie
Copy link
Member

Update: I ran this branch all weekend in a cluster without issue. I am reasonably sure that the mutex is not needed.

I think you're right, as there is a single go routine in the watcher driving all these interactions. This will become more obvious when we refactor the WAL Watcher.

LGTM from me but I'll hold off on merging until @cstyan has had a chance to run this in our env.

@csmarchbanks
Copy link
Member Author

@tomwilkie I know @cstyan ran this for a little bit in one of your environments. Do you need to run it longer or is this good to go?

@tomwilkie
Copy link
Member

Let me ask @cstyan. Sorry for dropping ball on this.

Signed-off-by: Chris Marchbanks <[email protected]>
It is not possible for any of the places protected by the seriesMtx to
be called concurrently so it is safe to remove. By removing the mutex we
can simplify the Append code to one loop.

Signed-off-by: Chris Marchbanks <[email protected]>
@cstyan
Copy link
Member

cstyan commented Jun 27, 2019

I just deployed a new build of this image to some test prometheus instances, it will run overnight at the least and I'll check resource usage in the morning.

@tomwilkie
Copy link
Member

LGTM
Screenshot 2019-06-27 at 19 47 25

@tomwilkie tomwilkie merged commit 06bdaf0 into prometheus:master Jun 27, 2019
@csmarchbanks csmarchbanks deleted the rw-allocs branch June 27, 2019 18:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants