Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

exporter/prometheusremotewriter: sort Sample by Timestamp to avoid out of order errors #2941

Conversation

odeke-em
Copy link
Member

Ensures that before a prompb.WriteRequest is created, that the
TimeSeries contained have their Sample values sorted chronologically
by Prometheus to avoid out of order errors reported by Prometheus
barfing.

Thanks to @rakyll for a reproducer and for diagnosing the problem, which
helped distill the issue from a very complex setup that required super
expensive Kubernetes clusters with many replicas, but essentially the problem
became more apparently when the number of TimeSeries grew.

It is important to note that the presence of such a bug signifies that
with a large number of replicas are being scraped from, this stalls
scraping and takes a long time which means that targets scraped in a
round-robin fashion experience staleness when many. This might be even
more reasons for setups to adopt a push model as opposed to scrape
endpoints.

Fixes #2315
Fixes open-telemetry/wg-prometheus#10

@odeke-em odeke-em requested a review from a team April 14, 2021 23:38
@codecov
Copy link

codecov bot commented Apr 14, 2021

Codecov Report

Merging #2941 (8b97422) into main (b4f495f) will increase coverage by 0.00%.
The diff coverage is 100.00%.

Impacted file tree graph

@@           Coverage Diff           @@
##             main    #2941   +/-   ##
=======================================
  Coverage   91.66%   91.66%           
=======================================
  Files         312      312           
  Lines       15312    15317    +5     
=======================================
+ Hits        14036    14041    +5     
  Misses        870      870           
  Partials      406      406           
Impacted Files Coverage Δ
exporter/prometheusremotewriteexporter/helper.go 99.55% <100.00%> (+<0.01%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update b4f495f...8b97422. Read the comment docs.

@odeke-em odeke-em force-pushed the prometheus-remotewriteexporter-fix-out-of-order-samples branch from 4c6e148 to 3f40ffc Compare April 15, 2021 00:13
…t of order errors

Ensures that before a prompb.WriteRequest is created, that the
TimeSeries contained have their Sample values sorted chronologically
by Prometheus to avoid out of order errors reported by Prometheus
barfing.

Thanks to @rakyll for a reproducer and for diagnosing the problem, which
helped distill the issue from a very complex setup that required super
expensive Kubernetes clusters with many replicas, but essentially the problem
became more apparently when the number of TimeSeries grew.

It is important to note that the presence of such a bug signifies that
with a large number of replicas are being scraped from, this stalls
scraping and takes a long time which means that targets scraped in a
round-robin fashion experience staleness when many. This might be even
more reasons for setups to adopt a push model as opposed to scrape
endpoints.

Fixes open-telemetry#2315
Fixes open-telemetry/wg-prometheus#10
@odeke-em odeke-em force-pushed the prometheus-remotewriteexporter-fix-out-of-order-samples branch from 3f40ffc to 8b97422 Compare April 15, 2021 04:24
@bogdandrutu bogdandrutu merged commit 03c7bf9 into open-telemetry:main Apr 15, 2021
@rakyll
Copy link
Contributor

rakyll commented Apr 16, 2021

Thanks much for this PR! I've been testing the exporter with this change and can't reproduce the bug anymore. I'm using relabel_configs to have pod name in the samples to avoid the duplicates that might be potentially coming from the replicas of my app.

@rakyll
Copy link
Contributor

rakyll commented Apr 16, 2021

I spoke too soon, filed #2949 for a follow up.

@odeke-em odeke-em deleted the prometheus-remotewriteexporter-fix-out-of-order-samples branch June 12, 2021 20:19
hughesjj pushed a commit to hughesjj/opentelemetry-collector that referenced this pull request Apr 27, 2023
Bumps [peter-evans/create-pull-request](https://github.com/peter-evans/create-pull-request) from 4 to 5.
- [Release notes](https://github.com/peter-evans/create-pull-request/releases)
- [Commits](peter-evans/create-pull-request@v4...v5)

---
updated-dependencies:
- dependency-name: peter-evans/create-pull-request
  dependency-type: direct:production
  update-type: version-update:semver-major
...

Signed-off-by: dependabot[bot] <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
3 participants