Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Hedgin Bets and Requests #750

Merged
merged 21 commits into from
Jun 14, 2021
Merged

Hedgin Bets and Requests #750

merged 21 commits into from
Jun 14, 2021

Conversation

joe-elliott
Copy link
Member

@joe-elliott joe-elliott commented Jun 9, 2021

What this PR does:

Hedges GCS/S3/Azure requests using this library. This currently only supports s3 and gcs. Azure does not seem to give access to http transport so I'm not sure if we can use this solution there. This PR is ready for review but I'm still evaluating options for adding tests.
Azure support added!

Using a hedge_requests_at value of 500ms seeing the following impact in ops:

frontend latency:

  • p99 from 9.8s -> 2.5s
  • p50/p90 down 10s of ms
    image

gcs requests/second:

  • slightly elevated? almost too little to tell. this is unsurprising since 500ms is above our p99

image

Thx to @cristaloleg and @storozhukBM for their help in getting hedgedhttp to support GCS/HTTP2.

Checklist

  • Tests updated
  • Documentation added
  • CHANGELOG.md updated - the order of entries should be [CHANGE], [FEATURE], [ENHANCEMENT], [BUGFIX]

Signed-off-by: Joe Elliott <[email protected]>
Signed-off-by: Joe Elliott <[email protected]>
Signed-off-by: Joe Elliott <[email protected]>
Signed-off-by: Joe Elliott <[email protected]>
Signed-off-by: Joe Elliott <[email protected]>
Signed-off-by: Joe Elliott <[email protected]>
Signed-off-by: Joe Elliott <[email protected]>
Signed-off-by: Joe Elliott <[email protected]>
Signed-off-by: Joe Elliott <[email protected]>
Copy link
Contributor

@cristaloleg cristaloleg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SGTM. Minor thing: 2nd param to the NewClient can be extracted to a named const, will be easier to get what 2 means.

Signed-off-by: Joe Elliott <[email protected]>
@cristaloleg
Copy link
Contributor

Made a minor change with params validation, no other changes https://github.com/cristalhq/hedgedhttp/releases/tag/v0.5.0

Also @storozhukBM suggested that hedgedhttp can be easily abstracted to any other hedging-like work, like adding same feature to Azure client even without access to http.Transport.

No sooner said than done - https://github.com/cristalhq/synx/blob/main/hedged.go but not tested and well designed yet. Will be polished next week(s).

Copy link
Contributor

@annanay25 annanay25 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is amazing 🚀

One more thought:

  • Should we add a metric on number of hedged requests that we can track?

tempodb/backend/gcs/gcs.go Show resolved Hide resolved
tempodb/backend/gcs/gcs.go Show resolved Hide resolved
tempodb/backend/gcs/gcs.go Show resolved Hide resolved
Signed-off-by: Joe Elliott <[email protected]>
Signed-off-by: Joe Elliott <[email protected]>
Signed-off-by: Joe Elliott <[email protected]>
Signed-off-by: Joe Elliott <[email protected]>
Signed-off-by: Joe Elliott <[email protected]>
@joe-elliott
Copy link
Member Author

@grafana/tempo Please re-review

  • tests added
  • azure support added

This is ready to go from my perspective.

Signed-off-by: Joe Elliott <[email protected]>
Signed-off-by: Joe Elliott <[email protected]>
Copy link
Member

@kvrhdn kvrhdn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! 👍 Left some remarks, nothing blocking.

Also kudos for the tests, this kind of stuff is tricky to test 🙂

CHANGELOG.md Outdated Show resolved Hide resolved
tempodb/backend/azure/azure_helpers.go Outdated Show resolved Hide resolved
@@ -9,4 +13,5 @@ type Config struct {
Endpoint string `yaml:"endpoint-suffix"`
MaxBuffers int `yaml:"max-buffers"`
BufferSize int `yaml:"buffer-size"`
HedgeRequestsAt time.Duration `yaml:"hedge-requests-at"`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not specific to this PR: I noticed this config uses kebab-case instead of snake_case like the other configs. Is this intentional? Is this just some debt we can't get rid of anymore now?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, i noticed this for the first time adding this config option. i think everything is snake except for this azure config. we should consider a breaking change PR where we move azure to the same standard as everything else.

tempodb/backend/s3/s3.go Show resolved Hide resolved
Signed-off-by: Joe Elliott <[email protected]>
@storozhukBM
Copy link

@joe-elliott
FYI: in new version https://github.com/cristalhq/hedgedhttp/releases/tag/v0.6.0
We have stats that you can use for metrics

Copy link
Contributor

@annanay25 annanay25 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @cristaloleg @storozhukBM for the super useful library and great collaboration 🚀

I agree we can move ahead with this, I'll open an issue for the metrics that we can work on in the future.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants