-
Notifications
You must be signed in to change notification settings - Fork 91
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(server): Org rate limit per metric bucket #2758
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just some quick nits, I'll have to look into the outcome logging more carefully later.
I would like if we didn't have to duplicate this and somehow can share as much code as possible with the ManagedEnvelope
logic (when we drop the envelope transactions and profile outcomes are created). But I'll need more time to think about this.
let _bucket_qty = buckets.len(); | ||
let bucket_partitions = partition_buckets(scoping.project_key, buckets, partitions); | ||
|
||
#[cfg(feature = "processing")] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it makes sense to check the cached rate limits even on non-processing relays
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
well atm, they just get updated from redis, which isnt available on non-processing relays, so im not sure if that would make sense
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not sure how propagation of quotas/limits to pops is exactly working, but I don't see a reason why we cant check it now and once we actually have a working propagation everything just works
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this PR rate-limit buckets on a binary basis? Do we want to consider scenarios in which some buckets are rate-limited but others aren't?
relay-server/src/actors/processor.rs
Outdated
|
||
self.drop_buckets_with_outcomes( | ||
reason_code, | ||
total_batches, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's the difference between total_batches
and bucket_qty
(function param), the maximum size of the bucket? Can we use just one of the two? It seems odd to me that sometimes we use one and sometimes the other (see when emitting outcomes and on log messages).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
bucket_qty
is the amount of total buckets in the flush, total_batches
is the amount of batches we are writing upstream (aka. amount of requests to upstream or amount of messages in the kafka topic).
relay-server/src/actors/processor.rs
Outdated
event_id: None, | ||
remote_addr: None, | ||
category: DataCategory::Metrics, | ||
quantity: total_batches as u32, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we exclude buckets with the usage
metric from rate-limited outcomes? These metrics are data we (sentry) generate but the user doesn't care about, and I assume these outcomes will be presented to them.
This is on the roadmap but not planned for this PR, the idea is to rate limit based on namespace (there is some info in the epic: #2716) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks good to me. Could you add an integration test?
/** | ||
* Metric bucket. | ||
*/ | ||
RELAY_DATA_CATEGORY_METRIC_BUCKET = 15, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note that this will require a release of the python library and bump in Sentry and Snuba.
CHANGELOG.md
Outdated
@@ -10,6 +10,8 @@ | |||
- Return global config ready status to downstream relays. ([#2765](https://github.com/getsentry/relay/pull/2765)) | |||
- Add Mixed JS/Android Profiles events processing. ([#2706](https://github.com/getsentry/relay/pull/2706)) | |||
- Allow to ingest measurements on a span. ([#2792](https://github.com/getsentry/relay/pull/2792)) | |||
- Add size limits on metric related envelope items. ([#2800](https://github.com/getsentry/relay/pull/2800)) | |||
- Include the size offending item in the size limit error message. ([#2801](https://github.com/getsentry/relay/pull/2801)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These changes shouldnt be here
relay-server/src/actors/processor.rs
Outdated
source_quantities += source_quantities_from_buckets(&BucketsView::new(buckets), mode); | ||
} | ||
|
||
let timestamp = UnixTimestamp::now().as_datetime().unwrap_or_else(Utc::now); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this not just Utc::now()
why the roundtrip through UnixTimestamp
?
] | ||
|
||
def generate_ticks(): | ||
# Generate a new timestamp for every bucket, so they do not get merged by the aggregator |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can also use tags for this, should be easier than having timestamps tied to bucket_interval
Reverts #2758 This was deployed before sentry updated librelay, so sentry crashed with unknown data category.
we had an incident when deploying #2758 because we unexpextedly sent outcomes of type ItemBucket without having updated sentry. This PR only adds the new datacategory so we can properly synchronize on both ends before adding the business logic. #2821 (comment)
part of: #2716 in order to protect our kafka metric consumers, we want to have a way of rate limiting based on the amount of buckets, as that's what decides the load placed on our kafka topics. We are starting out with just the org throughput limits but will be expanded upon further as outlined in the linked epic. This is a re-revert of #2758 (reverted in #2821). --------- Co-authored-by: Joris Bayer <[email protected]>
part of: #2716
in order to protect our kafka metric consumers, we want to have a way of rate limiting based on the amount of buckets, as that's what decides the load placed on our kafka topics.
We are starting out with just the org throughput limits but will be expanded upon further as outlined in the linked epic.