Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rolling window limit support #193

Closed
wants to merge 34 commits into from

Conversation

walbertus
Copy link
Contributor

Add support for rolling window limit #32

@mattklein123 mattklein123 self-assigned this Nov 18, 2020
@walbertus walbertus marked this pull request as ready for review November 23, 2020 03:08
@walbertus walbertus changed the title [WIP] Rolling window limit support Rolling window limit support Nov 23, 2020
@walbertus
Copy link
Contributor Author

walbertus commented Nov 30, 2020

Hi @mattklein123 , is there anything else I can do for this MR?

Copy link
Member

@mattklein123 mattklein123 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for implementing this. Flushing out the first round of comments. Thank you!

/wait

localCache,
s.NearLimitRatio)
} else {
logger.Fatalf("Unknown rate limit algorithm. %s\n", s.RateLimitAlgorithm)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is going to crash in some strange way after returning nil? Would it be better to panic here or should we have some more graceful handling somewhere?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Throwing error and let the runner handle the error. Log level parser also trigger panic in the runner

Comment on lines 33 to 45
func maxInt64(a int64, b int64) int64 {
if a > b {
return a
}
return b
}

func minInt64(a int64, b int64) int64 {
if a < b {
return a
}
return b
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The fact that this is implemented here and not in the std lib makes me cry, but so it goes. If this is not present elsewhere in the code can you at least put it in a common utility? If it is already elsewhere please share it?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moving max and min related code to utils

Comment on lines 47 to 55
func nanosecondsToDuration(nanoseconds int64) *duration.Duration {
nanos := nanoseconds
secs := nanos / 1e9
nanos -= secs * 1e9
return &duration.Duration{Seconds: secs, Nanos: int32(nanos)}
}

func secondsToNanoseconds(second int64) int64 {
return second * 1e9
}

func nanosecondsToSeconds(nanoseconds int64) int64 {
return nanoseconds / 1e9
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are there no helpers in the Go proto code that already do this? There are for C++.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cannot found it on proto duration
Should I move these code to utils too?

*pipeline = client.PipeAppend(*pipeline, result, "GET", key)
}

func windowedSetNewTatPipelineAppend(client Client, pipeline *Pipeline, key string, newTat int64, expirationSeconds int64) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is "Tat"? Can you spell this out or add more comments?

"golang.org/x/net/context"
)

type windowedRateLimitCacheImpl struct {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add some high level comments on how rolling window limits are implemented? It will help the reader and my own review.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adding some explanations and reference for GCRA

perSecondPipeline = nil
}

// Rate limit GCRA logic
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This code needs more comments per above. Maybe an overview comment will help me.

Also, a bunch of non-trivial code in this file is shared with the fixed impl. Is it possible to share more code via a shared utility or mix-in?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi, can you please give me examples in ratelimit about the shared utility and mix-in?
Thanks

@mattklein123
Copy link
Member

@nezdolik could I kindly ask you to review this one also? It's going to conflict with the memcached PR and I want to make sure we are heading in the right direction and we sequence the PRs correctly. Thank you!

Copy link
Member

@nezdolik nezdolik left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have a comment about code structure below. Will take one more pass tomorrow.

@@ -0,0 +1,262 @@
package redis
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

2 things to possibly consider:

  • There is some difference between fixed window vs floating window codewise, but big piece of code is duplicated. Would it be possible to parametrise cache impl with algorithm object of user choice? The algorithm would be aware on how to launch redis pipeline and how to calculate duration until reset.
  • (Could be followup PR) Make algorithm to be agnostic of backend type. In case we parametrise cache impl with algorithm, we could further parametrise algorithm itself with an object that is aware of backend type (redis, memcache) , maybe interaction with backend should not even be part of algorithm type.

}

func TestRedisWindowedWithJitter(t *testing.T) {
assert := assert.New(t)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can the code which sets up all the mocks and creates cache impl object be pulled from each test into some sort of SetUp method and be called in each test?

Copy link
Member

@nezdolik nezdolik left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you add integration test?

nearLimitRatio float32
}

func nanosecondsToDuration(nanoseconds int64) *duration.Duration {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should those methods be part of some sort of time util?

@stale
Copy link

stale bot commented Dec 24, 2020

This pull request has been automatically marked as stale because it has not had activity in the last 7 days. It will be closed in 7 days if no further activity occurs. Please feel free to give a status update now, ping for review, or re-open when it's ready. Thank you for your contributions!

@stale stale bot added the stale label Dec 24, 2020
@stale stale bot removed the stale label Dec 28, 2020
@walbertus
Copy link
Contributor Author

Hi, sorry for the delay. I'll continue working on the integration test and refactor this week.

@walbertus walbertus marked this pull request as draft January 5, 2021 09:54
@mattklein123
Copy link
Member

OK I think with @nezdolik refactor in, hopefully there should be less duplicate code now.

/wait

Signed-off-by: zufardhiyaulhaq <[email protected]>
@zufardhiyaulhaq
Copy link
Contributor

is the CI is flaky somehow? I don't see any error when executing
make docker_tests in my local. I cannot retry the CI in GitHub actions

ok      github.com/envoyproxy/ratelimit/test/integration        58.436s
ok      github.com/envoyproxy/ratelimit/test/memcached  (cached)
?       github.com/envoyproxy/ratelimit/test/mocks      [no test files]
?       github.com/envoyproxy/ratelimit/test/mocks/algorithm    [no test files]
?       github.com/envoyproxy/ratelimit/test/mocks/config       [no test files]
?       github.com/envoyproxy/ratelimit/test/mocks/limiter      [no test files]
?       github.com/envoyproxy/ratelimit/test/mocks/memcached/driver     [no test files]
?       github.com/envoyproxy/ratelimit/test/mocks/redis/driver [no test files]
?       github.com/envoyproxy/ratelimit/test/mocks/rls  [no test files]
?       github.com/envoyproxy/ratelimit/test/mocks/runtime/loader       [no test files]
?       github.com/envoyproxy/ratelimit/test/mocks/runtime/snapshot     [no test files]
?       github.com/envoyproxy/ratelimit/test/mocks/utils        [no test files]
ok      github.com/envoyproxy/ratelimit/test/redis      (cached)
ok      github.com/envoyproxy/ratelimit/test/server     (cached)
ok      github.com/envoyproxy/ratelimit/test/service    (cached)

@zufardhiyaulhaq
Copy link
Contributor

@dweitzman we try to refactor the code so it can support another algorithm in the future. GCRA is like a leaky bucket but only uses time as a measurement. I am still not sure how https://blog.cloudflare.com/counting-things-a-lot-of-different-things/. you can read the GCRA calculation here https://blog.ian.stapletoncordas.co/2018/12/understanding-generic-cell-rate-limiting.html

give some diagram to better understand GCRA
Generic Cell Rate Limiting

@mattklein123 @nezdolik finish adding a unit test for the algorithm and rolling window implementation for Memcached & Redis, can you try to review it? Thanks

@nezdolik
Copy link
Member

Thanks @zufardhiyaulhaq, will take one more pass tomorrow.

@zufardhiyaulhaq
Copy link
Contributor

zufardhiyaulhaq commented Feb 11, 2021

@nezdolik @mattklein123 give you the result of testing the rolling window rate limit, testing with Istio.

  • Memcached rolling window rate limit (10 requests/second in 4 minutes, rate-limited for 10 requests/minute)
"requests": 2400,
  "rate": 10.004171883777294,
  "throughput": 0.20424766658417626,
  "success": 0.020416666666666666,
  "status_codes": {
    "200": 49,
    "429": 2348,
    "500": 3
  },
  "errors": [
    "429 Too Many Requests",
    "500 Internal Server Error"
  ]
  • Redis rolling window rate limit (10 requests/second in 4 minutes, rate-limited for 10 requests/minute)
  "requests": 2400,
  "rate": 10.004173279566375,
  "throughput": 0.2000788245626525,
  "success": 0.02,
  "status_codes": {
    "200": 48,
    "429": 2347,
    "500": 5
  },
  "errors": [
    "429 Too Many Requests",
    "500 Internal Server Error"
  ]

Copy link
Member

@nezdolik nezdolik left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@zufardhiyaulhaq spent some time today refactoring duplicated code across fixed and rolling windows, check out this commit: nezdolik@0ff4b0b
Feel free to use the code, it could further be improved. There was bunch of duplicated code across window implementations and it was not a big effort to refactor it.
As per @dweitzman comment, it would be valuable to compare two algorithms and reason which one should end up in upstream.

README.md Outdated
Fixed window algorithm does not care when did the request arrive, all 60 can arrive at 01:01 or 01:50 and the limit will still reset at 02:00.

2. Rolling window
For a limit of 60 requests per hour. Initially it is able to take a burst of 60 requests at once, then the limit is restored by 1 each minute. Requests are allowed as long as there's still some available limit.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:s/it is able/it is possible

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be even more clear to rephrase like 'Initially rate limiter can take a burst of...`

@zufardhiyaulhaq
Copy link
Contributor

thanks @nezdolik will check tomorrow

@zufardhiyaulhaq zufardhiyaulhaq force-pushed the rolling-window-limit branch 2 times, most recently from 650aa12 to 092ad2c Compare February 20, 2021 06:39
Kateryna Nezdolii and others added 2 commits February 20, 2021 07:50
@zufardhiyaulhaq
Copy link
Contributor

zufardhiyaulhaq commented Feb 20, 2021

@nezdolik @mattklein123 refactor done, I also add a unit test for the base algorithm. Testing in Istio

  • Redis rolling window rate limit (10 requests/second in 4 minutes, rate-limited for 10 requests/minute)
  "requests": 2400,
  "rate": 10.00416761918139,
  "throughput": 0.20424889245049338,
  "success": 0.020416666666666666,
  "status_codes": {
    "200": 49,
    "429": 2347,
    "500": 4
  },
  "errors": [
    "429 Too Many Requests",
    "500 Internal Server Error"
  ]
  • Memcached rolling window rate limit (10 requests/second in 4 minutes, rate-limited for 10 requests/minute)
  "requests": 2400,
  "rate": 10.004172146579775,
  "throughput": 0.20425008105086626,
  "success": 0.020416666666666666,
  "status_codes": {
    "200": 49,
    "429": 2350,
    "500": 1
  },
  "errors": [
    "429 Too Many Requests",
    "500 Internal Server Error"
  ]

@zufardhiyaulhaq
Copy link
Contributor

@nezdolik @mattklein123 can you help review again? thanks

@nezdolik
Copy link
Member

@zufardhiyaulhaq i will be able to take one more pass tomorrow.

@mattklein123
Copy link
Member

Thanks for this feature. This PR is too many lines of code to review. Can we please split this into 2 parts:

  1. Part 1: Just the refactor to move files, split logic, etc.
  2. Part 2: Actually add the rolling window support

Thank you

/wait

Copy link
Member

@nezdolik nezdolik left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 on splitting the PR into two parts, would be much easier to review.

timeSource utils.TimeSource
jitterRand *rand.Rand
expirationJitterMaxSeconds int64
localCache *freecache.Cache
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we no longer need this field in both window impls right? (as it is part of WindowImpl now)


type RollingWindowImpl struct {
timeSource utils.TimeSource
cacheKeyGenerator utils.CacheKeyGenerator
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same for cacheKeyGenerator, we no longer need this field in both window impls, as it has been moved to base window.

@zufardhiyaulhaq
Copy link
Contributor

should we create a new MR for this feature?
@mattklein123 @nezdolik

@mattklein123
Copy link
Member

should we create a new MR for this feature?

I'm not sure what an MR is but as I mentioned this PR is too large to review/merge. Please do code movement / refactoring first.

@github-actions
Copy link

This pull request has been automatically marked as stale because it has not had activity in the last 30 days. It will be closed in 7 days if no further activity occurs. Please feel free to give a status update now, ping for review, or re-open when it's ready. Thank you for your contributions!

@github-actions github-actions bot added the stale label Jun 24, 2021
@github-actions
Copy link

github-actions bot commented Jul 1, 2021

This pull request has been automatically closed because it has not had activity in the last 37 days. Please feel free to give a status update now, ping for review, or re-open when it's ready. Thank you for your contributions!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants