-
Notifications
You must be signed in to change notification settings - Fork 173
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[live-data-fetcher] Distribute read load across objects in S3 #817
Conversation
be3b35a
to
52516a2
Compare
default_file_key = "test_file_key" | ||
for file_shard_num in range(NUM_FILE_SHARDS): | ||
if file_shard_num == 0: | ||
# The first copy should just be the original file. | ||
sharded_file_key = default_file_key | ||
else: | ||
# All other copies should include the sharded prefix. | ||
sharded_file_key = str(file_shard_num) + "/" + default_file_key | ||
s3_client.put_object( | ||
Bucket=bucket_name, | ||
Key=sharded_file_key, | ||
Body=json.dumps(s3_data).encode(), | ||
SSECustomerKey="test_decryption_key", | ||
SSECustomerAlgorithm="AES256", | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For unit test, we can upload all the copies of the file in the setUp.
And then in each unit test, we can optionally provide the num_file_shards
key in the ZK content. If the key is missing, we should expect the unit test to still pass since it should fetch the original file with no prefix.
If the key is present, we should also expect it to pass by fetching one of the prefixed files.
# For safe measure, run this 20 times. It should succeed every time. | ||
# We've uploaded 5 files to S3 in setUp() and num_file_shards=5 in the | ||
# ZK node so we should be fetching one of these 5 files randomly (and successfully) | ||
# and all should have the same content. | ||
for i in range(20): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I couldn't think of a better way to do this than to run on_change()
X number of times. This is because the only output of the function being unit tested is the contents of the manifest so functionally that's all I can test.
on_change() has the random number generation logic internally.
By running on_change()
50 times, i would expect different random numbers to have generated at least once, so we would pull from one of the 5 files uploaded in setUp() randomly and I would expect it to succeed and have the same contents each time.
@areitz Just wanted to get the PR out there. We can still wait for the doc to be reviewed first and also the Experiments Service PR has to go in first: https://github.snooguts.net/reddit/reddit-service-experiment-config/pull/3616 |
@@ -11,8 +11,11 @@ | |||
|
|||
from moto import mock_s3 | |||
|
|||
from baseplate.sidecars.live_data_watcher import _generate_sharded_file_key |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I know _generate_sharded_file_key
has an underscore prefix, but we're only using it internally in reality.
I just need to call it here as part of the unit test
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks super awesome, thanks!
I had a few minor questions, nothing blocking really. I can approve without code changes once I understand the answers to my questions.
expected_sharded_file_key = "test_file_key" | ||
self.assertEqual(actual_sharded_file_key, expected_sharded_file_key) | ||
possible_no_sharding_values = [0, 1, None] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
one suggestion for how to make the random part of the tests deterministic
# Default # of retries in legacy mode (current mode) is 5. | ||
s3_client = boto3.client( | ||
"s3", | ||
config=Config(signature_version=UNSIGNED, retries={"total_max_attempts": 10}), | ||
config=Config(signature_version=UNSIGNED), | ||
region_name=region_name, | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Per https://reddit.slack.com/archives/C067Q6N14LA/p1701373678708859?thread_ts=1701370864.667409&cid=C067Q6N14LA we want to revert the changes made in PR/813 where we increased retry total_max_attempt from 5->10
* Add logic to distribute read load across different files in S3 * lint * add test for live data watcher * intentionally fail test to verify * intentionally fail test to verify * remove logs * verify this fails * and it does * change range * change range * change range * comments * comments * use function * move more things into helper * fix unit test * fix unit test * rename unit test * rename unit test * import order * add logging since a test failed * import order * move logger up * oops, the delimiter should have been / not _ * add 0 and 1 to no sharding unit test * clean up python * clean up python * comments * add negative values to test * revert retries from 10 -> 5 --------- Co-authored-by: Siddharth Manoj <[email protected]>
See Experiments - S3 Prefix Sharding Design Doc for more info.
See #incident-3788--slower-experiment-changes-due-to-s3-rate-limiting for full incident context.
Problem (as copied from doc)
On the week of November 27th, we started to observe that the rate of requests for experiment config data to the S3 bucket was breaching the S3 rate limit of 5500 QPS. The main trigger condition is when experiments are changed, the manifest is uploaded (every X min), and a ZooKeeper Data Watch occurs. This subsequently triggers all service pods to refetch a copy of the new experiment config data (with a retry policy of 5 attempts).
When all pods attempt to fetch from the same S3 file at once, we breach the 5500 QPS limit set by AWS sending a large number of pods into a retry state (up to 5 retries).
These retries are done using boto3’s built-in exponential backoff strategy with a factor of 2 (with jitter). Ultimately we observed that while a retrying Pod may not receive the experiment config on the 1st try, 99% of them receive it by at most the 5th try. However, 1% of Pods do not get it even on the 5th retry and thus stop retrying - therefore these pods do not receive the most up-to-date experiment data until another manifest upload and ZK data watch is triggered.
Solution
Experiments service will send over multiple copies of the manifest - each prefixed numerically. See: https://github.snooguts.net/reddit/reddit-service-experiment-config/pull/3616
In the baseplate sidecar, we just read in the
num_file_shards
key from ZK and randomly generate a number from 1 to num_file_shards. We prepend that prefix to the file name followed by the delimiter ("/") and then attempt to fetch that file from S3.Benefits
By having these prefixed in the way above, we make use of S3's built in prefix partitioning for read performance. This means that pod reads can be distributed among the files (random load balancing).
See AWS Design Practices doc for more info
Is this backwards compatible with older versions of the sidecar?
Yes, old versions of the sidecar won't know about the
num_file_shards
key in ZK and will just pull down the original file provided in the ZK node (as it works today).Only new updated versions will attempt to prepend a prefix.
Old versions will always pull a file that looks like:
2023-10-31T16:13:18Z_3437442509_29cd4048e52252cb_experiment-config.json (original file)
New versions will pull files that look like: