Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[live-data-fetcher] Distribute read load across objects in S3 #817

Merged
merged 30 commits into from
Nov 30, 2023

Conversation

SiddharthManoj
Copy link
Contributor

@SiddharthManoj SiddharthManoj commented Nov 30, 2023

See Experiments - S3 Prefix Sharding Design Doc for more info.

See #incident-3788--slower-experiment-changes-due-to-s3-rate-limiting for full incident context.

Problem (as copied from doc)
On the week of November 27th, we started to observe that the rate of requests for experiment config data to the S3 bucket was breaching the S3 rate limit of 5500 QPS. The main trigger condition is when experiments are changed, the manifest is uploaded (every X min), and a ZooKeeper Data Watch occurs. This subsequently triggers all service pods to refetch a copy of the new experiment config data (with a retry policy of 5 attempts).

When all pods attempt to fetch from the same S3 file at once, we breach the 5500 QPS limit set by AWS sending a large number of pods into a retry state (up to 5 retries).

These retries are done using boto3’s built-in exponential backoff strategy with a factor of 2 (with jitter). Ultimately we observed that while a retrying Pod may not receive the experiment config on the 1st try, 99% of them receive it by at most the 5th try. However, 1% of Pods do not get it even on the 5th retry and thus stop retrying - therefore these pods do not receive the most up-to-date experiment data until another manifest upload and ZK data watch is triggered.

Solution

Experiments service will send over multiple copies of the manifest - each prefixed numerically. See: https://github.snooguts.net/reddit/reddit-service-experiment-config/pull/3616

In the baseplate sidecar, we just read in the num_file_shards key from ZK and randomly generate a number from 1 to num_file_shards. We prepend that prefix to the file name followed by the delimiter ("/") and then attempt to fetch that file from S3.

Benefits

By having these prefixed in the way above, we make use of S3's built in prefix partitioning for read performance. This means that pod reads can be distributed among the files (random load balancing).

See AWS Design Practices doc for more info

Is this backwards compatible with older versions of the sidecar?

Yes, old versions of the sidecar won't know about the num_file_shards key in ZK and will just pull down the original file provided in the ZK node (as it works today).

Only new updated versions will attempt to prepend a prefix.

Old versions will always pull a file that looks like:

2023-10-31T16:13:18Z_3437442509_29cd4048e52252cb_experiment-config.json (original file)

New versions will pull files that look like:

1/2023-10-31T16:13:18Z_3437442509_29cd4048e52252cb_experiment-config.json

2/2023-10-31T16:13:18Z_3437442509_29cd4048e52252cb_experiment-config.json

3/2023-10-31T16:13:18Z_3437442509_29cd4048e52252cb_experiment-config.json

4/2023-10-31T16:13:18Z_3437442509_29cd4048e52252cb_experiment-config.json

@SiddharthManoj SiddharthManoj changed the title [live-data-fetcher] Distribute read load across objects in S3 if num_file_shards key is present [live-data-fetcher] Distribute read load across objects in S3 Nov 30, 2023
Comment on lines +31 to +45
default_file_key = "test_file_key"
for file_shard_num in range(NUM_FILE_SHARDS):
if file_shard_num == 0:
# The first copy should just be the original file.
sharded_file_key = default_file_key
else:
# All other copies should include the sharded prefix.
sharded_file_key = str(file_shard_num) + "/" + default_file_key
s3_client.put_object(
Bucket=bucket_name,
Key=sharded_file_key,
Body=json.dumps(s3_data).encode(),
SSECustomerKey="test_decryption_key",
SSECustomerAlgorithm="AES256",
)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For unit test, we can upload all the copies of the file in the setUp.

And then in each unit test, we can optionally provide the num_file_shards key in the ZK content. If the key is missing, we should expect the unit test to still pass since it should fetch the original file with no prefix.

If the key is present, we should also expect it to pass by fetching one of the prefixed files.

Comment on lines 73 to 77
# For safe measure, run this 20 times. It should succeed every time.
# We've uploaded 5 files to S3 in setUp() and num_file_shards=5 in the
# ZK node so we should be fetching one of these 5 files randomly (and successfully)
# and all should have the same content.
for i in range(20):
Copy link
Contributor Author

@SiddharthManoj SiddharthManoj Nov 30, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I couldn't think of a better way to do this than to run on_change() X number of times. This is because the only output of the function being unit tested is the contents of the manifest so functionally that's all I can test.

on_change() has the random number generation logic internally.

By running on_change() 50 times, i would expect different random numbers to have generated at least once, so we would pull from one of the 5 files uploaded in setUp() randomly and I would expect it to succeed and have the same contents each time.

@SiddharthManoj SiddharthManoj marked this pull request as ready for review November 30, 2023 15:22
@SiddharthManoj SiddharthManoj requested a review from a team as a code owner November 30, 2023 15:22
@SiddharthManoj
Copy link
Contributor Author

@areitz Just wanted to get the PR out there. We can still wait for the doc to be reviewed first and also the Experiments Service PR has to go in first: https://github.snooguts.net/reddit/reddit-service-experiment-config/pull/3616

@@ -11,8 +11,11 @@

from moto import mock_s3

from baseplate.sidecars.live_data_watcher import _generate_sharded_file_key
Copy link
Contributor Author

@SiddharthManoj SiddharthManoj Nov 30, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know _generate_sharded_file_key has an underscore prefix, but we're only using it internally in reality.

I just need to call it here as part of the unit test

Copy link

@areitz areitz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks super awesome, thanks!

I had a few minor questions, nothing blocking really. I can approve without code changes once I understand the answers to my questions.

baseplate/sidecars/live_data_watcher.py Show resolved Hide resolved
tests/unit/sidecars/live_data_watcher_tests.py Outdated Show resolved Hide resolved
expected_sharded_file_key = "test_file_key"
self.assertEqual(actual_sharded_file_key, expected_sharded_file_key)
possible_no_sharding_values = [0, 1, None]
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice!

Copy link

@mrlevitas mrlevitas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

one suggestion for how to make the random part of the tests deterministic

baseplate/sidecars/live_data_watcher.py Outdated Show resolved Hide resolved
Comment on lines +170 to 175
# Default # of retries in legacy mode (current mode) is 5.
s3_client = boto3.client(
"s3",
config=Config(signature_version=UNSIGNED, retries={"total_max_attempts": 10}),
config=Config(signature_version=UNSIGNED),
region_name=region_name,
)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Per https://reddit.slack.com/archives/C067Q6N14LA/p1701373678708859?thread_ts=1701370864.667409&cid=C067Q6N14LA we want to revert the changes made in PR/813 where we increased retry total_max_attempt from 5->10

cc @areitz @jerroydmoore

@SiddharthManoj SiddharthManoj merged commit c132681 into reddit:develop Nov 30, 2023
3 checks passed
areitz pushed a commit that referenced this pull request Nov 30, 2023
* Add logic to distribute read load across different files in S3

* lint

* add test for live data watcher

* intentionally fail test to verify

* intentionally fail test to verify

* remove logs

* verify this fails

* and it does

* change range

* change range

* change range

* comments

* comments

* use function

* move more things into helper

* fix unit test

* fix unit test

* rename unit test

* rename unit test

* import order

* add logging since a test failed

* import order

* move logger up

* oops, the delimiter should have been / not _

* add 0 and 1 to no sharding unit test

* clean up python

* clean up python

* comments

* add negative values to test

* revert retries from 10 -> 5

---------

Co-authored-by: Siddharth Manoj <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

6 participants