Tensorstore key value store driver not Retrying on 429 Responses with S3 API #194

AdityaCohere · 2024-09-03T21:34:46Z

I'm using Tensorstore's key value store driver with OCI object storage using their S3 compatibility layer and am seeing that when the OCI Object Storage service sends 429s Tensorstore fails immediately without any retries. This was tested with debug logging enabled which also didn't show any retries happening either with the default value of 32 or with s3_request_retries set to a custom number.
I also tried testing this @sjperkins 's PR here to see if that may enable retries to work here however I'm hitting bazel build issues on a M1 Mac of:

configurable attribute "content" in @com_github_aws_c_common//:write_config_h doesn't match this configuration. Would a default condition help?

I was wondering if there was anything I could do to debug the lack of retries further or any modifications I could make to force retries.

The text was updated successfully, but these errors were encountered:

laramiel · 2024-09-03T21:41:24Z

The retry decision happens here (or one of the other methods in the file):

tensorstore/tensorstore/kvstore/s3/s3_metadata.cc

Line 218 in 2a44dc7

absl::Status AwsHttpResponseToStatus(const HttpResponse& response,

You could turn on logging. TENSORSTORE_VERBOSE_LOGGING=s3=1.

sjperkins · 2024-09-04T06:48:06Z

Unfortunately #149 doesn't integrate with any kvstore at present.

AdityaCohere · 2024-09-04T16:01:28Z

The retry decision happens here (or one of the other methods in the file):

tensorstore/tensorstore/kvstore/s3/s3_metadata.cc

Line 218 in 2a44dc7

absl::Status AwsHttpResponseToStatus(const HttpResponse& response,

You could turn on logging. TENSORSTORE_VERBOSE_LOGGING=s3=1.

Yea, I've tried turning on Logging as well with no output specifying any retries. I see that the 429 response is received and nothing beyond that other than a failure.
I will repro this and get the debug logs

AdityaCohere · 2024-09-04T16:01:52Z

Unfortunately #149 doesn't integrate with any kvstore at present.

Ah, this is unfortunate to hear, would it non-trivial to integrate?

sjperkins · 2024-09-04T16:11:10Z

Unfortunately #149 doesn't integrate with any kvstore at present.

Ah, this is unfortunate to hear, would it non-trivial to integrate?

It'd take some work, I guess the amount of code required would be smaller than the current s3 kvstore as it could defer to the C++ S3 SDK.

However, the C++ SDK is a fairly heavy dependency: the S3 C CRT looks much lighter. That's next, when I find the time.

laramiel · 2024-09-04T16:29:24Z

You could turn on logging. TENSORSTORE_VERBOSE_LOGGING=s3=1.

Yea, I've tried turning on Logging as well with no output specifying any retries. I see that the 429 response is received and nothing beyond that other than a failure. I will repro this and get the debug logs

With s3=1 you should see a line from s3_key_value_store.cc:435 ReadTask or s3_key_value_store.cc:717 WriteTask or such. Could you attach those lines?

Also, you could attempt to run localstack_test against OCI object storage.

git clone https://github.com/google/tensorstore.git
cd tenstorstore
bazelisk.py build tensorstore/kvstore/s3:localstack_test
./bazel-bin/tensorstore/kvstore/s3/localstack_test --tensorstore_verbose_logging=all=2 --localstack_endpoint=...  <other flags>

AdityaCohere · 2024-09-11T16:06:12Z

Hey, I've tried turning on logging and I don't seem to see any logs regarding s3 being printed.
I get a rate limited error but no read/write tasks.
I've tried TENSORSTORE_VERBOSE_LOGGING=s3=1 as well as TENSORSTORE_VERBOSE_LOGGING=all
Is there anything else I can try?

laramiel · 2024-09-26T02:25:38Z

Ok, so there was an issue that tensorstore logging when used from python didn't initialize the absl logging level in a way which provides output. This was fixed by 041a999 and I will make another release soon. I would expect that logging in gunit tests would work, though you may need gunit specific flags for that such as --gmock_verbose=info

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Tensorstore key value store driver not Retrying on 429 Responses with S3 API #194

Tensorstore key value store driver not Retrying on 429 Responses with S3 API #194

AdityaCohere commented Sep 3, 2024

laramiel commented Sep 3, 2024

sjperkins commented Sep 4, 2024

AdityaCohere commented Sep 4, 2024

AdityaCohere commented Sep 4, 2024

sjperkins commented Sep 4, 2024

laramiel commented Sep 4, 2024

AdityaCohere commented Sep 11, 2024

laramiel commented Sep 26, 2024

Tensorstore key value store driver not Retrying on 429 Responses with S3 API #194

Tensorstore key value store driver not Retrying on 429 Responses with S3 API #194

Comments

AdityaCohere commented Sep 3, 2024

laramiel commented Sep 3, 2024

sjperkins commented Sep 4, 2024

AdityaCohere commented Sep 4, 2024

AdityaCohere commented Sep 4, 2024

sjperkins commented Sep 4, 2024

laramiel commented Sep 4, 2024

AdityaCohere commented Sep 11, 2024

laramiel commented Sep 26, 2024