Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Availability: Improve fallback for 503 scenarios when ApplicationRegion is not set #4639

Open
GokulPrasad-Work opened this issue Aug 12, 2024 · 1 comment
Labels
customer-reported Issue created by a customer feature-request New feature or request

Comments

@GokulPrasad-Work
Copy link

GokulPrasad-Work commented Aug 12, 2024

Is your feature request related to a problem? Please describe.

During a regional outage, CDB client was getting 503 error from the service in the primary region. CDB is configured with 3 read replicas, but it did not fallback to other replicas until the primary region is marked offline. It was getting 503 error for an hour or so.
During investigation, it was realized that the property ApplicationRegion or ApplicationPreferredRegions has to be set for the reads to fallback during 503. Else it just considers only the primary region for the reads.
https://learn.microsoft.com/en-us/azure/cosmos-db/nosql/troubleshoot-sdk-availability

So there is a difference in client behavior to 503 based on the ApplicationRegion set vs not set. Moreover, it is not very intuitive and doesn't consider multiple replicas configured in the database.

Describe the solution you'd like
Solution 1:

“If Application region is set, the reads go to preferred available region closer to application region. If Application region is not set, the reads go to preferred available database region in order of failover priority (configured in the database)”.

This ensures consistent behavior whether AppRegion is set or not; improved success rate. potential increase in latency during failover.

Describe alternatives you've considered
Solution 2:

Given the importance of regional preferences (ApplicationRegion or ApplicationPreferredRegions) for the client to be resilient to regional outage, could this be made a required field during client initialization?

This ensures the callers will set this information by default, so it won't get missed and behavior becomes similar in all scenarios.

Additional context
More context to Solution 1:

<style> </style>
Scenario User Action Current Behavior Proposed Change Expected Outcome
1 User sets AppRegion All read regions are populated as preferred, sorted by proximity to AppRegion. Fallback to next closer region on 503.    
2 User does not set AppRegion SDK defaults to first DB region (irrespective of the latency). No fallback on 503. SDK uses first DB region and considers all DB regions for fallback on 503 (in the order of failover priority) Consistent behavior whether AppRegion is set or not; improved success rate. potential increase in latency during failover.

For an example,
Consider a Cosmos DB instance 'CDB A' with three regions: East US (EUS) with priority 0, West US (WUS) with priority 1, and West Europe (WEU) with priority 2:
Currently, if a user specifies the ApplicationRegion as EUS, the SDK populates all read regions as preferred regions, sorted by their proximity to EUS. In the event of a 503 error from EUS, the SDK falls back to WUS, the next closest region.
If the user does not specify the ApplicationRegion, the SDK defaults to the first database region, EUS. Upon encountering 503 errors, the SDK does not have a fallback mechanism and fails.
Proposed enhancement is to consider all the read replicas and fallback to next replica (in the order of failover priority) if the ApplicationRegion is not set.

@ealsur ealsur added feature-request New feature or request and removed feature-request New feature or request needs-investigation labels Aug 12, 2024
@kirankumarkolli
Copy link
Member

Tracked part of #4665

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
customer-reported Issue created by a customer feature-request New feature or request
Projects
None yet
Development

No branches or pull requests

3 participants