Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ServiceBus] AutoLockRewner lock renew strategy alignment #18513

Closed
yunhaoling opened this issue May 5, 2021 · 2 comments
Closed

[ServiceBus] AutoLockRewner lock renew strategy alignment #18513

yunhaoling opened this issue May 5, 2021 · 2 comments
Assignees
Labels
Client This issue points to a problem in the data-plane of the library. Messaging Messaging crew Service Bus
Milestone

Comments

@yunhaoling
Copy link
Contributor

yunhaoling commented May 5, 2021

Problem:

Some user might have clock skew issue between their dev machine/server and the service server. And our current AutoLockRewner only renews lock before 10 seconds of the expiration time which might lead to lock renew failure.

example issue logging:

May  4 14:35:01 internal0sps0 DEBUGENVIRONMENT [31610]: INF azureServiceBus:get_message:MainThread[543]: The message is locked until 2021-05-04 18:36:27.490000+00:00
May  4 14:35:01 internal0sps0 DEBUGENVIRONMENT [31610]: DEB auto_lock_renewer:_auto_lock_renew:ThreadPoolExecutor-0_0[133]: Running lock auto-renew thread for 300 seconds
May  4 14:36:17 internal0sps0 DEBUGENV[31610]: 2021-05-04 14:36:17,981 DEBUGENVIRONMENT [31610]: DEB auto_lock_renewer:_auto_lock_renew:ThreadPoolExecutor-0_0[155]: 10 seconds or less until lock expires - auto renewing.
May  4 14:36:18 internal0sps0 DEBUGENV[31610]: 2021-05-04 14:36:18,231 DEBUGENVIRONMENT [31610]: DEB auto_lock_renewer:_auto_lock_renew:ThreadPoolExecutor-0_0[172]: Failed to auto-renew lock: MessageLockLostError("Message lock renew failed. b'The lock supplied is invalid. Either the lock expired, or the message has already been removed from the queue. Reference:3737095c-8f5d-48b7-a7af-717f5c089b18, TrackingId:5585b9c0-ad2a-4d2f-8b06-3277d32f6ae7_B66, SystemTracker:spsdev0servicebus:Queue:debugenv_queue, Timestamp:2021-05-04T18:37:13'. Error condition: com.microsoft:message-lock-lost. Status Code: 410."). Closing thread.

Current:

  • lock renew strategy
    • have a thread for each lock renew object
    • run while loop, check if lock_until_uct - utc now < 10 seconds
      • renew if it meets the condition
      • otherwise sleep 1 second

Improvement we could probably do:

  • expose the configuration _sleep_time and _renew_period
  • have a different strategy
    • update the lock every x seconds

Action item:

  • cross language check for strategy first: Yes
  • whether we should align the strategy/introduce new strategy: Yes
  • whether we should expose the settings: No, not to further confuse users
  • update the code/api surface: No

Latest update:

follow NET's algorithm which is:
If < 400ms, renew immediately.
Else, renew = remainingTime - bufferTime.
bufferTime = Math.min((remainingTime/2), 10 seconds)

@yunhaoling yunhaoling added Service Bus Client This issue points to a problem in the data-plane of the library. labels May 5, 2021
@yunhaoling yunhaoling added this to the Backlog milestone May 5, 2021
@yunhaoling yunhaoling self-assigned this May 5, 2021
@yunhaoling yunhaoling added the Messaging Messaging crew label May 12, 2021
@swathipil swathipil assigned swathipil and unassigned yunhaoling May 17, 2021
@swathipil
Copy link
Member

ask in channel if other langs renew every 30 secs/implementation?

@lmazuel lmazuel assigned yunhaoling and unassigned swathipil May 19, 2021
@yunhaoling yunhaoling changed the title [ServiceBus] Configurable behavior for AutoLockRewner [ServiceBus] AutoLockRewner lock renew strategy alignment Dec 18, 2021
@yunhaoling yunhaoling modified the milestones: Backlog, [2022] April Mar 15, 2022
@yunhaoling
Copy link
Contributor Author

auto lock renewer alignment increases the chance of auto lock renew failure due to the python single thread model and the way auto lock renewer is implemented:

  • in current impl, whenever the renewable's expiration time is less than 10s from now, the auto lock renewer will try renew, with sleeping 1s interval in the while loop
  • in alignment, we're shrinking the time frame from expiration to now in each cycle, e.g., initially 10s from now, then 5s from now, then 2s from now. leading to less auto lock renewer execution and then failure.

macos and windows tests keep failing in the PR: #23557 due to the auto lock renewer change.

I think for Python it's better to keep the current strategy and deviate from other languages on this issue.

@github-actions github-actions bot locked and limited conversation to collaborators Apr 12, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Client This issue points to a problem in the data-plane of the library. Messaging Messaging crew Service Bus
Projects
None yet
Development

No branches or pull requests

2 participants