Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[service-bus] Closing some open areas where we could mask errors, and fixing flaky tests to be more reliable. #15761

Merged

Conversation

richardpark-msft
Copy link
Member

@richardpark-msft richardpark-msft commented Jun 16, 2021

Making changes to simplify a flaky test (and hopefully make it more reliable).

The main issue with the 'handle interrupted detach' method was that it relied on too many moving parts to work reliably. We could just eternally loop like we'd expect customers to do, but in the end we have a very simple test we're trying to perform - we want to receive, and while we're in the process of draining, cause a detach and have it early exit and reject/resolve immediately rather than waiting for the timeout.

I reworked the test to make that simpler by just removing the unneeded connection.idle() and just calling directly into the onDetached method. Because it happens prior to rhea even seeing that we're draining we should reliably win that race each time.

There were a couple of other things changed for this PR as well:

  • The max time per test was lowered accidentally. Bringing it back what's been used as the standard time in other libraries
  • Fixed a spot in receiveMessages() where, if the link had been closed, we'd falsly return an empty array instead of throwing an exception indicating the link closed. This didn't appear to be related to the bug but it's incorrect and can hide bugs so I'd rather just throw the error than eat the condition and return an empty array. It's rare, but when it does happen the empty array response isn't right either - we're probably in the middle of a connection reset/change event.

Fixes #13461

@ghost ghost added the Service Bus label Jun 16, 2021
@richardpark-msft
Copy link
Member Author

/azp run js - service-bus - tests

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@richardpark-msft
Copy link
Member Author

/azp run js - service-bus - tests

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@richardpark-msft
Copy link
Member Author

/azp run js - service-bus - tests

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@richardpark-msft richardpark-msft changed the title [service-bus] receiveMessages can fail to signal a fatal error, returning no messages instead. [service-bus] Closing some open areas where we could mask errors, and fixing flaky tests to be more reliable. Jun 17, 2021
@richardpark-msft
Copy link
Member Author

/azp run js - service-bus - tests

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@richardpark-msft
Copy link
Member Author

/azp run js - service-bus - tests

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@richardpark-msft
Copy link
Member Author

/azp run js - service-bus - tests

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@richardpark-msft
Copy link
Member Author

/azp run js - service-bus - tests

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@richardpark-msft
Copy link
Member Author

/azp run js - service-bus - tests

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

… what's been used as the standard time in other libraries

- Fixes to simplify the flaky tests. Mostly just reducing the number of non-deterministic events (connections restarting, etc..) that were being heavily depended on, and just changing them to manually initiate the events we need instead (ie, force detach).
- Fixed a spot in receiveMessages() where, if the link had been closed, we'd falsly return an empty array instead of throwing an exception indicating the link closed.
@richardpark-msft
Copy link
Member Author

/azp run js - service-bus - tests

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@richardpark-msft
Copy link
Member Author

/azp run js - service-bus - tests

1 similar comment
@richardpark-msft
Copy link
Member Author

/azp run js - service-bus - tests

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@richardpark-msft richardpark-msft marked this pull request as ready for review June 18, 2021 03:06
@richardpark-msft
Copy link
Member Author

/azp run js - service-bus - tests

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@richardpark-msft
Copy link
Member Author

/azp run js - service-bus - tests

@azure-pipelines
Copy link

Azure Pipelines could not run because the pipeline triggers exclude this branch/path.

@richardpark-msft
Copy link
Member Author

/azp run js - service-bus - tests

@azure-pipelines
Copy link

Azure Pipelines could not run because the pipeline triggers exclude this branch/path.

@richardpark-msft
Copy link
Member Author

/azp run js - service-bus - tests

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

- The "open a receiver on a non-existent entity" should wait a bit longer since establishing the connection can be slow if network conditions are bad.
- Updating to throw a more realistic error inside of the onDetached() call (it'll be a non-retryable exception, not a ServiceBusError like I used).
@richardpark-msft
Copy link
Member Author

/azp run js - service-bus - tests

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

…added when I was trying to allow use of a deprecated field)

- Fixing assert to look for SessionLockLost error with peekLock+session disconnection tests.
@richardpark-msft
Copy link
Member Author

/azp run js - service-bus - tests

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@richardpark-msft
Copy link
Member Author

1 run passed with latest fixes.

@richardpark-msft
Copy link
Member Author

/azp run js - service-bus - tests

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@richardpark-msft
Copy link
Member Author

Another!

@richardpark-msft
Copy link
Member Author

/azp run js - service-bus - tests

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

Copy link
Contributor

@chradek chradek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approving as this does improve the situation with some of our flaky tests.

@richardpark-msft
Copy link
Member Author

The failure that's in there currently is not related to my change - it's in the streaming receiver (and sessions to boot!).

I'm going to go ahead and commit this PR, and focus on the session flaky test next.

@richardpark-msft
Copy link
Member Author

/check-enforcer override

@richardpark-msft richardpark-msft merged commit d4d2c75 into Azure:main Jun 21, 2021
@richardpark-msft richardpark-msft deleted the sb-flakytests-disconnects branch June 21, 2021 17:58
openapi-sdkautomation bot pushed a commit to AzureSDKAutomation/azure-sdk-for-js that referenced this pull request Aug 26, 2021
Correct stable 2020 01 01 (Azure#15761)

* Remove definitions that don't exist in 2020-01-01 and extract Incident Severity to a definition.

* Use IncidentSeverityEnum instead of defining an incorrect enum which have 'Critical' value.

* Remove duplicate definition of IncidentInfo which is already defined in Bookmarks.json

* Remove Settings from SecurityInsights.json

* Make prettier

Co-authored-by: Anat Gilenson <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Service Bus] Flaky test - batch receiver "disconnect" in receiveAndDelete mode
3 participants