Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Query: Fixes TaskCanceledException being converted to InternalServerError and losing diagnostics #2359

Closed
wants to merge 8 commits into from

Conversation

bchong95
Copy link
Contributor

@bchong95 bchong95 commented Apr 1, 2021

Tracing: Adds support for CancellationTokens on Feed paths.

Description

The SDK has a soft contract where CancellationTokens throw a CosmosOperationCancelledException which is an OperationCancellationTokenException with CosmosDiagnostics. This PR adds baseline tests, so we can assert the behavior. An example of the tracing on an cancellation path looks likes so:

└── Root Trace(fa9634c1-23f0-4f4f-9385-562254d5882b)  Unknown-Component  [email protected]:97  03:24:17:442  138.15 milliseconds  
    │   (
    │       [Operation Cancelled Exception]
    │       System.OperationCanceledException: The operation was canceled.
    │          at System.Threading.CancellationToken.ThrowOperationCanceledException()
    │          at Microsoft.Azure.Cosmos.Common.CollectionCache.ResolveByNameAsync(String apiVersion, String resourceAddress, Boolean forceRefesh, CancellationToken cancellationToken, ITrace trace) in C:\azure-cosmos-dotnet-v3\Microsoft.Azure.Cosmos\src\Routing\CollectionCache.cs:line 257
    │          at Microsoft.Azure.Cosmos.Common.CollectionCache.ResolveCollectionAsync(DocumentServiceRequest request, CancellationToken cancellationToken, ITrace trace) in C:\azure-cosmos-dotnet-v3\Microsoft.Azure.Cosmos\src\Routing\CollectionCache.cs:line 145
    │          at Microsoft.Azure.Cosmos.Handlers.RequestInvokerHandler.SendAsync(String resourceUriString, ResourceType resourceType, OperationType operationType, RequestOptions requestOptions, ContainerInternal cosmosContainerCore, FeedRange feedRange, Stream streamPayload, Action`1 requestEnricher, ITrace trace, CancellationToken cancellationToken) in C:\azure-cosmos-dotnet-v3\Microsoft.Azure.Cosmos\src\Handler\RequestInvokerHandler.cs:line 201
    │          at Microsoft.Azure.Cosmos.Pagination.NetworkAttachedDocumentContainer.MonadicReadFeedAsync(FeedRangeState`1 feedRangeState, ReadFeedPaginationOptions readFeedPaginationOptions, ITrace trace, CancellationToken cancellationToken) in C:\azure-cosmos-dotnet-v3\Microsoft.Azure.Cosmos\src\Pagination\NetworkAttachedDocumentContainer.cs:line 171
    │          at Microsoft.Azure.Cosmos.Pagination.PartitionRangePageAsyncEnumerator`2.MoveNextAsync(ITrace trace) in C:\azure-cosmos-dotnet-v3\Microsoft.Azure.Cosmos\src\Pagination\PartitionRangePageAsyncEnumerator.cs:line 56
    │          at Microsoft.Azure.Cosmos.Pagination.CrossPartitionRangePageAsyncEnumerator`2.MoveNextAsync(ITrace trace) in C:\azure-cosmos-dotnet-v3\Microsoft.Azure.Cosmos\src\Pagination\CrossPartitionRangePageAsyncEnumerator.cs:line 126
    │          at Microsoft.Azure.Cosmos.ReadFeed.Pagination.CrossPartitionReadFeedAsyncEnumerator.MoveNextAsync(ITrace trace) in C:\azure-cosmos-dotnet-v3\Microsoft.Azure.Cosmos\src\ReadFeed\Pagination\CrossPartitionReadFeedAsyncEnumerator.cs:line 43
    │          at Microsoft.Azure.Cosmos.ReadFeed.ReadFeedIteratorCore.ReadNextAsync(ITrace trace, CancellationToken cancellationToken) in C:\azure-cosmos-dotnet-v3\Microsoft.Azure.Cosmos\src\ReadFeed\ReadFeedIteratorCore.cs:line 216
    │   )
    ├── MoveNextAsync(cb8de160-b5a8-4279-abab-a84ea2786f01)  ReadFeed-Component  [email protected]:41  03:24:17:447  121.39 milliseconds  
    │   └── MoveNextAsync(ecee72d2-9e81-43c6-93a0-a73e1403ce62)  Pagination-Component  [email protected]:112  03:24:17:451  117.27 milliseconds  
    │       ├── [,FF) move next(58796559-1c67-4dcc-bec7-e2f35773ec7c)  Pagination-Component  [email protected]:49  03:24:17:464  17.16 milliseconds  
    │       │   └── Prefetch(ccee5e65-7572-4132-8cdd-7536b4e0b36f)  Pagination-Component  [email protected]:50  03:24:17:466  14.39 milliseconds  
    │       │       └── [,FF) move next(7442d072-4aa8-4b34-a56f-0a8bbc4abc53)  Pagination-Component  [email protected]:49  03:24:17:467  13.08 milliseconds  
    │       │           └── Microsoft.Azure.Cosmos.Handlers.RequestInvokerHandler(907cf401-8eec-4b6a-b49c-0ff6c488d8c3)  RequestHandler-Component  [email protected]:148  03:24:17:468  6.76 milliseconds  
    │       │               ├── Get Partition Key Range Cache(a604247c-d18c-4708-a59e-fc0703741263)  Routing-Component  [email protected]:551  03:24:17:469  0.00 milliseconds  
    │       │               ├── Get Collection Cache(c95a73d9-e782-4ce2-90b5-51fa48c80afe)  Routing-Component  [email protected]:542  03:24:17:469  0.00 milliseconds  
    │       │               └── Try Get Overlapping Ranges(7a5626ab-76da-4080-b303-31741fa81756)  Routing-Component  [email protected]:49  03:24:17:470  3.50 milliseconds  
    │       ├── Get Container Properties(e2efd643-219d-4b2b-bfa4-55fd52dc7696)  Transport-Component  [email protected]:363  03:24:17:484  0.03 milliseconds  
    │       │   └── Get Collection Cache(7b5e4d45-18bf-4214-9916-8fd4a0cb73a2)  Routing-Component  [email protected]:542  03:24:17:484  0.00 milliseconds  
    │       ├── Get RID(4b20bdb4-2849-4062-a278-1f72d434d637)  Routing-Component  [email protected]:436  03:24:17:485  0.01 milliseconds  
    │       ├── Get Collection Cache(bfe65c11-f10f-41d4-9978-9b0ea0f298ef)  Routing-Component  [email protected]:542  03:24:17:485  0.00 milliseconds  
    │       ├── Get Overlapping Feed Ranges(55861794-61f0-444b-8feb-3ebc0c794216)  Routing-Component  GetTargetPartitionKeyRangeByFeedRangeAsync@CosmosQueryClientCore.cs:207  03:24:17:486  6.44 milliseconds  
    │       │   └── Get Partition Key Ranges(45fe734f-f63c-4b00-94a9-04139dd24fd3)  Routing-Component  [email protected]:240  03:24:17:488  4.19 milliseconds  
    │       │       └── Try Get Overlapping Ranges(ef1d629e-c7e9-484d-9723-a962999e7030)  Routing-Component  [email protected]:49  03:24:17:492  0.06 milliseconds  
    │       └── MoveNextAsync(c6b69b92-bb33-43a6-b4bd-2c644d5e64f8)  Pagination-Component  [email protected]:112  03:24:17:497  71.50 milliseconds  
    │           └── [,05C1CFFFFFFFF8) move next(058e3329-e35d-4e1a-95de-c027e30bdac4)  Pagination-Component  [email protected]:49  03:24:17:497  67.11 milliseconds  
    │               └── Microsoft.Azure.Cosmos.Handlers.RequestInvokerHandler(7835b22d-d7ab-4820-9680-dc5bb84be01b)  RequestHandler-Component  [email protected]:148  03:24:17:497  38.81 milliseconds  
    │                   ├── Get Partition Key Range Cache(a2e0986f-f092-432a-ab42-281fe189f622)  Routing-Component  [email protected]:551  03:24:17:497  0.00 milliseconds  
    │                   ├── Get Collection Cache(f528680c-90ae-40d9-8b73-71c13ccb907d)  Routing-Component  [email protected]:542  03:24:17:497  0.00 milliseconds  
    │                   ├── Try Get Overlapping Ranges(08926340-ab2e-4ef7-a870-571f4cc68dd9)  Routing-Component  [email protected]:49  03:24:17:497  0.03 milliseconds  
    │                   └── Microsoft.Azure.Cosmos.Handlers.RequestInvokerHandler(5a405249-73d4-4e23-9253-74577f8888f1)  RequestHandler-Component  [email protected]:51  03:24:17:501  35.02 milliseconds  
    │                       ├── Get Collection Cache(2f927fe2-3e7a-46b2-bc20-63c1e7fabe3e)  Routing-Component  [email protected]:542  03:24:17:503  0.00 milliseconds  
    │                       └── Send Async(351e657d-7f2b-41b2-9446-d813b8cd3027)  RequestHandler-Component  [email protected]:50  03:24:17:504  31.84 milliseconds  
    │                           │   (
    │                           │       [CPU Load History]
    │                           │       (2021-04-01T15:24:14.3548650Z 62.500)
    │                           │   )
    │                           └── Send Async(694070e3-073e-47c8-995f-cdfc34022e77)  RequestHandler-Component  [email protected]:50  03:24:17:504  31.84 milliseconds  
    │                               └── Send Async(f085971a-f8f8-4617-916b-453426dd1c6d)  RequestHandler-Component  [email protected]:23  03:24:17:504  31.84 milliseconds  
    │                                   └── Send Async(4dff0480-bc34-41dd-a63a-0b56161cbf94)  RequestHandler-Component  [email protected]:50  03:24:17:504  31.83 milliseconds  
    │                                       └── Microsoft.Azure.Documents.ServerStoreModel Transport Request(6cf8550a-74e4-481d-bb3e-f9024baa380f)  Transport-Component  [email protected]:87  03:24:17:504  31.79 milliseconds  
    │                                               (
    │                                                   [User Agent]
    │                                                   cosmos-netstandard-sdk/3.17.0|3.17.1|01|X64|Microsoft Windows 10.0.19042 |.NET Core 4.6.29812.02|
    │                                                   [Client Side Request Stats]
    │                                                   Start Time: 03:24:17:504
    │                                                   End Time: 03:24:17:534
    │                                                   Contacted Replicas
    │                                                     rntbd://127.0.0.1:10253/apps/DocDbApp/services/DocDbServer0/partitions/a4cb494c-38c8-11e6-8106-8cdcd42c33be/replicas/1p/: 1
    │                                                   Failed to Contact Replicas
    │                                                   Regions Contacted
    │                                                     rntbd://127.0.0.1:10253/apps/DocDbApp/services/DocDbServer0/partitions/a4cb494c-38c8-11e6-8106-8cdcd42c33be/replicas/1p/
    │                                                   Address Resolution Statistics
    │                                                   ┌────────────────┬──────────────┬──────────────────────────────────────────────────┐
    │                                                   │Start Time (utc)│End Time (utc)│Endpoint                                          │
    │                                                   ├────────────────┼──────────────┼──────────────────────────────────────────────────┤
    │                                                   └────────────────┴──────────────┴──────────────────────────────────────────────────┘
    │                                                   Store Response Statistics
    │                                                     Start Time: 03:24:17:508
    │                                                     End Time: 03:24:17:534
    │                                                     Resource Type: Document
    │                                                     Operation Type: ReadFeed
    │                                                     Store Result
    │                                                       Activity Id: 15522c24-f97f-4962-be9c-1823ddba1fa2
    │                                                       Store Physical Address: rntbd://127.0.0.1:10253/apps/DocDbApp/services/DocDbServer0/partitions/a4cb494c-38c8-11e6-8106-8cdcd42c33be/replicas/1p/
    │                                                       Status Code: Ok/Unknown
    │                                                       Is Valid: True
    │                                                       LSN Info
    │                                                         LSN: 51
    │                                                         Item LSN: -1
    │                                                         Global LSN: -1
    │                                                         Quorum Acked LSN: 51
    │                                                         Using LSN: True
    │                                                       Session Token: -1#51
    │                                                       Quorum Info
    │                                                         Current Replica Set Size: 1
    │                                                         Current Write Quorum: 1
    │                                                       Is Client CPU Overloaded: False
    │                                                       Exception
    │                                               )
    └── MoveNextAsync(856686f7-3041-4451-9ec6-ffbd86986811)  ReadFeed-Component  [email protected]:41  03:24:17:579  1.35 milliseconds  
        └── MoveNextAsync(061d861e-57e5-4068-b8a8-fa3f371cc8d1)  Pagination-Component  [email protected]:112  03:24:17:579  1.32 milliseconds  
            └── [,05C1CFFFFFFFF8) move next(2eee26bf-dde7-42cd-a351-dab1b9230536)  Pagination-Component  [email protected]:49  03:24:17:579  1.15 milliseconds  
                └── Microsoft.Azure.Cosmos.Handlers.RequestInvokerHandler(9872645f-a19d-4763-b993-6b76cf6d1234)  RequestHandler-Component  [email protected]:148  03:24:17:579  0.24 milliseconds  
                    ├── Get Partition Key Range Cache(84783cd1-fa67-4ad2-932e-0c53c7649d6c)  Routing-Component  [email protected]:551  03:24:17:579  0.00 milliseconds  
                    └── Get Collection Cache(0866c559-849d-4ab0-8acf-8d78b6609f97)  Routing-Component  [email protected]:542  03:24:17:579  0.00 milliseconds  

Which shows the OperationCancelledException and the last methods the request made it to before getting cancelled.

closes ##2065 Introduced in 3.14.0 in #1812

sboshra
sboshra previously approved these changes Apr 1, 2021
Copy link
Contributor

@sboshra sboshra left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:shipit:

neildsh
neildsh previously approved these changes Apr 22, 2021
@j82w
Copy link
Contributor

j82w commented Apr 22, 2021

/azp run

@azure-pipelines
Copy link

Azure Pipelines successfully started running 2 pipeline(s).

@neildsh
Copy link
Contributor

neildsh commented Apr 22, 2021

/azp run

@azure-pipelines
Copy link

Azure Pipelines successfully started running 2 pipeline(s).

@j82w
Copy link
Contributor

j82w commented Apr 22, 2021

/azp run

@azure-pipelines
Copy link

Azure Pipelines successfully started running 2 pipeline(s).

@j82w j82w changed the title Tracing: Adds support for CancellationTokens on Feed paths. Query: Fixes exception losing diagnostics and incorrectly converting operation cancelled exceptions to internal server errors Apr 23, 2021
@j82w
Copy link
Contributor

j82w commented Apr 23, 2021

/azp run

@azure-pipelines
Copy link

Azure Pipelines successfully started running 2 pipeline(s).

@j82w
Copy link
Contributor

j82w commented Apr 23, 2021

/azp run

@azure-pipelines
Copy link

Azure Pipelines successfully started running 2 pipeline(s).

@j82w j82w changed the title Query: Fixes exception losing diagnostics and incorrectly converting operation cancelled exceptions to internal server errors Query: Fixes TaskCanceledException being converted to InternalServerError and losing diagnostics Apr 23, 2021
@j82w
Copy link
Contributor

j82w commented Apr 23, 2021

This PR is moved to #2424

@j82w j82w closed this Apr 23, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants