Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Trace timeline seems wrong #841

Closed
Kralizek opened this issue Nov 15, 2023 · 16 comments · Fixed by #1049
Closed

Trace timeline seems wrong #841

Kralizek opened this issue Nov 15, 2023 · 16 comments · Fixed by #1049
Assignees

Comments

@Kralizek
Copy link

A page in a Razor Web app calls different endpoints of the same API.

The trace timeline seems wrong.

image
@davidfowl
Copy link
Member

What does "seems wrong" mean? Can you provide more details?

@Kralizek
Copy link
Author

Kralizek commented Nov 15, 2023

My bad. I was into the flow and the wrongness seemed clear to me.

This is the OnGetAsync method of the razor web app

    public async Task<ActionResult> OnGetAsync(CancellationToken cancellationToken)
    {
        var something = await adminApi.Http.GetFromJsonAsync<Something>($"endpoint1", cancellationToken);

        if (provider is null)
        {
            return NotFound();
        }

        Something = something;
        
        Something2s = await adminApi.Http.GetFromJsonAsync<Something2[]>($"endpoint2", cancellationToken);

        Something3s = await adminApi.Http.GetFromJsonAsync<Something3[]>($"endpoint3", cancellationToken);
        
        Something4s = await adminApi.Http.GetFromJsonAsync<Something4[]>($"endpoint4", cancellationToken);

        return Page();
    }

Given this snippet, I had the assumption that the trace timeline had the following characteristics:

  • The first span, representing the whole request, should be the first to start and the last to end
  • The second and third spans should be nested within the first span, sometimes after its start. The third span should start after the start of the second and end before its end.
  • The fourth and fifth span, should start after the end of the second span and similarly to the previous two spans, the fifth should be nested within the fourth,
  • finally, sixth and seventh span should start after the end of the fifth and behave like the other two.

Here is my wonderful ASCII representation.

1st xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
2nd      xxxxxxxxxxxx
3rd           xxxxx
4th                                 xxxxxxxxxxxxxxxx
5th                                      xxxxxxxxxx
6th                                                                   xxxxxxxxxxxxxxxxx
7th                                                                       xxxxxxxxxxx

I hope this is clearer.

@davidfowl
Copy link
Member

davidfowl commented Nov 15, 2023

No the trace looks correct to me, you have one parent, the incoming request, and 4 sibling requests (endpoints 1 to 4). Each of those have one child.

Edit: The first request does look strange.

cc @JamesNK

@Kralizek
Copy link
Author

Edit: The first request does look strange.

To me also the right part of the time diagram (the bars). the name column looks correct.

@danmoseley
Copy link
Member

@JamesNK thoughts?

@JamesNK
Copy link
Member

JamesNK commented Nov 16, 2023

I agree it's odd.

One possibility: You can have a situation where the root span for a trace isn't available yet, which means results can be unexpected.

For example:

  1. The server receives a HTTP request and starts a trace and a span.
  2. The server makes HTTP calls to other servers that creates new spans in the trace.
  3. The HTTP calls complete, the HTTP call spans complete, and they're sent via OTLP.
  4. The first HTTP request completes, and its span is sent via OTLP.

In the time period after 3, but before 4, there are child spans in the trace without a root span. It can look like a trace has multiple root spans.

However, I don't think that is the issue because the first two HTTP call spans are nested under the providers/{providerId:guid} span. Also, if there are multiple root spans, they're ordered by time to not appear out of order.

Another possibility is different apps have different clocks. I haven't seen that issue ever, and I don't think it is the case here because the providers/{providerId:guid} span and HTTP call spans are from the same app.

TLDR; I don't know. I'd be interested if there is a reproduction that I could look at.

@Kralizek
Copy link
Author

Kralizek commented Nov 16, 2023

I was hoping to get out a full representation of the whole trace but couldn't find how. (maybe a feature?)

Anyway, clicking on each row on the name column I got the following information (which confirms my hypothesis that the timeline graph is wrong)

# Name Service Duration Start time
1 providers/{providerId:guid} admin 18,35ms 0μs
2 HTTP GET admin 4ms 1,76ms
3 /providers/f0bc5c95-2d24-45fe-8d7e-9ca5bb724f57 api 3,55ms 2,12ms
4 HTTP GET admin 3,07ms 6,41ms
5 /providers/f0bc5c95-2d24-45fe-8d7e-9ca5bb724f57/courses api 2,64ms 6,73ms
6 HTTP GET admin 3,1ms 10,08ms
7 /providers/f0bc5c95-2d24-45fe-8d7e-9ca5bb724f57/sites api 2,78ms 10,33ms
8 HTTP GET admin 3,31ms 13,63ms
9 /providers/f0bc5c95-2d24-45fe-8d7e-9ca5bb724f57/users api 2,95ms 13,89ms

This is the timeline I got passing this data to GPT 4

image

@Kralizek
Copy link
Author

@JamesNK @danmoseley @davidfowl
Is there anything else other than a repro repo I can provide to help triage the issue?

@DamianEdwards
Copy link
Member

I tried to reproduce this based on your description and got the result you expected:

image

Here's my Razor Page's model class:

public class IndexModel(ILogger<IndexModel> logger, AdminApiClient adminApi) : PageModel
{
    private string? provider = "something";

    public int Something { get; set; }
    public int Something2s { get; set; }
    public int Something3s { get; set; }
    public int Something4s { get; set; }

    public async Task OnGetAsync()
    {
        var something = await adminApi.GetApi1();

        if (provider is null)
        {
            return;
        }

        Something = something;

        Something2s = await adminApi.GetApi2();

        Something3s = await adminApi.GetApi3();

        Something4s = await adminApi.GetApi4();
    }
}

And the apis:

app.MapGet("/api1", async () =>
{
    await Task.Delay(100);
    return 1;
});

app.MapGet("/api2", async () =>
{
    await Task.Delay(100);
    return 2;
});

app.MapGet("/api3", async () =>
{
    await Task.Delay(100);
    return 3;
});

app.MapGet("/api4", async () =>
{
    await Task.Delay(100);
    return 4;
});

At this point I think we'd need a repro repo.

@Kralizek
Copy link
Author

I'll put something together!

@Kralizek
Copy link
Author

@DamianEdwards could you share your repro?

I tried running a "dotnet new aspire-starter" application and I got this timeline. See the first line.

image

repro:

$ md Aspire2
$ cd Aspire2
$ dotnet new aspire-starter
$ dotnet run --project .\Aspire2.AppHost\

@DamianEdwards
Copy link
Member

@DamianEdwards
Copy link
Member

Is it possible this is a localization issue with the trace store sorting? I note that your culture is using , for thousands separators.

@Kralizek
Copy link
Author

Kralizek commented Nov 21, 2023

You might be onto something. This is what I got when I ran your repro fresh of checkout

image

That being said, the 0 μs should be 0 regardless of the digital separator.

@JamesNK
Copy link
Member

JamesNK commented Nov 21, 2023

I'm guessing the problem is here:

<div class="span-container" style="grid-template-columns: @context.LeftOffset.ToString("F2")% @context.Width.ToString("F2")% min-content;">

The ToStrings should use invariant culture.
The long-term solution is making sure culture code analysis is enabled (or maybe it is but doesn't run on Razor pages? IDK).

@DamianEdwards
Copy link
Member

@kvenkatrajan would be good to get this one fixed in preview 2

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

7 participants