Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix TRACE_GC build #108089

Closed
wants to merge 3 commits into from
Closed

Fix TRACE_GC build #108089

wants to merge 3 commits into from

Conversation

jkoritzinsky
Copy link
Member

Allow floats to be passed on 64-bit builds.

@Maoni0 in #105545, you added too many arguments to the gcDetailedStartMsg and passed the 16-arg maximum (which is causing the build to fail). Also, by changing that message, using the StressLogAnalyzer on logs with the old "gcDetailedStartMsg" value (or in the non-BACKGROUND_GC builds) will not recognize GC start/end with the current native StressLogAnalyzer. Can we split the message into two separate messages to fix these issues?

Copy link
Contributor

Tagging subscribers to this area: @dotnet/gc
See info in area-owners.md if you want to be subscribed.

@Maoni0
Copy link
Member

Maoni0 commented Sep 21, 2024

thanks for looking into this @jkoritzinsky!

can you please explain why the changes to gcDetailedStartMsg breaks stresslog analyzer? I thought the analyzer just used the first field (which I didn't change) to get the gc index and that's all it used from this message?

I can just not change it for stresslog, ie,keep the old msg when SIMPLE_DPRINTF is not defined and use the new one when it is defined. do you want me to submit a separate PR for that?

it's unfortunate how we have this limit for # of args for stress log entries...

@jkoritzinsky
Copy link
Member Author

StressLogAnalyzer identifies the start of a GC by the exact format string returned by gcDetailedStartMsg, not the address of the string in the memory mapped log. As a result, changing the string breaks the matching for older GC versions with a newer log reader.

I'm fine with either a separate PR or updates to this PR. We can also make another well-known message that StressLogAnalyzer shows by default.

Yeah the arg limit is annoying, but it's part of the GC/EE contract so it's difficult to up the limit in its current form.

@Maoni0
Copy link
Member

Maoni0 commented Sep 22, 2024

thanks @jkoritzinsky. I think we should just get rid of this exact string match mechanism of deciding if something is marked as interesting. I was remembering only this part

    case    IS_GCSTART:
    {
        int gcIndex = (int)(size_t)args[0];
        if (gcIndex < MAX_GC_INDEX)
        {
            s_gcStartEnd[gcIndex].startTime = deltaTime;
        }
        return s_showDefaultMessages;
    }

so I thought as long as I kept the first field as the index it should be fine. but as you pointed out we needed to have the string recognized as "interesting" first which is an exact match. it'd be good make it so that as long as we have specific fields in the contract it only needs to match those fields. not being able to change the string at all is unnecessarily limiting. we'll submit a separate PR for this.

Yeah the arg limit is annoying, but it's part of the GC/EE contract so it's difficult to up the limit in its current form.

it'd be a great to have a new version of stress log API that doesn't have the arg limit so we can start using that new one instead. would it be a lot of work to make such a new version?

Copy link
Member

@markples markples left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks like enough to merge on its own. I'm working on the gc start detection and the 17 argument issue.

(edit: that is, without the TRACE_GC build enabled since this doesn't fix it)

@jkoritzinsky
Copy link
Member Author

@markples I've updated the CI changes and opened #108628.

I'll merge this in once it's green.

@@ -81,6 +81,25 @@ extends:
eq(stageDependencies.EvaluatePaths.evaluate_paths.outputs['SetPathVars_non_wasm.containsChange'], true),
eq(variables['isRollingBuild'], true))

# Re-enable when fixing https://github.com/dotnet/runtime/issues/108628
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it make more sense to build the TRACE_GC instrumentation into the checked builds by default and have a config switch it to enable/disable it? It is what we do everywhere else.

Adding a new CI leg to validate a few dozens of ifdefs is not very scalable.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd be fine with that, but I'd want @Maoni0 @cshung or @markples to sign-off on that design choice before I do it (as they own the usage here).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm still sorting through the details on this. The GC appears to have some STRESS_LOG instrumentation that is already available by default (normal stress log). TRACE_GC (without SIMPLE_DPRINTF) allows GC dprintfs to be added to the stress log, but it abuses the LogFacility value:

    static void LogMsg(unsigned dprintfLevel, const StressLogMsg& msg)
    {
        GCToEEInterface::LogStressMsg(LL_ALWAYS, LF_ALWAYS|(dprintfLevel<<16)|LF_GC, msg);
    }

in a way that seems like it would be undesirable for others. LogStressAnalyzer filters on this. (I don't know the history, but it looks like the facility might have previously been 16 bits because I feel like I've seen various assumptions about that as I poke around.)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It turns out that the "available by default" instrumentation is for things like roots. The internal GC instrumentation is all under TRACE_GC. Some of that instrumentation at a very low level (used for tracking movement within the heap) that is believed to have substantial overhead (even if off).

It looks like TRACE_GC and DOTNET_StressLog (if not restricted by DOTNET_LogFacility) can collide, though the existence of the low bit (LF_GC) can be used to disambiguate (even though it otherwise seems to be intended as a pure bitmask). Interestingly,

I'm contemplating putting in a pre-STRESS_LOG_VA condition. In fact, the checked-in code already has a "commonly used line" -

#define dprintf(l,x) STRESS_LOG_VA(l,x);
//#define dprintf(l,x) {if ((l <= 2) || (l == 6666)) {STRESS_LOG_VA(l,x);}}

Using

#define dprintf(l,x) {if (l == *1*) {STRESS_LOG_VA(*0*,x);}}

is subtle but might work (need to verify), and then developers could opt-in as they please. Having a static check like this would mitigate the cost in checked/release builds.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please see #108655

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As your PR has all of the fixes, would you feel comfortable if we close my PR and merge yours instead?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure - happy to either include it or deal with the merge. The only question would be if mine has grown too big or if the GC changes need more discussion and delay it.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it makes more sense to do the whole "Fix the TRACE_GC" build work in one PR. I'll close this one.

@jkoritzinsky jkoritzinsky deleted the tracegc-fixes branch October 16, 2024 22:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants