-
Notifications
You must be signed in to change notification settings - Fork 4.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix TRACE_GC build #108089
Fix TRACE_GC build #108089
Conversation
…CI to avoid breaks
…s 64-bit floating point values)
Tagging subscribers to this area: @dotnet/gc |
thanks for looking into this @jkoritzinsky! can you please explain why the changes to gcDetailedStartMsg breaks stresslog analyzer? I thought the analyzer just used the first field (which I didn't change) to get the gc index and that's all it used from this message? I can just not change it for stresslog, ie,keep the old msg when it's unfortunate how we have this limit for # of args for stress log entries... |
StressLogAnalyzer identifies the start of a GC by the exact format string returned by gcDetailedStartMsg, not the address of the string in the memory mapped log. As a result, changing the string breaks the matching for older GC versions with a newer log reader. I'm fine with either a separate PR or updates to this PR. We can also make another well-known message that StressLogAnalyzer shows by default. Yeah the arg limit is annoying, but it's part of the GC/EE contract so it's difficult to up the limit in its current form. |
thanks @jkoritzinsky. I think we should just get rid of this exact string match mechanism of deciding if something is marked as interesting. I was remembering only this part
so I thought as long as I kept the first field as the index it should be fine. but as you pointed out we needed to have the string recognized as "interesting" first which is an exact match. it'd be good make it so that as long as we have specific fields in the contract it only needs to match those fields. not being able to change the string at all is unnecessarily limiting. we'll submit a separate PR for this.
it'd be a great to have a new version of stress log API that doesn't have the arg limit so we can start using that new one instead. would it be a lot of work to make such a new version? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks like enough to merge on its own. I'm working on the gc start detection and the 17 argument issue.
(edit: that is, without the TRACE_GC build enabled since this doesn't fix it)
@@ -81,6 +81,25 @@ extends: | |||
eq(stageDependencies.EvaluatePaths.evaluate_paths.outputs['SetPathVars_non_wasm.containsChange'], true), | |||
eq(variables['isRollingBuild'], true)) | |||
|
|||
# Re-enable when fixing https://github.com/dotnet/runtime/issues/108628 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it make more sense to build the TRACE_GC
instrumentation into the checked builds by default and have a config switch it to enable/disable it? It is what we do everywhere else.
Adding a new CI leg to validate a few dozens of ifdefs is not very scalable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm still sorting through the details on this. The GC appears to have some STRESS_LOG instrumentation that is already available by default (normal stress log). TRACE_GC (without SIMPLE_DPRINTF) allows GC dprintfs to be added to the stress log, but it abuses the LogFacility value:
static void LogMsg(unsigned dprintfLevel, const StressLogMsg& msg)
{
GCToEEInterface::LogStressMsg(LL_ALWAYS, LF_ALWAYS|(dprintfLevel<<16)|LF_GC, msg);
}
in a way that seems like it would be undesirable for others. LogStressAnalyzer filters on this. (I don't know the history, but it looks like the facility might have previously been 16 bits because I feel like I've seen various assumptions about that as I poke around.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It turns out that the "available by default" instrumentation is for things like roots. The internal GC instrumentation is all under TRACE_GC. Some of that instrumentation at a very low level (used for tracking movement within the heap) that is believed to have substantial overhead (even if off).
It looks like TRACE_GC and DOTNET_StressLog (if not restricted by DOTNET_LogFacility) can collide, though the existence of the low bit (LF_GC) can be used to disambiguate (even though it otherwise seems to be intended as a pure bitmask). Interestingly,
I'm contemplating putting in a pre-STRESS_LOG_VA condition. In fact, the checked-in code already has a "commonly used line" -
#define dprintf(l,x) STRESS_LOG_VA(l,x);
//#define dprintf(l,x) {if ((l <= 2) || (l == 6666)) {STRESS_LOG_VA(l,x);}}
Using
#define dprintf(l,x) {if (l == *1*) {STRESS_LOG_VA(*0*,x);}}
is subtle but might work (need to verify), and then developers could opt-in as they please. Having a static check like this would mitigate the cost in checked/release builds.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please see #108655
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As your PR has all of the fixes, would you feel comfortable if we close my PR and merge yours instead?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure - happy to either include it or deal with the merge. The only question would be if mine has grown too big or if the GC changes need more discussion and delay it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it makes more sense to do the whole "Fix the TRACE_GC" build work in one PR. I'll close this one.
Allow floats to be passed on 64-bit builds.
@Maoni0 in #105545, you added too many arguments to the
gcDetailedStartMsg
and passed the 16-arg maximum (which is causing the build to fail). Also, by changing that message, using the StressLogAnalyzer on logs with the old "gcDetailedStartMsg" value (or in the non-BACKGROUND_GC builds) will not recognize GC start/end with the current native StressLogAnalyzer. Can we split the message into two separate messages to fix these issues?