-
Notifications
You must be signed in to change notification settings - Fork 4.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Recurring test failures in tracing due to class loader alloc tracker heap corruption #57461
Comments
This and #57254 appear to be the same failure:
This looks like it is the same failure that was happening in #54469. |
Popped open the dumps from these failures. Looks to be coming from here (the destructor on the alloc tracker being hit at the end of the scope): runtime/src/coreclr/vm/clsload.cpp Lines 3123 to 3133 in a1eefbb
The scope of the tracking seems to be limited to this class load if I understand the alloc tracker code correctly. We're reaching this during loading from an R2R image and doing fixups: @davidwrighton or @mangod9 or @janvorli is there something I should take a look at that might give us more information? I have the MD for the method being loaded: it's |
Let me amend that previous comment: The method being reverse p/invoked is |
I'm unable to reproduce this locally. @hoyosjs helped me scrape some more stats about these failures:
There is at least one example of this failure happening in a CI run: https://dev.azure.com/dnceng/public/_build/results?buildId=1285558&view=ms.vss-test-web.build-test-results-tab&runId=37949368&resultId=110729&paneView=debug Looking at the failure there, it gets a little farther, but fails to load a different type ( The node in the linked list of allocations that is messed up always seems to be coming from the same place: runtime/src/coreclr/vm/methodtablebuilder.cpp Lines 11671 to 11683 in 2144431
|
cc @dotnet/dotnet-diag |
@josalem This looks to me like something is setting the However, I think the actual fault is likely tied in with the IPC work that's happening in EventPipe. In particular, if attaching and detaching the event pipe has some timeout/cancellation scenarios there is a risk that the async operation may complete at an unexpected time, and write to a local value on the stack if any stack pointers are ever passed to the file i/o functions. Of particular interest to me are the various failure paths which invove GetOverlappedResult failing. If that happens, I believe the contract is that the async operation could still complete, even if the IO has been cancelled, and such you can't reuse OVERLAPPED structures, etc until the IO is done. However, I'm not an expert in Windows IO api usage, so I'd push for someone who is to take a look. The other possibility I see is freeing a heap object in the event pipe code, but continuing to use it. (I suggest the issue is eventpipe related as these failures seem to only happen with the event pipe shutdown code on the stack). |
Found a case where we can set
|
Failures in https://dev.azure.com/dnceng/public/_build/results?buildId=1350047 are different and most likely due to changes done in PR, several tests hits OOM:
so not the same as the previous crash identified by this issue. |
@lateralusX I believe the problem identified in this issue was fixed with #58710. Can you confirm whether we can close this? |
Believe we can close it, have not seen or heard of any failures similar to this since this fix went in. |
Run: runtime-coreclr jitstress-isas-x86 20210814.1
Failed test:
Error message:
Runfo Tracking Issue: reverseouter tests
Build Result Summary
The text was updated successfully, but these errors were encountered: