Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Test Failed: Assert failure: 'sz == _idCodeSize' #12840

Closed
VincentBu opened this issue Jun 10, 2019 · 13 comments · Fixed by #57179
Closed

Test Failed: Assert failure: 'sz == _idCodeSize' #12840

VincentBu opened this issue Jun 10, 2019 · 13 comments · Fixed by #57179
Assignees
Labels
arch-x64 area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI os-windows
Milestone

Comments

@VincentBu
Copy link
Contributor

Job:
https://mc.dot.net/#/user/coreclr-outerloop-jitstress2-jitstressregs/ci~2Fdotnet~2Fcoreclr~2Frefs~2Fheads~2Fmaster/test~2Ffunctional~2Fcli~2F/20190609.1/workItem/JIT.Regression.CLR-x86-JIT.V1-M09-M11/analysis/xunit/JIT_Regression._CLR_x86_JIT_V1_M09_5_PDC_b25463_b25463_b25463_~2F_CLR_x86_JIT_V1_M09_5_PDC_b25463_b25463_b25463_cmd

Failed tests:
JIT_Regression.CLR_x86_JIT_V1_M09_5_PDC_b25463_b25463_b25463._CLR_x86_JIT_V1_M09_5_PDC_b25463_b25463_b25463_cmd

Log:

Assert failure(PID 3192 [0x00000c78], Thread: 408 [0x0198]): Assertion failed 'sz == _idCodeSize' in 'ILGEN_0x7122ba20:main():int' (IL size 171)

    File: f:\\workspace\\_work\\1\\s\\src\\jit\\emit.h Line: 888
    Image: C:\\dotnetbuild\\work\\6c91f97e-ceba-49f1-b03d-10156f43cafe\\Payload\\CoreRun.exe

Return code:      1
Raw output file:      C:\\dotnetbuild\\work\\6c91f97e-ceba-49f1-b03d-10156f43cafe\\Work\\c71234d0-eff6-4156-98e3-78836636de0c\\Exec\\JIT\\Regression\\Reports\\JIT.Regression\\CLR-x86-JIT\\V1-M09.5-PDC\\b25463\\b25463\\b25463.output.txt
Raw output:
BEGIN EXECUTION
 \"C:\\dotnetbuild\\work\\6c91f97e-ceba-49f1-b03d-10156f43cafe\\Payload\\corerun.exe\" b25463.exe 
Expected: 100
Actual: -1073740286
END EXECUTION - FAILED
FAILED
Test Harness Exitcode is : 1
To run the test:
> set CORE_ROOT=C:\\dotnetbuild\\work\\6c91f97e-ceba-49f1-b03d-10156f43cafe\\Payload
> C:\\dotnetbuild\\work\\6c91f97e-ceba-49f1-b03d-10156f43cafe\\Work\\c71234d0-eff6-4156-98e3-78836636de0c\\Exec\\JIT\\Regression\\CLR-x86-JIT\\V1-M09.5-PDC\\b25463\\b25463\\b25463.cmd
Expected: True
Actual:   False

category:implementation
theme:ir
skill-level:intermediate
cost:large

@BruceForstall
Copy link
Member

Looks like Windows/x64, JitStress=2, JitStressRegs=1/4/8

@BruceForstall
Copy link
Member

@dotnet/jit-contrib

@sandreenko sandreenko self-assigned this Jun 10, 2019
@BruceForstall
Copy link
Member

@sandreenko
Copy link
Contributor

The problem is in emitInsSizeCV that overestimates the real instruction size and returns value > 15 that is the maximum decodable size for Intel instructions.
The fast and awful solution would be to change instrDesc::idCodeSize(unsigned sz) to shrink sz to 15 when it is bigger. Then codegen will know the real size and fail with its own assert if the real size is > 15.

The good solution would be to clean emitInsSizeCV and similar functions (emitInsSizeSV, emitInsSizeRR, emitInsSizeAM, emitInsSizeCV) but it looks costly, from the first look I was not able to extract any logical parts in that code (where do we calculate opcode size, where do we calculate prefixes, etc). The advantage of this approach would be that we will allocate less memory for code from VM in general.

@CarolEidt
Copy link
Contributor

Here's the general issue for refactoring of the emitter, and the specific comment relating to instruction size: https://github.com/dotnet/coreclr/issues/23006#issuecomment-471114107

sandreenko referenced this issue in sandreenko/coreclr Jun 21, 2019
A temporary workaround to push the milestone for this issue to 3.next.
@sandreenko
Copy link
Contributor

sandreenko commented Jun 21, 2019

I have published PR with the first solution, meanwhile, I am working on a good one that uses emitOutputInstr to generate real bytes for the instr and get precise results.

It allows us to delete all functions that are doing estimates now and have very unclear code that we have to support every time when we change emitOutputInstr.

There are a few problems for far:

  1. can't predict jump sizes, so have to assume the worst;
  2. emitOutputInstr updates GC info that we need to skip when we estimate;
  3. emitOutputInstr registers relocation that we also do not need during estimates;
  4. a few methods in emitOutputInstr use predicated values from idCodeSize() to generate code (for nops for example);

2 and 3 are tricky because the code that does it is spread out through different places, but I am trying to extract it (for now I have template argument that I use to guard such places).

I think 2 weeks should be enough to get it into a decent state.

@mikedn
Copy link
Contributor

mikedn commented Jun 21, 2019

I am working on a good one that uses emitOutputInstr to generate real bytes for the instr and get precise results.

That's interesting and scary at the same time - any idea what perf impact such an approach has?

@sandreenko
Copy link
Contributor

That's interesting and scary at the same time - any idea what perf impact such an approach has?

In terms of throughput, the first iteration would be a bit slower, but emitOutputInstr is almost as expensive as our current version (when you skip the most expensive parts that do GC updates, that I think should not be there at all, but it is another issue) and communications with VM about relocs/static fields addresses.

The second iteration could memorize the generated bytes and keep them in Jit arena allocated memory, then memcpy it to VM memory and apply fixups. It could make TP better than it is now, but it is a far goal.

I do not expect any significant steady-state perf impact, we will allocate less memory for each method, so probably crossgen images could be smaller and we could have more short jumps, but It is hard to guess right now.

@mikedn
Copy link
Contributor

mikedn commented Jun 21, 2019

but emitOutputInstr is almost as expensive as our current version (when you skip the most expensive parts that do GC updates, that I think should not be there at all, but it is another issue) and communications with VM about relocs/static fields addresses.

Right. And if the mechanism that skips GC updates, relocation & instruction byte emission is efficient (e.g. some template wizardry) the overhead could be very low, perhaps even 0. This sounds very tempting.

The second iteration could memorize the generated bytes and keep them in Jit arena allocated memory, then memcpy it to VM memory and apply fixups. It could make TP better than it is now, but it is a far goal.

Yeah, I was wondering why we simply don't encode instructions directly, there should be another way to deal with short/long jump than what the emitter does now.

@sandreenko
Copy link
Contributor

sandreenko commented Jun 21, 2019

Yeah, I was wondering why we simply don't encode instructions directly, there should be another way to deal with short/long jump than what the emitter does now.

because we do not have memory where to put them and do not know hot/cold block sizes for jumps. I think these estimates became a real problem when we started to add many new instructions (mostly for HW intrinsics) and because the code in estimates allowed to over-estimate it did not complain when the new predictions were very inaccurate.

Note: on arm32/arm64 we do not want to use a temporary Jit memory block to put instruction there and copy to VM after, we can estimate there simpler and better than on XARCH avoiding alloc/memcpy.

sandreenko referenced this issue in dotnet/coreclr Jun 21, 2019
* WorkAround for #25050.

A temporary workaround to push the milestone for this issue to 3.next.

* Response review.
@jeffschwMSFT jeffschwMSFT changed the title Assert failure: 'sz == _idCodeSize' Test Failed: Assert failure: 'sz == _idCodeSize' Aug 22, 2019
@msftgits msftgits transferred this issue from dotnet/coreclr Jan 31, 2020
@msftgits msftgits added this to the 5.0 milestone Jan 31, 2020
@AndyAyersMS
Copy link
Member

@sandreenko do you think we should try and fix this for 5.0? How much overestimating is going on?

@sandreenko sandreenko modified the milestones: 5.0, Future Apr 30, 2020
@sandreenko
Copy link
Contributor

I don't remember the exact number, but it was very small, like 0.1%.

@kunalspathak
Copy link
Member

There is a follow-up conversation at #8748 (comment) related to over-estimation of certain instructions.

kunalspathak added a commit that referenced this issue Aug 14, 2021
* Add assert

* Remove the assert in emitInsSizeCV

* Add a check for includeRexPrefixSize

* Remove the codeSize() capping code added to fix #12840

* Make immediate only 4bytes long for non-mov instructions

* Delete a commented code
@ghost ghost locked as resolved and limited conversation to collaborators Sep 13, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
arch-x64 area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI os-windows
Projects
None yet
Development

Successfully merging a pull request may close this issue.

8 participants