-
Notifications
You must be signed in to change notification settings - Fork 561
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ASSERT: Invariant test failure: Too many reads (AArch64+sve) #6499
Comments
Side question: why do the read records in the trace snippet all have 0 bytes? Something looks off there. Main question: this is definitely an issue with the invariant checker. One solution is to mark scatter/gather instructions as predicated since with their masks for each load/store being optional they are dynamically variable in whether they do each load/store. We marked x86 rep string instructions this way: #1556, #6284. (We would then also mark x86 scatter/gather as well.). The invariant checker gives up on this check for predicated instructions. If there are aspects that can be checked by the invariant checker, we should try to add them: e.g., checking the sizes of any loads/stores -- that's statically known from the decoding? That would then report an error on these 0-byte loads. |
For the side question, I suspect it may be due to the predicate being all zeros. We can probably know the number of loads and stores by looking at the destination register and the vector length, we're also likely to know the sizes of the loads and stores as that's usually directly related to the element size. |
If a predicate bit is zero, then no load is done, so there should be no read record in the trace for that element. If the scatter/gather expansion is leaving an app load instr in place for a zero predicate, that is incorrect: that load needs to be skipped. |
I've been looking into it and considering your previous comments the obvious solution is to update I had a basic go at this and it does seem to resolve the problem of the assert in question but it doesn't remove all of the related "too many writes" assert. This is due to using this method to detect if something is predicated
This will catch most predicated instructions. AArch64 has several ways of using predicates, and to have the correct return for The simplest is what is caught by the above snippet:
These are predicates with modes which directly imply that these are governing predicates We can also reasonably assume that predicates in ld/st instructions like
imply that they are predicated instructions. The problem comes with instructions like
which have governing predicates, and are thus predicated instructions, but have no
where the predicates are merely being manipulated rather than modifying the reads or writes of the instruction. (There are a very limited number of this type of instruction). If we want to make sure that
Do you have any opinion on how we should proceed with this? |
For x86 we have a flag in the codec/table for predicates of various types, which turns into a DR_PRED_* instr_t-level flag. I would vote for a similar approach for aarch64 as well. |
Filed #6510 for x86. |
Hey @derekbruening, I ended up modifying the codec so that properties can be added to operands with a dot notation. For example p10_lo.gov uses p10_lo to decode the operand but it is then marked as a governing operand. Instructions with at least one governing operand are then marked as predicated by modifying the decode function generator to apply the DR_PRED_GOVERNING predicate value. This means we can then skip those instructions during the invariant checker and both the instructions which are predicated and the governing predicate are marked in the API for anyone who is interested in that information |
For the aarch64-internal codec that sounds fine. For the API-visible part, since the x86 and SVE concepts are so similar, we should try to share the interface to make it easier on clients. I'll add such a discussion to PR #6535. |
When debugging i#6499 we noticed that drcachesim was producing 0 byte read/write records for some SVE load/store instructions: ``` ifetch 4 byte(s) @ 0x0000000000405b3c a54a4681 ld1w (%x20,%x10,lsl #2) %p1/z -> %z1.s read 0 byte(s) @ 0x0000000000954e80 by PC 0x0000000000405b3c read 0 byte(s) @ 0x0000000000954e84 by PC 0x0000000000405b3c read 0 byte(s) @ 0x0000000000954e88 by PC 0x0000000000405b3c read 0 byte(s) @ 0x0000000000954e8c by PC 0x0000000000405b3c read 0 byte(s) @ 0x0000000000954e90 by PC 0x0000000000405b3c read 0 byte(s) @ 0x0000000000954e94 by PC 0x0000000000405b3c read 0 byte(s) @ 0x0000000000954e98 by PC 0x0000000000405b3c read 0 byte(s) @ 0x0000000000954e9c by PC 0x0000000000405b3c ifetch 4 byte(s) @ 0x0000000000405b4 ``` This turned out to be due to drdecode being linked into drcachesim twice: once into the drcachesim executable, once into libdynamorio. drdecode uses a global variable to store the SVE vector length to use when decoding so we end up with two copies of that variable and only one was being initialized. To fix this properly we would need to refactor the libraries so that there is only one copy of the sve_veclen global variable, or change the way that the decoder gets the vector length so its no longer stored in a global variable. In the mean time we have a workaround which makes sure both copies of the variable get initialized and drcachesim produces correct results. With that workaround in place however, the results were still wrong. For expanded scatter/gather instructions when you are using an offline trace, raw2trace doesn't have access to the load/store instructions from the expansion, only the original app scatter/gather instruction. It has to create the read/write records using only information from the original scatter/gather instruction and it uses the size of the memory operand to determine the size of each read/write. This works for x86 because the x86 IR uses the per-element data size as for the memory operand of scatter/gather instructions. This doesn't work for AArch64 because the AArch64 codec uses the maximum data transferred (per-element data size * number of elements) like other SIMD load/store instructions. We plan to make the AArch64 IR consistent with x86 by changing it to use the same convention as x86 for scatter/gather instructions but in the mean time we can work around the inconsistency by fixing the size in raw2trace based on the instruction's opcode. Issues: #6499, #5365
…predicated (#6535) This patch adds the DR_PRED_MASKED flag to instructions containing any governing predicate registers as well as tagging the governing register with DR_OPND_IS_MASKED. To do this, it extends the codec functionality such that operands may have "flags" which allow us to add some special code generation when they are encountered. In this case we amend the instr decode/encode functions to both tag the governing predicate and the instr itself. issue: #6499
With the above merged patch, this issue is no longer occurring in our tests |
When debugging i#6499 we noticed that drcachesim was producing 0 byte read/write records for some SVE load/store instructions: ``` ifetch 4 byte(s) @ 0x0000000000405b3c a54a4681 ld1w (%x20,%x10,lsl #2) %p1/z -> %z1.s read 0 byte(s) @ 0x0000000000954e80 by PC 0x0000000000405b3c read 0 byte(s) @ 0x0000000000954e84 by PC 0x0000000000405b3c read 0 byte(s) @ 0x0000000000954e88 by PC 0x0000000000405b3c read 0 byte(s) @ 0x0000000000954e8c by PC 0x0000000000405b3c read 0 byte(s) @ 0x0000000000954e90 by PC 0x0000000000405b3c read 0 byte(s) @ 0x0000000000954e94 by PC 0x0000000000405b3c read 0 byte(s) @ 0x0000000000954e98 by PC 0x0000000000405b3c read 0 byte(s) @ 0x0000000000954e9c by PC 0x0000000000405b3c ifetch 4 byte(s) @ 0x0000000000405b4 ``` This turned out to be due to drdecode being linked into drcachesim twice: once into the drcachesim executable, once into libdynamorio. drdecode uses a global variable to store the SVE vector length to use when decoding so we end up with two copies of that variable and only one was being initialized. To fix this properly we would need to refactor the libraries so that there is only one copy of the sve_veclen global variable, or change the way that the decoder gets the vector length so its no longer stored in a global variable. In the mean time we have a workaround which makes sure both copies of the variable get initialized and drcachesim produces correct results. With that workaround in place however, the results were still wrong. For expanded scatter/gather instructions when you are using an offline trace, raw2trace doesn't have access to the load/store instructions from the expansion, only the original app scatter/gather instruction. It has to create the read/write records using only information from the original scatter/gather instruction and it uses the size of the memory operand to determine the size of each read/write. This works for x86 because the x86 IR uses the per-element data size as for the memory operand of scatter/gather instructions. This doesn't work for AArch64 because the AArch64 codec uses the maximum data transferred (per-element data size * number of elements) like other SIMD load/store instructions. We plan to make the AArch64 IR consistent with x86 by changing it to use the same convention as x86 for scatter/gather instructions but in the mean time we can work around the inconsistency by fixing the size in raw2trace based on the instruction's opcode. Issues: #6499, #5365, #5036
Describe the bug
When running various applications, including fleetbench, we run into the following assert:
This failing test compares a predicted number of reads with the actual number of reads recorded by dynamorio.
The above case is failing at this point, when the second read is considered:
As this is run on a machine with a 256 bit vector length, we see 8 reads as expected for this instruction.
The invariant checker predicts the reads using the following function
This means that we can only ever expect one read per memory accessing operand (effectively one per instruction).
Should we expect sve instructions to "roll up" the memory accesses into a single entry, or do we have to do a better job at predicting the number of reads and writes?
Versions
The text was updated successfully, but these errors were encountered: