Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

raw2trace incorrectly tries to use the host machine's vector length when processing a trace from another architecture #6585

Closed
derekbruening opened this issue Jan 25, 2024 · 4 comments · Fixed by #6603
Assignees

Comments

@derekbruening
Copy link
Contributor

PR #6544 introduced a bug where raw2trace calls proc_get_vector_length to set dr_set_sve_vector_length. This is incorrect as the trace being processed may have been gathered on another architecture: we often trace on aarch64 and process on x86. In that case the proc routine returns 0, and the 0 hits a bug in dr_set_sve_vector_length which is a buffer overflow #6584 which was there since that function was added in PR #5776.

The vector length needs to be stored in the trace instead.

We are having to patch this code locally to get things to work. (If we had noticed sooner we would have requested to revert PR #6544.)

Also, the vector length code in proc.c should be audited for whether it should be using HOST_AARCHXX instead of AARCHXX defines: I don't recall how the proc code in general is handled for cross-arch today. We could file a separate bug on that if we determine change is needed there, as it's separate from storing the length in the trace.

@AssadHashmi
Copy link
Contributor

AssadHashmi commented Jan 25, 2024

The vector length needs to be stored in the trace instead.

To be clear, when the trace is generated on SVE h/w, the vector length (VL) used should be stored in the trace header so that it can be read when processing the trace on non-SVE h/w (e.g. x86-64)?

As a quick temporary workaround, can the VL be independent of the trace? For example reading from an environment variable or a command line option on non-SVE h/w?

Should VL be recorded in bytes or bits for trace reading purposes?

I've tried to see how this fails on x86-64 with a simple test case and the invariant checker as well as all the other -simulator_type options but it doesn't fail:

  1. Generate a trace on SVE h/w and check it's valid:
$ drrun -t drcachesim -offline -outdir ./i6585_for_x86 -- ./sve_test_case
$ drrun -t drcachesim -simulator_type invariant_checker -indir ./i6585_for_x86/drmemtrace. sve_test_case.4175747.4638.dir
Trace invariant checks passed
$
  1. scp the trace directory i6585_for_x86 to an x86-64 machine and run the checker on there:
$ drrun -t drcachesim -simulator_type invariant_checker -indir ./i6585_for_x86/drmemtrace.sve_test_case.4175747.4638.dir
Trace invariant checks passed
$

What part of the trace reading process requires the vector length in order to work properly?
Do you have a test case you can share which shows the tracer failing when it doesn't get the right VL?

@derekbruening
Copy link
Contributor Author

derekbruening commented Jan 25, 2024

I've tried to see how this fails on x86-64 with a simple test case and the invariant checker as well as all the other -simulator_type options but it doesn't fail:

The failure is in raw2trace, which you ran on the same machine where you gathered the trace. You need to scp the directory with just the raw subdirectory inside it before you run any other command (any analyzer like the checker you ran will go run raw2trace first if not already run). Then on the x86 machine your command would fail even sooner b/c presumably you have an x86 build. You need a special arm-on-x86 build (because drdecode doesn't yet support multi-arch from the same build: #1684) via -DTARGET_ARCH=aarch64.

@derekbruening
Copy link
Contributor Author

The vector length needs to be stored in the trace instead.

To be clear, when the trace is generated on SVE h/w, the vector length (VL) used should be stored in the trace header so that it can be read when processing the trace on non-SVE h/w (e.g. x86-64)?

Yes. The vector length should become a new marker record type (similar to TRACE_MARKER_TYPE_CACHE_LINE_SIZE) and be present in every aarch64 trace.

As a quick temporary workaround, can the VL be independent of the trace? For example reading from an environment variable or a command line option on non-SVE h/w?

For now I put a patch in place that hardcodes a constant s/proc_get_vector_length_bytes() * 8/256/ (though we'll have to verify whether 256 is right).

Should VL be recorded in bytes or bits for trace reading purposes?

Probably doesn't matter so long as it's consistently interpreted. Matching existing interfaces sounds reasonable -- though the proc_get_vector_length_bytes() is in bytes but dr_set_sve_vector_lengtyh() is bits.

@derekbruening
Copy link
Contributor Author

The same marker can be repeated (with a new value) if the app changes the vector length, as noted at #6544 (comment)

jackgallagher-arm added a commit that referenced this issue Jan 31, 2024
Adds a new trace marker
    TRACE_MARKER_TYPE_DYNAMIC_VECTOR_LENGTH
to drcachesim that indicates the current vector length for architectures
which have a dynamic vector length that can't be statically determined
from the instruction.

The marker is emitted as part of the thread header when running on AArch64
with SVE support, but in the future could also be used to track changes
in the vector length after prctl(PR_SVE_SET_VL, ..) system calls.

Some SVE load and store instructions such as
```    LDR <Zt>, [<Xn|SP>{, #<imm>, MUL VL}]```
or
```    ST1D { <Zt>.D }, <Pg>, [<Xn|SP>{, #<imm>, MUL VL}]```

scale the immediate offset based on the hardware vector length so
knowing the correct vector length for the traced application is
important to properly decode and analyse these instructions.

Fixes: #6585
jackgallagher-arm added a commit that referenced this issue Feb 9, 2024
Adds a new trace marker 'TRACE_MARKER_TYPE_DYNAMIC_VECTOR_LENGTH'
to drcachesim that indicates the current vector length for architectures which have a dynamic vector length that can't be statically determined from the instruction.

The marker is emitted as part of the thread header when running on AArch64 with SVE support, but in the future could also be used to track changes in the vector length after prctl(PR_SVE_SET_VL, ..) system calls (i#6625).

Some SVE load and store instructions such as
    LDR <Zt>, [<Xn|SP>{, #<imm>, MUL VL}]
or
    ST1D { <Zt>.D }, <Pg>, [<Xn|SP>{, #<imm>, MUL VL}]

scale the immediate offset based on the hardware vector length so knowing the correct vector length for the traced application is important to properly decode and analyse these instructions.

Fixes: #6585
Issues: #6625
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants