Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Inaccurate device counter trace #318

Open
sfantao opened this issue Nov 29, 2023 · 1 comment
Open

Inaccurate device counter trace #318

sfantao opened this issue Nov 29, 2023 · 1 comment

Comments

@sfantao
Copy link

sfantao commented Nov 29, 2023

Using as an example https://github.com/amd/HPCTrainingExamples/tree/main/HIPIFY/mini-nbody/hip, if I get device counters with rocprof using:

> cat $wd/counters.txt
pmc : WriteSize FetchSize
> bash -c "export ROCR_VISIBLE_DEVICES=0 ; rocprof -i $wd/counters.txt ./nbody-orig $((12*65536))"

I get:

Index,KernelName,gpu-id,queue-id,queue-index,pid,tid,grd,wgr,lds,scr,arch_vgpr,accum_vgpr,sgpr,wave_size,sig,obj,WriteSize,FetchSize
0,"bodyForce(Body*, float, int) [clone .kd]",4,0,0,148495,148495,786432,256,0,0,16,0,16,64,0x0,0x7f4abf7508c0,36723.0000000000,524628.5625000000
1,"bodyForce(Body*, float, int) [clone .kd]",4,0,2,148495,148495,786432,256,0,0,16,0,16,64,0x0,0x7f4abf7508c0,17505.1250000000,488091.6250000000
2,"bodyForce(Body*, float, int) [clone .kd]",4,0,4,148495,148495,786432,256,0,0,16,0,16,64,0x0,0x7f4abf7508c0,17510.6250000000,487910.1250000000
3,"bodyForce(Body*, float, int) [clone .kd]",4,0,6,148495,148495,786432,256,0,0,16,0,16,64,0x0,0x7f4abf7508c0,33072.5000000000,2820859.8125000000
4,"bodyForce(Body*, float, int) [clone .kd]",4,0,8,148495,148495,786432,256,0,0,16,0,16,64,0x0,0x7f4abf7508c0,32875.0000000000,1719172.6875000000
5,"bodyForce(Body*, float, int) [clone .kd]",4,0,10,148495,148495,786432,256,0,0,16,0,16,64,0x0,0x7f4abf7508c0,31081.0000000000,668958.1250000000
6,"bodyForce(Body*, float, int) [clone .kd]",4,0,12,148495,148495,786432,256,0,0,16,0,16,64,0x0,0x7f4abf7508c0,17516.0000000000,488220.2500000000
7,"bodyForce(Body*, float, int) [clone .kd]",4,0,14,148495,148495,786432,256,0,0,16,0,16,64,0x0,0x7f4abf7508c0,32861.8750000000,3522902.0625000000
8,"bodyForce(Body*, float, int) [clone .kd]",4,0,16,148495,148495,786432,256,0,0,16,0,16,64,0x0,0x7f4abf7508c0,17505.0000000000,488151.7500000000
9,"bodyForce(Body*, float, int) [clone .kd]",4,0,18,148495,148495,786432,256,0,0,16,0,16,64,0x0,0x7f4abf7508c0,32938.8750000000,2949121.8750000000

If I use omniperf with a configuration containing:

OMNITRACE_ROCM_EVENTS                              = FetchSize:device=0 WriteSize:device=0

and run:

bash -c "export ROCR_VISIBLE_DEVICES=0 ; omnitrace-sample ./nbody-orig $((12*65536))"

I get:
image
i.e the counters do not show any fluctuation as they should trusting the rocprof output.

Tested on ROCm 5.7.0 and omnitrace omnitrace-1.10.4-ubuntu-20.04-ROCm-50700-PAPI-OMPT-Python3.sh.

For completeness on different machine and ROCm 5.6.1 I see things like:

image

Also no fluctuations but for the first kernel the reading starts correct but shifts in the middle of the kernel.

@jrmadsen
Copy link
Collaborator

There are a couple things going on here. I believe the default view of the timelines is the accumulation of the counters, so you will not see them fluctuate but instead, grow over time — if you click on the lightning bolt looking thing, you can change the view, I think one of them will be the delta. Second, there are likely some discrepancies from mapping hardware counters for kernels onto the kernel-independent timeline. Third, I don’t have a ton of confidence in the combination of the timing alignment between omnitrace’s current use of roctracer for kernel timing with the kernel timings reported by rocprofiler when it reports the HW counters — this needs to be investigated.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants