Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CYCLE_ACTIVITY.STALLS_L1D_PENDING is always zero #18

Closed
tootoonchian opened this issue Mar 9, 2015 · 4 comments
Closed

CYCLE_ACTIVITY.STALLS_L1D_PENDING is always zero #18

tootoonchian opened this issue Mar 9, 2015 · 4 comments

Comments

@tootoonchian
Copy link

I noticed that level 3 stats printed for memory bound workloads are incorrect on my machine (Xeon E5-2658 v3, Linux 3.19). Here is a sample output with a program that is DRAM bound (Intel MLC):

BE      Backend_Bound:                                90.68% 
BE/Mem  Backend_Bound.Memory_Bound:                   84.30% 
BE/Mem  Backend_Bound.Memory_Bound.L1_Bound:          84.35% 
BE/Mem  Backend_Bound.Memory_Bound.L3_Bound:          22.48% 
BE/Mem  Backend_Bound.Memory_Bound.MEM_Bound:         61.69% 

L1_Bound value is incorrect. I traced the issue to perf always reporting zero for CYCLE_ACTIVITY.STALLS_L1D_PENDING. Here is a sample perf output for that event:

perf stat -I 1000 -e cpu/event=0xa3,umask=0xc,cmask=12/ -a sleep 5
#           time             counts unit events
     1.000206434                  0      cpu/event=0xa3,umask=0xc,cmask=12/
     2.000452095                  0      cpu/event=0xa3,umask=0xc,cmask=12/
     3.000657316                  0      cpu/event=0xa3,umask=0xc,cmask=12/
     4.000875653                  0      cpu/event=0xa3,umask=0xc,cmask=12/
     5.001068298                  0      cpu/event=0xa3,umask=0xc,cmask=12/

With cmask=4, a value that seems correct is returned. I double checked SDM Vol3b and it seems that cmask value of 12 (0xc) should be correct. I understand this is not directly a pmu-tools bug, but was hoping to hear back if others are affected too.

@andikleen
Copy link
Owner

Does cmask=12 work correctly too?

Yes it looks like a bug in the event list. I'll report it.
For now you can patch the event list manually in ~/.cache/pmu-events/GenuineIntel-6-3F.json

@andikleen
Copy link
Owner

It turned out to be a kernel bug. 12 is the correct cmask, but the kernel would schedule it on the wrong counter.

Kernel patch (cut'n'pasted, may need to be applied manually)

diff --git a/arch/x86/kernel/cpu/perf_event_intel.c b/arch/x86/kernel/cpu/perf_event_intel.c
index 9f1dd18..35d9f5a 100644
--- a/arch/x86/kernel/cpu/perf_event_intel.c
+++ b/arch/x86/kernel/cpu/perf_event_intel.c
@@ -212,11 +212,11 @@ static struct event_constraint intel_hsw_event_constraints[] = {
INTEL_UEVENT_CONSTRAINT(0x01c0, 0x2), /* INST_RETIRED.PREC_DIST /
INTEL_EVENT_CONSTRAINT(0xcd, 0x8), /
MEM_TRANS_RETIRED.LOAD_LATENCY /
/
CYCLE_ACTIVITY.CYCLES_L1D_PENDING */

  •   INTEL_EVENT_CONSTRAINT(0x08a3, 0x4),
    
  •   INTEL_UEVENT_CONSTRAINT(0x08a3, 0x4),
    /\* CYCLE_ACTIVITY.STALLS_L1D_PENDING */
    
  •   INTEL_EVENT_CONSTRAINT(0x0ca3, 0x4),
    
  •   INTEL_UEVENT_CONSTRAINT(0x0ca3, 0x4),
    /\* CYCLE_ACTIVITY.CYCLES_NO_EXECUTE */
    
  •   INTEL_EVENT_CONSTRAINT(0x04a3, 0xf),
    
  •   INTEL_UEVENT_CONSTRAINT(0x04a3, 0xf),
    EVENT_CONSTRAINT_END
    
    };

@andikleen
Copy link
Owner

@tootoonchian
Copy link
Author

Thanks! I just tried with a patched kernel. perf now reports the numbers correctly.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants