Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

E cores? #3338

Open
Manouchehri opened this issue Jul 24, 2022 · 6 comments
Open

E cores? #3338

Manouchehri opened this issue Jul 24, 2022 · 6 comments

Comments

@Manouchehri
Copy link
Contributor

At first glance, it looks like the E cores on newer Intel CPUs don't have perf counters. Can/should rr automatically pin itself to only run on cores that are going to work?

dave@intel:~/obj$ perf stat -ddd /bin/date
Mon Jul 25 01:11:14 AM CEST 2022

 Performance counter stats for '/bin/date':

              0.17 msec task-clock                #    0.539 CPUs utilized
                 0      context-switches          #    0.000 /sec
                 0      cpu-migrations            #    0.000 /sec
                68      page-faults               #  396.197 K/sec
           867,913      cpu_core/cycles/          #    5.057 G/sec
     <not counted>      cpu_atom/cycles/                                              (0.00%)
         1,029,910      cpu_core/instructions/    #    6.001 G/sec
     <not counted>      cpu_atom/instructions/                                        (0.00%)
           202,177      cpu_core/branches/        #    1.178 G/sec
     <not counted>      cpu_atom/branches/                                            (0.00%)
             6,623      cpu_core/branch-misses/   #   38.588 M/sec
     <not counted>      cpu_atom/branch-misses/                                       (0.00%)

       0.000318438 seconds time elapsed

       0.000341000 seconds user
       0.000000000 seconds sys
dave@intel:~/obj$ ./bin/rr record date
rr: Saving execution to trace directory `/home/dave/.local/share/rr/date-1'.
[FATAL src/PerfCounters.cc:378:check_working_counters() errno: EDOM]
Got 0 branch events, expected at least 500.

The hardware performance counter seems to not be working. Check
that hardware performance counters are working by running
  perf stat -e r5111c4 true
and checking that it reports a nonzero number of events.
If performance counters seem to be working with 'perf', file an
rr issue, otherwise check your hardware/OS/VM configuration. Also
check that other software is not using performance counters on
this CPU.
=== Start rr backtrace:
./bin/rr(_ZN2rr13dump_rr_stackEv+0x5a)[0x55f65f89530a]
./bin/rr(_ZN2rr15notifying_abortEv+0x14)[0x55f65f8977e4]
./bin/rr(+0x1f4854)[0x55f65f8b3854]
./bin/rr(_ZN2rr12PerfCounters5resetEl+0xc1c)[0x55f65f7a11cc]
./bin/rr(_ZN2rr4Task16resume_executionENS_13ResumeRequestENS_11WaitRequestENS_12TicksRequestEi+0x7f2)[0x55f65f869592]
./bin/rr(_ZN2rr13RecordSession13task_continueERKNS0_9StepStateE+0x33e)[0x55f65f7a796e]
./bin/rr(_ZN2rr13RecordSession11record_stepEv+0x34c)[0x55f65f7b6bec]
./bin/rr(_ZN2rr13RecordCommand3runERSt6vectorINSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEESaIS7_EE+0xc5d)[0x55f65f7a5e9d]
./bin/rr(main+0x1c8)[0x55f65f70bcc8]
/lib/x86_64-linux-gnu/libc.so.6(+0x29d90)[0x7fbe93dadd90]
/lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0x80)[0x7fbe93dade40]
./bin/rr(_start+0x25)[0x55f65f70be85]
=== End rr backtrace
Aborted (core dumped)
dave@intel:~/obj$ sudo perf stat -e r5111c4 true
WARNING: event 'N/A' not valid (bits 16,20,22 of config '5111c4' not supported by kernel)!

 Performance counter stats for 'true':

            93,138      cpu_core/r5111c4/
     <not counted>      cpu_atom/r5111c4/                                             (0.00%)

       0.000496501 seconds time elapsed

       0.000514000 seconds user
       0.000000000 seconds sys

Fix for i9-12900K:

taskset -c 0-15 rr record [whatever]

Picking the CPU to bind on seems to work too.

./bin/rr record --bind-to-cpu=0 date

Disabling the cores works too, not sure if there's any performance benefit to doing so though.

for i in {16..23}; do echo 0 | sudo tee /sys/devices/system/cpu/cpu${i}/online; done

Source: https://unix.stackexchange.com/questions/686459/disable-intel-alder-lake-efficiency-cores-on-linux

Related: #2997 #3032

@Keno
Copy link
Member

Keno commented Jul 24, 2022

Can you see if r517ec4 works for the E cores?

@Manouchehri
Copy link
Contributor Author

Manouchehri commented Jul 24, 2022


dave@intel:~$ taskset -c 16-23 perf stat -e r517ec4 true
WARNING: event 'N/A' not valid (bits 16,20,22 of config '517ec4' not supported by kernel)!

 Performance counter stats for 'true':

     <not counted>      cpu_core/r517ec4/                                             (0.00%)
            94,944      cpu_atom/r517ec4/

       0.000518950 seconds time elapsed

       0.000551000 seconds user
       0.000000000 seconds sys

@Keno
Copy link
Member

Keno commented Jul 24, 2022

Alright, seems to be working. We just need to hook up the core-specific perf-counter selection that we do on ARM then. Somewhat annoyingly, Intel microcode-updated all these chips to fake the CPUID to be the same on both cores, though there's a different CPUID leaf that can be used to detect that: https://www.intel.com/content/www/us/en/developer/articles/guide/12th-gen-intel-core-processor-gamedev-guide.html

@mdavidsaver
Copy link
Contributor

mdavidsaver commented Aug 10, 2023

I have run into this same issue with an i7-1250U. At least I found this ticket before creating a duplicate. Pinning to core 0 seems a sufficient workaround.

With intel-microcode 3.20230512.1 and Linux 6.1.0-10 (Debian) the cpuid utility (version 20230120) reports this as:

$ cpuid  | egrep -i 'hybrid|core type'
      hybrid part                              = true
      core type               = Intel Core
      hybrid part                              = true
      core type               = Intel Core
      hybrid part                              = true
      core type               = Intel Core
      hybrid part                              = true
      core type               = Intel Core
      hybrid part                              = true
      core type               = Intel Atom
      hybrid part                              = true
      core type               = Intel Atom
      hybrid part                              = true
      core type               = Intel Atom
      hybrid part                              = true
      core type               = Intel Atom
      hybrid part                              = true
      core type               = Intel Atom
      hybrid part                              = true
      core type               = Intel Atom
      hybrid part                              = true
      core type               = Intel Atom
      hybrid part                              = true
      core type               = Intel Atom

Note "Core" vs. "Atom". I have hyperthreading enabled, so the two P cores appear as 4.

@jtunhag
Copy link

jtunhag commented Nov 13, 2023

Same here for an i7-13700H (and the taskset -c 0-15 rr record [whatever] workaround works)

$ perf stat -e r5111c4 true
WARNING: event 'N/A' not valid (bits 16,20,22 of config '5111c4' not supported by kernel)!

 Performance counter stats for 'true':

           145 702      cpu_core/r5111c4/                                                     
     <not counted>      cpu_atom/r5111c4/                                                       (0,00%)

       0,001158560 seconds time elapsed

       0,001197000 seconds user
       0,000000000 seconds sys

@rocallahan
Copy link
Collaborator

Not only do we need to hook up the multi-PMU support, but we should also by default try to bind to a P-core

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants