Skip to content

Commit

Permalink
3-9: monotype 'perf'
Browse files Browse the repository at this point in the history
  • Loading branch information
dankamongmen committed Sep 9, 2024
1 parent 22e9133 commit 568fb19
Showing 1 changed file with 2 additions and 2 deletions.
4 changes: 2 additions & 2 deletions chapters/3-CPU-Microarchitecture/3-9 PMU.md
Original file line number Diff line number Diff line change
Expand Up @@ -30,7 +30,7 @@ If we imagine a simplified view of a processor, it may look something like what

![Simplified view of a CPU with a performance monitoring counter.](../../img/uarch/PMC.png){#fig:PMC width=60%}

Typically, PMCs are 48-bit wide, which enables analysis tools to run for a long time without interrupting a program's execution.[^2] Performance counter is a hardware register implemented as a Model-Specific Register (MSR). That means the number of counters and their width can vary from model to model, and you cannot rely on the same number of counters in your CPU. You should always query that first, using tools like `cpuid`, for example. PMCs are accessible via the `RDMSR` and `WRMSR` instructions, which can only be executed from kernel space. Luckily, you only have to care about this if you're a developer of a performance analysis tool, like Linux perf or Intel VTune profiler. Those tools handle all the complexity of programming PMCs.
Typically, PMCs are 48-bit wide, which enables analysis tools to run for a long time without interrupting a program's execution.[^2] Performance counter is a hardware register implemented as a Model-Specific Register (MSR). That means the number of counters and their width can vary from model to model, and you cannot rely on the same number of counters in your CPU. You should always query that first, using tools like `cpuid`, for example. PMCs are accessible via the `RDMSR` and `WRMSR` instructions, which can only be executed from kernel space. Luckily, you only have to care about this if you're a developer of a performance analysis tool, like Linux `perf` or Intel VTune profiler. Those tools handle all the complexity of programming PMCs.

When engineers analyze their applications, it is common for them to collect the number of executed instructions and elapsed cycles. That is the reason why some PMUs have dedicated PMCs for collecting such events. Fixed counters always measure the same thing inside the CPU core. With programmable counters, it's up to the user to choose what they want to measure.

Expand All @@ -39,7 +39,7 @@ For example, in the Intel Skylake architecture (PMU version 4, see [@lst:QueryPM
It's not unusual for the PMU to provide more than one hundred events available for monitoring. Figure @fig:PMU shows just a small subset of the performance monitoring events available for monitoring on a modern Intel CPU. It's not hard to notice that the number of available PMCs is much smaller than the number of performance events. It's not possible to count all the events at the same time, but analysis tools solve this problem by multiplexing between groups of performance events during the execution of a program (see [@sec:secMultiplex]).

- For Intel CPUs, the complete list of performance events can be found in [@IntelOptimizationManual, Volume 3B, Chapter 20] or at [perfmon-events.intel.com](https://perfmon-events.intel.com/).
- ADM doesn't publish a list of performance monitoring events for every AMD processor. Curious readers may find some information in the Linux perf source [code](https://github.com/torvalds/linux/blob/master/arch/x86/events/amd/core.c)[^3]. Also, you can list performance events available for monitoring using the AMD uProf command line tool. General information about AMD performance counters can be found in [@AMDProgrammingManual, 13.2 Performance Monitoring Counters].
- ADM doesn't publish a list of performance monitoring events for every AMD processor. Curious readers may find some information in the Linux `perf` source [code](https://github.com/torvalds/linux/blob/master/arch/x86/events/amd/core.c)[^3]. Also, you can list performance events available for monitoring using the AMD uProf command line tool. General information about AMD performance counters can be found in [@AMDProgrammingManual, 13.2 Performance Monitoring Counters].
- For ARM chips, performance events are not so well defined. Vendors implement cores following an ARM architecture, but performance events vary widely, both in what they mean and what events are supported. For the ARM Neoverse V1 processor, that ARM designs themselves, the list of performance events can be found in [@ARMNeoverseV1].

[^2]: When the value of PMCs overflows, the execution of a program must be interrupted. The profiling tool then should save the fact of an overflow. We will discuss it in more detail in [@sec:sec_PerfApproaches].
Expand Down

0 comments on commit 568fb19

Please sign in to comment.