Skip to content

Commit

Permalink
Added review comments +++
Browse files Browse the repository at this point in the history
  • Loading branch information
dendibakh committed Feb 21, 2024
1 parent 5578f01 commit 0d1a84c
Show file tree
Hide file tree
Showing 4 changed files with 8 additions and 0 deletions.
Original file line number Diff line number Diff line change
Expand Up @@ -124,6 +124,8 @@ In PEBS, the feature that allows this to happen is called Data Address Profiling

With the IBS Execute and ARM SPE sampling, you can also do in-depth analysis of memory accesses performed by an application. One approach is to dump collected samples and process them manually. IBS saves the exact linear address, its latency, where the access was served from (cache or DRAM), and whether it hit or missed in the DTLB. SPE can be used to estimate latency and bandwidht of the memory subsystem components, estimate memory latencies of individual loads/stores, and more.

[TODO]: refer to section that shows using `perf mem` in Chapter 8 as an example of using PEBS/IBS/SPE features.

One of the most important use cases for these extensions is detecting True and False Sharing, which we will discuss in [@sec:TrueFalseSharing]. The Linux `perf c2c` tool heavily relies on all three mechanisms (PEBS, IBS and SPE) to find contested memory accesses, which could experience True/False sharing: it matches load/store addresses for different threads and checks if the hit occurrs in a cache line modified by other threads.

[^1]: PEBS grabber tool - [https://github.com/andikleen/pmu-tools/tree/master/pebs-grabber](https://github.com/andikleen/pmu-tools/tree/master/pebs-grabber). Requires root access.
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,8 @@ typora-root-url: ..\..\img

# Optimizing Memory Accesses {#sec:MemBound}

[TODO]: maybe add example of using `perf mem`.

Modern computers are still being built based on the classical Von Neumann architecture with decouples CPU, memory and input/output units. Operations with memory (loads and stores) account for the largest portion of performance bottlenecks and power consumption. It is no surprise that we start with this category first.

The statement that the memory hierarchy performance is very important is backed by Figure @fig:CpuMemGap. It shows the growth of the gap in performance between memory and processors. The vertical axis is on a logarithmic scale and shows the growth of the CPU-DRAM performance gap. The memory baseline is the latency of memory access of 64 KB DRAM chips from 1980. Typical DRAM performance improvement is 7% per year, while CPUs enjoy 20-50% improvement per year.[@Hennessy]
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -33,6 +33,8 @@ Additionally, choose the data storage, bearing in mind what the code will do wit

### Packing the Data.

[TODO]: include example of using data-type profiling (https://lwn.net/Articles/955709/).

Memory hierarchy utilization can be improved by making the data more compact. There are many ways to pack data. One of the classic examples is to use bitfields. An example of code when packing data might be profitable is shown on [@lst:PackingData1]. If we know that `a`, `b`, and `c` represent enum values which take a certain number of bits to encode, we can reduce the storage of the struct `S` (see [@lst:PackingData2]).

Listing: Packing Data: baseline struct.
Expand Down
2 changes: 2 additions & 0 deletions chapters/8-Optimizing-Memory-Accesses/8-5 Memory Profiling.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,7 @@
## Memory Profiling {#sec:MemoryProfiling}

[TODO]: maybe rename the section to avoid confusion. This section discusses how to measure memory usage and memory footprint, which is application-level memory profiling. But using `perf mem` is also can be called "memory profiling", so maybe I should rename this section as "Memory Usage and Footprint" or split it into two level-2-header sections.

So far in this chapter, we have discussed a few techniques to optimize memory accesses in a particular piece of code. In this section, we will learn how to collect high-level information about a program's interaction with memory. This process is usually called *memory profiling*. Memory profiling helps you understand how an application uses memory over time and helps you build the right mental model of a program's behavior. Here are some questions it can answer:

* What is a program's total memory consumption and how it changes over time?
Expand Down

0 comments on commit 0d1a84c

Please sign in to comment.