Skip to content

Commit

Permalink
[Chapter7] Memory Profiling (#27)
Browse files Browse the repository at this point in the history
* Memory Profiling part1

* Worked a little on memory footprint section

* Working on memory footprint section

* Fixed a few TODOs

* Working on memory footprint section

* Working on memory footprint section

* Finished memory footprint case study

* Finished Stockfish sectoin

* [MemProfiling] fixes after rebase

* [MemProfiling] cosmetics. part1

* [MemProfiling] Wrote about limitations

* [MemProfiling] Cosmetics for Mem usage and footprint of matmul

* working on reuse distance

* described modern tools and gave pointers

* Finished data locality section

* Fixed compile error

* Run through grammarly

* added url for RDX paper

* Added review comments

* Added review comment ++

* Added review comments +++

* Added review comments ++++

* small fix

* Removed section aboud SDE

* Reworked memory usage section

* Working on memory footprint section

* Fixed some TODO comments in the beginning

* Fixed the ending

* Moved it to chapter 7 - overview of tools

* Reworked memory footprint section

* Fixed intro

* Grammarly

* cleanup

---------

Co-authored-by: dbakhval <dbakhval@DBAKHVAL-MOBL>
  • Loading branch information
dendibakh and dbakhval authored Mar 1, 2024
1 parent 6a1b2fd commit fb94fa5
Show file tree
Hide file tree
Showing 15 changed files with 113 additions and 6 deletions.
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
## Case Study: Measuring Code Footprint
## Case Study: Measuring Code Footprint {#sec:CodeFootprint}

[TODO]: define hot and non-cold code; maybe get rid of non-cold; also there is warm and cold code with a threshold.

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -130,6 +130,8 @@ In PEBS, the feature that allows this to happen is called Data Address Profiling

With the IBS Execute and ARM SPE sampling, you can also do in-depth analysis of memory accesses performed by an application. One approach is to dump collected samples and process them manually. IBS saves the exact linear address, its latency, where the access was served from (cache or DRAM), and whether it hit or missed in the DTLB. SPE can be used to estimate latency and bandwidht of the memory subsystem components, estimate memory latencies of individual loads/stores, and more.

[TODO]: refer to section that shows using `perf mem` in Chapter 8 as an example of using PEBS/IBS/SPE features.

One of the most important use cases for these extensions is detecting True and False Sharing, which we will discuss in [@sec:TrueFalseSharing]. The Linux `perf c2c` tool heavily relies on all three mechanisms (PEBS, IBS and SPE) to find contested memory accesses, which could experience True/False sharing: it matches load/store addresses for different threads and checks if the hit occurrs in a cache line modified by other threads.

[^1]: PEBS grabber tool - [https://github.com/andikleen/pmu-tools/tree/master/pebs-grabber](https://github.com/andikleen/pmu-tools/tree/master/pebs-grabber). Requires root access.
Expand Down

Large diffs are not rendered by default.

Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,8 @@ typora-root-url: ..\..\img

# Optimizing Memory Accesses {#sec:MemBound}

[TODO]: maybe add example of using `perf mem`.

Modern computers are still being built based on the classical Von Neumann architecture with decouples CPU, memory and input/output units. Operations with memory (loads and stores) account for the largest portion of performance bottlenecks and power consumption. It is no surprise that we start with this category first.

The statement that the memory hierarchy performance is very important is backed by Figure @fig:CpuMemGap. It shows the growth of the gap in performance between memory and processors. The vertical axis is on a logarithmic scale and shows the growth of the CPU-DRAM performance gap. The memory baseline is the latency of memory access of 64 KB DRAM chips from 1980. Typical DRAM performance improvement is 7% per year, while CPUs enjoy 20-50% improvement per year.[@Hennessy]
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -33,6 +33,8 @@ Additionally, choose the data storage, bearing in mind what the code will do wit

### Packing the Data.

[TODO]: include example of using data-type profiling (https://lwn.net/Articles/955709/).

Memory hierarchy utilization can be improved by making the data more compact. There are many ways to pack data. One of the classic examples is to use bitfields. An example of code when packing data might be profitable is shown on [@lst:PackingData1]. If we know that `a`, `b`, and `c` represent enum values which take a certain number of bits to encode, we can reduce the storage of the struct `S` (see [@lst:PackingData2]).

Listing: Packing Data: baseline struct.
Expand Down
5 changes: 0 additions & 5 deletions chapters/8-Optimizing-Memory-Accesses/8-3 Memory Profiling.md

This file was deleted.

Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added img/memory-access-opts/MemoryUsageAIBench.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added img/memory-access-opts/StockfishSummary.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added img/memory-access-opts/Stockfish_allocations.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added img/memory-access-opts/Stockfish_consumed.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added img/memory-access-opts/Stockfish_flamegraph.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.

0 comments on commit fb94fa5

Please sign in to comment.