Skip to content

Releases: dendibakh/perf-book

Q2.2024

26 Jun 16:39
Compare
Choose a tag to compare

New Content

  • Chapter 13 "Optimizing Multithreaded Applications" - major rewrite of the chapter. I added new sections about Thread Count Scalability, Task Scheduling, and updated the remaining parts of this chapter.
  • Chapter 8 "Optimizing Memory Accesses" - added a section about field reordering in data structures. (5e80a16)
  • More proofreading fixes throughout the book.
  • I'm currently working on the last big piece of content for the second edition. This will be a section in chapter 12 titled "CPU-specific optimizations", where I will touch on some aspects of optimizing for a specific platform. It covers topics such as ISA extensions, instruction latencies and throughput, and some common microarchitecture-specific issues. (#56)

Pull requests:

@pveentjer #44 #45 #47 #51
@cf-natali #46 #48

Full Changelog: Q1.2024...Q2.2024

Q1.2024

02 Apr 14:47
Compare
Choose a tag to compare

New Content

  • A case study about L3 cache sensitivity, contributed by @chusAB (chapter 12, pull request #39). It shows how you can determine whether an application is sensitive to the size of the last-level cache (LLC). Using this information, you can make educated decisions when buying HW components for your computing systems. Similarly, you can later determine sensitivity to other factors, such as memory bandwidth, core count, and processor frequency.
  • I wrote a section about how to measure hot code footprint (chapter 11, commit 2183eda). Applications with large amounts of hot code usually cause pressure on the CPU front end (I-cache and TLBs). Knowing how many cache lines/pages of a program code are hot can be an additional argument for investing time into machine code layout optimizations. Thanks to @aaupov for his review and comments.
  • I wrote a new section about memory profiling. It discusses how to measure memory usage (VSZ and RSS), how to analyze heap allocations and more. (chapter 7, pull request #27)
  • I've made some big updates to chapter 8. "Optimizing Memory Accesses". I wrote about some new data structure reorganization techniques that were not present in the first edition. Also, I improved two sections about dynamic memory allocation and what to do when you hit memory bandwidth limitation.
  • I have fixed ~10 TODOs. There are still ~60 items left.
  • I fixed many proofreading comments (thanks to Ciaran).

Full Changelog: Q4.2023...Q1.2024

Q4.2023

22 Dec 19:09
Compare
Choose a tag to compare

Release notes:

  1. Low-latency techniques (#33, authored by Mark Dawson)
  2. New section about data dependency chains
  3. Updated the chapter about Front-End bound optimizations (now with pretty images), expanded the section about PGO and BOLT (thanks to @aaupov).
  4. Major update of the PMU chapter. Including performance monitoring features of AMD and ARM-based processors. (WIP)
  5. A LOT of proofreading comments (thanks to Ciaran).

Q3.2023

01 Oct 00:32
Compare
Choose a tag to compare

Release notes:

  1. PRs merged:
  2. Finished chapter 7 "Overview Of Performance Analysis Tools" (+AMD uProf, +Xcode Instruments, +flamegraphs)
  3. Many changes in chapters 1-5
    • Chapters 1 and 2: mostly cosmetics
    • Chapter 3: TLB hierarchy, store optimizations
    • Chapter 4: major updates for sections about UOPs, IPC, pipeline slots
    • Chapter 5: many updates for sections on sampling, static performance analysis (+UICA), and compiler opt reports.
  4. Updated a section about FP subnormals

Q2.2023

29 Jun 15:34
Compare
Choose a tag to compare

Release notes:

  1. Two PRs merged:
  2. Major update to the chapter 3 CPU-Microarchitecture
    • DRAM rank, channels, interleaving
    • Multicore, SMT, and Hybrid CPUs.
    • Branch prediction section.
    • Updated section "Modern CPU design" (Skylake -> Goldencove), deep dive into CPU Front-End, Back-End, Load-Store unit, and TLB hierarchy.
  3. Added questions and exercises throughout the book.
  4. Major rewrite of section 6.1 Code Instrumentation.
  5. Updated intro for the second part (section 9.0).
  6. Split previously chapter 9 BackendBound into two: chapter 9 MemoryBound and chapter 10 CoreBound.

Q1 2023 release

03 Apr 23:14
Compare
Choose a tag to compare

The book has several updated chapters and sections:

  • performance metrics (major update): secondary metrics, memory latency and bandwidth, case study
  • overview of performance analysis tools (new chapter, WIP)
  • huge pages (several places throughout the book)
  • sw memory prefetching
  • unroll and jam
  • draft of the cover image

First edition of the book

05 Dec 16:18
3073e6d
Compare
Choose a tag to compare

This is the first edition of the book. Published on November 2020.
PDF can also be found on this page: https://book.easyperf.net/perf_book.