Skip to content

Commit

Permalink
[Grammar] Update 10-0 Optimizing bad speculation.md
Browse files Browse the repository at this point in the history
  • Loading branch information
dendibakh authored Sep 22, 2024
1 parent fdd9863 commit fa93464
Showing 1 changed file with 2 additions and 2 deletions.
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@ So far we've been talking about optimizing memory accesses and computations. How

In general, modern processors are very good at predicting branch outcomes. They not only follow static prediction rules but also detect dynamic patterns. Usually, branch predictors save the history of previous outcomes for the branches and try to guess what the next result will be. However, when the pattern becomes hard for the CPU branch predictor to follow, it may hurt performance.

Mispredicting a branch can add a significant speed penalty when it happens regularly. When such an event happens, a CPU is required to clear all the speculative work that was done ahead of time and later was proven to be wrong. It also needs to flush the pipeline and start filling it with instructions from the correct path. Typically, modern CPUs experience from 10 to 20 cycles penalty as a result of a branch misprediction. The exact number of cycles depends on the microarchitecture design, namely, on the depth of the pipeline and the mechanism used to recover from the mispredicts.
Mispredicting a branch can add a significant speed penalty when it happens regularly. When such an event occurs, a CPU is required to clear all the speculative work that was done ahead of time and later was proven to be wrong. It also needs to flush the pipeline and start filling it with instructions from the correct path. Typically, modern CPUs experience 10 to 20-cycle penalties as a result of a branch misprediction. The exact number of cycles depends on the microarchitecture design, namely, on the depth of the pipeline and the mechanism used to recover from the mispredicts.

Branch predictors use caches and history registers and therefore are susceptible to the issues related to caches, namely three C's:

Expand All @@ -18,7 +18,7 @@ A program will always experience a non-zero number of branch mispredictions. You

In the past, developers had an option of providing a prediction hint to an x86 processor in the form of an encoding prefix to the branch instruction (`0x2E: Branch Not Taken`, `0x3E: Branch Taken`). This could potentially improve performance on older microarchitectures, like Pentium 4. However, modern x86 processors used to ignore those hints until Intel's RedwoodCove started using it again. Its branch predictor is still good at finding dynamic patterns, but now it will use the encoded prediction hint for branches that have never been seen before (i.e. when there is no stored information about a branch). [@IntelOptimizationManual, Section 2.1.1.1 Branch Hint]

There are indirect ways to reduce the branch misprediction rate by reducing the dynamic number of branch instructions. This approach helps because it alleviates the pressure on branch predictor structures. Compiler transformations such as loop unrolling and vectorization help in reducing the dynamic branch count, though they don't specifically aim at improving the prediction rate of any given conditional statement. Progile-Guided Optimizations (PGO) and post-link optimizers (e.g., BOLT) are also effective at reducing branch mispredictions thanks to improving fallthrough rate (straighten the code). We will discuss those techniques in the next chapter.[^1]
There are indirect ways to reduce the branch misprediction rate by reducing the dynamic number of branch instructions. This approach helps because it alleviates the pressure on branch predictor structures. Compiler transformations such as loop unrolling and vectorization help in reducing the dynamic branch count, though they don't specifically aim at improving the prediction rate of any given conditional statement. Profile-Guided Optimizations (PGO) and post-link optimizers (e.g., BOLT) are also effective at reducing branch mispredictions thanks to improving the fallthrough rate (straightening the code). We will discuss those techniques in the next chapter.[^1]

So perhaps the only direct way to get rid of branch mispredictions is to get rid of the branch itself. In subsequent sections, we will take a look at how branches can be replaced with lookup tables, arithmetic, and selection.

Expand Down

0 comments on commit fa93464

Please sign in to comment.