You have a Haskell program that's not performing how you'd like. Use this list to check that you've done the usual steps to performance nirvana:
✓ Are you compiling with -Wall?
✓ Are you compiling with -O
or above?
✓ Have you run your code with the profiler?
✓ Have you checked for stack space leaks?
✓ Have you setup an isolated benchmark?
✓ Have you looked at strictness of your function arguments?
✓ Are you using the right data structure?
✓ Are your data types strict and/or unpacked?
✓ Did you check your code isn't too polymorphic?
✓ Do you have an explicit export list?
✓ Have you looked at the Core?
✓ Have you considered unboxed arrays/strefs/etc?
✓ Are you using Text or ByteString instead of String?
✓ Have you considered compiling with LLVM?
Running code in GHCi's interpreter will always be much slower than compiling it to a binary.
Make sure you're compiling your code with ghc
.
GHC warns about type defaults and missing type signatures:
- If you let GHC default integers, it will choose
Integer
. This is 10x slower thanInt
. So make sure you explicitly choose your types. - You should have explicit types to not miss something obvious in the types that is slow.
By default GHC does not optimize your programs. Cabal and Stack enable
this in the build process. If you're calling ghc
directly, don't
forget to add -O
.
Enable -O2
for serious, non-dangerous optimizations.
Profiling is the standard way to see for expressions in your program:
- How many times they run?
- How much do they allocate?
Resources on profiling:
Check that your operations aren't allocating too much or more than you'd expect:
https://github.com/fpco/weigh#readme
Allocating in GC is claimed to be "fast" but not allocating is always faster.
Most space leaks result in an excess use of stack. If you look for the part of the program that results in the largest stack usage, that is the most likely space leak, and the one that should be investigated first.
Resource on stack space leak:
Benchmarking is a tricky business to get right, especially when timing things at a smaller scale. Haskell is lucky to have a very good benchmarking package. If you are asking someone for help, you are helping them by providing benchmarks, and they are likely to ask for them.
Do it right and use Criterion.
Resources on Criterion:
https://wiki.haskell.org/Performance/Strictness
This GitHub organization provides comparative benchmarks against a few types of data structures. You can often use this to determine which data structure is best for your problem:
- sets - for set-like things
- dictionaries - dictionaries, hashmaps, maps, etc.
- sequences - lists, vectors/arrays, sequences, etc.
Tip: Lists are almost always the wrong data structure. But sometimes they are the right one.
See also HaskellWiki on data structures.
By default, Haskell fields are lazy and boxed. Making them strict can often (not always) give them more predictable performance, and unboxed fields (such as integers) do not require pointer indirection.
Resources on data type strictness:
Code which is type-class-polymorphic, such as,
genericLength :: Num n => [a] -> n
has to accept an additional dictionary argument for which class
instance you want to use for Num
. That can make things slower.
Resources on overloading:
This is a suggestion from the HaskellWiki, but I believe it's based on out of date information about how GHC does inlining. It's left here for interested parties, however.
https://wiki.haskell.org/Performance/Modules
Haskell compiles down to a small language, Core, which represents the real code generated before assembly. This is where many optimization passes take place.
Resources on core:
An array with boxed elements such as Data.Vector.Vector a
means each
element is a pointer to the value, instead of containing the values
inline.
Use an unboxed vector where you can (integers and atomic types like that) to avoid the pointer indirection. The vector may be stored and accessed in the CPU cache, avoiding mainline memory altogether.
Likewise, a mutable container like IORef
or STRef
both contain a
pointer rather than the value. Use URef
for an unboxed version.
The String
type is slow for these reasons:
- It's a linked list, meaning access is linear.
- It's not a packed representation, so each character is a separate structure with a pointer to the next. It requires access to mainline memory.
- It allocates a lot more memory than packed representations.
Resources on string types:
Case studies: