vek
is a collection of SIMD accelerated vector functions for Go.
Most modern CPUs have special SIMD instructions (Single Instruction, Multiple Data) to
process data in parallel, but there is currently no way to use them in a pure Go program.
vek
implements a large number of common vector operations in SIMD accelerated assembly
code and wraps them in a simple Go API. vek
supports most modern x86 CPUs and falls
back to a pure Go implementation on unsupported platforms.
- Fast, average speedups of 10x for
float32
vectors - Fallback to pure Go on unsupported platforms
- Support for
float64
,float32
andbool
vectors - Zero allocation variations of each function
go get -u github.com/viterin/vek
Vectors are represented as plain old floating point slices, there are no special data
types in vek
. All operations on float64
vectors reside in the vek
package. It contains
all the basic arithmetic operations:
package main
import (
"fmt"
"github.com/viterin/vek"
)
func main() {
x := []float64{0, 1, 2, 3, 4}
// Multiply a vector by itself element-wise
y := vek.Mul(x, x)
fmt.Println(x, y) // [0 1 2 3 4] [0 1 4 9 16]
// Multiply each element by a number
y = vek.MulNumber(x, 2)
fmt.Println(x, y) // [0 1 2 3 4] [0 2 4 6 8]
}
The vek32
package contains float32
versions of each operation:
package main
import (
"fmt"
"github.com/viterin/vek/vek32"
)
func main() {
// Add a float32 number to each element
x := []float32{0, 1, 2, 3, 4}
y := vek32.AddNumber(x, 2)
fmt.Println(x, y) // [0 1 2 3 4] [2 3 4 5 6]
}
Floating point vectors can be compared to other vectors or numbers. The result is a bool
vector
indicating where the comparison holds true. bool
vectors can be used to select matching elements,
count matches and more:
package main
import (
"fmt"
"github.com/viterin/vek"
)
func main() {
x := []float64{0, 1, 2, 3, 4, 5}
y := []float64{5, 4, 3, 2, 1, 0}
// []bool indicating where x < y (less than)
m := vek.Lt(x, y)
fmt.Println(m) // [true true true false false false]
fmt.Println(vek.Count(m)) // 3
// []bool indicating where x >= 2 (greater than or equal)
m = vek.GteNumber(x, 2)
fmt.Println(m) // [false false true true true true]
fmt.Println(vek.Any(m)) // true
// Selection of non-zero elements less than y
z := vek.Select(x,
vek.And(
vek.Lt(x, y),
vek.NeqNumber(x, 0),
),
)
fmt.Println(z) // [1 2]
}
vek
has a number of functions to construct new vectors and convert between vector types efficiently:
package main
import (
"fmt"
"github.com/viterin/vek"
"github.com/viterin/vek/vek32"
)
func main() {
// Vector with number repeated n times
x := vek.Repeat(2, 5)
fmt.Println(x) // [2 2 2 2 2]
// Vector ranging from a to b (excl.) in steps of 1
x = vek.Range(-2, 3)
fmt.Println(x) // [-2 -1 0 1 2]
// Conversion from float64 to int32
xi32 := vek.ToInt32(x)
fmt.Println(xi32) // [-2 -1 0 1 2]
// Conversion from int32 to float32
x32 := vek32.FromInt32(xi32)
fmt.Println(x32) // [-2 -1 0 1 2]
}
By default, functions allocate a new array to store the result. Append _Inplace
to a function to do the operation inplace, overriding the data of the first
argument slice with the result. Append _Into
to write the result into a target
slice.
package main
import (
"fmt"
"github.com/viterin/vek"
)
func main() {
x := []float64{0, 1, 2, 3, 4}
vek.AddNumber_Inplace(x, 2)
y := make([]float64, len(x))
vek.AddNumber_Into(y, x, 2)
fmt.Println(x, y) // [2 3 4 5 6] [4 5 6 7 8]
}
SIMD Acceleration is enabled by default on supported platforms, which is any x86/amd64 CPU with
the AVX2 and FMA extensions. Use vek.Info()
to see if hardware acceleration is enabled. Turn
it off or on with vek.SetAcceleration()
. Acceleration is currently disabled by default on
mac as I have no machine to test it on.
package main
import (
"fmt"
"github.com/viterin/vek"
)
func main() {
fmt.Printf("%+v", vek.Info())
// {CPUArchitecture:amd64 CPUFeatures:[AVX2 FMA ..] Acceleration:true}
}
description | |
---|---|
Arithmetic | |
vek.Add(x, y) | element-wise addition |
vek.AddNumber(x, a) | add number to each element |
vek.Sub(x, y) | element-wise subtraction |
vek.SubNumber(x, a) | subtract number from each element |
vek.Mul(x, y) | element-wise multiplication |
vek.MulNumber(x, a) | multiply each element by number |
vek.Div(x, y) | element-wise division |
vek.DivNumber(x, a) | divide each element by number |
vek.Abs(x) | absolute values |
vek.Neg(x) | additive inverses |
vek.Inv(x) | multiplicative inverses |
Aggregates | |
vek.Sum(x) | sum of elements |
vek.CumSum(x) | cumulative sum |
vek.Prod(x) | product of elements |
vek.CumProd(x) | cumulative product |
vek.Mean(x) | mean |
vek.Median(x) | median |
vek.Quantile(x, q) | q-th quantile, 0 <= q <= 1 |
Distance | |
vek.Dot(x, y) | dot product |
vek.Norm(x) | euclidean norm (length) |
vek.Distance(x, y) | euclidean distance |
vek.ManhattanNorm(x) | sum of absolute values |
vek.ManhattanDistance(x, y) | sum of absolute differences |
vek.CosineSimilarity(x, y) | cosine similarity |
Matrices | |
vek.MatMul(x, y, n) | multiply m-by-n and n-by-p matrix (row-major) |
vek.Mat4Mul(x, y) | specialization for 4 by 4 matrices |
Special | |
vek.Sqrt(x) | square root of each element |
vek.Pow(x, y) | element-wise power |
vek.Round(x), Floor(x), Ceil(x) | round to nearest, lesser or greater integer |
Special (32-bit only) | |
vek32.Sin(x) | sine of each element |
vek32.Cos(x) | cosine of each element |
vek32.Exp(x) | exponential function |
vek32.Log(x), Log2(x), Log10(x) | natural, base 2 and base 10 logarithms |
Comparison | |
vek.Min(x) | minimum value |
vek.ArgMin(x) | first index of the minimum value |
vek.Minimum(x, y) | element-wise minimum values |
vek.MinimumNumber(x, a) | minimum of each element and number |
vek.Max(x) | maximum value |
vek.ArgMax(x) | first index of the maximum value |
vek.Maximum(x, y) | element-wise maximum values |
vek.MaximumNumber(x, a) | maximum of each element and number |
vek.Find(x, a) | first index of number, -1 if not found |
vek.Lt(x, y) | element-wise less than |
vek.LtNumber(x, a) | less than number |
vek.Lte(x, y) | element-wise less than or equal |
vek.LteNumber(x, a) | less than or equal to number |
vek.Gt(x, y) | element-wise greater than |
vek.GtNumber(x, a) | greater than number |
vek.Gte(x, y) | element-wise greater than or equal |
vek.GteNumber(x, a) | greater than or equal to number |
vek.Eq(x, y) | element-wise equality |
vek.EqNumber(x, a) | equal to number |
vek.Neq(x, y) | element-wise non-equality |
vek.NeqNumber(x, a) | not equal to number |
Boolean | |
vek.Not(x) | element-wise not |
vek.And(x, y) | element-wise and |
vek.Or(x, y) | element-wise or |
vek.Xor(x, y) | element-wise exclusive or |
vek.Select(x, y) | select elements using boolean vector |
vek.All(x) | all bools are true |
vek.Any(x) | at least one bool is true |
vek.None(x) | none of the bools are true |
vek.Count(x) | number of true bools |
Construction | |
vek.Zeros(n) | vector of zeros |
vek.Ones(n) | vector of ones |
vek.Repeat(a, n) | vector with number repeated |
vek.Range(a, b) | vector from a to b (excl.) in steps of 1 |
vek.Gather(x, idx) | select elements at given indices |
vek.Scatter(x, idx, size) | create vector with indices set to values |
vek.FromBool(x), FromInt64, .. | convert slice to floats |
vek.ToBool(x), ToInt64, .. | convert floats to other type |
vek32.xxx( .. )
The vek32
package contains identical functions for float32
vectors, e.g. vek32.Add(x, y)
.
vek.xxx_Inplace( .. )
Append _Inplace
to the function name to mutate the argument vector inplace, e.g.
vek.Add_Inplace(x, y)
. The first argument is the destination. It should not overlap
other argument slices.
vek.xxx_Into( dst, .. )
Append _Into
to the function name to write the result into a target slice, e.g.
vek.Add_Into(dst, x, y)
. The destination should have sufficient
capacity to hold the result, its length can be anything. It should
not overlap other argument slices. The return value is the destination slice resized
to the length of the result.
Comparison of SIMD accelerated functions to the pure Go fallback version for different size slices. Times are in nanoseconds. Functions are inplace.
go test -benchmem -timeout 0 -run=^# -bench=. ./internal/...
1k, Go | 1k, SIMD | 100k, Go | 100k, SIMD | speedup | |
---|---|---|---|---|---|
vek.Add | 484 | 192 | 57,544 | 26,431 | 2x |
vek32.Add | 610 | 116 | 84,870 | 13,164 | 6x |
vek.Mul | 499 | 186 | 58,154 | 26,955 | 2x |
vek32.Mul | 607 | 126 | 83,486 | 13,056 | 6x |
vek.Abs | 794 | 123 | 120,018 | 19,680 | 6x |
vek32.Abs | 736 | 82 | 113,446 | 7,990 | 14x |
vek.Sum | 633 | 39 | 64,824 | 6,859 | 9x |
vek32.Sum | 631 | 20 | 65,007 | 3,191 | 20x |
vek.Quantile | 3,375 | 3,075 | 860,382 | 816,831 | 1x |
vek32.Quantile | 3,367 | 3,040 | 751,790 | 698,111 | 1x |
vek.Round | 1,485 | 161 | 250,316 | 21,622 | 11x |
vek32.Round | 1,812 | 102 | 250,035 | 9,722 | 25x |
vek.Sqrt | 1,900 | 614 | 326,998 | 85,986 | 4x |
vek32.Sqrt | 1,704 | 148 | 247,944 | 15,571 | 15x |
vek.Pow | 39,833 | 6,137 | 4,155,465 | 776,556 | 5x |
vek32.Pow | 30,386 | 2,091 | 4,070,793 | 292,980 | 14x |
vek32.Exp | 7,177 | 375 | 1,120,300 | 49,694 | 22x |
vek32.Log | 4,663 | 453 | 1,017,240 | 65,042 | 16x |
vek.Max | 734 | 62 | 43,412 | 7,568 | 6x |
vek32.Max | 731 | 27 | 44,349 | 3,484 | 13x |
vek.Maximum | 1,000 | 517 | 542,944 | 66,423 | 8x |
vek32.Maximum | 873 | 499 | 556,103 | 66,786 | 8x |
vek.Find | 294 | 77 | 21,989 | 7,256 | 3x |
vek32.Find | 223 | 35 | 21,813 | 3,010 | 7x |
vek.Lt | 543 | 195 | 64,136 | 23,548 | 3x |
vek32.Lt | 539 | 130 | 62,449 | 13,188 | 5x |
vek.And | 1,172 | 60 | 373,077 | 2,683 | 139x |
vek.All | 237 | 11 | 21,696 | 738 | 29x |
vek.Range | 647 | 59 | 65,403 | 7,889 | 8x |
vek32.Range | 633 | 32 | 65,155 | 3,252 | 20x |
vek.FromInt32 | 335 | 56 | 33,410 | 11,428 | 3x |
vek32.FromInt32 | 439 | 29 | 44,372 | 7,423 | 6x |
m=1k,n=1k,p=1, Go | m=1k,n=1k,p=1, SIMD | p=1k, Go | p=1k, SIMD | speedup | |
---|---|---|---|---|---|
vek.MatMul | 258,418 | 38,835 | 152,726,512 | 20,823,962 | 7x |
vek32.MatMul | 256,453 | 28,403 | 147,474,083 | 10,479,834 | 14x |
m=4,n=4,p=4, Go | m=4,n=4,p=4, SIMD | ||||
vek.Mat4Mul | 26 | 5 | 5x | ||
vek32.Mat4Mul | 26 | 5 | 5x |