Skip to content

Commit

Permalink
Added vek.Pow() and vek32.Pow(). Added vcl submodule.
Browse files Browse the repository at this point in the history
  • Loading branch information
olivier403 committed Oct 9, 2022
1 parent 97353b1 commit 5bc9dc6
Show file tree
Hide file tree
Showing 17 changed files with 3,800 additions and 734 deletions.
3 changes: 3 additions & 0 deletions .gitmodules
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
[submodule "asm/_cpp/vcl"]
path = asm/_cpp/vcl
url = https://github.com/vectorclass/version2
3 changes: 3 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -227,6 +227,7 @@ func main() {
| vek.Mat4Mul(x, y) | specialization for 4 by 4 matrices |
| **Special** | |
| vek.Sqrt(x) | square root of each element |
| vek.Pow(x, y) | element-wise power |
| vek.Round(x), Floor(x), Ceil(x) | round to nearest, lesser or greater integer |
| **Special (32-bit only)** | |
| vek32.Sin(x) | sine of each element |
Expand Down Expand Up @@ -318,6 +319,8 @@ Times are in nanoseconds. Functions are inplace.
| **vek32.Round** | 1,812 | 102 | 250,035 | 9,722 | 25x |
| **vek.Sqrt** | 1,900 | 614 | 326,998 | 85,986 | 4x |
| **vek32.Sqrt** | 1,704 | 148 | 247,944 | 15,571 | 15x |
| **vek.Pow** | 39,833 | 6,137 | 4,155,465 | 776,556 | 5x |
| **vek32.Pow** | 30,386 | 2,091 | 4,070,793 | 292,980 | 14x |
| **vek32.Exp** | 7,177 | 375 | 1,120,300 | 49,694 | 22x |
| **vek32.Log** | 4,663 | 453 | 1,017,240 | 65,042 | 16x |
| **vek.Max** | 734 | 62 | 43,412 | 7,568 | 6x |
Expand Down
4 changes: 2 additions & 2 deletions asm/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,9 +2,9 @@

All SIMD functions are generated from C++. Most of it is portable, auto-vectorized code. To add a new function:

1. Compile the C++ source code to AT&T assembly with Clang, flags `-Ofast -mfma -mavx2 -funroll-loops -fomit-frame-pointer`
1. Compile the C++ source code to AT&T assembly with Clang, flags `-Ofast -mfma -mavx2 -funroll-loops -fomit-frame-pointer -std=c++17`
2. Convert the output to [avo](https://github.com/mmcloughlin/avo) instructions with `python asm/asm2avo.py --suffix AVX2 input.s --out output.go`
3. Fix potential issues with the output. Data sections need to be added manually and moves changed to unaligned access
4. Add the avo section to `asm/gen.go` and generate Go assembly and stubs with `go generate asm/gen.go`

The C++ code was mostly compiled and analyzed with [godbolt](https://godbolt.org/). Other assembly output may need additional cleanup.
The C++ code was mostly compiled and analyzed with [godbolt](https://godbolt.org/). Other assembly output may need additional cleanup. To compile code that uses vcl run godbolt [locally](https://github.com/compiler-explorer/compiler-explorer) and add `-I/path/to/vcl` to the compiler flags.
Loading

0 comments on commit 5bc9dc6

Please sign in to comment.