Added vek.Pow() and vek32.Pow(). Added vcl submodule.

viterin · Oct 9, 2022 · 5bc9dc6 · 5bc9dc6
1 parent 97353b1
commit 5bc9dc6
Show file tree

Hide file tree

Showing 17 changed files with 3,800 additions and 734 deletions.
diff --git a/.gitmodules b/.gitmodules
@@ -0,0 +1,3 @@
+[submodule "asm/_cpp/vcl"]
+	path = asm/_cpp/vcl
+	url = https://github.com/vectorclass/version2
diff --git a/README.md b/README.md
@@ -227,6 +227,7 @@ func main() {
 | vek.Mat4Mul(x, y)               |            specialization for 4 by 4 matrices |
 | **Special**                     |                                               |
 | vek.Sqrt(x)                     |                   square root of each element |
+| vek.Pow(x, y)                   |                            element-wise power |
 | vek.Round(x), Floor(x), Ceil(x) |   round to nearest, lesser or greater integer |
 | **Special (32-bit only)**       |                                               |
 | vek32.Sin(x)                    |                          sine of each element |
@@ -318,6 +319,8 @@ Times are in nanoseconds. Functions are inplace.
 | **vek32.Round**     |      1,812 |          102 |      250,035 |          9,722 |         25x |
 | **vek.Sqrt**        |      1,900 |          614 |      326,998 |         85,986 |          4x |
 | **vek32.Sqrt**      |      1,704 |          148 |      247,944 |         15,571 |         15x |
+| **vek.Pow**         |     39,833 |        6,137 |    4,155,465 |        776,556 |          5x |
+| **vek32.Pow**       |     30,386 |        2,091 |    4,070,793 |        292,980 |         14x |
 | **vek32.Exp**       |      7,177 |          375 |    1,120,300 |         49,694 |         22x |
 | **vek32.Log**       |      4,663 |          453 |    1,017,240 |         65,042 |         16x |
 | **vek.Max**         |        734 |           62 |       43,412 |          7,568 |          6x |

diff --git a/asm/README.md b/asm/README.md
@@ -2,9 +2,9 @@
 
 All SIMD functions are generated from C++. Most of it is portable, auto-vectorized code. To add a new function:
 
-1. Compile the C++ source code to AT&T assembly with Clang, flags `-Ofast -mfma -mavx2 -funroll-loops -fomit-frame-pointer`
+1. Compile the C++ source code to AT&T assembly with Clang, flags `-Ofast -mfma -mavx2 -funroll-loops -fomit-frame-pointer -std=c++17`
 2. Convert the output to [avo](https://github.com/mmcloughlin/avo) instructions with `python asm/asm2avo.py --suffix AVX2 input.s --out output.go`
 3. Fix potential issues with the output. Data sections need to be added manually and moves changed to unaligned access
 4. Add the avo section to `asm/gen.go` and generate Go assembly and stubs with `go generate asm/gen.go`
 
-The C++ code was mostly compiled and analyzed with [godbolt](https://godbolt.org/). Other assembly output may need additional cleanup.
+The C++ code was mostly compiled and analyzed with [godbolt](https://godbolt.org/). Other assembly output may need additional cleanup. To compile code that uses vcl run godbolt [locally](https://github.com/compiler-explorer/compiler-explorer) and add `-I/path/to/vcl` to the compiler flags.