Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add routines for arithmetic operations with mixed real and complex types #3

Merged
merged 5 commits into from
Apr 3, 2023

Conversation

heltonmc
Copy link
Owner

@heltonmc heltonmc commented Apr 3, 2023

No description provided.

@codecov-commenter
Copy link

codecov-commenter commented Apr 3, 2023

Codecov Report

Merging #3 (371def4) into main (2cefd8a) will decrease coverage by 1.94%.
The diff coverage is 81.57%.

📣 This organization is not using Codecov’s GitHub App Integration. We recommend you install it so Codecov can continue to function properly for your repositories. Learn more

@@            Coverage Diff             @@
##             main       #3      +/-   ##
==========================================
- Coverage   87.16%   85.22%   -1.94%     
==========================================
  Files           5        5              
  Lines         148      176      +28     
==========================================
+ Hits          129      150      +21     
- Misses         19       26       +7     
Impacted Files Coverage Δ
src/types.jl 100.00% <ø> (ø)
src/arithmetic.jl 45.65% <30.00%> (-4.35%) ⬇️
src/complex.jl 100.00% <100.00%> (ø)
src/horner.jl 99.05% <100.00%> (+0.14%) ⬆️

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

@heltonmc
Copy link
Owner Author

heltonmc commented Apr 3, 2023

Added support for complex SIMD evalpoly routines when argument is complex and coefficients are real. This is needed upstream in Bessels.jl

The regular evalpoly routine already SIMDs well for a single polynomial evaluation to the differnet complex evalpoly routine. So matching the single polynomial performance when evaluating two or four polynomials in parallel can't be achieved. But this quickly gains significant advantages compared to sequentially evaluating them.

julia> z
1.1 + 1.2im
julia> length(P1)
15

julia> const pp4 = pack_poly((P1, P2, P3, P4))

julia> @benchmark SIMDMath.horner_simd($z, pp4)
BenchmarkTools.Trial: 10000 samples with 1000 evaluations.
 Range (min  max):  6.458 ns  9.000 ns  ┊ GC (min  max): 0.00%  0.00%
 Time  (median):     6.583 ns             ┊ GC (median):    0.00%
 Time  (mean ± σ):   6.589 ns ± 0.057 ns  ┊ GC (mean ± σ):  0.00% ± 0.00%

                              █                              
  ▂▁▁▁▁▁▁▁▁▂▁▁▁▁▁▁▁▁▃▄▁▁▁▁▁▁▁▁█▁▁▁▁▁▁▁▁▃▁▁▁▁▁▁▁▁▁▃▁▁▁▁▁▁▁▁▂ ▂
  6.46 ns        Histogram: frequency by time       6.71 ns <

 Memory estimate: 0 bytes, allocs estimate: 0.

julia> @benchmark evalpoly($z, $P1)
BenchmarkTools.Trial: 10000 samples with 1000 evaluations.
 Range (min  max):  4.000 ns  20.291 ns  ┊ GC (min  max): 0.00%  0.00%
 Time  (median):     4.084 ns              ┊ GC (median):    0.00%
 Time  (mean ± σ):   4.109 ns ±  0.311 ns  ┊ GC (mean ± σ):  0.00% ± 0.00%

                              █▁             ▃                
  ▂▁▁▁▁▁▁▁▁▁▁▁▁▁▃▁▁▁▁▁▁▁▁▁▁▁▁▁██▁▁▁▁▁▁▁▁▁▁▁▁▁█▁▁▁▁▁▁▁▁▁▁▁▁▁▃ ▂
  4 ns           Histogram: frequency by time        4.17 ns <

 Memory estimate: 0 bytes, allocs estimate: 0.

julia> @benchmark (evalpoly($z, $P1), evalpoly($z, $P2))
BenchmarkTools.Trial: 10000 samples with 1000 evaluations.
 Range (min  max):  7.417 ns  31.208 ns  ┊ GC (min  max): 0.00%  0.00%
 Time  (median):     7.542 ns              ┊ GC (median):    0.00%
 Time  (mean ± σ):   7.566 ns ±  0.451 ns  ┊ GC (mean ± σ):  0.00% ± 0.00%

                          █                                   
  ▂▁▁▁▁▁▁▁▂▁▁▁▁▁▁▁▅▁▁▁▁▁▁▁█▁▁▁▁▁▁▁▅▃▁▁▁▁▁▁▁▃▁▁▁▁▁▁▁▂▁▁▁▁▁▁▁▂ ▂
  7.42 ns        Histogram: frequency by time        7.71 ns <

 Memory estimate: 0 bytes, allocs estimate: 0.

julia> @benchmark (evalpoly($z, $P1), evalpoly($z, $P2), evalpoly($z, $P3))
BenchmarkTools.Trial: 10000 samples with 999 evaluations.
 Range (min  max):  11.177 ns  16.266 ns  ┊ GC (min  max): 0.00%  0.00%
 Time  (median):     11.261 ns              ┊ GC (median):    0.00%
 Time  (mean ± σ):   11.285 ns ±  0.074 ns  ┊ GC (mean ± σ):  0.00% ± 0.00%

                         █           ▅                         
  ▂▁▁▁▁▁▁▁▁▁▁▃▃▁▁▁▁▁▁▁▁▁▁█▁▁▁▁▁▁▁▁▁▁▁█▁▁▁▁▁▁▁▁▁▁▁▄▁▁▁▁▁▁▁▁▁▁▂ ▂
  11.2 ns         Histogram: frequency by time        11.4 ns <

 Memory estimate: 0 bytes, allocs estimate: 0.

julia> @benchmark (evalpoly($z, $P1), evalpoly($z, $P2), evalpoly($z, $P3), evalpoly($z, $P4))
BenchmarkTools.Trial: 10000 samples with 998 evaluations.
 Range (min  max):  14.695 ns  41.458 ns  ┊ GC (min  max): 0.00%  0.00%
 Time  (median):     14.822 ns              ┊ GC (median):    0.00%
 Time  (mean ± σ):   14.851 ns ±  0.634 ns  ┊ GC (mean ± σ):  0.00% ± 0.00%

           ▃▂        ▇         █         ▆         ▄        ▂ ▂
  ▇▁▁▁▁▁▁▁▁██▁▁▁▁▁▁▁▁█▁▁▁▁▁▁▁▁▁█▁▁▁▁▁▁▁▁▁█▁▁▁▁▁▁▁▁▁█▁▁▁▁▁▁▁▁█ █
  14.7 ns      Histogram: log(frequency) by time      14.9 ns <

 Memory estimate: 0 bytes, allocs estimate: 0.

So the horner_simd routine is evaluating four different polynomials of length 15 with a complex argument in about 6.5 ns.

EDIT: I've switched the muladd function to match the usual instrinsics fmadd to match the other methods. I believe this is the best way to go as I frequently ran into issues where method errors using Vecs would throw errors in base. These are hard to track down when overloading that function.

@heltonmc heltonmc merged commit 3470d10 into main Apr 3, 2023
@heltonmc heltonmc deleted the complex_simd branch April 3, 2023 14:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants