-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add routines for arithmetic operations with mixed real and complex types #3
Conversation
Codecov Report
📣 This organization is not using Codecov’s GitHub App Integration. We recommend you install it so Codecov can continue to function properly for your repositories. Learn more @@ Coverage Diff @@
## main #3 +/- ##
==========================================
- Coverage 87.16% 85.22% -1.94%
==========================================
Files 5 5
Lines 148 176 +28
==========================================
+ Hits 129 150 +21
- Misses 19 26 +7
📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more |
Added support for complex SIMD evalpoly routines when argument is complex and coefficients are real. This is needed upstream in Bessels.jl The regular evalpoly routine already SIMDs well for a single polynomial evaluation to the differnet complex evalpoly routine. So matching the single polynomial performance when evaluating two or four polynomials in parallel can't be achieved. But this quickly gains significant advantages compared to sequentially evaluating them. julia> z
1.1 + 1.2im
julia> length(P1)
15
julia> const pp4 = pack_poly((P1, P2, P3, P4))
julia> @benchmark SIMDMath.horner_simd($z, pp4)
BenchmarkTools.Trial: 10000 samples with 1000 evaluations.
Range (min … max): 6.458 ns … 9.000 ns ┊ GC (min … max): 0.00% … 0.00%
Time (median): 6.583 ns ┊ GC (median): 0.00%
Time (mean ± σ): 6.589 ns ± 0.057 ns ┊ GC (mean ± σ): 0.00% ± 0.00%
█
▂▁▁▁▁▁▁▁▁▂▁▁▁▁▁▁▁▁▃▄▁▁▁▁▁▁▁▁█▁▁▁▁▁▁▁▁▃▁▁▁▁▁▁▁▁▁▃▁▁▁▁▁▁▁▁▂ ▂
6.46 ns Histogram: frequency by time 6.71 ns <
Memory estimate: 0 bytes, allocs estimate: 0.
julia> @benchmark evalpoly($z, $P1)
BenchmarkTools.Trial: 10000 samples with 1000 evaluations.
Range (min … max): 4.000 ns … 20.291 ns ┊ GC (min … max): 0.00% … 0.00%
Time (median): 4.084 ns ┊ GC (median): 0.00%
Time (mean ± σ): 4.109 ns ± 0.311 ns ┊ GC (mean ± σ): 0.00% ± 0.00%
█▁ ▃
▂▁▁▁▁▁▁▁▁▁▁▁▁▁▃▁▁▁▁▁▁▁▁▁▁▁▁▁██▁▁▁▁▁▁▁▁▁▁▁▁▁█▁▁▁▁▁▁▁▁▁▁▁▁▁▃ ▂
4 ns Histogram: frequency by time 4.17 ns <
Memory estimate: 0 bytes, allocs estimate: 0.
julia> @benchmark (evalpoly($z, $P1), evalpoly($z, $P2))
BenchmarkTools.Trial: 10000 samples with 1000 evaluations.
Range (min … max): 7.417 ns … 31.208 ns ┊ GC (min … max): 0.00% … 0.00%
Time (median): 7.542 ns ┊ GC (median): 0.00%
Time (mean ± σ): 7.566 ns ± 0.451 ns ┊ GC (mean ± σ): 0.00% ± 0.00%
█
▂▁▁▁▁▁▁▁▂▁▁▁▁▁▁▁▅▁▁▁▁▁▁▁█▁▁▁▁▁▁▁▅▃▁▁▁▁▁▁▁▃▁▁▁▁▁▁▁▂▁▁▁▁▁▁▁▂ ▂
7.42 ns Histogram: frequency by time 7.71 ns <
Memory estimate: 0 bytes, allocs estimate: 0.
julia> @benchmark (evalpoly($z, $P1), evalpoly($z, $P2), evalpoly($z, $P3))
BenchmarkTools.Trial: 10000 samples with 999 evaluations.
Range (min … max): 11.177 ns … 16.266 ns ┊ GC (min … max): 0.00% … 0.00%
Time (median): 11.261 ns ┊ GC (median): 0.00%
Time (mean ± σ): 11.285 ns ± 0.074 ns ┊ GC (mean ± σ): 0.00% ± 0.00%
█ ▅
▂▁▁▁▁▁▁▁▁▁▁▃▃▁▁▁▁▁▁▁▁▁▁█▁▁▁▁▁▁▁▁▁▁▁█▁▁▁▁▁▁▁▁▁▁▁▄▁▁▁▁▁▁▁▁▁▁▂ ▂
11.2 ns Histogram: frequency by time 11.4 ns <
Memory estimate: 0 bytes, allocs estimate: 0.
julia> @benchmark (evalpoly($z, $P1), evalpoly($z, $P2), evalpoly($z, $P3), evalpoly($z, $P4))
BenchmarkTools.Trial: 10000 samples with 998 evaluations.
Range (min … max): 14.695 ns … 41.458 ns ┊ GC (min … max): 0.00% … 0.00%
Time (median): 14.822 ns ┊ GC (median): 0.00%
Time (mean ± σ): 14.851 ns ± 0.634 ns ┊ GC (mean ± σ): 0.00% ± 0.00%
▃▂ ▇ █ ▆ ▄ ▂ ▂
▇▁▁▁▁▁▁▁▁██▁▁▁▁▁▁▁▁█▁▁▁▁▁▁▁▁▁█▁▁▁▁▁▁▁▁▁█▁▁▁▁▁▁▁▁▁█▁▁▁▁▁▁▁▁█ █
14.7 ns Histogram: log(frequency) by time 14.9 ns <
Memory estimate: 0 bytes, allocs estimate: 0.
So the EDIT: I've switched the |
No description provided.