-
Notifications
You must be signed in to change notification settings - Fork 33
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Optimize clamp
#194
Optimize clamp
#194
Conversation
Codecov Report
@@ Coverage Diff @@
## master #194 +/- ##
==========================================
+ Coverage 89.11% 89.15% +0.04%
==========================================
Files 6 6
Lines 450 452 +2
==========================================
+ Hits 401 403 +2
Misses 49 49
Continue to review full report at Codecov.
|
julia> @btime clamp.(x, 0.1N0f8, 0.9N0f8) setup=(x=collect(rand(N0f8, 512, 512)));
460.800 μs (2 allocations: 256.14 KiB) # before (Julia v1.4.2)
11.801 μs (2 allocations: 256.14 KiB) # after (Julia v1.4.2) It seems to be faster on Julia v1.6.0-DEV for large arrays. |
At least on Julia 1.5, doesn't this give the same codegen? julia> @code_typed clamp(0.5N0f8, 0N0f8, 1N0f8)
CodeInfo(
1 ─ %1 = Base.getfield(hi, :i)::UInt8
│ %2 = Base.getfield(x, :i)::UInt8
│ %3 = Base.ult_int(%1, %2)::Bool
│ %4 = Base.getfield(x, :i)::UInt8
│ %5 = Base.getfield(lo, :i)::UInt8
│ %6 = Base.ult_int(%4, %5)::Bool
│ %7 = Base.Math.ifelse(%6, lo, x)::Normed{UInt8,8}
│ %8 = Base.Math.ifelse(%3, hi, %7)::Normed{UInt8,8}
└── return %8
) => Normed{UInt8,8}
julia> newclamp(x::X, lo::X, hi::X) where {X <: FixedPoint} = X(clamp(x.i, lo.i, hi.i), 0)
newclamp (generic function with 1 method)
julia> @code_typed newclamp(0.5N0f8, 0N0f8, 1N0f8)
CodeInfo(
1 ─ %1 = Base.getfield(x, :i)::UInt8
│ %2 = Base.getfield(lo, :i)::UInt8
│ %3 = Base.getfield(hi, :i)::UInt8
│ %4 = Base.ult_int(%3, %1)::Bool
│ %5 = Base.ult_int(%1, %2)::Bool
│ %6 = Base.Math.ifelse(%5, %2, %1)::UInt8
│ %7 = Base.Math.ifelse(%4, %3, %6)::UInt8
│ %8 = %new(Normed{UInt8,8}, %7)::Normed{UInt8,8}
└── return %8
) => Normed{UInt8,8} |
I believed that too (#179 (comment)), and I don't know the cause, but the actual results are different (in v1.0.5, v1.4.2, v1.5.0-rc1 and v1.6.0-DEV). 😕 julia> @btime clamp(x,y,z) setup=(x=rand(N0f8); y=rand(N0f8); z=rand(N0f8)); # non-vectorized version is OK
1.099 ns (0 allocations: 0 bytes) # before (Julia v1.5.0-rc1)
1.099 ns (0 allocations: 0 bytes) # after (Julia v1.5.0-rc1)
julia> @btime clamp.(x, 0.1N0f8, 0.9N0f8) setup=(x=collect(rand(N0f8, 512, 512)));
160.399 μs (2 allocations: 256.14 KiB) # before (Julia v1.5.0-rc1)
15.300 μs (2 allocations: 256.14 KiB) # after (Julia v1.5.0-rc1) |
That is pretty crazy. I don't understand either, but this is fine. Since it is surprising, perhaps add a code-comment linking to this PR? Then fine to merge. |
I think it's a common thing on Julia's auto-vectorization. 😅 Edit: |
This adds a specialized method for
clamp
with all arguments in the sameFixedPoint
type. This also supports the newclamp(x, T)
method (cf. JuliaLang/julia#34426).This partially solves #179, but this is not helpful in cases where promotions occur.
cc: @johnnychen94