Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

type-stable inner loop for sqrtm #20214

Merged
merged 11 commits into from
Feb 3, 2017
44 changes: 25 additions & 19 deletions base/linalg/triangular.jl
Original file line number Diff line number Diff line change
Expand Up @@ -1867,48 +1867,54 @@ function logm{T<:Union{Float64,Complex{Float64}}}(A0::UpperTriangular{T})
end
logm(A::LowerTriangular) = logm(A.').'

function sqrtm{T}(A::UpperTriangular{T})
n = checksquare(A)
function sqrtm(A::UpperTriangular)
realmatrix = false
if isreal(A)
realmatrix = true
for i = 1:n
for i = 1:Base.LinAlg.checksquare(A)
if real(A[i,i]) < 0
realmatrix = false
break
end
end
end
sqrtm(A,Val{realmatrix})
end
function sqrtm{T,realmatrix}(A::UpperTriangular{T},::Type{Val{realmatrix}})
if realmatrix
TT = typeof(sqrt(zero(T)))
else
TT = typeof(sqrt(complex(-one(T))))
TT = typeof(sqrt(complex(zero(T))))
end
n = Base.LinAlg.checksquare(A)
R = zeros(TT, n, n)
Copy link
Member

@stevengj stevengj Jan 25, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

~~My inclination would be to use R = Matrix{TT}(n, n) here. Then get rid of the r == 0 check below. sqrtm is unlikely to return a sparse result, so I don't see the point of pre-initializing the array to zero.~~~

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, nevermind, I forgot that this is for the upper-triangular case. Then I guess initializing it to zero makes sense.

for j = 1:n
R[j,j] = realmatrix?sqrt(A[j,j]):sqrt(complex(A[j,j]))
for i = j-1:-1:1
r = A[i,j]
for k = i+1:j-1
r -= R[i,k]*R[k,j]
@inbounds begin
for j = 1:n
R[j,j] = realmatrix ? sqrt(A[j,j]) : sqrt(complex(A[j,j]))
for i = j-1:-1:1
r = A[i,j] + zero(TT)
Copy link
Member

@stevengj stevengj Jan 25, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Even simpler:

r::TT = A[i,j]

Though (like a lot of our generic linear-algebra code), this isn't right for dimensionful quantities, because r has the units of A whereas R has the units of sqrt(A). If we want to get that right, it should really be something like:

TA = realmatrix ? float(T) : Complex{float(T)}

and then use r::TA.

Copy link
Contributor Author

@felixrehren felixrehren Jan 26, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To be clear, do you suggest replacing

TT = realmatrix ? typeof(sqrt(zero(T))) : typeof(sqrt(complex(zero(T))))

with

TT = realmatrix ? float(T) : Complex{float(T)}

or would TA be supplemental to TT? Because r should have the units of R = sqrtm(A)?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TA would be supplemental to TT, because A and R have different units.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of introducing a new type variable then you can simply do

r = A[i,j] - zero(TT)*zero(TT)

which matches the operation in the loop and therefore correct. We should avoid explicit float calls if possible since it requires that float is implemented correctly for new types (which might not be the case).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I hope this is addressed -- I guess we should write some tests? (for a different PR ...)

@simd for k = i+1:j-1
r -= R[i,k]*R[k,j]
end
r==0 || (R[i,j] = r / (R[i,i] + R[j,j]))
Copy link
Member

@stevengj stevengj Jan 27, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Per @tkelman's comment below, for non-commutative number types (e.g. quaternions), this is not right.

I haven't gone through the algebra very carefully, but I think the final R[i,j] needs to solve R[i,i]*R[i,j] + R[i,j]*R[j,j] == r, i.e. a Sylvester equation. So, the right thing to do seems like:

R[i,j] = sylvester(R[i,i], R[j,j], -r)

and then add a method

sylvester(a::Union{Real,Complex},b::Union{Real,Complex},c::Union{Real,Complex}) = 
    (-c) / (a + b)

for the commutative Number cases. New Number types will have then to provide their own sylvester method if they want sqrtm to work. (We already have sylvester methods for matrices thanks to #7435.)

end
r==0 || (R[i,j] = r / (R[i,i] + R[j,j]))
end
end
return UpperTriangular(R)
end
function sqrtm{T}(A::UnitUpperTriangular{T})
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems like we could replace this with

sqrtm{T}(A::UnitUpperTriangular{T}) = sqrtm(A, Val{true})

in terms of the abovementioned method. Given a Val method that is type-stable, the only remaining reason to duplicate the code here seems to be to save the sqrt call for the diagonal elements, but I doubt this is worth the trouble since there are only O(n) such calls.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will look at this tomorrow

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since UnitUpperTriangular{T} is not a subtype of UpperTriangular{T} (??), I think it would be

sqrtm(A::UnitUpperTriangular) = UnitUpperTriangular(sqrtm(UpperTriangular(A),Val{true}))

For some reason this is faster than the present method, which confuses me. This version involves two additional conversions and some unnecessary arithmetic operations in the loop (of order O(n) as you mention)! Are some ops on UnitUpperTriangular slower than the same op on UpperTriangular?

Will make this change later unless someone can enlighten me

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(Julia doesn't support inheritance of concrete types.)

Indexing expressions A[i,j] are slower for UnitUpperTriangular than for UpperTriangular, and both are slower than for Array, because it has to check if i <= j and (in the unit case) whether i == j.

Since by construction you only access the upper triangle, I would do:

B = A.data

at the beginning of the function (for both the UnitUpperTriangular and UpperTriangular methods), and then use B[i,j] rather than A[i,j].

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(There might be other methods operating on triangular matrix types that might benefit from a similar change.)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The @inbounds means that it never checks where i and j are in 1:n. It still checks whether i ≤ j, since if i > j then it just returns zero rather than looking at A.data.

Timing such a small operation is tricky; I would normally recommend BenchmarkTools instead of @time:

julia> using BenchmarkTools

julia> A = rand(10,10); T = UpperTriangular(A);

julia> @btime $A[3,5]; @btime $T[3,5];
  1.377 ns (0 allocations: 0 bytes)
  1.650 ns (0 allocations: 0 bytes)

Copy link
Member

@stevengj stevengj Jan 26, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(It may be that the effect on the UpperTriangular case is too small to measure, however. Or maybe even LLVM is smart enough to figure out that i ≤ j is always true. But it wouldn't hurt to unwrap the type anyway.)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After improving the type stability, the specialised implementation for UnitUpperTriangular is faster than converting.
Now, on 0.5.0 using @benchmarking, I see a slowdown when unwrapping the type first in sqrtm. Relatedly, on my machine (juliabox.com) there appears to be a slowdown for the atomic operations:

@benchmark $A[1,2]
  median time:      2.650 ns
@benchmark $T[1,2]
  median time:      2.223 ns

(what does the $ do here?)
This persists when wrapping these calls in a function. All in all I am hesitant to make the unwrapping change for sqrtm (but not against)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How did you unwrap the type? I hope you didn't do A = A.data, which hurts type stability.

I can't reproduce your benchmark results. Maybe the benchmarks aren't reliable for such a short operation, and one needs to put a bunch of indexing operations in a loop? (Probably you also want to time them with @inbounds). The $ splices the value of the variable directly into the expression, which gets rid of the penalty for benchmarking with a global variable (it means the compiler does not have to do runtime type inference to find the type of A).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For example:

julia> function sumupper(A)
           s = zero(eltype(A))
           for j = 1:size(A,2), i = 1:j
               s += A[i,j]
           end
           return s
       end
sumupper (generic function with 1 method)

julia> A = rand(100,100); T = UpperTriangular(A);

julia> using BenchmarkTools

julia> sumupper(A) == sumupper(T)
true

julia> @btime sumupper($A); @btime sumupper($T);
  4.122 μs (0 allocations: 0 bytes)
  5.102 μs (0 allocations: 0 bytes)

n = checksquare(A)
TT = typeof(sqrt(zero(T)))
R = zeros(TT, n, n)
for j = 1:n
R[j,j] = one(T)
for i = j-1:-1:1
r = A[i,j]
for k = i+1:j-1
r -= R[i,k]*R[k,j]
R = eye(TT, n, n)
@inbounds begin
for j = 1:n
for i = j-1:-1:1
r = A[i,j] + zero(TT)
@simd for k = i+1:j-1
r -= R[i,k]*R[k,j]
end
r==0 || (R[i,j] = r / (R[i,i] + R[j,j]))
end
r==0 || (R[i,j] = r / (R[i,i] + R[j,j]))
end
end
return UnitUpperTriangular(R)
Expand Down