Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

type-stable inner loop for sqrtm #20214

Merged
merged 11 commits into from
Feb 3, 2017
55 changes: 30 additions & 25 deletions base/linalg/triangular.jl
Original file line number Diff line number Diff line change
Expand Up @@ -1867,48 +1867,53 @@ function logm{T<:Union{Float64,Complex{Float64}}}(A0::UpperTriangular{T})
end
logm(A::LowerTriangular) = logm(A.').'

function sqrtm{T}(A::UpperTriangular{T})
n = checksquare(A)
function sqrtm(A::UpperTriangular)
realmatrix = false
if isreal(A)
realmatrix = true
for i = 1:n
for i = 1:checksquare(A)
if real(A[i,i]) < 0
realmatrix = false
break
end
end
end
if realmatrix
TT = typeof(sqrt(zero(T)))
else
TT = typeof(sqrt(complex(-one(T))))
end
R = zeros(TT, n, n)
for j = 1:n
R[j,j] = realmatrix?sqrt(A[j,j]):sqrt(complex(A[j,j]))
for i = j-1:-1:1
r = A[i,j]
for k = i+1:j-1
r -= R[i,k]*R[k,j]
sqrtm(A,Val{realmatrix})
end
function sqrtm{T,realmatrix}(A::UpperTriangular{T},::Type{Val{realmatrix}})
n = checksquare(A)
t = realmatrix ? typeof(sqrt(zero(T))) : typeof(sqrt(complex(zero(T))))
R = zeros(t, n, n)
tt = typeof(zero(t)*zero(t))
@inbounds begin
for j = 1:n
R[j,j] = realmatrix ? sqrt(A[j,j]) : sqrt(complex(A[j,j]))
for i = j-1:-1:1
r::tt = A[i,j]
@simd for k = i+1:j-1
r -= R[i,k]*R[k,j]
end
r==0 || (R[i,j] = r / (R[i,i] + R[j,j]))
Copy link
Member

@stevengj stevengj Jan 27, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Per @tkelman's comment below, for non-commutative number types (e.g. quaternions), this is not right.

I haven't gone through the algebra very carefully, but I think the final R[i,j] needs to solve R[i,i]*R[i,j] + R[i,j]*R[j,j] == r, i.e. a Sylvester equation. So, the right thing to do seems like:

R[i,j] = sylvester(R[i,i], R[j,j], -r)

and then add a method

sylvester(a::Union{Real,Complex},b::Union{Real,Complex},c::Union{Real,Complex}) = 
    (-c) / (a + b)

for the commutative Number cases. New Number types will have then to provide their own sylvester method if they want sqrtm to work. (We already have sylvester methods for matrices thanks to #7435.)

end
r==0 || (R[i,j] = r / (R[i,i] + R[j,j]))
end
end
return UpperTriangular(R)
end
function sqrtm{T}(A::UnitUpperTriangular{T})
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems like we could replace this with

sqrtm{T}(A::UnitUpperTriangular{T}) = sqrtm(A, Val{true})

in terms of the abovementioned method. Given a Val method that is type-stable, the only remaining reason to duplicate the code here seems to be to save the sqrt call for the diagonal elements, but I doubt this is worth the trouble since there are only O(n) such calls.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will look at this tomorrow

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since UnitUpperTriangular{T} is not a subtype of UpperTriangular{T} (??), I think it would be

sqrtm(A::UnitUpperTriangular) = UnitUpperTriangular(sqrtm(UpperTriangular(A),Val{true}))

For some reason this is faster than the present method, which confuses me. This version involves two additional conversions and some unnecessary arithmetic operations in the loop (of order O(n) as you mention)! Are some ops on UnitUpperTriangular slower than the same op on UpperTriangular?

Will make this change later unless someone can enlighten me

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(Julia doesn't support inheritance of concrete types.)

Indexing expressions A[i,j] are slower for UnitUpperTriangular than for UpperTriangular, and both are slower than for Array, because it has to check if i <= j and (in the unit case) whether i == j.

Since by construction you only access the upper triangle, I would do:

B = A.data

at the beginning of the function (for both the UnitUpperTriangular and UpperTriangular methods), and then use B[i,j] rather than A[i,j].

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(There might be other methods operating on triangular matrix types that might benefit from a similar change.)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The @inbounds means that it never checks where i and j are in 1:n. It still checks whether i ≤ j, since if i > j then it just returns zero rather than looking at A.data.

Timing such a small operation is tricky; I would normally recommend BenchmarkTools instead of @time:

julia> using BenchmarkTools

julia> A = rand(10,10); T = UpperTriangular(A);

julia> @btime $A[3,5]; @btime $T[3,5];
  1.377 ns (0 allocations: 0 bytes)
  1.650 ns (0 allocations: 0 bytes)

Copy link
Member

@stevengj stevengj Jan 26, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(It may be that the effect on the UpperTriangular case is too small to measure, however. Or maybe even LLVM is smart enough to figure out that i ≤ j is always true. But it wouldn't hurt to unwrap the type anyway.)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After improving the type stability, the specialised implementation for UnitUpperTriangular is faster than converting.
Now, on 0.5.0 using @benchmarking, I see a slowdown when unwrapping the type first in sqrtm. Relatedly, on my machine (juliabox.com) there appears to be a slowdown for the atomic operations:

@benchmark $A[1,2]
  median time:      2.650 ns
@benchmark $T[1,2]
  median time:      2.223 ns

(what does the $ do here?)
This persists when wrapping these calls in a function. All in all I am hesitant to make the unwrapping change for sqrtm (but not against)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How did you unwrap the type? I hope you didn't do A = A.data, which hurts type stability.

I can't reproduce your benchmark results. Maybe the benchmarks aren't reliable for such a short operation, and one needs to put a bunch of indexing operations in a loop? (Probably you also want to time them with @inbounds). The $ splices the value of the variable directly into the expression, which gets rid of the penalty for benchmarking with a global variable (it means the compiler does not have to do runtime type inference to find the type of A).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For example:

julia> function sumupper(A)
           s = zero(eltype(A))
           for j = 1:size(A,2), i = 1:j
               s += A[i,j]
           end
           return s
       end
sumupper (generic function with 1 method)

julia> A = rand(100,100); T = UpperTriangular(A);

julia> using BenchmarkTools

julia> sumupper(A) == sumupper(T)
true

julia> @btime sumupper($A); @btime sumupper($T);
  4.122 μs (0 allocations: 0 bytes)
  5.102 μs (0 allocations: 0 bytes)

n = checksquare(A)
TT = typeof(sqrt(zero(T)))
R = zeros(TT, n, n)
for j = 1:n
R[j,j] = one(T)
for i = j-1:-1:1
r = A[i,j]
for k = i+1:j-1
r -= R[i,k]*R[k,j]
t = typeof(sqrt(zero(T)))
R = eye(t, n, n)
tt = typeof(zero(t)*zero(t))
half = 1/(2*R[1,1])
@inbounds begin
for j = 1:n
for i = j-1:-1:1
r::tt = A[i,j]
@simd for k = i+1:j-1
r -= R[i,k]*R[k,j]
end
r==0 || (R[i,j] = half*r)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the old version of this function might not have worked correctly in this case either, but would this do the right thing if the element type was not commutative under multiplication?

Copy link
Member

@stevengj stevengj Jan 27, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See my comment above for the general case. However, even for non-commutative algebras the case here is special because R[i,i] is the identity, which always commutes with everything. So, the problem reduces to solving R[i,j] + R[i,j] = r. If the elements form a field over the reals, then the solution R[i,j] = r/2 is correct.

The case here is also wrong if e.g. the elements are matrices, in which case 1/(2*R[1,1]) will throw an error, although if you do half = inv(2*R[1,1]) it will work. However, it would be even better to just to R[i,j] = r/2 in that case, since it would avoid the matrix multiplication.

However, it is still wrong if the elements are some other number field where you can't do /2. The correct, general solution seems like it should be

half = inv(R[1,1] + R[1,1]) # don't use 1/(2R[1,1]) or half=0.5, to handle general algebraic types

end
r==0 || (R[i,j] = r / (R[i,i] + R[j,j]))
end
end
return UnitUpperTriangular(R)
Expand Down