Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

parallel LU factorization memory leak #15450

Closed
thraen opened this issue Mar 11, 2016 · 8 comments · Fixed by #30425
Closed

parallel LU factorization memory leak #15450

thraen opened this issue Mar 11, 2016 · 8 comments · Fixed by #30425
Labels
linear algebra Linear algebra parallelism Parallel or distributed computation sparse Sparse arrays

Comments

@thraen
Copy link

thraen commented Mar 11, 2016

I encountered a memory leak when I tried to solve LU-factorizations in parallel:

const m     = 100 
const n     = 100

function doit()
    x   = speye(m*n)
    X   = lufact(x)

    p   = rand(m,n)

    for i=1: 10000
        @sync @parallel for t= 1:100
            ax  = X\p[:]
        end     
        println(i)
        #@everywhere gc()
    end
end
doit()
@jiahao
Copy link
Member

jiahao commented Mar 11, 2016

What do you mean by "memory leak"? Did you get an error? Did the code not do what you expected?

@andreasnoack
Copy link
Member

I can confirm this. Sparse factorizations cannot usually be moved across workers, since the pointer is set to zero during the serializing process. However, in contrast to the Cholesky, the sparse LU recomputes the factorization instead of failing when it is detected that the pointer is null. I'm still not sure why this causes a leak.

@tkelman
Copy link
Contributor

tkelman commented Mar 11, 2016

in contrast to the Cholesky, the sparse LU recomputes the factorization instead of failing when it is detected that the pointer is null

Is that difference intentional? Seems like we should stick to one or the other.

@thraen
Copy link
Author

thraen commented Mar 11, 2016

Yes, and I think there should be a warning in the case of recomputing. I forgot to mention that the leak also happens when I define the LU factorization everywhere like this:

@everywhere const m = 100 
@everywhere const n = 100
@everywhere const x = speye(m*n)
@everywhere const X = lufact(x)
function doit()
    p   = rand(m,n)
    for i=1: 10000
        @sync @parallel for t= 1:100
            ax  = X\p[:]
        end
        println(i)
    end
end
doit()

I wasn't aware that even then the LU factorization is moved across workers (and that it's recomputed on the workers).
In case somebody has the same problem: to prevent this one has to wrap the solver into a closure:

@everywhere const m = 100 
@everywhere const n = 100
@everywhere const x = speye(m*n)
@everywhere const X = lufact(x)
@everywhere function solveLU(b)
    return X\b
end

function doit()
    p   = rand(m,n)
    for i=1: 10000
        @sync @parallel for t= 1:100
            solveLU(p[:])
        end
        println(i)
    end
end
doit()

@andreasnoack
Copy link
Member

Is that difference intentional?

I don't think so. UMFPACK and CHOLMOD have a slightly different design, but our wrappers are also written by different people. The present memory management model in the CHOLMOD wrappers is one I introduced out of necessity, but I've almost not contributed to the UMFPACK wrappers.

It might be possible to recompute a CHOLMOD.Factor after serialization, but that would require that we store the original sparse matrix and some meta information with the factorization. That is what the UmfpackLU does, but I also think that UMFPACK has less global state. CHOLMOD has the common struct that controls if the factorization is a Cholesky or LDLt. We would need to carry such information explicitly in the type as well for getting recomputation after serialization working.

The usual tradeoffs between generality and hidden slowness also apply here. It might be possible to avoid the error when moving sparse factorizations, but it will be extremely inefficient to move and recompute the factorization on the new workers. Users probably want to use @parallel to get a speedup, but they probably won't in this case and it will be tricky to realize why.

...but back to this issue. I think I have an idea about what is causing the memory leak and will get back with an update on that.

@tkelman
Copy link
Contributor

tkelman commented Mar 12, 2016

Maybe erroring instead of recomputing in the LU case would be a better option (and more consistent with Cholesky) than quietly re-factorizing.

@kshyatt kshyatt added linear algebra Linear algebra sparse Sparse arrays parallelism Parallel or distributed computation labels Jul 28, 2016
@ViralBShah
Copy link
Member

@andreasnoack Was this resolved?

@andreasnoack
Copy link
Member

andreasnoack commented Dec 17, 2018

No this is still an issue. The problem is that we don't register a finalizer when deserializing the factorization so the memory allocated when the factorization is recomputed is never released. The issue can therefore be reproduced just with

julia> using SparseArrays, SuiteSparse, LinearAlgebra, Serialization

julia> b = IOBuffer();

julia> F = lu(sparse(1.0I, 10000, 10000));

julia> foreach(1:1000) do i
         seekstart(b)
         serialize(b, F)
         seekstart(b)
         SuiteSparse.UMFPACK.umfpack_symbolic!(deserialize(b))
       end;

I think there are three possible solution

  1. Throw on serialization to avoid leaks. It also has the benefit of revealing the costly recomputation of the factorization which is kind of hidden in the original issue. This solution would be breaking, though, so probably not the best solution.
  2. Define a custom a deserialization method that registers a finalizer. This should get rid of the leak but will still hide the costly recomputation.
  3. Define a proper serialization for these objects. This would probably require that we match the C structs with Julia structs and on deserialization copy the content to memory not managed by the GC. This might be slightly tricky.

I suspect we should start with 2 and then open a separate to track the development of 3.

KristofferC pushed a commit that referenced this issue Jan 11, 2019
KristofferC pushed a commit that referenced this issue Feb 4, 2019
KristofferC pushed a commit that referenced this issue Feb 11, 2019
KristofferC pushed a commit that referenced this issue Apr 20, 2019
KristofferC pushed a commit that referenced this issue Feb 20, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
linear algebra Linear algebra parallelism Parallel or distributed computation sparse Sparse arrays
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants