Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sparse arrays with algebraic, non-numerical data #46

Closed
abraunst opened this issue Jan 3, 2019 · 29 comments
Closed

sparse arrays with algebraic, non-numerical data #46

abraunst opened this issue Jan 3, 2019 · 29 comments

Comments

@abraunst
Copy link
Contributor

abraunst commented Jan 3, 2019

using SparseArrays,StaticArrays

julia> T=SMatrix{2,2,Float64};

julia> s=sparse(1:10,1:10,fill(zero(T), 10));

julia> nnz(s)
10

julia> nnz(sparse(Matrix(s)))
100

This is just an example, there are other methods that don't work quite as they should (e.g. setindex!) The problem seems to be that the code for sparse matrices is riddled with != 0 checks instead of !iszero checks. This seems related to JuliaLang/julia#19561, but in that case the problem was for types without zero(T) for which sparse does not make real sense anyway. Maybe it would be reasonable to enforce the value type <: Number for SparseMatrixCSC? Are there people using it for non-number types? In that case one could fix all the != 0 instances and see if things work...

@ViralBShah
Copy link
Member

Happy to accept a PR and see this work.

@abraunst
Copy link
Contributor Author

abraunst commented Jan 3, 2019

Tried, with mixed results ;-) Doing so will forbid creating sparse vectors or matrices of types T with no defined zero(T) for instance in SparseVector{Tv,Ti}(s::AbstractVector{Tv}) as it needs to determine if the values are non-zero. That would be fine I think, except that in line 1246 of stdlib/SparseArrays/test/sparse.jl there is this test

   A = sparse(["a", "b"])
   @test_throws MethodError findmin(A, dims=1)

That now fail in the creation of the vector. I don't know what to think, is SparseVector{String,Ti} something that should be supported? All remaining tests pass (both for vector and matrices).
All in all I think that the best would be to be able to specify the non-stored value for sparse arrays.

@ViralBShah
Copy link
Member

I would be ok with forbidding creation of sparse matrices where zero(T) is not defined. Would be good to hear other thoughts.

@stevengj
Copy link
Contributor

stevengj commented Jan 4, 2019

See also JuliaLang/julia#19561, JuliaLang/julia#11408.

@abraunst
Copy link
Contributor Author

abraunst commented Jan 4, 2019

See also JuliaLang/julia#19561, JuliaLang/julia#11408.

Thanks @stevengj. I was unaware of JuliaLang/julia#11408. If there is no consensus regarding the drop of support for types T without zero(T), then maybe it is better close this (and the PR)?

@andreasnoack
Copy link
Member

I'm in favor of restricting SparseMatrixCSC to elements with a zero but regardless of that discussion, I think JuliaLang/julia#30580 improves that the situation for StaticArrays, right?

@abraunst
Copy link
Contributor Author

abraunst commented Jan 4, 2019

I'm in favor of restricting SparseMatrixCSC to elements with a zero but regardless of that discussion, I think JuliaLang/julia#30580 improves that the situation for StaticArrays, right?

Yes, but I admit that I found the example after I ran into the "bug" in the code. The current code uses != 0 in several places, which is of course wrong for non-numerical types that define zero; but at least it doesn't give an error for non-numerical types that do not define zero (although !=0 is always true there, so for instance the sparse-dense-sparse round-trip gives a matrix full of stored zeros). Using iszero would mean to really drop support for those types (e.g. String) even for simple storage (which I don't know if people really use, but at least it appears in some tests).

@yurivish
Copy link

yurivish commented Feb 2, 2019

D4M sometimes uses sparse arrays to store non-numeric data.

I found this issue after being tripped up by the fact that broadcast non-numeric data didn't work, and was a little surprised that you couldn't broadcast an Any[1, 2] with a sparse matrix either, though I see now why that's the case. Regardless of the conclusions here, it would be nice if that error message were improved to be a bit more specific. Thanks to everyone for taking the time to thoroughly map out the territory — I learned a lot reading through some of the issues just now.

I wonder whether Base should maybe define zero(::String) == "", which would at least make that specific case work.

@abraunst
Copy link
Contributor Author

abraunst commented Feb 3, 2019

I wonder whether Base should maybe define zero(::String) == "", which would at least make that specific case work.

I get your point, but at least from the mathematical point of view it seems an awkward stretch IMHO: "" is the identity element of the multiplicative (non-abelian) semigroup (*,String), whereas zero(T) should be the identity of the additive group (+,T). I think all this awkwardness could be avoided if one could specify the "non-stored" element of sparse arrays. On numerical data, linear operations could be fast-tracked if the non-stored element iszero so as to retain the current performance (and a sensible default could mimic current or desired behavior). I'm not saying all this is easy to implement though of course, but it seems like a fun project.

@StefanKarpinski
Copy link
Contributor

I wonder whether Base should maybe define zero(::String) == "", which would at least make that specific case work.

This should absolutely not be defined. The empty string is the multiplicative unit of the string semigroup which is why we have one(String) == "". There is no additive identity because we do not use + for concatenation. If we did, then "foo" + "bar" should produce the regular expression r"foo|bar". String and regular expressions form a semiring under the operations of * for concatenation and + for alternation. So the correct value for zero(String) if any, should be a regular expression that matches nothing, so we could have zero(::Type{String}) = r"(?!)".

@yurivish
Copy link

yurivish commented Feb 4, 2019

Right, yeah, I realized later that I was confused. Returning regular expressions seems mathematically correct but not very nice for performance (or type-stability).

@lr4d
Copy link

lr4d commented May 21, 2021

I ran into the following error today when trying to use spare arrays with non-numeric data types. I could create a SparseVector{Symbol} but couldn't use it as intended

julia> v = SparseArrays.sparsevec([1;22], [:a, :b])
22-element SparseVector{Symbol,Int64} with 2 stored entries:
  [1 ]  =  :a
  [22]  =  :b

julia> findfirst(isequal(:b), v)
ERROR: MethodError: no method matching zero(::Type{Symbol})

If these types do not really support working with non-numeric values, then I would also suggest to forbid their creation

@KlausC
Copy link
Contributor

KlausC commented May 21, 2021

As far as I know, the SparseVector and SparseMatrixCSC types have minimal requirements, which are support for iszero and zero. Both are essential for the intended functionalty of the structures - and make not much sense for Symbol or String.
Neverthless it is not desirable restrict the element types to (for example) Number s, because there are other senseful types, which have both iszero and zero.

@abraunst
Copy link
Contributor Author

I wonder if we could not simply add two new parameters SparseMatrixCSC{Tv,Ti,Z,IZ}, and make the constructors make them default to zero and iszero respectively. This would make using SparseMatrixCSC as storage for non-numerical types possible, on pair with Matrix.

@KlausC
Copy link
Contributor

KlausC commented May 22, 2021

I would prefer not to add functions as type parameters.
Alternatively introduce new functions like sparse_unstored and is_sparse_unstored, which default to zero and iszero for Number, and which can be redefined for other types.

@abraunst
Copy link
Contributor Author

Is there a particular reason why functions as type parameters are to be avoided? (honest question)
I don't like the sparse_unstored and is_sparse_unstored solution, because then user-defining it e.g. for String would be type piracy...

@KlausC
Copy link
Contributor

KlausC commented May 26, 2021

Is there a particular reason why functions as type parameters are to be avoided? (honest question)

The reason that made me feel uncomfortable with this was, that it introduced a breaking change into the type system.

would be type piracy...

Not, if we defined default implementation for String and other relevant types.

@abraunst
Copy link
Contributor Author

Not, if we defined default implementation for String and other relevant types.

For every type? This doesn't feel right to me, as in a sense would be making String aware of sparse structures. And the problem would still stand for types defined in packages, no?

@KlausC
Copy link
Contributor

KlausC commented May 28, 2021

It looks like we must either commit a "breaking change" or provoke a "type piracy" by the user, if we want to extend the element types of Sparse... beyond Numbers.
Defending my proposal of sparse_unstored etc.: it seems to be a crime of second degree, if we pirate these new functions compared to iszero and zero :-) .

Thinking more about adding type parameters, and why it disgusts me adding two functions as replacements of iszero and zero, I believe, these two functions are jointly representing one single concept (of storage). Maybe a question of economy and extensibility to encode this concept in a single parameter. With other words, can't we pass a single trait name to encompass what we want. That could look like so:

struct SparseMatrixCSC{Tv,Ti,NullityConcept} ... end

SparseMatrixCSC{Tv,Ti}(...) = SparseMatrixCSC{Tv,Ti,DefaultNullityTrait}(...)

iszero(x, ::Type{<:DefaultNullityTrait}) = iszero(x)
zero(T::Type, ::Type{<:DefaultNullityTrait}) = zero(T)

abstract type DefaultNullityTrait end

That would allow the use with different concepts to define

abstract type StringNullityTrait end
Base.iszero(x::AbstractString, ::Type{<:StringNullityTrait}) = isempty(x)
Base.zero(::Type{T}, ::Type{<:StringNullityTrait}) where T<:AbstractString = T("")

and inject this behavior into the sparse matrix using constructor SparseMatrixCSC{Tv,Ti,StringNullityTrait}(...)

@Suavesito-Olimpiada
Copy link

Hi! I was redirected to this issue form JuliaLang/julia#42536 since I'm using SparseArrays with Num from Symbolics.jl and all the (!)== 0 messes with the use since is returns an Expresion instead of a Bool.

Is there any consensus as to whether the iszero and zero interface for a type is required to work with SparseArrays?

XR: https://discourse.julialang.org/t/is-there-a-defined-minimal-interface-for-a-type-to-work-with-sparsearrays/69602

@jlapeyre
Copy link
Contributor

An alternative solution, with other benefits is here: https://github.com/JuliaLang/julia/issues/41036
This looks like it may be related to proposals involving isnullable, although I can't find them at the moment.

As far as I can tell https://github.com/JuliaLang/julia/issues/41036 involves no breakage, inefficiency, or pirating. In fact it clarifies the role of the unstored element.

@abraunst
Copy link
Contributor Author

An alternative solution, with other benefits is here: JuliaLang/julia#41036 This looks like it may be related to proposals involving isnullable, although I can't find them at the moment.

As far as I can tell JuliaLang/julia#41036 involves no breakage, inefficiency, or pirating. In fact it clarifies the role of the unstored element.

This is similar I think to the sparse_unstored proposal above in this thread by @KlausC and the nullable solution by @Suavesito-Olimpiada in https://discourse.julialang.org/t/is-there-a-defined-minimal-interface-for-a-type-to-work-with-sparsearrays/69602 if I'm not mistaken.

@Suavesito-Olimpiada
Copy link

I think that although the core idea of sparse arrays for linear algebra and storage is the same, they are quite different on the semantics.

Under the actual implementation, for linear algebra you want to have a additive "zero" element under a ring. But for storage sparse array, all you want to have is a "nullable" (unstored) element with no other properties attached to it. Under this circumstances the sparse array idea for linear algebra is a subset of the more general idea of a storage sparse array.

One solution that come to my mind that allow the current implementation to continue working is to define a AbstractStorageSparseArrays type where you can define a Unstored and IsUnstored type parameters which are functions that define the "default" unstored element and the check if an element is unstored.

Then base AbstractSparseArrrays on this new type and set this parameters as default to zero and iszero on this.

The basic idea is to have something as the following

abstract type AbstractStorageSparseArray{Tv, Ti, N, Unstored, IsUnstored} <: AbstractArray{Tv, N} end

abstract type AbstractSparseArray{Tv, Ti, N} <: AbstractStorageSparseArray{Tv, Ti, N, zero, iszero} end

const AbstractStorageSparseVector{Tv, Ti, Unstored, IsUnstored} = AbstractStorageSparseArray{Tv, Ti, 1, Unstored, IsUnstored}

const AbstractSparseVector{Tv, Ti} = AbstractSparseArray{Tv, Ti, 1}

struct SparseVector{Tv, Ti<:Integer} <: AbstractSparseVector{Tv, Ti}
    # fields...
end

And base the basic interface for sparse arrays on the new AbstractStorageSparseArray. The use of Unstored and IsUnstored as type parameters allows to define in the most convenient way what elements are not stored and the unstored value, and the definition of AbstractSparseArray does not change the current type.

This also avoids type piracy, allowing for example to do StorageSparseArray{String, Int, isempty, emptystring} (defining emptystring() = "" in the corresponding module).

Maybe I'm missing things, I've tried to go to read all the places where this came up, but it is really hard to find them all and remember all the points made.

@abraunst
Copy link
Contributor Author

I agree with almost everything (I proposed something similar some comments above), except that I think that having a generic unstored would be useful also for algebraic elements (e.g. Float64), so I would not separate between "storage" and "linear algebra" arrays. The distinction will come automatically because some algebraic operations will fail if the corresponding operations on elements are not defined...

@Suavesito-Olimpiada
Copy link

I agree with that, but the problem remains on that we cannot change type for the current SparseArray.

What I think is that this allows to solve most of the problems without breaking anything.

Also allows to be extensible, for example, someone could define a DefaultArray with a general default value (no strictly zero). This would allow to have something like what @StefanKarpinski proposed in some places before (cannot find the link rn), but the ideas was that some operation over matrices (including for pure function, map) are closed if we allow a general default value.

Again, this would be more elegant (and general) if we didn't make the distintion of the zero case, but right now that is the accepted and stable interface for stdlib SparseArray.

@KristofferC KristofferC transferred this issue from JuliaLang/julia Jan 14, 2022
@SobhanMP
Copy link
Member

this has been fixed

julia> nnz(sparse(Matrix(s)))
0

@Suavesito-Olimpiada
Copy link

Still not working with Symbolics. I think it should not be closed yet.

@SobhanMP
Copy link
Member

can you provide an MWE?

@ViralBShah ViralBShah reopened this Sep 20, 2022
@Suavesito-Olimpiada
Copy link

Sorry for the pollution, I just noticed that although I had deved the main branch from here, it would still use the sysimage version. Downloading Julia nightly, it works as expected.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests