Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Should we expect floating point indexing to be implemented? #10154

Closed
jakebolewski opened this issue Feb 10, 2015 · 73 comments · Fixed by #10458
Closed

Should we expect floating point indexing to be implemented? #10154

jakebolewski opened this issue Feb 10, 2015 · 73 comments · Fixed by #10458
Labels
needs decision A decision on this change is needed

Comments

@jakebolewski
Copy link
Member

Handling (get/set)index(::AType, ::FloatingPoint....) is inconsistently implemented for AbstractArray subtypes in base and in user defined types for many packages. Should we expect this case be handled when defining (get/set)index behavior for user defined types? This needs to be documented (or a generic fallback defined), related to #10064.

@JeffBezanson
Copy link
Sponsor Member

In more detail, we call to_index in operators.jl on "non obvious" index types, and it does things like convert to Int.

I've never really liked this. This is a case where the types can accurately reflect what is actually supported (namely, integers). I wonder how useful it is in practice to be able to index with integer-valued floats, and get an InexactError for non-integer values, which is what we do now.

@jiahao
Copy link
Member

jiahao commented Feb 10, 2015

cc ApproxFun.jl authors @dlfivefifty @ajt60gaibb any thoughts?

@jakebolewski
Copy link
Member Author

I can see how this is useful functionality for AbstractArray subtypes, in that floating point indexing is useful when you want a user defined type that is represented as a function but is "array like". This is used in Interpolations.jl for instance. If this is useful then we should define a generic AbstractArray fallback.

I don't think this is useful for general types, such as indexing into a tuple.

@jakebolewski
Copy link
Member Author

However, in the continuous case size cannot be defined so in that sense these types cannot be thought of as AbstractArrays

@andreasnoack
Copy link
Member

Has the ability to overload call changed the need for non-integer indexing?

@jiahao jiahao added the needs decision A decision on this change is needed label Feb 10, 2015
@SimonDanisch
Copy link
Contributor

I like that you can index textures with floats in GLSL and get interpolated values back. But it's probably not justified to put something like this in Base, as it should rather live in an interpolation package.
Which is actually a case against explicitly allowing for floats, as an interpolation package couldn't overwrite it that easily than.
I'd think, that in general something went wrong if you index an array with a float.
There are a lot of use cases, where float intermediate representations are useful, but shouldn't the caller be forced, to use a function like floor,ceil, round, etc, to convert to an Integer before indexing?!
Throwing an inexact error gives me shivers, as it can lead to subtle bugs if the float is just sometimes a little bit off.

@dlfivefifty
Copy link
Contributor

In the case of ApproxFun this should be sufficient, as we have moved away from the “functions as vectors” in many other respects.

On 11 Feb 2015, at 7:03 am, Andreas Noack [email protected] wrote:

Has the ability to overload call changed the need for non-integer indexing?


Reply to this email directly or view it on GitHub #10154 (comment).

@JeffBezanson
Copy link
Sponsor Member

I agree it's typically better to handle a float index by specifying floor, ceil, etc. at the call site. Much clearer what's going on.
Types that do interpolation are clearly a separate case. It's totally fine if some AbstractArray subtypes support float indexing and others don't. #10064 is about specifying the minimal core interface.

@jakebolewski
Copy link
Member Author

Is there support for deprecating this behavior for AbstractArray's and other datatypes (ex. tuples) in base?

@stevengj
Copy link
Member

@jakebolewski, we could always just deprecate getindex(a, i::FloatingPoint...) without specifying the type of a, no?

@ViralBShah
Copy link
Member

That seems to be a reasonable approach.

@cortner
Copy link

cortner commented Mar 15, 2015

If I may: I think this is a bad idea. In many situations, it is very convenient to not worry about types. It is one of the reasons why switching to Julia was easy for me. I started a discussion at Google groups

@SimonDanisch
Copy link
Contributor

Can you give good examples, where this actually hurts?
I cried a little, when convert(Int, FloatingPoint) was starting to throw InexactError() errors.
But it actually made my code quite a bit safer and wasn't that hard to ship around.

@nalimilan
Copy link
Member

I had the same feeling as @cortner, but I failed to find good examples of where it's useful. I'm sure there are valid cases. (And this feature is much less dangerous than automatic rounding conversion from float to int, there is no loss of information here.)

@dlfivefifty
Copy link
Contributor

This change revealed several type in-safeties in my code, where what started as an int became a float: k/2 instead of div(k,2). So I'm in favour.

The deprecating of int() at the same time is a bit annoying, but given that it's ambiguous whether it means integer part or round, I'm ok with that.

Sent from my iPad

On 16 Mar 2015, at 4:00 am, Milan Bouchet-Valat [email protected] wrote:

I had the same feeling as @cortner, but I failed to find good examples of where it's useful. I'm sure there are valid cases. (And this feature is much less dangerous than automatic rounding conversion from float to int, there is no loss of information here.)


Reply to this email directly or view it on GitHub.

@IainNZ
Copy link
Member

IainNZ commented Mar 16, 2015

Only time I've used this is to select percentiles lazily, e.g.

n = 927
x = rand(n)
sort!(x)
@show x[n/4]
@show x[3n/4]

but the "cure" is div. I think the real problem will be people not knowing about div.

@mbauman
Copy link
Sponsor Member

mbauman commented Mar 16, 2015

Well if folks like my proposal to refactor getindex on AbstractArrays (#10525) then one of my reasons for liking this change will go away. As Tim noted, Real's placement in the type hierarchy makes extending getindex particularly tricky and annoying (#10458 (comment)). If we only have one getindex method defined for AbstractArrays (without specializing on the indices), then that concern is no longer relevant.

I still think this is a sensible restriction to make, but I'd be open to reconsidering it now.

@jiahao
Copy link
Member

jiahao commented Mar 16, 2015

As I had posted on julia-users, an edge case shows up when indexing with floats in regions where epsilon > 1:

julia> (1:10^17)[1e16:(1e16+5)] #Not the same as indexing with (10^16:10^16+5) 
5-element Array{Int64,1}: 
 10000000000000000 
 10000000000000000 
 10000000000000002 
 10000000000000004 
 10000000000000004

@cortner
Copy link

cortner commented Mar 16, 2015

I think it should be easy enough to throw an exception when a conversion is not "safe".

@cortner
Copy link

cortner commented Mar 16, 2015

Re use cases: it occurs e.g. when I compute an index using round, but more generally I've stumbled over it a few times. Always easy to fix, but annoying.

@cortner
Copy link

cortner commented Mar 16, 2015

@dlfivefifty : I am not convinced about the type instability. When there is a performance issue then a programmer who actually cares will be able to find it. Julia provides very nice tools to find type instability.

@dlfivefifty
Copy link
Contributor

Probably wasn't much performance impact, but it did bother me that something I called k was becoming a float...it's very unlikely I would have found this otherwise. (More unlikely that it would ever make a difference however..)

My feeling is 99% of cases where get index is called with float are unintentional.

(I do kinda think down the line there should be a "relaxed" mode where Julia is not so picky though..)

Sent from my iPad

On 16 Mar 2015, at 10:46 pm, Christoph Ortner [email protected] wrote:

@dlfivefifty : I am not convinced about the type instability. When there is a performance issue then a programmer who actually cares will be able to find it. Julia provides very nice tools to find type instability.


Reply to this email directly or view it on GitHub.

@jiahao
Copy link
Member

jiahao commented Mar 16, 2015

I think it should be easy enough to throw an exception when a conversion is not "safe".

@cortner If the index doesn't round to itself, sure. However, I don't think it is so easy to tell in the example I gave whether or not that command was intentional or a mistake. Maybe someone forgot to round the entries of the index set. But maybe someone genuinely intended to do precisely that (since you can index into an array with an arbitrary index set). Trying to second-guess a user's intent is a dangerous rabbit hole to go down.

@jiahao
Copy link
Member

jiahao commented Mar 16, 2015

AFAIK only Matlab/Octave and Mathematica support indexing into an array with a float. Maple, C, Python, Fortran do not.

@StefanKarpinski
Copy link
Sponsor Member

For dense arrays, this is not such a big concern since indexing into a Float64 vector with a length of maxintfloat() requires 64 petabytes of data. For sparse data structures, of course, this can actually happen, although I suspect it's still rare to have integral dimensions that are that large.

@johnmyleswhite
Copy link
Member

Once we have COO arrays, I think you should expect to see a lot of integral dimensions outside of the 2^52 range.

@jakebolewski
Copy link
Member Author

@cortner, @StefanKarpinski why should indexing for arrays be special cased here? Do you think that most users define indexing for all subtypes of Reals for user defined types? Almost no one does. To index consistently across all user defined types you have to explicitly convert to an Int anyways. Reasoning about Ints in the system is inescapable.

In fact, indexing by floating point values is not even consistently defined in Base. For example, the dense case is defined:

julia> A = randn(3,3)
3x3 Array{Float64,2}:
 -1.44595   1.07011    2.06128
 -0.46673   0.406243   3.70655
 -0.564156  0.494642  -0.936865

julia> A[1.0,1.0]
-1.4459545197847976

but for sparse arrays, and much of the AbstractArray subtypes it is not

julia> SA = sprandn(3,3,1.0)
3x3 sparse matrix with 9 Float64 entries:
    [1, 1]  =  -0.294094
    [2, 1]  =  0.00830028
    [3, 1]  =  -1.11293
    [1, 2]  =  0.304284
    [2, 2]  =  -0.786085
    [3, 2]  =  -0.591092
    [1, 3]  =  0.648532
    [2, 3]  =  1.72333
    [3, 3]  =  -0.0314884

julia> SA[1.0,1.0]
ERROR: MethodError: `getindex` has no method matching getindex(::Base.SparseMatrix.SparseMatrixCSC{Float64,Int64}, ::Float64, ::Float64)
Closest candidates are:
  getindex(::AbstractArray{T,2}, ::Any, ::Any, ::Any, ::Any...)
  getindex(::AbstractArray{T,N}, ::Real)
  getindex(::Base.SparseMatrix.SparseMatrixCSC{Tv,Ti<:Integer}, ::Integer)
  ...


julia> Diagonal(A)[1.0,1.0]
ERROR: MethodError: `getindex` has no method matching getindex(::Base.LinAlg.Diagonal{Float64}, ::Float64, ::Float64)
Closest candidates are:
  getindex(::AbstractArray{T,2}, ::Any, ::Any, ::Any, ::Any...)
  getindex(::AbstractArray{T,N}, ::Real)
  getindex(::Base.LinAlg.Diagonal{T}, ::Integer, ::Integer)
  ...

@mbauman work would help here for subtypes of AbstractArray, but that still does not address the behavior for user defined types. We could obviously export the conversion and promote it as the way to do indexing but that would be a big change to the Julia code which is currently out in the wild.

@StefanKarpinski
Copy link
Sponsor Member

To be clear, I'm kind of ambivalent about this. It's annoying how inconsistent this is across all of Base and various packages. I don't think that indexing is an exceptional case – I want a more consistent policy. But to have such a policy, I think we need to have a reason why we draw the line where we do. Currently, it's a very wiggly line.

@johnmyleswhite
Copy link
Member

For me, the bright line is just that indexing shouldn't use floating-point numbers in addition to allowing integers. The rationale I see is that:

  • Allowing floats doesn't increase the expressive power of the language
  • Allowing floats increases the amount of code that has to be written, documented, tested and maintained
  • Allowing floats contributes to encouraging people to develop a bad habit of conflating integers and floating point numbers

For me, floating point indexing is almost entirely lose-lose. It's kind of cute that it works, but it's not cute enough to justify its existence in the absence of some other compelling rule that would imply it ought to exist.

@jakebolewski
Copy link
Member Author

I just want some consistency in the interface as well, but I agree with @johnmyleswhite's points which is why I proposed the change.

@dlfivefifty
Copy link
Contributor

I would say as someone who has not really used MATLAB/R that it wasn’t “cute” but “bizarre” to find out that float indexing works… I just checked Mathematica (which does distinguish between Ints and Floats) which turns out also supports Floats as an index..but it would be bizarre if Mathematica’s behaviour wasn’t bizarre :)   

On 17 Mar 2015, at 6:26 am, John Myles White [email protected] wrote:

For me, the bright line is just that indexing shouldn't use floating-point numbers in addition to allowing integers. The rationale I see is that:

Allowing floats doesn't increase the expressive power of the language
Allowing floats increases the amount of code that has to be written, documented, tested and maintained
Allowing floats contributes to encouraging people to develop a bad habit of conflating integers and floating point numbers
For me, floating point indexing is almost entirely lose-lose. It's kind of cute that it works, but it's not cute enough to justify its existence in the absence of some other compelling rule that would imply it ought to exist.


Reply to this email directly or view it on GitHub #10154 (comment).

@goretkin
Copy link
Contributor

In the world where floats can be an index to an array, I'm confused about whether a[3.9] would be an error or whether there would be rounding.

If the proponents of that world think it should be an error, then we might as well allow a["4"]. There are many strings that don't represent integers, just as there are many floats that don't either. It's harder for strings to arise from numerical computations than it is for floats, but I think it would be consistent with doing a best-effort interpretation of a float index, right?

I don't think that's the right way to go, but doesn't it seem like either you'd have both or none?

@nalimilan
Copy link
Member

@goretkin Currently:

julia> Int(3.9)
ERROR: InexactError()
 in call at base.jl:36

julia> Int(4.0)
4

julia> Int("4")
ERROR: MethodError: `convert` has no method matching convert(::Type{Int64}, ::ASCIIString)

So it's not like float -> int and string -> int aren't treated differently already; "both or none" doesn't sound like a sufficient rule.

@cortner
Copy link

cortner commented Mar 17, 2015

@goretkin "4" is not a number but a string; 4.0 is a number.

@dlfivefifty
Copy link
Contributor

If BigInt("4") works then so should Int("4")...

Sent from my iPhone

On 17 Mar 2015, at 7:45 pm, Christoph Ortner [email protected] wrote:

@goretkin "4" is not a number but a string; 4.0 is a number.


Reply to this email directly or view it on GitHub.

@cortner
Copy link

cortner commented Mar 17, 2015

I just want to say this discussion got the attention I had hoped for and I am very happy with this, even if the developers decide to stick with integer-only indexing.

Just to reiterate, the much bigger issue for me is, that it would be great if the Julia developers could bear in mind that the majority of users and potential users (all of Matlab and R!) of the language are not and don't want to be computer scientists. Possibly I am wrong on this particular issue, but I just think decisions like having to write i = round(Int, x) instead of i = round(x) (so I can now use i as an index; and btw I can't write Int(a) if a is an vector, I have to write Vector{Int}(a)) and Dict(a=>b) instead of [a=>b] and Float64[ f(x) for x in X] instead of [f(x) for x in X] put me off just a little bit. It makes it a little less pleasant to write Julia code and having too many such cases will make teaching in Julia and conversion to Julia much harder.

@dlfivefifty
Copy link
Contributor

I have to say that I agree with all your examples, other than float for getindex 😀

Sent from my iPhone

On 17 Mar 2015, at 8:58 pm, Christoph Ortner [email protected] wrote:

I just want to say this discussion got the attention I had hoped for and I am very happy with this, even if the developers decide to stick with integer-only indexing.

Just to reiterate, the much bigger issue for me is, that it would be great if the Julia developers could bear in mind that the majority of users and potential users (all of Matlab and R!) of the language are not and don't want to be computer scientists. Possibly I am wrong on this particular issue, but I just think decisions like having to write i = round(Int, x) instead of i = round(x) (so I can now use i as an index; and btw I can't write Int(a) if a is an vector, I have to write Vector{Int}(a)) and Dict(a=>b) instead of [a=>b] and Float64[ f(x) for x in X] instead of [f(x) for x in X] put me off just a little bit. It makes it a little less pleasant to write Julia code and having too many such cases will make teaching in Julia and conversion to Julia much harder.


Reply to this email directly or view it on GitHub.

@nalimilan
Copy link
Member

Related debate: array constructors that accept any real: #4275, #9776

@garborg
Copy link
Contributor

garborg commented Mar 30, 2015

@cortner I've worked mostly using Stata and R doing economics and statistics in academia and business, and I suspect I'm in the minority, R- and Matlab- user-wise, but...

I think writing Int(a) if a is a vector is the greater evil -- special casing certain operations like that both obscures what that step is doing, and is, to me, idiomatic in a bad (inconsistent) way and more to store in one's head for newcomer and old user alike. map(Int, a) and Vector{Int}(a) seem like plenty -- with the latter hinting that conversion, not construction, is happening, and the former handling other cases (or if one prioritizes brevity over giving maximal hints to readers).

I'm more on the fence about floating point indexing, but I'm mostly bugged by running into libraries implementing it inconsistency, or partially -- I've never actually missed it personally.

@dlfivefifty
Copy link
Contributor

I think having one command that converts both vectors and scalar to a type is convenient for writing functions that work both on scalars and vectors. int(a) is cleaner than isa(a,Vector)?Vector{Int}(a):Int(a)

Sent from my iPad

On 31 Mar 2015, at 2:05 am, Sean Garborg [email protected] wrote:

@cortner I've worked mostly using Stata and R doing economics and statistics in academia and business, and I suspect I'm in the minority, R- and Matlab- user-wise, but...

I think writing Int(a) if a is a vector is the greater evil -- special casing certain operations like that both obscures what that step is doing, and is, to me, idiomatic in a bad (inconsistent) way and more to store in one's head for newcomer and old user alike. map(Int, a) and Vector{Int}(a) seem like plenty -- with the latter hinting that conversion, not construction, is happening, and the former handling other cases (or if one prioritizes brevity over giving maximal hints to readers).

I'm more on the fence about floating point indexing, but I'm mostly bugged by running into libraries implementing it inconsistency, or partially -- I've never actually missed it personally.


Reply to this email directly or view it on GitHub.

@StefanKarpinski
Copy link
Sponsor Member

Mapping Int already has this property:

julia> map(Int,1.0)
1

This is an argument in favor of considering scalars to be container-like.

@garborg
Copy link
Contributor

garborg commented Mar 30, 2015

@dlfivefifty IMO those are two separate issues -- either we want that behavior for callable things, not just conversion/coercion (in which case map or a special syntax for mapping -- open discussion in #8450 -- is the way to go), or we decide on balance to forgo it. I'd be sad to see it special cased again.

@ScottPJones
Copy link
Contributor

Interesting discussion here... but I ask, why shouldn't you be able to index by a floating point value (either binary or decimal)?
I've spent almost 30 years as one of the developers/architects for a language/database system where that is done all the time... it is heavily used (you're medical information is most likely stored using it,
if you live in the US, England, Sweden, Scotland, and many other places around the world... or if you get arrested in Belgium! 😀)
Basically, any string can be used as an index into a multidimensional associate array, and they
are sorted (a B+tree is used for persistent variables, and a ptrie is used for in-memory variables)
using a particular collation rule, which defines the numeric collation and string collation behavior.
The rule says 1) whether you have pure string collation, or where all numbers come first, sorted by numerical value [up to around 510 digits]), and 2) if there is some country/language specific collation sequence (i.e. like Spanish or French, where you want estas and estás to sort before estandar)

The indices (subscripts) all get encoded in a wonderfully special way 😀 so that "" sorts before everything, then all numbers (including floating point), and then all strings (unless the rule says to treat numbers as strings)... Unicode is also encoded in a very packed format, so that you normally have things sorted by the Unicode code points, but still can have collations where FOO and foo both sort before FOOBAR (case insensitive - but they are still distinct).

So, it is quite frequent to have an array with index with subscripts like "-100.234" 0.123e-6 12345.

It would be extremely limiting IMO if Julia did not allow for AbstractArrays to be able to handle this...

@timholy
Copy link
Sponsor Member

timholy commented Apr 25, 2015

#10525 is about to change this, in a way that's easy to make systematic.

@ScottPJones
Copy link
Contributor

@timholy that will be just what I need, actually, to implement associative arrays... I have to dig into this, will I be able to have an arbitrary number of dimensions? Some customers had many many dimensions (10 or more easily!)

@timholy
Copy link
Sponsor Member

timholy commented Apr 25, 2015

Yes, you'll be able to have arbitrary dimensions. "Easy & generic" arbitrary dimensions await #10911, but you can "hand-generate" all the methods you need up to some fixed dimensionality.

@stevengj
Copy link
Member

@ScottPJones, it sounds like you are thinking of the (already existing) Dict type (i.e. an associative array). Dict can indeed be indexed by anything you want. If you want ordering, then you can use an OrderedDict (in the DataStructures package) or a PriorityQueue, depending on your needs.

And, yes, a Dict can have an arbitrary number of dimensions, in the sense that you can index them like mydict[a,b,c,d] = e. (Technically, this is equivalent to a tuple key (a,b,c,d).)

The Array type, which is the topic of this issue, is something different. Arrays a high-performance data structure specialized for the case where you have a dense set of sequential integer keys (or tuples thereof).

@ScottPJones
Copy link
Contributor

@stevengj I'd want an ordered, multi-dimensional Dict then... also, can Dict's sort numerically? I hadn't thought so... but there is so much I don't yet know in Julia... (I also want a persistent OMD [ordered, multidimensional Dict 😀] I also have to look if Dict's allow any string, including "", as a key...

@stevengj
Copy link
Member

@ScottPJones, an OrderedDict can sort by any comparison function that you want. And yes, they allow any Julia object as a key, including empty strings.

@kmsquire
Copy link
Member

Actually, an OrderedDict maintains the insertion order of elements. A
SortedDict will maintain the sort order.

I started working on a patch to sort OrderedDicts a while ago, but never
finished it. I guess now would be as good of a time as any...

Cheers,
Kevin

On Sat, Apr 25, 2015 at 6:51 PM, Steven G. Johnson <[email protected]

wrote:

@ScottPJones https://github.com/ScottPJones, an OrderedDict can sort by
any comparison function that you want. And yes, they allow any Julia object
as a key, including empty strings.


Reply to this email directly or view it on GitHub
#10154 (comment).

@stevengj
Copy link
Member

Thanks @kmsquire, I got the two confused.

@StefanKarpinski
Copy link
Sponsor Member

I've spent almost 30 years as one of the developers/architects for a language/database system where that is done all the time... it is heavily used (you're medical information is most likely stored using it,
if you live in the US, England, Sweden, Scotland, and many other places around the world... or if you get arrested in Belgium! )

@ScottPJones, Please stop giving your resume whenever you're making an argument. Argument from authority is actively counter-productive to having a reasoned debate. Someone who has only been programming for a year could just as easily be right while people who have been programming for decades are wrong. If you can't make a solid argument without falling back on authority, you don't have a good case. Experience is a great resource, but it should provide good examples from which solid arguments can be made, not serve as a replacement for making a compelling case.

@ScottPJones
Copy link
Contributor

@StefanKarpinski Sorry about that, I was not trying to make an argument from authority, I hate that myself, but rather based on my experience, of real world applications, and the rest of that comment was an attempt to give an example of just what is done where people do want to use sorted floating point keys, coexisting with sorted, collated strings.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
needs decision A decision on this change is needed
Projects
None yet
Development

Successfully merging a pull request may close this issue.