Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[f(x) | x=y] not the same as map(f, y) #670

Closed
ksvanhorn opened this issue Apr 5, 2012 · 4 comments
Closed

[f(x) | x=y] not the same as map(f, y) #670

ksvanhorn opened this issue Apr 5, 2012 · 4 comments

Comments

@ksvanhorn
Copy link

typeof(map(x -> x^2, [1,2,3])) = Array{int64,1}

typeof([ x^2 | x=[1,2,3]) = Array{Any, 1}

It seems odd that the comprehension loses the type information, while map retains it.

@pao
Copy link
Member

pao commented Apr 5, 2012

Looks like another head of the hydra that is issue 524. (Edited to delink; see below.)

@StefanKarpinski
Copy link
Sponsor Member

No, this is actually a completely different issue and one of the most deep and vexing problems with Julia's approach to types. The problem is that in general we have semantics that only depend on run-time types, not on anything that the compiler can infer about types. However, we also store arrays with concrete element types like Vector{Float64} inline for efficiency and memory compatibility with C/Fortran, so that you can do things like call LAPACK on a float array.

The problem comes in when you want to map or comprehend and need to know what T should be in the resulting Vector{T} object. In general, what should the result type be for

map(f, v)

or

[ f(x) | x=v ]

?

In a statically typed language, the compiler has to be able to completely determine the type behavior of f, so it knows what the storage type of the resulting array ought to be. In a traditional dynamically language, like Python or Ruby, arrays can hold any kind of object because they're actually arrays of pointers to boxed objects, so there's nothing to worry about. In Julia, however, let's say we do this:

map(x->2x+1, [0:9])

The expression [0:9] produces a vector of type Vector{Int64} and we'd really like for the result to be of type Vector{Int64} too. But how do you figure that out? Here are a few options:

  1. compute all the values and figure out the tightest type they all fit it, in this case it would be Int64
  2. compute a single value and assume that everything else will have the same type
  3. use dynamic type inference to determine an upper bound on the element type

The first option sucks because you either end up storing the computed values in some inefficient intermediate location, or computing them all twice! The second option is what we're currently doing for map. I hate this option — it makes my skin crawl and my teeth ache. It must die. It's just wrong, wrong, wrong, wrong. The third option is what we're doing for comprehensions, but that ability isn't exposed to user code yet, so it can't be used in map — it's built into how comprehensions work.

The third option is the best, but it still sucks. Here's why: it makes program semantics depend on type inference — which is, in every other circumstance, just a heuristic that speeds programs up without changing their behavior. Suppose I have a situation where w = [ f(x) | x=v ] is inferred to have element type Union(Int64,ASCIIString). In some later code, you insert an ASCIIString value into w and it works just fine. Now, some time in the future, you upgrade to the latest and greatest Julia version, in which type inference has gotten sharper, which is generally a good thing. However, the type inference now determines that the element type of w actually has to be Int64 — f never actually returns an ASCIIString. Now your code breaks: when you try to insert a string into w, it doesn't work because the type of w is Vector{Int64}.

Jeff and I have talked about this a lot, especially over the past few days while we were at Lang.NEXT. I think we have worked out a way to do this that will be practical and make sure that programs that work will continue working even when the type inference is improved. Basically, the conservative thing to do here is to throw an error if the type inference can't determine a tight bound on the element type — and have the same behavior for both map and comprehensions. If you use a really complicated mapping function that type inference can't infer a tight bound for, then you will have to explicitly put a type annotation in, indicating the type you want your array to be of. That's ok. If the type inference can't figure out a tight element type, that's probably a good thing to do anyway.

@pao
Copy link
Member

pao commented Apr 5, 2012

I should really use more question marks. Thanks for the detailed comment. I've edited my first comment to delink.

@JeffBezanson
Copy link
Sponsor Member

duplicate of #210.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants