Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Should null short-circuiting short-circuit spreads? #330

Closed
Tracked by #110
leafpetersen opened this issue Apr 24, 2019 · 24 comments
Closed
Tracked by #110

Should null short-circuiting short-circuit spreads? #330

leafpetersen opened this issue Apr 24, 2019 · 24 comments
Labels
nnbd NNBD related issues

Comments

@leafpetersen
Copy link
Member

As part of the NNBD language feature, we will allow null-aware operators to "short-circuit" a certain amount of the their continuation. So for example, a?.b.c will evaluate to null if a evaluates to null, instead of throwing a null pointer exception when c is called on null.

Should we extend this behavior to spreads? And if so, what semantics should it have? That is, given:

  void test() {
    List<int>? l = null;
    print([...l?.map((x) => x+1)]);
  }

If short-circuiting does not propagate through spreads, this is a compile time error, because l?.map(...) evaluates to either an Iterable or null, and it is an error to spread a nullable type. So you would have to change the code to ...?l?.map((x) => x+1).

If we make null-short-circuiting propagate through spreads, then this code could be allowed.

There is a question as to what semantics to give it. Consider ...l?.map((x) => x+1).

  • The treatment of short-circuiting elsewhere is that the short-circuited operation is only performed in the non-null continuation, which would suggest this semantics:
    • if (l == null) null else ...l.map((x) => x+1)
  • An alternative is to do the following:
    • if (l == null) ...?null else ...l.map((x) => x+1))

The latter semantics seems likely to be more useful.

@leafpetersen leafpetersen added the nnbd NNBD related issues label Apr 24, 2019
@leafpetersen
Copy link
Member Author

cc @munificent @eernstg @lrhn

@eernstg
Copy link
Member

eernstg commented Apr 24, 2019

We might also just stick to the explicit approach, and say that ...l?.map((x) => x+1) is a compile-time error because the type of l?.map((x) => x+1) is nullable. Developers would then be required to use ...?l?.map((x) => x+1).

In return, Dart would avoid having a situation where the ?s can propagate to the left, which seems to be an anomaly.

(I believe that ?s will otherwise always propagate to the right, in the sense that when a ? is added implicitly, it occurs in a position which is to the right of an existing explicit ?).

@leafpetersen
Copy link
Member Author

in the sense that when a ? is added implicitly

To be mildly pedantic, implicitly adding ? isn't really right (though it's a semi-reasonable informal way of thinking about it). To see this, note that:

  class A {
    dynamic foo() => null;
  }
  void test() {
    A? a = new A();
    print(a?.foo().isEven);
  }

evaluates to a null pointer error, rather than printing out null (which is what you would get if you just inserted a ? everywhere to the right).

@munificent
Copy link
Member

Should we extend this behavior to spreads?

I think so, yes, strongly.

I believe there is a clean narrative we can make for null-shorting which is "the scope of the shorting is the surrounding expressions that have null-aware forms". So, in a?.b.c()[f](g) + h, we null-short .c(), [f], and (g) because they each have null-aware forms (?.c(), ?.[f], ?.(g)) and we don't short out to + h because it does not. (Don't be misled by the fact that .c() appears after a?.b. .c() is the parent expression and a?.b is the receiver subexpressions).

If that narrative appeals, we have some work to do to make it actually true:

  1. Add null-aware forms of [] and () because we do want to short those.
  2. Short spreads since spreads do have a null-aware form.

I like this narrative, and I think 1 and 2 are pragmatically useful, so I think we should null short spreads as well as adding those other null-aware operators.

An equivalent narrative is: "If not shorting some operation X would cause a type error (because some null-aware subexpression has a nullable type in a position where a nullable type isn't allowed) and there is a null-aware variant of X, then X should null-shorted."

And if so, what semantics should it have?

It should insert no elements in the surrounding collection. That's what users want. Your second desugaring is what I would want, but ...?null is a little odd. I'd probably show it as:

[...(l != null ? l.map((x) => x+1) : const [])]

In return, Dart would avoid having a situation where the ?s can propagate to the left, which seems to be an anomaly.

The spread syntax is a little odd because it reads right-to-left, but I don't think "null shorting propagates to the right" is the clearest mental model. Instead, think of it as "null shorting propagates up the syntax tree". That's true both for method chains and for spreads, it's just that the root of the tree is textually on the left for spreads and the right for methods. There are other places in the language where execution goes right to left, like in a(b(c(d))).

So the idea is that a null-aware operator evaluating to null in certain subexpression positions may cause certain parent expressions to be skipped as well.

in the sense that when a ? is added implicitly

Yeah, sorry I mentioned that in the language meeting. That's not the right way to look at it. Sorry for the confusion.

@a14n
Copy link

a14n commented Apr 26, 2019

Shouldn't short circuit be handled on the spread operator?

[...?l.map((x) => x+1).where((x) => x > 1)];
// same as
[...?l?.map((x) => x+1)?.where((x) => x > 1)];

@eernstg
Copy link
Member

eernstg commented Apr 26, 2019

@leafpetersen wrote:

implicitly adding ? isn't really right
.. a?.foo().isEven ..
evaluates to a null pointer error

I'd certainly expect a?.foo().isEven to mean a?.foo()?.isEven if we introduce propagation of null-awareness. We've discussed this topic now and then at least since 2016 (e.g., dart-lang/sdk#26240), and I haven't heard anything to the contrary (we never went into enough detail to actually spell out any specific semantics, but there are a bunch of examples). If null-awareness propagation cannot be used to avoid a null error in a?.foo().isEven then what's it good for at all? ;-)

@lrhn
Copy link
Member

lrhn commented Apr 26, 2019

I would personally say that if null-awareness short-circuiting extends to spreads, then

[... l?.map(x) => x+1).where((x) => x > 1)]

should be valid. If l is null, this introduces no elements. in the list. Since l?. promises that the expression might be null, having to repeat the ? on the ... seems redundant and errorprone in the same way as having to repeat it for ?.where.

Short circuiting is not equivalent to just inserting extra ?s. Inserting the extra ?s will hide other null errors, and avoiding that it the reason to do proper short-circuiting.
(Requiring e1?.foo?.bar hides whether foo can be null. If you can just write e1?.foo.bar, then we can make that an error if foo is also nullable, and we can give a warning if you do write e1?.foo?.bar and foo cannot be null).

A context(expr?.tail) with short-circuiting can be rewritten as context(let tmp = expr in tmp == null ? null : tmp.tail) (where let is an expression-level variable introduction that the language doesn't have, but which makes things so much easier to explain).
If spread is included in the short-circuiting, it means it's part of the tail instead of the context. In practice ... l?.map(...).where(...) would be rewritten to let tmp = l in if (tmp != null) ... tmp.map(...).where(...).

(If you think of the problem using continuations, then a short-circuiting is a delimited continuation, so C2(C1(e)) (an expression with the continuation C2C1) becomes C2(let tmp = e in if (tmp == null) null else C1(tmp)). The big question is where to delimit the short-circuited continuation, and here whether ... C(e) includes the ... in the short-circuited continuation or not).

@eernstg
Copy link
Member

eernstg commented Apr 26, 2019

@lrhn wrote:

A context(expr?.tail) with short-circuiting can be rewritten as
context(let tmp = expr in tmp == null ? null : tmp.tail)

OK, that approach allows a?.foo().isEven to evaluate to null (rather than throw) when a is null, without changing .isEven to ?.isEven.

This means that we're considering null-awareness propagation as a semantic rather than a syntactic mechanism, hence the continuations. But this kind of specification cannot be used directly (unless we want to start talking about 'the expression evaluation completes with null-propagation', and that's surely more general than needed/appropriate). The problem is that it is ambiguous (depending on what context(...) means, but it looks like meta-syntax).

The current grammar allows for expressions of the form <primary> <selector>*, and it looks like that would be the intended forms covered by expr?.tail (so expr is derived from <primary> <selector>* and ?.tail is derived from <selector>+).

But this means that we'd have to say something like "and expr?.tail is not a subexpression of an expression derivable from expr?.tail <selector>+". We don't otherwise have that kind of "it must be the biggest expression" constraints in the spec today, and we'd need to consider the implications of having such things.

(If we do not add the "biggest expression" constraint then we would be allowed to rewrite x?.m1.m2 as both let v = x in v == null ? null : v.m1.m2 as well as (let v = x in v == null ? null : v.m1).m2, which is of course not the intention.)

So this means that context(expr?.tail) could stand for print(expr?.tail) or expr?.tail..toString(), but it could not stand for expr?.tail.baz.

We could consider <assignableExpression> as well, e.g., context(expr?.tail) could stand for expr?.tail.baz = 14, but we probably don't want to make that mean (let v = expr in v == null ? null : v.tail)?.baz = 14, and it will then soon be a compile-time error because expr?.tail has a nullable type.

If we use that desugaring then we'd automatically be able to skip over constructs that do not have a nullable form today, e.g., x?.m[y] would skip [y], x?.m(y)<int>() would skip <int>(), so if we only want to skip selectors that have a nullable form then we'd need to refine that "biggest expression" constraint a bit more. But we could also just skip the whole ?.tail based on the syntax, and then decide separately whether/when we want to add support for ?.[...], ?.<...>(...) (along with other non-selector proposals like +?).

But all these things don't change the fact that the skipping actions are always going left-to-right. So I still tend to prefer an explicit ...?, as in

[...? work.hard().to[3].getToA.listWithASomewhatLongerName?.map((x) => x+1).where((x) => x > 1)]

The ?. in the spread expression could potentially occur pretty far from the token ..., and there are lots of other ways for the spread expression to be nullable, so why is this one special? I would expect the developer who is trying to understand what's going on to be helped more by having ...? up front than the original author was helped by not having to write that ?. ;-)

@leafpetersen
Copy link
Member Author

leafpetersen commented Apr 26, 2019

@eernstg

OK, that approach allows a?.foo().isEven to evaluate to null (rather than throw) when a is null, without changing .isEven to ?.isEven.

Importantly, only if foo() has a non-nullable return type.

This means that we're considering null-awareness propagation as a semantic rather than a syntactic mechanism, hence the continuations.

Yes. I have a proposed semantics (given as a source to "mostly source" transform) in the NNBD spec proposal.

@leafpetersen
Copy link
Member Author

@a14n

Shouldn't short circuit be handled on the spread operator?

[...?l.map((x) => x+1).where((x) => x > 1)];
// same as
[...?l?.map((x) => x+1)?.where((x) => x > 1)];

I don't think so. The principle that we're going by is that we want to take things that would otherwise just always be a static error requiring you to add a bunch of ? markers and instead just make them do the right thing. But a crucial point is that we don't want to just silently hide nulls in the process. So we require you to explicitly add a ?. for every operation that can produce a null directly. We just don't require you to explicitly handle the propagation of that null past other operators that can't produce null. So in your example, if l has a nullable type, then we want you to have to write an explicit ? there, to say "Yes, I know that this operation can return null, and I want you to handle it for me by short-circuiting the rest of this sequence of operations". We don't require you to put a ? after the map call to say "I know that l?.map(..) can return null, and I want you to propagate it", because it's redundant: of course you know that l?.map(...) can return null - it says so right there!.

Your interpretation is more like: l.map(...).where(...) occurs in a context where null can be accepted, so I'll just silently drop any operations that return null on the way. This is dangerous, and not refactoring safe. I may have written the ...? because I know that .where returns null sometimes, but l and .map(...) might have non-nullable type. If someone refactors those to have nullable type - I want to be told that by the analyzer.

@lrhn
Copy link
Member

lrhn commented Apr 29, 2019

I'm leaning towards not including the spread operator in the null-propagation context.

I'm sure it's doable, but I think it might also be confusing to users.

It is already potentially confusing that you can do ... foo.bar?.baz but not ... foo.bar?.baz + 4 because we only propagate short-circuiting nulls through selectors, not weaker binding operators. I'd like to make the story about where null-propagation works very clear (not just consistent), so saying that it has effectively the same extent as a spread might help.

It does mean that you have to write ...? whenever the operand expression might be null, even if it is syntactically obvious that it might be null because it contains a ?..

@lrhn
Copy link
Member

lrhn commented Apr 29, 2019

Yes, I'm considering a semantic model where "the expression evaluation completes with null-propagation" is included.

We may want to introduce shorthands like "evaluate an expression to a null-aware value" vs "evaluate an expression to a value", where the former can evaluate to a "propagating null" and the latter would evaluate the same expression just to null.

Or maybe we won't need that because our grammar is of the form receiver selectors*. We can define evaluation of that recursively on the selectors, taking the receiver as a parameter.

Take e selectors. To evaluate that to a value, first evaluate e to a value v, then do "selector evaluation" of selectors with v as receiver.

Selector evaluation of no selectors on v evaluates to v.
Selector evaluation ?. method(args) selectors' on v is evaluated as:

  • if v is null, evaluate to null.
  • otherwise evaluate v.method(args) to a value v' and perform selector evaluation of selectors on v'.

(similarly for selectors ?.[e], ?.[e] = e2, ?.getter, ?.setter = e, without the first clause for the non-null-aware selectors).

@munificent
Copy link
Member

I'm leaning towards not including the spread operator in the null-propagation context.

I'm sure it's doable, but I think it might also be confusing to users.

I'm worried about that too, but my hunch is that it's practically useful enough that we'll regret not doing it. If we don't any time you use a null-aware operator somewhere inside the operand to a spread, you'll have to change the ... to ...? or you'll get a static error.

That sounds an awful lot to me like the original motivation for doing null-shorting in the first place, so...

Or maybe we won't need that because our grammar is of the form receiver selectors*. We can define evaluation of that recursivley on the selectors, taking the receiver as a parameter.

I have always found this corner of the grammar very odd. Semantically, the receiver is a subexpression of the invocation, so the "natural" grammar in my mind is:

invocation
    :    invocation '(' stuff... ')'
    |    invocation '[' expression ']'
    |    invocation '.' identifier
    |    primary
    ;

Do we not specify it that way because we want to avoid left recursion in the grammar, or is there some other reason?

@lrhn
Copy link
Member

lrhn commented May 3, 2019

@munificent There is the problem of cascades.

A foo..bar.baz..qux.wiz needs to be parsed as ((foo..bar.baz)..qux.wiz), but the value of foo..bar.baz is not the value of (foo..bar).baz, so I'm not sure how to grammatically associate that.

I guess we could have two different productions, one for normal invocations and one for cascade-content invocations, so the above becomes (((foo..bar)->baz)..qux)->wiz, and the value of foo..bar is some combination of foo and foo.bar, where the following selector determines which one is used,
so (foo..bar)->baz is foo..bar.baz and (foo..bar).baz is calling baz on foo.
Doesn't seem simpler, though.

For on-cascade invocations, I think it's only a matter of not duplicating productions unnecessarily.

@eernstg
Copy link
Member

eernstg commented May 3, 2019

@munificent wrote:

Do we not specify it that way because we want to avoid left recursion in the grammar

I think that might very well be the main reason. The form you mention is more compositional (it supports specifying the meaning of a term in the smallest possible increments) whereas the form <primary> <selector>* in principle requires us to talk about a the longest possible term of that form (because it gets inconvenient if we wish to specify the semantics of a selector in isolation).

But the specification already specifies the meaning of various terms of the form e.<selector>, so in that sense you could say that the specification implicitly assumes a transformation of the syntax tree to the more compositional structure, before we can even start determining what the program means. ;-)

@eernstg
Copy link
Member

eernstg commented May 3, 2019

@lrhn wrote:

the problem of cascades

We have the same syntactic structure for cascades as we do for selectors, <conditionalExpression> <cascadeSection>*, so the same approach should be applicable, assuming that we add support for ?...

However, it is not very meaningful to have ?.. in any other pattern than e?..cs1?..cs2?..cs3 etc. (that is: on every cascade section, if ?.. occurs at all), because each section will attempt to execute an instance method on the same receiver. So we could specify e?..cs1..cs2..cs3 to mean that we skip the entire cascade as in let v = e in v == null ? null : v..cs1..cs2..cs3, which would give it the same semantics as e?..cs1?..cs2?..cs3. (We can be more flexible with methods on Object?, if we find it worthwhile.)

In each cascade there could be occurrences of ?., and they should probably be handled in a similar way as the corresponding expressions of the form <primary> <selector>*, and similarly to assignable expressions when the cascade contains assignments.

For example, foo?..bar?.baz would desugar as

let v = foo in v == null
    ? null
    : (let v1 = v, _ = (let v2 = v1.bar in v2 == null ? null : v2.baz) in v1)

which might of course be further simplified because we're testing several times whether the same value is null, and because the variable _ is unused.

As before, <assignableExpression> is different from <primary> <selector>* so we'd need to consider cascade sections containing assignments separately. For instance, e..x?.y = e1 should probably not evaluate e1 if e evaluates to o and o.x is null (based on the treatment of x?.y = e1), etc.

@lrhn
Copy link
Member

lrhn commented May 3, 2019

The "problem with cascades" that I was eluding to here is that I can't find a nice compositional grammar for the individual selector chains. That's hardly surprising with the way cascades work.

Allowing e1?..s1..s2..s3 to short-circuit the entire cascade was the idea. I don't think it should short-circuit the baz in (foo?..bar).baz though. On thing at a time.

The desugaring of foo?..bar?.baz seems correct, although v1 could probably be skipped since it's just an alias for v.

And yes, the assignment is also correct, e..x?.y =e1 would be

let v= e in if e == null ? null : let _ = (let v2 = v.x in v2 == null ? null : v2.y=e1) in v

@leafpetersen
Copy link
Member Author

This never got resolved. Reading over the comments, I think @munificent is in favor, I'm not sure where @lrhn landed. Thoughts on this?

@lrhn
Copy link
Member

lrhn commented Dec 9, 2019

Still against it.
If it's not down-source in a selector chain from the ?. then it's not part of the short-circuiting. That's the only way I have found to explain what the scope is which isn't too complex.

(Which is also why we should fix it so that --x?.y doesn't work. Currently it does because it's considered equivalent to x?.y -= 1)..

@leafpetersen
Copy link
Member Author

@eernstg is also not in favor of this. @munificent do you want to make a strong pitch for this, or can we mark this is as not planned? It should be something we can change our minds on later, since adding it would only allow more programs.

@natebosch
Copy link
Member

I believe there is a clean narrative we can make for null-shorting which is "the scope of the shorting is the surrounding expressions that have null-aware forms".

If we use this as justification, is it going to limit us in what other null-aware forms we can add later?

@munificent
Copy link
Member

I don't know if I can make a principled argument for why we should do it, but I still think we should do it.

If a user writes a?.b.c. We short-circuit the .c because if we didn't the user would get a static error forcing them to write another ?. It's a pointless tax — we know exactly what the user wants but refuses to run until they write it.

As far as I can tell, that argument applies equally well to [...a?.b]. If we don't short-circuit, the user gets a static error forcing them to write a ?. We again no exactly what the user wants — insert no elements — but we force them to write it anyway.

I think if we don't do this, users will keep tripping over it and asking why we don't do it. (In fact, we've already gotten users asking why spreads don't implicitly skip null even without a null-aware operator.) Here's the potential user experiences as I see it:

  1. Don't short-circuit, user doesn't expect it to. Dart today. Harmless, but a little verbose because they have to write ? after the ....

  2. Don't short-circuit, user expects it to. They get an annoying static error that forces them to write a seemingly pointless ?. Language feels dumb to them.

  3. Do short-circuit, user doesn't expect it to. The only practical use for this case is if the user's code is wrong and the static error you get from not short-circuiting helps them discover that. Given that the scope of null-aware operators is pretty small and syntactically visible, I think mistakes around it are relatively uncommon. The ?. is right there and they wrote the ... too. It's hard to imagine what they expected it to do if not short-circuit. Call a .iterator extension method defined on null?

  4. Do short-circuit, user expects it to. Great. Does what they want, maximally terse. Language feels smart and helpful.

Looking at those, it seems like short-circuiting is useful and maybe a little delightful for the users who expect it, and not particularly harmful for those who don't.

I believe there is a clean narrative we can make for null-shorting which is "the scope of the shorting is the surrounding expressions that have null-aware forms".

If we use this as justification, is it going to limit us in what other null-aware forms we can add later?

Yes, it would. I think this came up in the language meeting when we talked about it. :-/

@munificent
Copy link
Member

I spent some time talking to @natebosch about this and I think I've been persuaded to the other side.

The two things I found most compelling are:

  • We have a request for null-aware collection elements, which I think is a good addition to the language. If we have that, then in something like [?a?.b], the first ? seems redundant. Why don't we short it? But doing so is very likely a bad idea because it implies shorting every collection element that uses ?.. Very likely not what users want?

    But if we don't null-short single elements, then it feels inconsistent if we do short spreads:

    [
      ...?a?.b, // <-- First '?' is unneeded here.
      ?c?.d     // <-- But required here.
    ]
  • If we don't short spreads, then the result is a compile-time error. That tells users exactly what the behavior is and how to get the result they want. So not shorting may be annoying, but it won't cause the program to do something the user didn't want. More importantly, it also means we could add shorting in the future without it being a breaking change.

Given that, and that the rest of the language team seems to be on the side of not shorting, I think that's probably the right choice, at least for now.

@eernstg
Copy link
Member

eernstg commented Feb 3, 2020

Closing: We do not introduce null-shorting for the spread operator. The nnbd spec has reflected this decision for quite some time already.

@eernstg eernstg closed this as completed Feb 3, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
nnbd NNBD related issues
Projects
None yet
Development

No branches or pull requests

6 participants