Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow variables named this #328

Open
eernstg opened this issue Apr 24, 2019 · 8 comments
Open

Allow variables named this #328

eernstg opened this issue Apr 24, 2019 · 8 comments

Comments

@eernstg
Copy link
Member

eernstg commented Apr 24, 2019

In response to #266, this issue proposes user-declared implicit member access, which is simply the feature which adds the ability for a variable (which could be a formal parameter, or any kind of variable) or getter to have the name this.

[Edit, May 3rd 2019: Clarified some errors, added an error when this has the type dynamic. May 7th 2019: Introducing a targeted this and more detailed lookup rules; added type-safe builder example.]

In Dart, the lexical scope is searched first, and any declaration found there is used. For instance, m(42) will call the library function named m in the body of m2, not the instance method m which is inherited from A:

m(x) {}

class A {
  m(x) {}
  m1() {}
}

class B extends A {
  m2() => m(42); // Calls top-level function `m`.
  m3() => m1(); // Means `this.m1()`, calls inherited method.
}

In the case where the lexical scope does not contain a declaration of the requested name, this. is prepended, and the resulting expression is used during subsequent static analysis. (So we check the lexical scopes, then this, but we may still end up with a compile-time error because the requested name is not found in any of those two ways).

The feature proposed here is to allow declarations of variables and getters named this. The existing scope rules are used to obtain the effect that this can be an implicit receiver: The enclosing lexical scopes will still be searched first, and member accesses with this as an implicit receiver will be taken into account if nothing was found in the lexical scopes, and we still get a compile-time error if none of those two approaches yield a successful lookup.

The rules proposed here specify that, in the body of an instance method, an instance member invocation is chosen whenever an instance member of the given name exists in the interface of the enclosing class (not just when such a member is declared in the lexical scope). This may seem like a breaking change (because it mentions the interface of the class, and the current rules only talk about the lexical scope), but it is in fact backward compatible because the existing rules unconditionally prescribed the addition of this as the last catch-all case.

The additional expressive power added by this feature is only the ability to give the name this to a larger set of objects, thus allowing for implicit member accesses to them.

Note that multiple declarations of this in different lexical scopes can shadow each other, just like other declarations, such that the innermost declaration wins:

class A {
  m() {}
}

class B extends A {
  m2 => m(); // Means `this.m()`, calls `A.m`.
  m3(C this) => m(42); // Means `this.m(42)`, calls `C.m` on the argument.
}

class C {
  m(int i) {}
}

This mechanism is likely to be very convenient in a number of situations.

In particular, it is commonly expected that implicit access to an object which is not the "current instance" of an enclosing class is available in extension methods (such as #41 and #177); that can be achieved simply by making the syntactic receiver of the extension method invocation be a parameter named this in the function which is the desugared version of the extension method. This means that we can use this general mechanism, and we don't need to invent a whole set of special rules to explain why it is possible to use this to access the syntactic receiver of an extension method invocation.

In general, the use of a variable or getter named this allows for concise access to the instance members of that object.

On the other hand, excessive usage of declarations named this will make it harder to read the source code. In the specification below we will leave it open how to constrain this mechanism, but it would of course be very easy to single out specific usages and make them a compile-time error (e.g., it could be a compile-time error for a top-level variable to have the name this).

If we prefer to take a more cautious path, we could make the choice to only allow declarations named this in a very small number of specific situations. In all other situations it would just be a compile-time error for a declaration to have the name this.

It would then presumably be quite easy (in terms of the specification as well as the implementation) to allow additional kinds of declarations to have the name this, whenever we have determined that this generalization is valuable in practice.

Syntax

The grammar is adjusted as follows in order to support user-declared implicit member access:

<variableIdentifier> ::= // NEW
    <identifier> | 'this'

<declaredIdentifier> ::= // MODIFIED
    'covariant'? <finalConstVarOrType> <variableIdentifier>

<simpleFormalParameter> ::= // MODIFIED
    <declaredIdentifier>
    | 'covariant'? <variableIdentifier>

<initializedIdentifier> ::= // MODIFIED
    <variableIdentifier> ('=' <expression>)?

<topLevelVariableDeclaration> ::=
    <varOrType> <variableIdentifier> ('=' <expression>)? (',' <initializedIdentifier>)*

<primary> ::= <thisExpression>
    |    <targetedThisExpression>
    |    'super' <unconditionalAssignableSelector>
    |    <constObjectExpression>
    |    <newExpression>
    |    <functionPrimary>
    |    '(' <expression> ')'
    |    <literal>
    |    <identifier>

<targetedThisExpression> ::= <typeIdentifier> '.' 'this'

<getterSignature> ::= <type>? 'get' <variableIdentifier>

Note that it is a syntax error to use the name this for a type variable, but other variables can have the name this.

Static Analysis

It is a compile-time error for a variable or getter with the name this to have the type dynamic.

This is true both in the case where the variable has a type annotation, and in the case where the type of the variable is inferred.

It is a compile-time error for a <targetedThisExpression> of the form C.this to occur, unless it occurs in an instance method of a class named C. A <targetedThisExpression> is considered to be an identifier whose static type is the enclosing class, passing any declared type parameters as actual type arguments. (For instance, C.this has type C<X> if C is declared as class C<X> {...}.)

The static type of a <thisExpression> e is the type annotation of the declaration named this if any such declaration is in scope; otherwise, the static type of e is the enclosing class, when e occurs in an instance method; otherwise, e is a compile-time error.

In the rules for unqualified invocations, a new 5th bullet point is added (here), specifying that

Otherwise, if $i$ occurs in an instance method body and
the interface of the enclosing class $C$ contains a member named \id,
then $i$ is equivalent to
\code{$C$.\THIS.\id<$A_1, \ldots,\ A_r$>($a_1, \ldots,\ a_n,\ x_{n+1}$: $a_{n+1}, \ldots,\ x_{n+k}$: $a_{n+k}$)}.

and the last bullet is adjusted to say

Otherwise, if $i$ occurs in an instance method body, or a declaration named \THIS{} is in scope,
$i$ is equivalent to the ordinary method invocation
\code{\THIS{}.\id<$A_1, \ldots,\ A_r$>($a_1, \ldots,\ a_n,\ x_{n+1}$: $a_{n+1}, \ldots,\ x_{n+k}$: $a_{n+k}$)}.

Similar adjustments are made for identifier references here.

For instance, if an expression e of the form m(42) occurs in the body of an instance method of a class C then, if there is no declaration of m in the enclosing lexical scopes, e is desugared to this.m(42), and subsequent processing is based on the desugared expression. In particular, the static analysis of e will check that there is a member named m in the static interface of the type of this, and that 42 is an appropriate actual argument list for an invocation of m. Similarly, the dynamic semantics of e follows from the result of desugaring.

Dynamic Semantics

When an ordinary method invocation takes place and it invokes an instance method m in a class C with receiver o, the targeted this-expression C.this is bound to o; if no declaration named this is in scope, this is bound to o as well.

If a declaration named this is in scope then no special bindings for this are provided, and the corresponding declared entity is accessed using the ordinary rules for access to a declared entity.

Discussion

Kotlin supports the notion of a function literal with receiver and function type with receiver. An example is the following:

val sum = fun Int.(other: Int): Int = this + other

This specifies that the receiver type is Int, and it allows the call to pass an instance of type Int by providing it as the syntactic receiver (such as 4 in 4.sum(2)), and to access that instance in the body of the function using this.

The Kotlin mechanism differs from the proposal of this issue by being more constrained: It bundles together two different mechanisms, and they can't be used separately. The first one is that a certain function argument is passed as a receiver (that is, that the call is of the form r.f(...), but r will be passed to f as the first actual argument). The second mechanism is that the "receiver argument" is accessed in the body of the function using the name this, and it also allows for implicit accesses (such that m() in the body can denote the instance method invocation this.m()).

The feature proposed in this issue, user-declared implicit member access, is only concerned with the scoping and the implicit member access, that is, the second mechanism mentioned above.

Other mechanisms (like #41, #42, #177, #309) are used to enable the first mechanism, and they in turn rely on this feature. This allows us to specify extension methods, extension types and so on in terms of a common underlying mechanism which allows this to take on the appropriate meaning in each case, as opposed to an approach where we would specify a new set of ad-hoc rules for how to understand this in the context of each of those constructs.

A major use case for Kotlin's function types and literals with receiver is type-safe builders. Here is an example from here:

// Kotlin type-safe builder example.

class HTML {
    fun body() { ... }
}

fun html(init: HTML.() -> Unit): HTML {
    val html = HTML()  // create the receiver object
    html.init()  // pass the receiver object to the lambda
    return html
}

// Example expression using the type-safe builder.

html {  // lambda with receiver begins here
    body()  // calling a method on the receiver object
}

The point is that the construction of an instance of HTML is done in the function html, and the {...} construct at the end is an argument to html which is a function literal, and the body of that function literal gets to use methods on that HTML implicitly, because it is a function literal with receiver.

If we choose to give the implicit parameter the name this then we could do something similar in Dart:

// Dart counterpart. 

class HTML {
  void body() { ... }
}

HTML html(void Function(HTML) init) {
  final html = HTML();
  init(html);
  return html;
}

// Example expression using the `html` builder.

html({
  body(); // Calling a method on the receiver object
});

It's a bit more noisy because of the parentheses and the semicolons, but we do get the basic structure of a Kotlin style type-safe build.

Note that Dart gives more information locally:

In Kotlin, you'd need to look up the type of the parameter of html (possibly in some other file) in order to understand that we may rely on an implicit receiver—for instance, body() calls a method on the argument. In Dart, we immediately know that body() can be a method call on the argument, because (if we choose the name this for the parameter) that's how abbreviated function literals work.

@lrhn
Copy link
Member

lrhn commented Apr 26, 2019

This only allows this as a parameter. Can it also be a local variable:

{
  var this = StringBuffer();
  add("42");
  addAll(something);
  ...
  return toString();
}

It seems likely that a user will want to locally use a different this binding, and if you can only do that as (this) { .... override block .... } (someObject) then it is harder to read and doesn't afford the same control flow as if you can do an inline block.

@eernstg
Copy link
Member Author

eernstg commented Apr 26, 2019

@lrhn wrote:

This only allows this as a parameter.

It does allow for this to be a declared name using <declaredIdentifier> and <initializedIdentifier>, and that allows us to have instance variables and local variables named this as well. I had forgotten to adjust <topLevelVariableDeclaration>, however, but that's done now. So the intention was to allow this to be any kind of variable, and I believe that's true now.

I still expect that we'd want to constrain this mechanism somewhat, because it will be confusing if it is used too aggressively, but it's very easy to point out some cases and make them an error.

@lrhn
Copy link
Member

lrhn commented Apr 30, 2019

I'd be very wary at allowing this as a top-level variable, or even static or instance variables.
If you declare dynamic this = null; at top-level, then you disable all invalid member errors. I don't see a good use-case for that.

I think I'd go for only allowing it for local variables.

  • An instance variable named this suggests that we also need a this-named getter/setter. It's also completely useless since instance methods presumably have a default implicit this parameter which would shadow the instance member, and static members cannot use the instance member anyway (even if it is in scope).
  • A static variable named this can only affect static members (again, the instance members have a shadowing implicit parameter), and then it gets really confusing to figure out what this means, when a static method and an instance method next to each other, with no visible this declaration in either, behave differently. Also means a this-named getter/setter.
  • A top-level variable named this would affect all invalid member declarations in the scope. It's a public name, so it's presumably exported. If you declare dynamic this; at top-level, you have effectively turned off all invalid member invocation checking. Too fragile, too dangerous. And again means getters/setters.
  • A type variable named this makes no sense.

So, I'd make this only work inside member bodies, where this already occurs.

One other worry: If you write an identifier foo, then foo is looked up in the lexical scope. If it is found as an instance member of the current class, then it is currently converted to this.foo. That would not be valid if there is a shadowing this declaration in scope. I guess we will have to say that if foo denotes an instance member, there is a this declaration shadowing the implicit this of the current instance member, then it is a compile time error because the foo declaration is not accessible (just as if you had used foo in a static member).

@eernstg
Copy link
Member Author

eernstg commented May 3, 2019

@lrhn wrote:

I'd be very wary at allowing this as ... instance variables

We do have a kind of a use case: It serves very nicely as a model for why extension methods can access the syntactic receiver using implicit member access and via an explicit this.

In general, it would enable all instance methods of a given class to behave as if the class were forwarding to a given instance variable. A similar property would apply for static variables, in which case static methods would also participate in the forwarding.

So I agree that it is a mechanism that should be used judiciously, but then it can actually be quite powerful.

dynamic this .. disable all invalid member errors.

We might very well say that this can never have the type dynamic, and similarly a type variable can't have that name. The latter is actually already enforced by the grammar that I proposed. I added these things to the proposal.

@lrhn
Copy link
Member

lrhn commented May 3, 2019

A more significant problem with making this an instance member is that if we assume all normal instance members get an implicit this parameter, then the instance variable won't shadow an implicit this parameter of an instance method (at least not without extra special-casing). If that's not the model, then I'm not sure what the model is ... maybe letting this refer to the current instance is the default if there is no this in the lexical scope, but what about inherited instance this members?

If you write this, and this is an inherited instance member, and there is no other this in scope, will it be rewritten to this.this? Will you ever stop rewriting? 😄

@eernstg
Copy link
Member Author

eernstg commented May 7, 2019

@lrhn wrote:

if we assume all normal instance members get an implicit this parameter

The specification says that this has a binding during 'ordinary method invocation' and we do not have any reference to a desugaring step where that is achieved by means of an added parameter. So in that sense the access to this in an instance method is purely semantic.

(That location used to say and with \THIS{} bound to the current value of \THIS{}, but that will not work in the case where the receiver and the current binding of \THIS{} differ, so I corrected it in dart-lang/sdk@673d5f0; in any case, it has been specified directly in terms of the binding of this all the way back to Sep 2013 when the spec was added to the SDK repo.)

For the current proposal we'd need to add something like a notion of a "targeted this", which could be written as C.this as in Java, which would denote the current instance of the enclosing class C. We do not have any kind of nested classes (inner or static or whatever) so there would always be exactly one such enclosing instance that we would add to the bindings for each instance method invocation.

Then we'd execute instance methods of a class C such that C.this is bound to the enclosing instance of C, and the plain this would get the same binding except when there is a binding of this in a lexically enclosing scope.

For ordinary method invocations we would then have an additional bullet (here) specifying that $i$ is equivalent to \code{C.\THIS{}.\id...} in the case where the interface of the enclosing class contains a member named id.

This means it is always possible to access the current instance of an enclosing class in an instance method; so id(...) means id(...) if id denotes a top-level function (and in that case we use the rules for a function expression invocation), otherwise it means C.this.id(...) if id is an instance method of the enclosing class, otherwise it means this.id(...) (which yields an ordinary invocation for C.this as well as this).

When there is no declaration named this, the latter could invoke any method in the interface of the enclosing class (or it would be a compile-time error because there is no such method).

When there is a declaration named this then it (that is, this.id(...)) would invoke id on the binding of this, whatever that is (or it could be a compile-time error as usual).

We would need to change the last bullet such that it changes id(...) to this.id(...) when it occurs in an instance method and when a declaration named this is in scope.

Superinvocations would be defined in terms of C.this. I do not advocate introducing any mechanism for superinvocations on any other instance than the current instance of the enclosing class.

If you write this, and this is an inherited instance member, and there is no other this in scope,
.. Will you ever stop rewriting?

If we consider an explicit reference to this which occurs in a class C and assume that no declarations named this exist in the lexical scope then we hit the new bullet mentioned above and make it C.this.this which resolves to that inherited member (note that C.this is a name, we don't evaluate C first and then look up this).

class A {
  int this;
}

class B extends A {
  bool foo() => this.isEven; // Rewritten to `C.this.this.isEven`
}

Conversely, consider an unqualified invocation id(...) in a situation where there is no declaration of this in scope, but there is one in the interface of the current class (e.g., because we inherit it from a superclass):

class A {
  int this;
}

class B extends A {
  foo() {
    print(isEven); // Error.
  }
}

In this situation we transform isEven to this.isEven (because no bullets before the last one is applicable), and there is no declaration of this in scope, so the static analysis uses the interface of B which does not contain a member named isEven: Compile-time error (this desugaring step should definitely be a run-once step, and I think that we never perform any desugaring which could be described as repeat-until-stable). In that sense the inherited declaration of this is powerless.

If we want to have the effect where that int is an implicit receiver then we can put the name this into the lexical scope (in a sense confirming that we want a custom implicit receiver):

class A {
  int this;
}

class B extends A {
  int get this; // Abstract getter puts a declared `this` in scope.
  foo() {
    print(isEven); // Desugars to `this.isEven`, OK statically and dynamically.
  }
}

For the dynamic semantics, the term C.this is bound to the enclosing instance of C, and so is this if there is no declared this, and a method or getter lookup for a declared this will find the corresponding member as usual.

So I don't think we have an infinite regress problem.

Finally, it could be debated, but I would expect the ability to look up members named this to be unnecessary and possibly confusing (so I haven't added this as a possible selector). This means that terms like C.this.this.isEven can only occur after desugaring, they can't be written explicitly.

If we consider a field named this to be a kind of delegatee of a given class C, then this means that clients of a C cannot access that delegatee (C can of course declare a getter that returns it, but a client cannot directly). I think that's a meaningful restriction: Delegation should probably be a private matter.

@lrhn
Copy link
Member

lrhn commented May 8, 2019

The use of C.this for overriding this resolution can work, but it precludes making this a static declaration.
Example:

class C {
  static final B this = ...;
  int foo(D this) =>
    this.foo() + C.this.foo();
}

Here C.this.foo can reference either the current C instance of the static B instance declared in C.

I can live with not allowing static/top-level this declarations at all. It's probably safer too.
If we disallow this as a general selector, then we have rules out the static access anyway (except when it's in the lexical scope?)

@eernstg
Copy link
Member Author

eernstg commented May 8, 2019

@lrhn wrote:

The use of C.this for overriding this resolution can work, but it precludes
making this a static declaration.

class C {
  static final B this = ...;
  int foo(D this) =>
    this.foo() + C.this.foo();
}

C.this.foo() will actually invoke the foo instance method of the enclosing instance of C. That's because I didn't add this as a selector, so C.this is always parsed as a <targetedThisExpression>, never as a class name followed by the name of a static member.

If we do insist on allowing this as a selector then we could remove the conflict by accessing the enclosing instance using class.this. As long as we don't do that, it is just one example of the general rule that we can't access that static member using C.this, anywhere, just like we can't access an instance member named this using anObject.this.

Conceptually, we could think of this as a certain kind of privacy: When we use a variable or getter named this it's intended to enable concise forwarding to that object in the scope of that declaration. For any location outside that scope, the declaration named this is inaccessible. If you want to have a declaration named this and also allow access to that object from the outside, declare a separate getter for it with a different name:

class C {
  static final B this = ...;
  static B get forwardee => this; // Allow remote access.
  int foo(D this) =>
      this.foo() + // Invoke `D.foo` on the argument.
      C.forwardee.foo(); // Invoke `B.foo()` on the static variable.
}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants