Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Proposal: Auto-propagated function parameters #2327

Open
yazaddaruvala opened this issue Feb 8, 2018 · 47 comments
Open

Proposal: Auto-propagated function parameters #2327

yazaddaruvala opened this issue Feb 8, 2018 · 47 comments
Labels
T-lang Relevant to the language team, which will review and decide on the RFC.

Comments

@yazaddaruvala
Copy link

yazaddaruvala commented Feb 8, 2018

I've never created an RFC or proposal before, some mentoring would be appreciated!

Summary

The purpose of this proposal is to provide an ergonomic yet explicit way to get access to a set of variables deep in the call stack, or for a lot of sibling calls which need the same value. You may even call this feature, "compile time, strongly typed, dependency injection", or "context variables". This particular proposal is about an implementation based on the addition of an AST transform to give Rust some syntax sugar to achieve this.

Motivation

There are a lot of times you want to propagate a function parameter deep into the call stack or to a lot of child calls at the same level in the stack.

  • Recursive functions with a result accumulator (example below)
  • Request scoped structs in a web application, eg. Request, User, Metrics, Tracer (example below)
    • i.e. Structs which are used by basically every method
  • Possibly in Future's Context as an explicit param to poll

The goal of this proposal is to provide Rust as it is today, with some ergonomic wins. There should be minimal/zero drawbacks to this syntax sugar. Ideally the syntax would be extremely intuitive, if not, less than a page of documentation should be needed.

Ergonomic Goals

There are two places this feature can effect readability, a function's signature, and a function's implementation.

Function signatures:

These change marginally, as such I don't think there is an ergonomic win or loss while reading function signatures.

Function implementations:
  • Perspective of the writer:
    • Goal: None
    • Currently: This proposal happens to improve ergonomics
  • Perspective of the reader:
    • Goal: Improve ergonomics
    • Currently:
      • This proposal makes it harder to read a code sample for the first time in that domain.
      • This proposal makes it easier to re-read code samples in familiar domains.
        • Detailing the intents of the function rather than how the function is implemented.
        • Eliminating variables which are not valuable to improving the readers understanding of the purpose of the function.
          • i.e. This feature reduces noise, and improves signal, for all future re-reads.

A common starting point: I think we can all agree,for x in ... (vs it's legacy counterpart) is far more signal (i.e. more readable), even tho how we accomplish the iteration is no longer readable. This is because the purpose was to iterate, not to increment registers and dereference arrays.

Similarly, in some problem domains, while reading over the implementation of a particular function, certain variables are noise, not signal. Reading parameters which are mechanically passed over and over doesn't help me grok the purpose of the function.

Additionally, if you asked me, "In a web service, whats the purpose of the request variable if you can't see where it is used?"

  1. I would likely be able to guess without looking at any part of the implementation.
    • I know from years of experience what a request variable does.
  2. If it was relatively new to me (i.e. first ever read), it is very obvious where to find that answer: the function's signature/docs.
    • Having read the docs, the fact that it is used to call dependent services and construct a response becomes a given.

Current solutions

  • Utilize global variables
    • Current usage: e.g. env::var, or stdout.
    • Limitation: Read only / synchronization overheads
  • Utilize thread local storage (TLS)
    • Current usage:
      • e.g. futures: 0.1 (I think removed in 0.2); tokio's implicit core.
      • This solution has been extended in other languages e.g. Java with request scoped dependency injection, aspect oriented programming, and them together as aop-scoped-proxies.
    • Drawbacks:
      • Request scopes were built to ensure appropriate cleanup.
      • With manual TLS, there is no mechanism which enforces cleanup.
        • I've personally seen User objects stored in TLS and then not cleaned up before the next request
        • This results in "leaking" private data to a different user!
        • "This is a bad engineer", "better code reviews", Yes I agree!
        • That said, aren't we all using Rust because we believe the best engineers and the best process are still fallible? That as much as we can, we should explore how to get our tools to help fix today's problems?
    • Appropriate use:
      • e.g. Logging: where a thread should do lock free batching of log messages before performing disc io.
    • Inappropriate use: As a crutch for not wanting to pass around a few arguments.
      • If this is a problem, lets give Rust an alternative, so we avoid abusing TLS.
  • Utilize Context objects which encapsulate all the state needed
    • This forces all functions to take the whole context. It would be nice to write idiomatic functions, which only take the data they need.
    • A single Context object, needs to know about application specific types. Therefore it cannot improve the ergo for functions defined in a different crate (e.g. framework, plugin or middleware).
      • Workaround: You would need one context per crate, and nest them
    • Obfuscates parameters. i.e. Neither the caller nor the callee's function signature defines the used values.

Explanation

Note: Good syntax is very important, but for this iteration, I am going to use placeholder syntax. We can discuss syntax after, if the general concepts of this proposal are more or less discussed and well received.

Functions will have special syntax to define auto-propagated params and with it an auto-propagation scope. Auto-propagated params with identical names and types will be automatically propagated by the compiler to any sub-call within scope. Functions within an auto-propagation scope, can have any or all of their auto-propagated params overridden. When called without an auto-propagation scope (usually at the top of the call stack) overrides can be used to set the initial auto-propagation scope.

The best way to explain this is with example sugar/de-sugar variants.

// Imagine the database client is a plugin to a web framework.
// It only knows about `database::Query`s and `io::Metrics`, not about `jwt::User` or `framework::Request`.
// A single struct would not be known to the database client implementor.

// Imagine that you're using JWT auth to extract user information.
// I'm going to ignore that this would probably be middleware for the moment,
// but the problem is the same.

// Framework/Library author implemented
// `database.call` = fn call(query: database::Query, context { metrics: io::Metrics })

// Service owner implemented
// fn query_builder1(context { user: jwt::User })
// fn query_builder2(context { user: jwt::User, request: framework::Request })
// fn build_response(result1: ..., result2: ..., with { user: jwt::User, request: framework::Request })

// With sugar
fn handle_request(context { request: framework::Request, metrics: io::Metrics }) {
    // let's override the context to have a `user` parameter
    let context user = jwt::User::from_bearer(request.auth_header());

    let result1 = database.call(query_builder1());
    let result2 = database.call(query_builder2());

    ...

    return build_response(result1, result2);
}

// Without sugar
fn handle_request(request: framework::Request, metrics: io::Metrics) {
    let context user = jwt::User::from_bearer(request.auth_header());

    let result1 = database.call(query_builder1(user), metrics);
    let result2 = database.call(query_builder2(user, request), metrics);

    ...

    return build_response(user, request, result1, result2);
}
// With Sugar
fn find_goal_nodes(
    root: &TreeNode,
    context {
        // This function expects these parameters, but understands
        // they are generally passed down deep in the call stack, without modification.
        goal: Predicate<...>,
        results: &mut Vec<&TreeNode>,
    }
) {
    if (goal.evaluate(root)) {
        results.push(root);
    }
    for child in root.children() {
        // All parameters which share the same name, are auto-propagated to the sub-function.
        // i.e. The parameters `goal`, and `results` are automatically propagated.
        find_goal_nodes(child);

        // Also possible to override some params, but still inherit others.
        // i.e. `goal` is being overridden, but `results` is still auto-propagated.
        // Note: This is just like shadowing any other variable, and the override to the context dies at the end of the scope.
        let context goal = goal.negate();
        find_goal_nodes(child);

        // Disclaimer: Frequently overriding an auto-propagated param, would be bad practice.
        // The line above overriding `goal`, would be a good "hint" that this feature is being abused on this param.
    }
}

fn main() {
    let root: TreeNode = ...;
    let context goal: Predicate<...> = ...;
    let context mut results: Vec<&TreeNode> = Vec::new();

    // Initial call
    find_goal_nodes(&root);
}
// De-Sugar the above example
fn find_goal_nodes(
    root: &TreeNode,
    goal: Predicate<...>,
    results: &mut Vec<&TreeNode>,
) {
    if (goal.evaluate(root)) {
        results.push(root);
    }
    for child in root.children() {
        find_goal_nodes(child, goal, results);
        find_goal_nodes(child, goal.negate(), results);
    }
}

fn main() {
    let root: TreeNode = ...;
    let goal: Predicate<...> = ...;
    let mut results: Vec<&TreeNode> = Vec::new();

    find_goal_nodes(&root, goal, results);
}

The real ergonomic wins start showing in large code bases. As such no example can really do it justice.

Open Questions

  • Requiring that the names match makes the feature less flexible and useful (because everyone has to agree on using the same names). There are also questions around how to handle name/type collisions.

Drawbacks

  • At initial glance the syntax sugar seems like magic. However, a single page of documentation would highlight a de-sugared example, and it will be very straight forward / intuitive for all. This is similar to a first encounter with ? or Into or self.___().

    • However, unlike those there is no marker at the call site indicating implicitness.
      • We could add a marker:
        • database.call(query_builder(_), _);
        • database.call(query_builder(...), ...);
  • New syntax for "special" parameters. Users could be overwhelmed.

    • Possibly, but seems unlikely, the mental overhead is far less than generics or lifetimes.
  • Excessive use of generics can be an issue regardless of parameters being implicit, but this issue gets more-likely as we create more parameters for functions.

  • This particular syntax (we can fix this when we discuss syntax):

    • Basically introduces named params.
    • The initial call is less ergonomic. But perhaps that is ok.
    • Doesn't improve ergonomics for many sibling calls at the top level.

FAQ

  • Will this block any future features?
    • The concept in general is just syntax sugar, and as such should not block any future features
    • The syntax possibly could, but should be such that it doesn't block future enhancements to method signatures. We can discuss when we open up discussion about syntax.

Alternatives

@burdges suggested that we could possibly use something similar to dependent types and type inference. I'm happy to explore this route if everyone prefers it. Although it likely will limit the context to have unique types.

@burdges has also suggested a possible State monad with the possible inclusion of do notation.

@mitsuhiko has a suggestion, which I understood as TLS with a tighter scope. Basically instead of "thread local storage", it could be "call stack local storage" (my name). His proposal seems to suggest access through a global like function e.g. env::var. I would still like to put the function's dependencies into the function signature somehow. And ideally, get some compile time checks, but without adding the context to all function's signatures, this is compile time check is impossible because of "no global inference".

Conclusion

Adding a relatively simple AST transform would open Rust up to some pretty nice and easy ergonomic wins. The syntax would need to be discussed and could bring in more drawback, but at least for now the drawbacks seem minimal and within acceptable bounds.

I'd love to hear other people's thoughts:

  • Do you like it? (+1s will do)
    • What might your usecase be?
  • Drawbacks
    • Any of the drawbacks show-stoppers for you?
    • Any drawbacks I didn't forsee?
  • Compile/Lang teams
    • Is this even feasible? I have no idea if this can actually be an AST transform.
@ahmedcharles
Copy link

Scala is the only language I know of with this feature set. It's generally (ab)used to provide things like FromIterator to functions like collect. They also seem to result in code being really hard to understand and seem magical, especially when combined with the other features which use the implicit keyword.

The only scenario where this even initially seemed fine in one of the codebases I maintain was to pass a logging parameter through the callstack, but even when doing that it's occasionally confusing to get weird error messages about it being missing or not being found.

Besides, I don't think there's enough research and language design experience that suggests that this is consistently beneficial to implement it in a language with a production focus.

@Ixrec
Copy link
Contributor

Ixrec commented Feb 8, 2018

If I understand the example correctly, you still have to write with { } everywhere the arguments are getting passed, so it seems like this is already easy to do with an "options object".

struct FindGoalAccumulator {
    goal: Predicate<...>,
    results: &mut Vec<&TreeNode>,
}
impl FindGoalAccumulator { ... }

fn find_goal_nodes(
    root: &TreeNode,
    acc: &FindGoalAccumulator
) {
    ...
    find_goal_nodes(root, acc.negateGoal());
}

Which to me at least looks almost identical in readability and ergonomics. Perhaps writing the negateGoal method is a nuisance since Predicate already had a negate() method, and you do need to invent a name for this options object you're passing, but those seem like really tiny improvements compared to the downsides of having an entirely new language feature just for this pattern.

In other words, I'm just not seeing the motivation for this.

@yazaddaruvala
Copy link
Author

yazaddaruvala commented Feb 8, 2018

It's generally (ab)used to provide things like FromIterator to functions like collect.

@ahmedcharles I apologize for my lack of experience with Scala, I only know it at a basic level. Briefly looking at the Scala docs, there are two similar features around implicit.

  • Implicit Parameters
    • Looking at the docs, while close in spirit, I'm not sure this feature is identical in practice.
    • Although, looking at the docs' examples, I do see how this could be abused with excessive generics (I'll add this as a drawback)
      • That said, I'm not sure this feature increases the chances a library implementer would abuse generics in their function signatures.
  • Implicit Conversions
    • It seems like the feature which you describe is abusive.
    • Rust already has a solutions similar to this feature.
    • I am not advocating for this, and have little need for it.

Besides, I don't think there's enough research and language design experience that suggests that this is consistently beneficial to implement it in a language with a production focus.

@ahmedcharles I'm not exactly making up this problem. The java webservices I've built for production uses has this pattern (issue?) all over the place. I don't like these options, but a lot of people today (over the last decade) end up using either thread local storage or "request scoped" Dependency Injection (DI), which internally uses thread local storage, to "improve ergonomics". In java they even try building "method interceptors" called aspects (I'm not a fan), again using thread local storage. All of these are intricate and nuanced workarounds for this relatively simple, missing compile time feature.

You might even call this feature "compile time, strongly typed, dependency injection". Hopefully this helps alleviate your concerns around "enough research"?

@Ixrec yeah an "options" struct is a great alternative, thanks I'll add it to my proposal for visibility. However, while it can help improve the ergonomics, it does end up being kind of awkward and also limiting.

I blame myself for being lazy and only adding one example. Not all calls will need all of the information. Passing around an options struct, doesn't allow you to write functions idiomatically (i.e. limit variable scope as much as possible). Keep in mind middleware might be written in different crates, not exactly able to share a common options struct.

For example:

// Imagine the database client is a plugin to a web framework.
// It only knows about `Query`s and `Metrics`, not about `User` or `Request`.
// A single struct would not be known to the database client implementor.

// Framework/Library author implemented
// `database.call` = fn call(query: Query, with { metrics: Metrics })

// Service owner implemented
// fn query_builder1(with { user: User })
// fn query_builder2(with { user: User, request: Request })
// fn build_response(result1: ..., result2: ..., with { user: User, request: Request })

fn handle_request(with { request: Request, user: User, metrics: Metrics }) {
    let result1 = database.call(query_builder1());
    let result2 = database.call(query_builder2());

    ...

    return build_response(result1, result2);
}

You could just take my word for it, or look at most web applications, these query_builders and build_response methods usually end up going a few layers deep, passing the same object over and over. Per request handler methods, there are usually only 2-5 remote calls which end up needing metrics passed over and over, but I've seen implementations with remote calls in the teens. Overall a single (simple) service will end up passing metrics and sometimes a tracer down the call stack in 50+ locations.

Maybe this showcases the benefits of this proposal more? If so, I'll update my original post with these examples / motivations.

@yazaddaruvala
Copy link
Author

@Ixrec thinking about it a little more, I do think you could use an options struct, with a lot of impl From ..., and pass the options everywhere. But I hope you'd agree this wouldn't be an on-par ergonomic win. Or maybe you're thinking about it differently than I did. If so, I'd love to see the handle_request example ported with your suggestion.

@ahmedcharles
Copy link

If you look at the Scala example for implicit parameters, you'll note that the value of the implicit is purely based on matching types. I actually don't know what happens if there are two implicits declared in scope with the same type, though I hope it results in an ambiguity error. Basically, the names of the parameters are irrelevant.

You seem to be proposing that the names are important, but that basically results in adding named parameters to the language, which hasn't been done so far due to complexity. Requiring that the names match makes the feature less flexible and useful (because everyone has to agree on using the same names, or it ends up being more verbose than the current syntax). But alternatively, basing it purely on type results in the chaos (my word) seen in Scala.

Let's take your handle request example and make it more complete.

// Imagine the database client is a plugin to a web framework.
// It only knows about `Query`s and `Metrics`, not about `User` or `Request`.
// A single struct would not be known to the database client implementor.

// Imagine that you're using JWT auth to extract user information.
// I'm going to ignore that this would probably be middleware for the moment,
// but the problem is the same.

// Framework/Library author implemented
// `database.call` = fn call(query: Query, with { metrics: Metrics })

// Service owner implemented
// fn query_builder1(with { user: User })
// fn query_builder2(with { user: User, request: Request })
// fn build_response(result1: ..., result2: ..., with { user: User, request: Request })

fn handle_request(with { request: Request, metrics: Metrics }) {
    // let's pretend we have a way to make user act like it was a with parameter
    let with { user } = jwt::User::from_bearer(request.auth_header());
    let result1 = database.call(query_builder1());
    let result2 = database.call(query_builder2());

    ...

    return build_response(result1, result2);
}

This is all nice and fancy, but now the User object used by the JWT library has to match the one used by the database library. And well, lets assume that the JWT library also does metrics, perhaps you get lucky and it even uses the same library for metrics as the database library, but it calls the parameter metrics_client instead, since the dev figured that was clearer, cause metrics get reported to a server.

I suppose it's obvious now why Scala ignores parameter names. But it still requires the same types. In an ecosystem like Rust's where there are lots of small crates that integrate with each other, primarily using traits, I don't really see this feature being beneficial. As soon as you start mixing libraries, passing variables is the clearest solution and other than being verbose, I don't really see the problem.

@Centril Centril added the T-lang Relevant to the language team, which will review and decide on the RFC. label Feb 8, 2018
@Centril
Copy link
Contributor

Centril commented Feb 8, 2018

Haskell supports a feature like this under -XImplicitParams so there's certainly prior art and research (Lewis2000, https://dl.acm.org/citation.cfm?doid=325694.325708) around this subject. My opinion of that language pragma has always been that it is a misfeature that optimizes for writing ergonomics to the detriment of reading ergonomics. As reading code happens more frequently than writing, I believe we should default to favoring the ergonomics of reading over writing whenever there is a conflict, unless there is a strong argument for an exception in the particular case. To me, the legibility of code will be the major drawback of this proposal should it come to fruition.

@yazaddaruvala
Copy link
Author

yazaddaruvala commented Feb 9, 2018

@Centril I'm entirely in agreement. My preference is always to optimize for readability. I'd take this one step further, as much as possible we should optimize for re-readability.

I'm not familiar with this features' implementation in Haskel or why it might be a miss feature there. I think wether this feature is a misfeature is highly dependent on what you believe is signal and what is noise.

There are two places this feature can effect readability, a function's signature, and a function's implementation.

The feature I'm requesting only changes the function signature marginally, as such I don't think there is an ergonomic win or loss while reading function signatures. The rest of my comment will cover the readability impact of this feature in function implementations.

A common starting point: I think we can both agree, for x in ... (vs it's legacy counterpart) is far more signal (i.e. more readable), even tho how we accomplish the iteration is no longer readable. This is because the purpose was to iterate, not to increment registers and dereference arrays.

Similarly, in some problem domains, while reading over the implementation of a particular function, certain variables are noise, not signal. Reading them over and over doesn't tell me more about the purpose of the function.

I'll de-sugar my earlier example:

fn handle_request(with { request: Request, user: User, metrics: Metrics }) {
    let result1 = database.call(query_builder1());
    let result2 = database.call(query_builder2());

    ...

    return build_response(result1, result2);
}

vs

fn handle_request(request: Request, user: User, metrics: Metrics) {
    let result1 = database.call(query_builder1(user), metrics);
    let result2 = database.call(query_builder2(user, request), metrics);

    ...

    return build_response(user, request, result1, result2);
}

I'm not able to contrive an adequately involved example, but note that user, request, and metrics end up getting mechanically passed to almost all methods (regardless of the number of methods), without adding any value to my understanding of the purpose of this function.

Additionally, if you asked me, "whats the purpose of the request variable if you can't see where it is used?"

  1. I would likely be able to guess without looking at any part of the implementation.
    • I know from years of experience what a request variable does.
  2. If it was relatively new to me (i.e. first ever read), it is very obvious where to find that answer, the function's signature/docs.
    • Having read the docs, the fact that it is used to call dependent services and construct a response is a given.

When it comes to groking a function for the 2nd, and especially 15th time, I would call these types of parameters which have repetitive, predetermined usage patterns, noise.

I'd postulate, that while I'm not familiar with them, there exists a similar set of "mechanically passed" parameters in every codebase which shares a problem domain (i.e. anywhere you'd build and share a framework).

@burdges
Copy link

burdges commented Feb 9, 2018

As an overly broad rule, syntax-level tricks bring confusion while type-level tricks bring power and clarity.

I think a measure of type inference plus dependent types provides a much better route to this :

fn find_goal_nodes<const goal: Predicate<...>>(
    root: &TreeNode,
    results: &mut Vec<&TreeNode>
) { ...

We'll only have const dependent types in near future, but one could imaging improving that to constant-relativet-to-the-local-call-stack, no-interior-mutability, etc. dependent types or whatever.

In practice, you actually find yourself reaching for implicit parameters not to avoid typing, but so that people cannot pass the wrong thing as easily, so dependent types provide one stronger solution, but so do other type system extensions:

Improving ephemeral types ala structural records aka anonymous structs permits doing things like

fn find_goal_nodes(args: struct {
    root: &TreeNode,
    goal: Predicate<...>,
    results: &mut Vec<&TreeNode>,
}) {
    ...
        find_goal_nodes({ root: child, ..args });

You can actually enforce invariants with design by contract or other verification tools, which frequently sounds better. In this vein, your find_goal_nodes(child, with { goal: goal.negate() }); is likely an anti-pattern unless based on something like structural records or design by contract.

@yazaddaruvala
Copy link
Author

yazaddaruvala commented Feb 9, 2018

@burdges I do not mind if we use the type system over an AST transform.

constant-relativet-to-the-local-call-stack, no-interior-mutability, etc. dependent types

Passing &muts, is a large part of the use case.

I think the type inference would be the blocker, but if the lang/compiler team feels its the better route, and can achieve the same effect, I'm happy to change this to go with that solution instead.

@epage
Copy link
Contributor

epage commented Feb 10, 2018

As for my thought with the RFC, I personally would like to get some runtime on seanmonstar's idea before deciding we need to extend the language and in what way. Granted, I'm always a bit conservative about adding language features due to concern with bloating the language (one of the things that turned me off from D).

For background, my use case of a "Context" is I maintain the Rust implementation of the Liquid templating language. It has "global" state in both the parser and the renderer and it'd be nice to find better ways to express that rather than passing those structs up and down the call stack.

Things I think this RFC could use

  • Pitfalls a user run into with this and how can we help them debug them
  • Some prior art has been mentioned. In addition, this sounds similar to the failed(?) area of Aspect Oriented Programming (just never hear it talked about anymore). I think it'd be good to analyze these to see how they differ, what works well for them, and what doesn't work well for the,

Auto-propagated params with identical names or types will be automatically propagated by the compiler to any sub-call within scope

I assume we can't auto-wire by name alone because then the types might be invalid. Do we need auto-wiring to match name and type to ensure we don't miswire things?

@udoprog
Copy link

udoprog commented Feb 10, 2018

I want to echo @Centril's sentiment about how this RFC seems to optimize for writing ergonomics rather than reading. I'd like to see stronger value to outweigh that.

What this RFC is primarily competing with is an explicit context argument. If this is a struct (or anonymous struct at some point), it can evolve with the library or application that uses it without having to refactor all call sites. For the places where it's changed we have similar unpacking/packing problems as we would have with this RFC.

this sounds similar to the failed(?) area of Aspect Oriented Programming (just never hear it talked about anymore).

In the Java world this was IMO mostly unsuccessful because there wasn't stable language support for it, leading to pains when integrating AOP into the real world.

As for my thought with the RFC, I personally would like to get some runtime on seanmonstar's idea before deciding we need to extend the language and in what way.

I agree with this. I just want to clarify that this RFC wants to add context-capabilities to any function (if I'm reading it correctly), while the proposal above is limited to receivers. Because of this, it doesn't propagate well across methods (this comment by withoutboats). That is a potential benefit that this RFC has compared to that proposal.

@mitsuhiko
Copy link
Contributor

mitsuhiko commented Feb 10, 2018

I think the example shown here has moved the conversation into an area that I personally don't find that useful where @seanmonstar's proposal can also fit. However the place that I find most interesting is context variables that can appear out of thin air by a previously unknown piece of code. eg: think of it more as env::var that is scoped locally to a task but might or might not be inherited to another task.

Because of asyncio Python recently gained context variables which are variables that are just generally available but can have different values per task: https://www.python.org/dev/peps/pep-0567/

There are a lot of uses for this:

  • For instance i18n utilities can discover the current language from anywhere. This means no matter how deep down a calls tack you are you can always get the appropriate language.
  • Auditing functionality can always discover the relevant information for logging (current security context, current user, current ip etc.). This means even the most remote utility function can issue a audit record without all callers having to be updated.
  • Security contexts. This is the one I care about the most. Our business is to write a multi tenant application and being able to know which is the current tenant at all times lets us write much more secure code. For instance if we know what the current tenant is we can automatically add constraints to the current tenant to all queries in a database. Or we can run assertions on result sets that if data for a different tenant is returned we filter it out or error instead of leaking information.
  • Systems such a sentry's raven clients can collect breadcrumbs of the current execution and scope it to the most relevant unit of work (for instance current http request etc.) to aid debugging later.

Such a system can also come in useful when tests are used. For instance right now println! can be redirected temporarily for the test runner alone on a per-thread basis through an undocumented and unstable API. With a context system one could imagine a world where global default streams can be tapped in on a per task basis.

I made an earlier proposal to Python where I imagined a system where the context variable can define what happens if a new task is spawned out of a task (does the value get copied, cleared etc.). I figured this is useful as there are typically two ways to spawn a task: a task that aids another task (http fetch / db query during web api http request handling) or a task that spawns a whole context by itself (http request for instance). There needs to be a generally well understood API to spawn such things differently or the contexts will all get messed up. There is an almost neverending list of complexities for this and I can see if I can find some of the notes I wrote. I also know that the node community didn't manage to settle on a new system yet but did leave some notes behind as well about how such a system could work.

In the process of working on this I noticed that there are quite a few systems that can be used for inspiration (.NET's logical call contexts and nodejs' domains and various libraries working with the async hooks and now Python 3's context variables).

//EDIT: one thing I forgot to mention is that it's I think very important that new data can be made up by 3rd party libraries into existing contexts. This is a crucial requirement for systems such as Sentry or auditing code.

@sandeep-datta
Copy link

What happened to "explicit over implicit"? Are we not doing that any more?

@Centril
Copy link
Contributor

Centril commented Feb 10, 2018

@sandeep-datta Type inference makes things implicit, lifetime elision also. I think the key is to be explicit where it matters most for clarity and implicit where it reduces pain the most... You might be interested to read @withoutboats's excellent blog post on this subject.

@ahmedcharles
Copy link

@Centril Lifetime elision isn't really implicit. The compiler isn't allowed to make a choice based on information that isn't local to the function definition and there's only one possibility. Granted, people may not know what that is in terms of having the lifetimes in the source, but then again, those same people wouldn't be likely to understand the lifetimes if they were visible in the source either.

This feature results in any function call within the definition of a function with N implicit parameters having 2^N possible ways of being called, ignoring the order of the parameters. I don't want to do the math with ordering being a consideration. 1 vs 2^N isn't really comparable, unless N is 0.

@mitsuhiko
Copy link
Contributor

What happened to "explicit over implicit"? Are we not doing that any more?

I think that's the wrong way to think about it. Well implemented context variables are just a superior version of env::vars.

@nox
Copy link
Contributor

nox commented Feb 10, 2018

I strongly disagree with any kind of implicit argument passing in any form or fashion. What is wrong with passing the damn arguments around? I cannot imagine any code review going faster thanks to implicit parameters.

At initial glance the syntax sugar seems like magic. However, a single page of documentation would highlight a de-sugared example, and it will be very straight forward / intuitive for all. This is similar to a first encounter with ? or Into or self.___().

I disagree with that. All three existing features you mentioned require something to be written where the code is impacted, be it a question mark or a method call. Implicitly passing arguments means I need to always remember what even was the function signature and what were its implicit arguments. Furthermore, it can make code way more confusing, if there is a function with implicit params which is named the same thing as a trait method with a different arity.

I think that's the wrong way to think about it. Well implemented context variables are just a superior version of env::vars.

Accessing environment variables from the middle of code is asking for problems, just like rampant use of global variables.

@catern
Copy link

catern commented Feb 10, 2018

To provide another piece of terminology that might help people understand this proposal: Essentially we are proposing adding dynamic scope. Scala implicits are a syntactically different but semantically identical example of dynamic scope. Environment variables are another example of dynamic scope. Exception handlers are typically dynamically scoped.

Dynamic scope is very useful - many languages have exceptions/exception handlers, and environment variables are widely used for configuration. Dynamic scope can be used to implement thread-locals/coroutine-locals in an elegant, unified way. But: dynamic scope can be easily abused, and probably should not be used outside of foundational libraries. So it's a bit of a toss-up.

@mitsuhiko
Copy link
Contributor

mitsuhiko commented Feb 10, 2018

I strongly disagree with any kind of implicit argument passing in any form or fashion. What is wrong with passing the damn arguments around?

See my comment above but the simple answer is that it's impossible. The goal of a context system is to make things which are currently global state (think of things like stdout, environment variables, execution flags such a if tracebacks should be shown etc.) slightly more local. Traditionally TLS is used for this, but this cannot work with async systems.

More importantly though we need to be able to compose code. Different systems will allow you to pass different data through. Even the best designed system will require TLS sooner or later to capture vital state. Security contexts in multi tenant systems for auditing are a really good example of something which is just too important to pass around. Same thing with logging integrations in general. I encourage you to look at how logical call contexts in .NET work to get an idea what they are used for. Likewise why the node community has domains and similar systems. They are vital for the operation of the raven-node client for instance.

(Any api that currently uses TLS needs to use a different system to work in an async environment.)

@nox
Copy link
Contributor

nox commented Feb 10, 2018

How can it be impossible, given this issue is about sugar that corresponds to passing arguments explicitly? Am I missing something?

@mitsuhiko
Copy link
Contributor

@nox i was talking about the need for context variables, not about syntactic sugar for implicit argument passing which I'm opposed to.

@kylone
Copy link

kylone commented Feb 10, 2018

@nox I'm thinking that this RFC isn't discussing the underlying motivations for the feature--to its detriment. @mitsuhiko described them a while up:

How do we make a new cross-cutting feature, like logging, auditing, collecting metrics, etc work with an existing library without needing to update the function signatures of the entire library for each feature?

@kylone
Copy link

kylone commented Feb 10, 2018

Also, after thinking about it, would a feasible solution be using a builder pattern that constructs a context , and move all of the functionality of the api to methods on the context? (And then wrap this for cross-cutting needs?)

@nox
Copy link
Contributor

nox commented Feb 10, 2018

How do we make a new cross-cutting feature, like logging, auditing, collecting metrics, etc work with an existing library without needing to update the function signatures of the entire library for each feature?

By updating the function signatures of the entire library, such that you can see the code flow and not be completely lost 3 months in the future.

@yazaddaruvala
Copy link
Author

yazaddaruvala commented Feb 11, 2018

similar to the failed(?) area of Aspect Oriented Programming

@epage I brought up aspects and my dislike for them in my second post. I'm just trying to batch a bunch of ideas, and I'll update the proposal.

Basically, regarding aspects. I do not like aspects because they end up needing TLS, and while I don't have a problem with TLS in general, I do have an issue when it is used as a crutch for a missing feature, and is possibly misused. I'll update the proposal with my thoughts.

@mitsuhiko I don't mind if we use env::var type syntax to achieve this (or any syntax). However, I would want to make sure these are not scoped to the thread, and instead these contexts cannot exceed the lifetime of the current call stack (CSLS - "call-stack local storage" if you will). I'd likely go so far as to say, they should be "owned" by the call stack.

Also, I do prefer if every function in the call stack (even if it doesn't need the context) have to declare the existence of the context. This is purely for compile time correctness, and based on Rust's rule of not doing global checking. But I could be convinced otherwise.

I strongly disagree with any kind of implicit argument passing in any form or fashion. What is wrong with passing the damn arguments around?

@nox I actually don't mind "passing the damn arguments around". I would prefer if it was more ergonomic :), but I end up using this "passing arguments" style in all of my production Java projects. That said, this isn't about you or me, I know that other people view this "passing ... around" as a code smell, and the conversations not only get contentious, but most developers go looking for any solution which might help (i.e. request scoped DI, AOP, manual TLS), and I've had to fix unforeseen consequences of this type of code.

I don't love it, but I could be convinced to introduced something like JavaScript's spread operator, ...args (or any other syntax), as a marker to make this implicit passing more explicit at the call sites, but still improve ergonomics? Like I said, I'm not in love with the idea, but I am curious about people's thoughts on that?

@nox
Copy link
Contributor

nox commented Feb 11, 2018

@yazaddaruvala The damning was unnecessary, sorry for that. For what it's worth it was about them plain old arguments and not the people wanting to omit them. :)

In my experience on Servo, at no point we ever wished for implicit argument passing, but we did just group around a bunch of arguments multiple time in some sort of Context structure in different parts of the system. This also helped documenting what was going on and grepping through the code base etc.

A small thing I often do is select an identifier in my editor and let it highlight all other occurrences to see where it's defined and used. This is also one of the key features of searchfox.org. Implicit argument passing breaks that.

@yazaddaruvala
Copy link
Author

yazaddaruvala commented Feb 11, 2018

In my experience on Servo, at no point we ever wished for implicit argument passing

@nox This makes a lot of sense to me, and I see more why you guys are pushing back. If you're not dealing with frameworks, you wouldn't run into this pain.

The work you've done for the Context in Servo was done once and won't be done again by those engineers. When you're working with web services framework, and as per normal, building many web services, and the same patterns need to be repeated again and again, it is felt as pain, undifferentiated "boilerplate".

Additionally, within a single application its relatively easy to build a single Context struct shared by all. However, with a framework, you use types from the framework (i.e. not controlled by you), plus types you've made (i.e. not controlled by the framework author), plus types from middleware (i.e. types not controlled by either the framework author or you).

Creating a single Context over and over gets tedious, but also doesn't work well. Your custom methods can take this new consolidated `Context, but middleware will still only take middleware + framework types, and framework functions will still only take framework types. And you end up with less, but similar noise.

@nox
Copy link
Contributor

nox commented Feb 11, 2018

The work you've done for the Context in Servo was done once and won't be done again by those engineers. When you're working with web services framework, and as per normal, building many web services, and the same patterns need to be repeated again and again, it is felt as pain, undifferentiated "boilerplate".

I worked 6 years for a dating website before I worked on Servo, I did not feel pain because I was missing implicit argument passing.

Additionally, within a single application its relatively easy to build a single Context struct shared by all. However, with a framework, you use types from the framework (i.e. not controlled by you), plus types you've made (i.e. not controlled by the framework author), plus types from middleware (i.e. types not controlled by either the framework author or you).

This applies to Servo, you would be surprised by how many independent parts there are in it.

Creating a single Context over and over gets tedious, but also doesn't work well. Your custom methods can take this new consolidated Context, but middleware will still only take middleware + framework types, and framework functions will still only take framework types. And you end up with less, but similar noise.

Again, this is entirely the same with implicit argument passing except that you can't track the code flow just by reading it. There is nothing that implicit argument passing can do that explicit argument passing can't, as per the topic discussed in this topic.

If you are actually arguing about something else, you should probably open a different issue.

@nielsle
Copy link

nielsle commented Feb 11, 2018

Rocket uses function decorators to validate the user when handling a HTTP request. Perhaps Yazaddaruvala can use a similar approach to solve the issue in the RFC - But it may only compile on nightly.

https://rocket.rs/guide/requests/#request-guards

@mitsuhiko
Copy link
Contributor

mitsuhiko commented Feb 11, 2018

I worked 6 years for a dating website before I worked on Servo, I did not feel pain because I was missing implicit argument passing.

And that website / service did not use thread local data? I find that hard to believe. In PHP effectively everything is thread local. Node uses domain information excessively behind the scenes. Django and Flask in the Python world are using TLS behind the scenes. Ruby on Rails uses TLS as well.

Rocket uses function decorators to validate the user when handling a HTTP request. Perhaps Yazaddaruvala can use a similar approach to solve the issue in the RFC - But it may only compile on nightly.

That does not work if data does not travel through the entrypoint. A helper function internally cannot depend on data that is not injected in the entrypoint. For instance an i18n system in rocket needs to have an injector in every view function and then pass an i18n object to all functions throughout the codebase manually.

@nox
Copy link
Contributor

nox commented Feb 11, 2018

And that website / service did not use thread local data? I find that hard to believe. In PHP effectively everything is thread local. Node uses domain information excessively behind the scenes. Django and Flask in the Python world are using TLS behind the scenes. Ruby on Rails uses TLS as well.

The things you cited correspond to code using globals, which, yet again, doesn't correspond to implicit argument passing. Could we focus on the merit of implicit argument passing, which by definition is syntactic sugar for something that corresponds exactly to passing arguments around explicitly, and not talk about similar-yet-unrelated features which should go in their own RFC or issue?

@mitsuhiko
Copy link
Contributor

Could we focus on the merit of implicit argument passing

So to avoid us talking about different things:

  • implicit argument parsing: some data available in the current stack is passed implicitly onwards to a direct caller
  • context variables: TLS that can be explicitly switched with other forms of concurrency (like tasks, greenlets etc.). More importantly multiple tasks within one thread might share the same context variable.

You replied to a comment about me arguing that implied function argument passing is not a sufficient solution for actual problems (need for context variables) but in a way that I was assuming you are arguing against the need of context variables / TLS supporting async tasks altogether. If that's not the case then please ignore my comments.

@burdges
Copy link

burdges commented Feb 11, 2018

As an aside, you can always do mutable implicit parameters with a state monad anyways, except maybe not your missfeature find_goal_nodes(child, with { goal: goal.negate() }); that I complained about upthread.

struct State<S,T>(FnMut(S) -> (T,S));
... // impl bind, etc. to make it monad-like and provide a do notation

fn find_goal_nodes(root: &TreeNode)
  -> State<(Predicate<...>, &mut Vec<&TreeNode>),()>
{ State(|(goal, results)| {
    ...
} } // State

You'll want a do notation to give this monad more implicit feeling parameters. Right now, there are two do notations in Rust created by the Try RFC #1859 and the generators eRFC #2033. I suspect either work actually but ? might require too strict a bind and maybe #[async] might permit hiding the closure.

In other words, you should attempt to do this using coroutines, almost as if it were an Iterator, maybe it'll actually look okay.

@burdges
Copy link

burdges commented Feb 11, 2018

At minimum, actually doing this via a state monad with coroutines/generators providing the do notation should be informative for coroutines/generators, like if anything does not work then certain iterators must be unnecessarily tricky to write as coroutines/generators.

@Ixrec
Copy link
Contributor

Ixrec commented Feb 11, 2018

Either I'm hopelessly confused, or most of this discussion is about hypothetical ideas that have very little in common with what originally was or currently is in the proposal at the top of the thread. Unfortunately this design space is one where it's almost impossible to fairly evaluate an idea without some concrete code samples, and all the samples presented here so far are only convincing me that most of these ideas are either not even a net win for write ergonomics, or are making a small win write ergonomics at a substantial cost in read ergonomics.

I say "most" because there's a handful of comments that, at least to me (I've never looked into AOP), are so vague I can't even form enough of a strawman proposal in my mind to ask specific questions about them. In particular:

The goal of a context system is to make things which are currently global state ... slightly more local. Traditionally TLS is used for this, but this cannot work with async systems.

What is a "context system" that's not just using a bunch of globals or passing a context argument around? I can imagine helpful sugars or conventions on top of those two options, but nothing that changes the fundamental trade-offs.

@yazaddaruvala
Copy link
Author

I've compiled all of the feedback, and updated the original proposal.

Changed the motivation, extended it to show the current solutions, why they are not adequate. I've added extra example and alternate proposed solutions to achieve the same end result, without using the same mechanism.

@sandeep-datta
Copy link

sandeep-datta commented Feb 12, 2018

There are two places this feature can effect readability, a function's signature, and a function's implementation.

What about function call sites? You don't always need to read the function signature if you can infer it from the function name and arguments used at the call site. If I had to read the signature for each function call in a body of code it will soon get very tiresome.

This is an ergonomic setback for readability.

@Centril
Copy link
Contributor

Centril commented Feb 12, 2018

Please stop pushing this misguided "ergonomic" improvement.

Please consider your tone here. It is not very cordial.

I assume @yazaddaruvala has legitimate reasons for pursuing this and considers this to be an improvement even if only for writing. Reasonable people can disagree over what the trade-offs are and where we land on scale. I believe if nothing else some interesting ideas have been spread here and discussed in this topic and we are all richer for it as people.

@sandeep-datta
Copy link

sandeep-datta commented Feb 12, 2018

I am sorry that came off a little harsh. I have removed the offending sentence now.

@burdges
Copy link

burdges commented Feb 12, 2018

"Whenever there is a difference of opinion between [the writers of code and the reader of code], the readers are always right and the writers are always wrong." - 10min into Yaron Minsky's Effective ML lecture

Said differently, any utility derived from making code shorter actually comes entirely from making the code more comprehensible, i.e. more readable, even say fn as opposed to function works due to improving visual tokenization.

Implicit parameters can only make code more readable when used in constrained ways, like DSLs or @mitsuhiko list. It's not the implicitness per se that makes implicit parameters a missfeature but lack of constraints that tell the reader what is going on.

There are however several type system approaches like do-notation for state monads, dependent types, and structural records that afaik provide any readability improvement you might gain through implicit parameters. I only named fancy ones so far, but simply using self more aggressively works too:

struct FindGoalNodes {
    goal: Predicate<...>, 
    results: &mut Vec<&TreeNode>)
}
impl FindGoalNodes {
    fn go(&mut self, root: &TreeNode) {
        if (self.goal.evaluate(root)) {
            self.results.push(root);
        }
        let goal = self.goal.negate();
        for child in root.children() {
            self.go(child);
            FindGoalNodes { goal, ..self }.go(child);
        }
    }
}

After writing that, I now realize you're likely fighting the borrow checker here, so actually NLL will largely solve your problems:

impl FindGoalNodes {
    fn go(&mut self, root: &TreeNode) {
        let FindGoalNodes { goal, ref mut results } = *self;
        if (goal.evaluate(root)) {
            results.push(root);
        }
        let goal = goal.negate();
        // our borrow of self.results ends here under NLL so self becomes usable again
        for child in root.children() {
            self.go(child);
            FindGoalNodes { goal, ..self }.go(child);
        }
    }
}

I'll point out that Haskell's FRUs are more terse so that last line becomes self { goal }.go(child); there. I've no opinion on Rust vs Haskell FRUs myself, like maybe Rust's more explicit FRUs are better, and Haskell just needs terse ones to make up for lacking mutability.

Also, I'll point out that if you wanted another self parameter then you can always do impl<'a> (&'a mut FindGoalNodes, &'a mut MyOtherSelf) { ... } or whatever. We might ask for @ to play nice with NLL so that let (fgn @ &mut FindGoalNodes { goal, ref mut results }, selfy) = self; permitted you to abandon results to get pack to fgn.

@burdges
Copy link

burdges commented Feb 12, 2018

You might also ask if there is much sense in destructuring self in fn declarations ala

impl<'a> (&'a mut FindGoalNodes, MyOtherSelf) {
    fn go(self: (fgn, selfy), root: &TreeNode)

@burdges
Copy link

burdges commented Feb 12, 2018

I just noticed the WithContext proposal for Futures which handles augmenting self nicely: rust-lang-nursery/futures-rfcs#2 (comment) via https://aturon.github.io/2018/02/09/amazing-week/

@udoprog
Copy link

udoprog commented Feb 12, 2018

@burdges this thread is pretty massive, but you (and others) might be interested in jumping back to this comment which discusses that approach.

@H2CO3
Copy link

H2CO3 commented Apr 12, 2018

No, no, no, please don't do this. It will be extremely confusing and hard to read. Passing around context parameters is not hard. I've done it several times, and it didn't hurt… it was like a 1 minute task to complete. On the other hand, I would be very unhappy if I had to read code that had who knows how many implicit name bindings who knows where, given especially that Rust is very lenient when it comes to shadowing, so this has the most realistic potential to introduce (or hide the existence of) serious bugs.

@Centril
Copy link
Contributor

Centril commented Apr 12, 2018

@yazaddaruvala Just a note on the RFC procedure: If you want to file a formal RFC to be considered by the language team for inclusion into Rust, this needs to be done as a pull request to this repository. Check out some merged and closed RFC to get a sense of what the process will look like.

@Centril
Copy link
Contributor

Centril commented May 15, 2018

Ping from triage: @yazaddaruvala, are you intending to make a formal RFC out of this?

@blanchet4forte
Copy link

blanchet4forte commented May 14, 2021

This is a really old RFC and a lot has changed in the Scala world which a lot of the implicit talk above references. Scala 3 is moving away from implicit in favor of something a little less magical. In particular given and using I think are the features most relevant to this RFC.

Taking a Scala 3 example and pseudo translating to Rust:

given globalEc: ExecutionContext = ForkJoinPool::new()

async fn whatever(value: i32)(using ec: ExecutionContext) { /* ... */ }

whatever(42)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
T-lang Relevant to the language team, which will review and decide on the RFC.
Projects
None yet
Development

No branches or pull requests